Liquid Data Layer
The Clarative team has years of experience building data applications, pipelines, and infrastructure, as well as bringing AI into enterprises and governments. We have been been building tight rails around AI for mission critical use cases in security-minded industries from manufacturing to defense & intelligence for 5+ years. Reach out to learn more about the future of AI and data.
Thank you for your interest, and enjoy the whitepaper!
Feel free to reach out to info@clarative.ai with any questions.
Introduction
Advancements in AI are going to change how work is done in every industry. Incumbents across the modern data stack are already working to add ChatGPT-like co-pilots to their offerings. These co-pilots will make it easier to write and maintain ETL jobs, structure databases, and build dashboards. All of these improvements will surely be productivity bumps, but they won’t be paradigm shifts.
What the existing solutions are missing is that AI has the power to completely change the data ecosystem. Rather than hard and inflexible data models powered by layers of ETL, AI will empower users to directly use data across organizations in the form they understand, regardless of the domain. Imagine a marketing analyst effortlessly querying real-time sales data from Salesforce, combined with customer feedback from Zendesk and campaign performance metrics from Google Analytics — all without writing a single line of integration code or waiting for an ETL process.
What is the Modern Data Stack?
The Modern Data Stack (MDS) refers to a collection of tools used by organizations to manage, store, transform, and analyze their data. A few examples of layers in the stack are:
Ingestion: Stitch, Fivetran
Storage (Lake/Warehouse): Google BigQuery, Amazon Redshift, Databricks, Snowflake
Transformation: dbt, Databricks
BI: Looker, Mode, Tableau, ThoughtSpot, PowerBI
Governance: DataHub, Amundsen, Atlan
Many of the above products also provide more pieces of the stack or have a full MDS offering (like Palantir Foundry).
Not pictured above are the source systems or systems of record. These are the systems that originate the data, like applications and services. Some examples are application databases like Postgres, ElasticSearch, or Oracle, tracking services like Amplitude or Google Analytics, CRMs like Salesforce or HubSpot, and APIs like Stripe or Shopify.
Additionally, products like Customer Data Platforms (Segment, for example) have cropped up to add vertical specific views on top of source systems and warehouses. “Reverse ETL” capabilities of these products also allow you to write back to these systems to maintain that single source of truth.
Why the Modern Data Stack?
The MDS exists for scale and convenience. Analytical teams want to answer questions that can only be answered by querying data produced by different silo’d data sources. High-scale event data often requires very fast analytical databases to query efficiently. For these reasons, the canonical solution has been to move data from where it’s created (applications, services, APIs) to a single place where it can all be queried efficiently together.
Unfortunately, completing a data transformation and moving enterprise data to a single data warehouse didn’t solve all data problems.
Problems with the Modern Data Stack
The MDS solved a lot of problems by centralizing data and governance in organizations, but it wasn’t able to entirely remove complexity. Complexity shifted left to the ETL (ingestion and transformation) of data instead. Now organizations need to build and maintain data pipelines that bring data into a data lake, then transform and clean it inside of a data warehouse. Services like Fivetran exist to manage these pipelines for organizations, but using these managed services means orgs are dependent on the included connectors.
If organizations are using a Business Intelligence layer that requires a strict semantic model (data model matching the business objects) like Looker or ThoughtSpot, then they also must manage transformations to get data to a BI-ready state. This is ultimately more work for data engineering teams to configure and manage, and that’s if you can get the various stakeholders internally to agree on what such a semantic model is supposed to look like. Solutions like dbt try to make defining this strict semantic model easier, but these solutions don’t make BI any more flexible.
The impact of this is that when a source system has a field or object that stakeholders need that isn’t included in the managed ingest connector, data lake, or BI layer, it can take weeks-to-months of coordination across data engineering, analytical, and business teams to get that new data into a dashboard. Wasn’t putting all the data in one place supposed to solve this back-and-forth?
What Does AI Change?
The MDS exists to get all the data into one place, in a form that BI applications can read and people can understand. It enables point-and-click, self-serve analysis, as well as centralized data governance. So what’s next?
With the rise of Large Language Models (LLMs) and semantic search systems, suddenly we have the power to make data legible and composable across different systems without moving it around and transforming it. This opens up a massive opportunity for ad-hoc analytics. We are introducing the Liquid Data Layer to do just that.
Liquid Data Layer
The Liquid Data Layer (LDL) is the first and only AI-native data layer. Unlike semantic models or hard ontologies of the past, the LDL is a flexible data model. It combines knowledge of existing data source systems, business nuance, and context about user intent to create a virtual data model, purpose-built to answer a user’s question or to help a user discover data.
The LDL plugs into every part of the Modern Data Stack, from source systems to ETL to BI to governance to build an AI-native, flexible model of an organization’s data ecosystem.
LLMs make it easy to create flashy demo-ware. The data tooling market is flooded with naive attempts to “chat with your data” based on primitive LLM context, or worse, insecure methods like arbitrary code execution or fine-tuning models on internal schemas and data. Without a platform layer like the Liquid Data Layer to provide security, governance, and scalability on top of internal data, applications like SQL/code generation or question & answer flows will fall short of expectations.
The Liquid Data Layer respects centralized governance, whether configured in-platform or pulled in from an existing data catalog or governance solution. This allows users to leverage the entire corpus of data they have access to - and only that data - in any form that makes sense to them and their goal.
The LDL leverages AI advancements to self-document existing data, dashboards, transformations, and canonical queries. This enables users to leverage data in new and powerful ways, changing data models in seconds rather than in weeks, all while using business nuance and full knowledge of the existing ecosystem.
The Liquid Data Layer is the basis of Clarative’s Data Navigator application which provides data discovery and natural language question and answer capabilities (like Google for your Data). But the LDL is an open ecosystem that provides an API so other applications can be built on top of its flexible data model. In this way, the LDL future proofs against changing chat interfaces. The Liquid Data Layer can serve as the data backbone and security layer to applications like ChatGPT or Claude enterprise, or to whatever cutting edge models come next.
Conclusion
AI is going to transform the way analysts, data scientists, engineers, and stakeholders work with data. The Clarative team has years of experience building data applications, pipelines, and infrastructure, as well as bringing AI into enterprises and governments. Reach out to learn more about the future of AI and data.
Interested in reading more? Access the full whitepaper by submitting your email.