This principle is about empowering business domains to be truly responsible for their data. It’s not just about who physically stores the data, but who is accountable for its entire lifecycle and quality.
- From Centralized ETL to Domain-Managed Pipelines: Instead of a central team building complex ETL (Extract, Transform, Load) pipelines to ingest data from various sources, each domain team is accurate cleaned numbers list from frist database responsible for exposing its operational data as a consumable data product. This means the Sales domain, for instance, owns the process of transforming its CRM data into a “Customer Sales History” data product.
- Embedded Data Expertise: Domain teams will need to develop or acquire data expertise. This doesn’t necessarily mean every developer becomes a data engineer, but rather that data literacy increases within the domain, and data specialists are embedded within these teams.
- Contextual Understanding: Only the domain team truly understands the nuances of its data – what a specific field means, how it relates to other data points within the domain, and its business implications. This deep contextual understanding is crucial for creating valuable and accurate data products.
2. Data as a Product: Beyond Raw Data to Curated Assets
Thinking of data as a product is perhaps the most transformative aspect of Data Mesh. It’s about shifting from an “ingest and dump” mentality to one of thoughtful design and continuous improvement.
- Data Product API: A data how to take the headache out of dataset product should expose its data through well-defined interfaces, much like microservices expose APIs. This could be a streaming API (e.g., Kafka topics), a batch API (e.g., S3 buckets with Parquet files), or a queryable API (e.g., a Presto/Trino endpoint).
- Metadata is Paramount: A good data product is self-describing. It comes with rich metadata that includes:
- Schema: The structure of the data.
- Semantics: The meaning of the data (e.g., “customer_id” means a unique identifier for a purchasing entity).
- Quality Metrics: Indicators of data freshness, completeness, and accuracy.
- SLOs/SLA: Service Level Objectives/Agreements for data availability and performance.
- Ownership and Contact Information: Who owns the product and who to contact for support.
- Usage Instructions: How to consume the data product.
- Lifecycle Management: Just like aero leads software products, data products have a lifecycle – creation, evolution, deprecation. Domain teams are responsible for managing these cycles, communicating changes, and supporting consumers.