All articles

Building a Data Mesh with Starburst

Building a Data Mesh with Starburst
Explore how data mesh architecture with Starburst enables decentralized data access without requiring a traditional data lake or data warehouse.
Tagged in:
Steve Novak
Steve
Novak
Vice President
View bio

Data mesh is a new, decentralized approach to data that allows end-users to easily access data where it lives without a data lake or data warehouse. Domain-specific teams manage and serve data as a product to be consumed by others. Its objective is to allow for data products to be created from virtually any data source while minimizing intervention from data engineers.

Data mesh has four principles to achieve this objective. The principles are:

  1. Domain-Oriented Ownership
  2. Data as a Product
  3. Self-Service Data Infrastructure
  4. Federated Computational Governance

Starburst can be used to achieve a data mesh. The following sections outline how Starburst can be used in alignment with each of the four principles of data mesh.

Domain-Oriented Ownership

In a data mesh, data teams are organized by domain, which is another word for the subject area. Teams publish data products that other teams can access and use to derive their own new data products. Starburst’s goal is to allow teams to focus less on building infrastructure and data pipelines around serving data products and more on using familiar tools such as SQL to prepare data products for end-users.

To achieve this, Starburst provides a large set of connectors that allows each domain to connect to data wherever and in whatever format it may live using a SQL query interface.

Figure 1 shows various example data sources that can be accessed from Starburst.

Figure 1: Example data sources in Starburst

Data as a Product

After connecting to a data source in Starburst, Starburst allows you to curate data products from it for other users to access.

Users can browse the published data products as shown in Figure 2.

Figure 2: Example data products created from data sources in Starburst

Self-Service Data Infrastructure

Starburst’s SQL query interface allows users to discover, understand, and evaluate the trustworthiness of data products. Figure 3 shows an example of using the SQL query interface to query an Amazon S3-based data product.

Figure 3: Querying an S3 file in Starburst

Using the SQL query interface, you can also join data products from different technologies together. For example, Figure 4 shows an example of joining together a PostgreSQL-based data product with an Amazon S3-based data product on a common field. The result of this join can be considered a new, derived data product that can also be registered in Starburst.

Figure 4: Deriving a new data product from different technologies using Starburst

Federated Computational Governance

Data mesh proposes a federated model for data governance that focuses on shared responsibility between the domains and the central IT organization in order to adhere to governance, risk, and compliance concerns while allowing adequate autonomy for the domains.

Starburst provides connectors and access to various data governance and data catalog tools such as Collibra and Alation to help users discover, understand, and evaluate the trustworthiness of data products.

Starburst also significantly reduces the need to create copies of data between systems as Starburst’s query engine can read across data sources and can replace or reduce a traditional ETL/ELT pipeline. Copying data also requires reapplying entitlements, which can result in potential opportunities for a data breach; with Starburst that risk is minimized simply because fewer copies of the data will exist since data is mostly queried at the source. This concept, known as data minimization, means data privacy, security, and governance are more achievable goals in organizations that embrace Starburst together with data mesh.

Sources

Https://Www.Starburst.Io/Resources/Starburst-Data-Products/

Https://Blog.Starburst.Io/Data-Mesh-And-Starburst-Domain-Oriented-Ownership-Architecture

Https://Blog.Starburst.Io/Data-Mesh-And-Starburst-Data-As-A-Product

Https://Blog.Starburst.Io/Data-Mesh-Starburst-Self-Service-Data-Infrastructure

Https://Blog.Starburst.Io/Data-Mesh-Federated-Computational-Governance

Other articles

Why Most Health System M&A Efforts Fail to Deliver a Unified View of the Enterprise

Why Most Health System M&A Efforts Fail to Deliver a Unified View of the Enterprise

Data Governance
Healthcare
Data Value Realization
Health system M&A struggles when data context is fragmented. Patient, provider, location, and financial definitions must be aligned to realize deal value.
The Cost of Fragmented Data in Health System M&A Is Higher Than You Think

The Cost of Fragmented Data in Health System M&A Is Higher Than You Think

Data Governance
Healthcare
Data Value Realization
Fragmented data is one of the largest hidden costs of health system M&A. Reconciliation, stalled decisions, and AI liability all compound after close.
The Agentic Trough Is Coming: Build Now or Get Stuck in It

The Agentic Trough Is Coming: Build Now or Get Stuck in It

Best Practices
Data Governance
Data Value Realization
Most organizations will spend on agentic AI and get little back. Foundation work in data, processes, and people separates winners from those who stall.
Client testimonial
The Definian team was great to work with. Professional, accommodating, organized, knowledgeable ... We could not have been as successful without you.
Senior Manager | Top Four Global Consulting Firm

Partners & Certifications

Ready to unleash the value in your data?