Blog
Building Better Analytics Pipelines

Building Better Analytics Pipelines

May 23, 2023
Building Better Analytics Pipelines
Building Better Analytics Pipelines

A recap of our live event on the benefits and techniques for orchestrating analytics pipelines.

As data analytics teams scale their efforts, they rapidly run into challenges.  They are overrun by an increasingly complex collection of systems, code, files, requirements and stakeholders.  They become over-reliant on institutional memory, and see degrading morale as more time is sent fixing things than on building anything new.

Bringing more people onto the team is one option, but it's expensive and does not address the root of the problem: Dealing with growing organizational complexity.  In modern data engineering, it is not 'Big Data' that presents the big problems, it's 'Big Complexity' that is found inside most data-driven organizations.

In and of itself, tooling is not the answer.  But a lack of framework and tooling for managing the critical data assets the team is in charge of producing will make it very hard to navigate organizational complexity, respond to many requests from stakeholders, enable self-service on data assets, and collaborate with other data practitioners across the enterprise.

From this perspective, an orchestrator is the essential platform for scaling your data analytics efforts. It provides a proper development process, observability and testability for your data pipelines, and performance optimization tools such as scheduling and partitioning.

In this article we dive into the benefits of bringing orchestration to your data analytics efforts, and provide a demonstration of the value orchestration will bring.

Managing complexity in data analytics

Brian Kernighan

According to Brian Kernighan, Professor of Computer Science at Princeton University  "Managing complexity is the essence of computer programming."

Managing complexity is important in traditional software engineering; it is equally important in the realm of data management and data engineering.  In many ways, it is even more of a challenge when managing large platforms built around heterogeneous technologies and systems, including on-prem, cloud, SaaS, and/or open-source solutions.

Data platforms can span across multiple compute environments, be coded in different languages, and be operating on many isolated scheduling tools.  Upstream data sources are unpredictable and beyond the control of the downstream consumers.

A typical data analytics platform is a complicated montage of heterogeneous systems.
   A typical data analytics platform is a complicated montage of heterogeneous systems.  

Understanding how these systems interact, who owns them, what their current status is can be challenging. When something breaks, debugging, fixing and recovering from the error can be time-consuming.  Errors are often discovered in production by business owners downstream.

the data assets these fragile, untrusted platforms produce are critical for the efficient performance of the organization.

Data engineers are only too familiar with the stream of frustrated emails or Slack messages asking when the issues will be addressed or the dashboards updated.  They also know how much time is wasted doing data wrangling across many services.  Analytics can be chaos, and engineers can get burnt out, spelunking through thousands of lines of poorly documented and duplicated code, trying to figure out how it all holds together.

And yet, the data assets these fragile, untrusted platforms produce are critical for the efficient performance of the organization.

The role of orchestration in mastering complexity

Data analytics teams are discovering the value an orchestration tool can provide in their work.  Simply put:

  • Orchestration brings order to the codebase: it provides a "factory floor view" of the assets your code produces and manages the metadata for you.
  • Build locally, then ship with confidence: no more pushing to prod, burdensome delays or jumping through hoops.
  • Code becomes testable: You can test your code rapidly and in a totally safe environment.
  • Unified control plane, which provides observability across your platform (including sources and definitions). It allows you to monitor status and troubleshoot issues rapidly.

Most orchestration platforms would claim to deliver some of the above benefits. But Dagster brings a unique declarative asset-centric approach to orchestration.  This greatly simplifies the mental model in data analytics as the outputs of each defined asset can be directly stored and refreshed as a table. You can observe the state of each asset, its metadata, and trace the lineage and dependencies all in one interface.

Dagster fully supports dbt models with a native integration, making it easy to import all of your models, then manage and execute them in the context of the larger platform.

Moving your analytics pipelines to Dagster

If you currently run an analytics data pipeline in Python, with dbt models or other common analytics tools, porting it over to Dagster is typically straightforward.   It is often simply a case of setting up the configuration of the resources that the data pipeline uses and moving the Python functions over, adding the Dagster @asset decorator.  Yuhan Luo shows us how in this section of the event.

In most cases, you can tap into Dagster's free 30-day trial and run your initial pipeline on Dagster+.

Our data analytics showcase

In early May 2023, we invited Pedram Navid, founder of West Marin Data, to hand us (the Elementl team) a typical analytics pipeline he would build for his clients, using Dagster as the orchestrator.

From this example, we ran a Dagster showcase, highlighting the features and techniques demonstrated by Pedram.  Here, you can watch the recording, access the code for this example in our public repo, and read a summary of the key takeaways from our event.

>    Watch the entire Analytics event as broadcast on Youtube on May 10th, 2023.  

Pedram points to the key high-level benefits of orchestration in his work:

  • Build locally and ship confidently to speed up his development cycles with cheap local resources, rapid testing, and iteration.
  • Separate I/O and compute to free himself up from having to plan where the pipeline will ultimately be executed so it can be 'plugged in' to production systems once ready.
  • Think in Assets which is a much more natural mental model in data analytics than focusing on ops.
  • Use selective "materializations" - i.e., create and refresh parts of the data in a very surgical fashion, such as only updating data for the last week, thus speeding up his work and reducing costs.

In his example, Pedram taps into many of Dagster's features, including schedules, integrations, metadata analysis, partitions, and backfills.  Thanks to these, he can build with confidence, deploy with ease and put his data pipelines on autopilot.

For local development, his example uses CSV files, Steampipe, the GitHub API, DuckDB.In production, he easily switches to Google Big Query and taps into HEX for visualisation.

Explore the video and the companion repo, and join us on Slack to find out how to boost your analytics pipeline with Dagster.

Interested in trying Dagster Cloud for Free?

        Enterprise orchestration that puts developer experience first. Serverless or hybrid deployments, native branching, and out-of-the-box CI/CD.        

             Try Dagster Cloud Free for 30 days            

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

Monorepos, the hub-and-spoke model, and Copybara
Monorepos, the hub-and-spoke model, and Copybara
Blog

April 3, 2026

Monorepos, the hub-and-spoke model, and Copybara

How we configure Copybara for bi-directional syncing to enable a hub-and-spoke model for Git repositories

Making Dagster Easier to Contribute to in an AI-Driven World
Making Dagster Easier to Contribute to in an AI-Driven World
Blog

April 1, 2026

Making Dagster Easier to Contribute to in an AI-Driven World

AI has made contributing to open source easier but reviewing contributions is still hard. At Dagster, we’re improving the contributor experience with smarter review tooling, clearer guidelines, and a focus on contributions that are easier to evaluate, merge, and maintain.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on how to scale data teams
Guide

November 5, 2025

Download the eBook on how to scale data teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the e-book primer on how to build data platforms
Guide

February 21, 2025

Download the e-book primer on how to build data platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.