Customers
Case Study: KIPP - Building a Resilient Data Platform with Dagster

Case Study: KIPP - Building a Resilient Data Platform with Dagster

October 23, 2024
Case Study: KIPP - Building a Resilient Data Platform with Dagster

How KIPP’s solo data engineer radically improved KIPP’s ability to leverage data across the organization.

A Platform Approach for Faster Development

The KIPP NJ & Miami schools are part of the nationally recognized “Knowledge Is Power Program” network of free, open-enrollment, college-preparatory public schools. The network, known as KIPP Team & Family, comprises of 19 schools serving over 10,000 students across Newark, Camden, and Miami, FL.

The network faced a challenge familiar to many growing organizations: managing a complex and fragmented data stack. Data is vital at KIPP, informing everything from student progress and attendance to operational finance and HR. However, with different systems spread across multiple regions, scaling data management became a considerable task.

As the sole data engineer for KIPP's Newark, Camden, and Miami regions, Charlie Bini had to find a solution that could support growth, streamline data operations, and provide much-needed visibility into the data stack.

Charlie Bini, Data Engineer at KIPP
Charlie Bini - Data Engineer at KIPP

This is the story of how KIPP Team & Family Schools moved from a disparate collection of tools to a unified data platform powered by Dagster.              

In doing so, Charlie changed how the entire data team works, providing better observability and alerting, a faster development cycle, and easier self-service for team members. Processes that took ”two weeks to a month to develop” can now be done “in a couple of days," says Charlie.      

People love the idea of what we're doing, and the vision of a cohesive data platform is something that resonates with everyone.

The Problem: A Patchwork of Tools

Back in 2017, KIPP's data infrastructure was a tangle of multiple technologies—Python scripts, cron jobs, stored procedures, Fivetran, and even browser automation with Selenium. These tools were stitched together to extract data from various systems, including SFTP folders, APIs, and databases. Fivetran acted as the hub, ingesting data and moving it into a SQL Server for reporting. While it worked, the setup was brittle. If one step took longer than expected, the whole process would fail, and the data quality issues would be a constant challenge.

KIPP's old data architecture before building a data platform on Dagster
     KIPP's old data architecture before adopting a true platform approach.    

"Everything was very synchronous," said Charlie. "If our pull from a student information system took too long, the next process would miss its tick, and we'd end up waiting until the next scheduled run." This lack of resilience not only affected data reliability but also consumed Charlie's time in debugging and firefighting.

It was clear to Charlie and the rest of the team that in order to scale and free up time to take on new challenges, they would first need to overhaul the foundation of their data processes, moving from a patchwork of systems that had evolved organically, to a fully featured data platform where key capabilities such as orchestration, metadata management, end-to-end observability, and data quality could be centralized and unified.

Selecting the Right Orchestration Solution

KIPP experimented briefly with Airflow to improve orchestration, but it didn't provide the experience they were looking for. The complexity and the cost of running Airflow in Google Cloud Composer also made it unsustainable. Charlie knew that to scale, KIPP needed a new approach—one that would bring together all these disparate data processes in a cohesive and manageable way.

In 2021, Charlie began experimenting with Dagster, an emerging orchestrator that promised more visibility and control over data pipelines. With Dagster, KIPP could centralize all its data processes, bringing data extraction, transformation, and orchestration into a unified platform. One of the key benefits was the real-time visibility Dagster provided into pipeline health.

"The visibility into the pipelines is a game changer," Charlie noted. "We know as soon as something fails and why, and it's pretty easy to remedy it." This was a major shift from the old setup, where issues often remained undetected until downstream effects were noticed, such as outdated reports or missing data in dashboards.

Building the Data Platform

The new data stack brought all of KIPP's data operations under a single, cohesive framework:

KIPP's updated data architecture after migrating to a data platform built on Dagster
     KIPP's new data architecture takes a platform approach, orchestrated by Dagster.    
  • Dagster for Orchestration and Cataloging: The core of the new data platform was Dagster. Dagster orchestrated everything—from data extraction from third-party systems to transformations and data loading into the warehouse. The platform also provided clear lineage graphs, which allowed the team to easily track the flow of data from source to report.
  • BigQuery as the Data Warehouse: KIPP centralized its data storage in Google BigQuery, providing a scalable and performant environment for storing and querying data.
  • dbt for Transformations: dbt was introduced to handle transformations within BigQuery, making the entire ELT process transparent and manageable with version control.
  • Containerized Ingestion with Kubernetes: Data ingestion scripts were containerized and run on Kubernetes, making the system more scalable and resilient.

In this new setup, Dagster also manages data quality checks, leveraging asset checks to ensure incoming data conforms to expected schemas. This setup drastically reduced the number of data quality issues, and tickets related to data problems dropped by 30% year over year.

The Impact: Faster Development, Fewer Issues

The shift to Dagster wasn't just about technology—it changed how the entire data team worked. Data pipeline development, which previously took weeks, was reduced to just days. "It used to take two weeks to a month to develop a new integration. Now, with Dagster, we can get a proof of concept done in a couple of days," Charlie explained.

The time savings weren't just in development but also in maintaining data quality. Thanks to Dagster's real-time alerts and lineage tracking, Charlie and the team could respond to issues immediately. Slack alerts provided instant notifications when a job failed, allowing quick triage and resolution, often before downstream users even noticed a problem.

Empowering the Team

Another significant benefit of adopting Dagster was the growth in the team's capabilities. With everything centralized in a Git repository and using pull requests for changes, the analysts at KIPP were able to participate more directly in the data engineering process. This shift brought software engineering best practices—like code reviews and DevOps concepts—to the data team, raising everyone's technical skills.

"Our team has grown skill-wise," Charlie said. "Analysts are now more proficient in SQL, Git, and understanding how the entire data stack works together."

The introduction of Dagster's data catalog feature also provided the team with a single source of truth for data assets. Analysts could trace the lineage of data, understand dependencies, and find documentation, all within Dagster. This made onboarding new team members easier and increased overall productivity.

Conclusion: A Unified, Scalable Data Platform

By adopting Dagster, KIPP transformed its data platform from a fragmented set of ad-hoc tools into a unified, resilient, and scalable solution. The result was a data stack that not only met the organization's growing demands but also empowered the data team to work more efficiently and collaboratively.

Charlie hopes that other KIPP regions will eventually adopt a similar setup, allowing them to standardize data operations across the entire network and reduce redundancy. "People love the idea of what we're doing," he said, "and the vision of a cohesive data platform is something that resonates with everyone."

For other data teams facing similar challenges, KIPP's story serves as an inspiring example of how the right orchestrator can simplify operations, improve data quality, and ultimately help scale data capabilities to support meaningful organizational growth.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

Unlocking the Full Value of Your Databricks
Unlocking the Full Value of Your Databricks
Blog

March 12, 2026

Unlocking the Full Value of Your Databricks

Standardizing on Databricks is a smart strategic move, but consolidation alone does not create a working operating model across teams, tools, and downstream systems. By pairing Databricks and Unity Catalog with Dagster, enterprises can add the coordination layer needed for dependency visibility, end-to-end lineage, and faster, more confident delivery at scale.

Announcing AI Driven Data Engineering
Announcing AI Driven Data Engineering
Blog

March 5, 2026

Announcing AI Driven Data Engineering

AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.