Built to serve communities in need, Connect 211 provides a platform that standardizes and publishes health and human services data from 211 call centers across the United States. Connect 211 lets service organizations rapidly launch their own search engine, without needing in-house developers, to make their resources searchable, shareable, and profitable. The company needed a data orchestration platform that helped Connect 211’s users efficiently and affordably connect people with the information they need to live better lives in their communities, from finding food or shelter to accessing health care or education.
Key Results
- Issue response time down from days -> hours
- “Game-changing” efficiency: Low data throughput, high hands-on system -> high throughput, low hands-on
- Reduced pipeline troubleshooting time by 80% with improved visibility and error handling
- Affordability unlocked: Lower devops costs and better insight into operations and issues made “the overall cost point much more affordable”
Connect 211’s use case demonstrates how Dagster's improved visibility and error-handling capabilities can dramatically reduce troubleshooting time for complex, multi-source data pipelines. Strong integration with dbt enables comprehensive data quality testing within a unified interface as a part of a well-architected orchestration system that can transform fragmented, legacy data sources into a standardized, high-value data platform.
Complexity, little visibility, and limited resources
When Connect 211 set out to create a user-friendly directory for health and human services resources, they faced a challenging technical problem: the project required connecting to hundreds of data silos, many using outdated technology dating back to the 1960s-1990s.
“We had to connect to all of these different data silos, many of them in various stages of awful technology,” explains Connect 211 CEO Skyler Young. “Then we have a vast orchestration process that connects to all the things, pulls the data in somewhere between hourly and daily. Cleans it, standardizes it, transforms and enriches it, and then publishes to various services.” In other words, he says, data orchestration is Connect 211’s business.
Initially, they had hard-coded pipelines that were brittle and difficult to maintain. The organization then moved to Airflow, which was an upgrade but still required more DevOps support than a bootstrapped startup could manage. “Airflow is just a generally heavy lift for a small team,” Skyler says, describing the Connect 211 team’s struggles:
- Limited visibility: Error logging and visibility were handled separately for every single component because Airflow could not provide enough insight into their data pipelines.
- Diagnostic difficulties: This limited visibility made troubleshooting problems a nightmare, Skyler says. “Because we're orchestrating a bunch of different things in what’s not necessarily a tightly coupled or integrated environment, we had to handle error logging and visibility separately for every component that's being orchestrated.”
- Complexity at scale: As the platform expanded to connect with more data sources, they encountered a series of scaling issues and problems that were taking a long time to diagnose".
- Integration challenges: dbt was their core transformation tool, and they needed better integration between their orchestration platform and transformation processes.
We had a series of scaling issues, and problems that were taking a long time to diagnose. I only had junior level engineers to help me and they were just like, Nope, can't handle it. Gotta do something a little easier. - Skyler Young
Why Connect 211 chose Dagster
Skyler says he looked at Prefect and also briefly considered Mage, which he found to be too new. “Dagster won out pretty easily for being enterprise grade, scalable, and big-feature rich — but also a hell of a lot easier to run and use than Airflow,” he says.
dbt integration: “The thing that made Dagster a total win for us is the really tight integration with dbt, which is the nucleus of our data orchestration system.”
Error visibility: "Being able to get the dbt errors in real time, seamlessly integrated with any errors that might be coming from the platform itself, has made a huge difference."
Industry recommendations: Multiple data analytics engineering teams they consulted with recommended Dagster, saying "yeah, Dagster's really good."
Dagster made data transformation really accessible to us when we were pretty early and inexperienced and not heavily resourced — and now we're the largest data publisher for Health and Human Services data in 20 states now, which is crazy.
Building a data federation platform with Dagster at its core
Skyler and his team constructed a robust data orchestration system that can connect health and human services data sources, standardize “some incredibly variable and fuzzy data,” and publish it to a user-friendly directory. Dagster serves as the backbone of this comprehensive data federation platform:
- Data ingestion pipeline: They implemented a "vast orchestration process that connects to many things, pulls the data in somewhere between hourly and daily," from various data silos, many using wildly outdated technology.
- Transformation workflow: After ingestion, data goes through cleaning, standardization, transformation, and enrichment processes before being published to various services.
- Publishing structure: Their project structure reflects a pub-sub architecture with "writers that push data into the system and process it and prepare it for publishing" and "readers" that "aggregate multiple writers together and then publish them to a specific destination."
- Data quality testing: "We use dbt as our first line of defense for quality checks," explains Young. "I wrote a dbt Schema YAML that runs tests for all the core relationships and constraints on data types and values. In a full dataset, that's something north of 800 tests that get run after we've done our initial transformation."
- Elasticsearch integration: They integrated Elasticsearch to search thousands of resources based on keywords, taxonomies, and geospatial data. This allows them to search thousands or tens of thousands of resources and automatically filter on geographic service areas with millisecond response times.
- Geospatial processing: Connect 211 built a specialized service that "parses the usually non-structured or semi-structured service area definitions" from source data and turns them into GeoJSON format, enabling jurisdiction-based searches.
"We have a vast orchestration process that connects to many things, pulls the data in somewhere between hourly and daily,” Skyler says. “Cleans it, standardizes it, transforms and enriches it, and then publishes to various services. It's becoming quite a large kind of data federation platform.”
Data orchestration is our primary business, so Dagster has been a total game changer for us.
Reliability, visibility, and operational excellence
In just about a year since its implementation, Connect 211's Dagster-based data platform has transformed how health and human services information is collected, processed, and shared. This foundation enables better access to critical community resources and opens new possibilities for the future.
- Pipeline reliability: Connect 211’s data pipeline reliability has greatly improved since implementing Dagster, Skyler says. “We’re maintaining higher levels of data throughput with less developer time than we used to. We’re also much faster at identifying and fixing errors.”
- Improved observability: Their previous solution required error logging and visibility to be handled separately for every single component. Now, Dagster provides comprehensive visibility across their entire data platform.
- Dramatically reduced troubleshooting: When errors occur, the team can now quickly identify the source and take corrective action, rather than spending hours hunting through disconnected logs.
- Standardized data: Connect 211 now delivers "a very usable stockpile of standardized data that is ready for engineers to consume and use in reality," addressing the challenges of working with inconsistent source data.
- Scalable architecture: Already "the largest data publisher for Health and Human Services data in 20 states," Connect 211 now has a scalable data platform that can continue to grow with them.
- Foundation for innovation: Their standardized data platform provides opportunities for advanced applications, including AI-powered entity resolution, data cleaning, and sentiment analysis for improved search interactions.
"At the end of it, what we have is a very usable stockpile of standardized data that is ready for engineers to consume and use in reality, because we had to build it for that purpose. I'm super proud. It's just, it's remarkable," Skyler says.
One of Connect 211's most significant innovations is their service area processing: "One of our biggest successes was building a service that parses the usually non-structured or semi-structured service area definitions from source data and turns them into GeoJSON reliably and accurately," explains Young. "That's the type of enrichment you need to make this data something developers can actually use."
Our users are having a huge positive impact in their communities, and we're helping them just do their job easier and better and get supported for the work they do. I love this. It's good work.