BenchSci: A Leap Forward with Dagster | Dagster Blog

February 20, 20243 minute read

BenchSci: A Leap Forward with Dagster

Learn about how BenchSci uses Dagster in their journey to expedite drug development.
TéJaun RiChard
Name
TéJaun RiChard
Handle
@tejaun

Speed is a given within the pharmaceutical industry, with the race to market not only depending on the quickness of an organization but also its ability to innovate, understand, and adapt. Bringing groundbreaking medical treatments from concept to reality presents a significant challenge: managing the tide of data from countless research avenues and clinical trials.

The BenchSci logo

BenchSci stands out in this race, employing cutting-edge AI technology to rapidly bring life-saving medicines to market. Their ASCEND technology is a science-first disease biology GenAI platform that acts as an AI assistant to preclinical research and development scientists to discover and evaluate experimental data, optimize research strategies, and mitigate potential risks to ensure the efficiency and effectiveness of scientific experimentation.

In a deep-dive conversation with the BenchSci analytics team, we learned how they're transforming the industry, with Dagster playing a pivotal role in their journey to expedite drug development.

The Analytics Team

With a diverse team of six, comprising product analysts and data analysts, BenchSci's analytics unit goes beyond mere number-crunching and reporting. They provide critical data insights into the performance of BenchSci’s core product suite. Their toolset, which includes Dagster, forms the backbone of a system designed to distill vast data sets into actionable insights.

BenchSci: Empowering scientists with the world's most advanced biomedical AI

From Major Challenges at BenchSci to Adopting Dagster

The Pre-Dagster Challenges

Before building on Dagster, BenchSci faced several key challenges that impeded its analytics data operations:

  • A Growing Set of Heterogeneous Tools: While the teams' toolset was still relatively straightforward, it was clear that, as the team built out their platform, the current orchestration approach would be a bottleneck.
  • A Lack of Observability: To properly manage the expanding number of data sources and growing sophistication of their processes, the team needed a single pane of glass that gave them precise insights into the current state of their analytics data pipelines.
  • Cost Management: The team was aware that many processes could be better optimized for spend, but this required both observability and the right framework for pinpoint control and conditional execution.

In response to these challenges, BenchSci sought a solution to bring order and efficiency to their data management processes.

Choosing Dagster for Advanced Data Orchestration

BenchSci ultimately selected Dagster because of its ability to offer:

  • Effective Data Management: Dagster promised to manage the data load while clarifying and directing BenchSci's analytics workflows.
  • Isolated Development Environments: With Dagster Cloud’s CI process, each team member can seamlessly spin up their own ephemeral deployments for sandboxing and testing, merging changes to production with confidence.
  • Seamless Integration: It could integrate seamlessly with their existing and future tools, providing a single source of truth for their data.
  • Event-Driven Automation: Dagster's core Software Defined Asset abstraction – whether a database table, report, or ML model – was crucial for managing modern data workflows. This aspect of Dagster was particularly appealing as it aligned with BenchSci's vision for efficient and responsive data handling.
  • Ease of Deployment: Compared to other solutions, deploying Dagster was much easier and resulted in a more stable and easier-to-manage setup. Dependencies in other solutions they evaluated caused major headaches and lost time for the team.

Each team member has their own dev environment within which they work. With Dagster’s setup, they can just clone tables from production into their dev datasets. This saves the team a lot of time and computing costs.

If an error does occur upstream, the team now notifies the stakeholders of the impact on the dashboards.

Strategic Decision: Dagster Over Other Options

The decision to choose Dagster over alternatives was a strategic one. Specifically, BenchSci valued Dagster's approach to data assets and its suitability for event-driven automation, logging, and retries in data-rich environments. This approach allowed BenchSci to address specific areas of redundant compute to eliminate costs.

Dagster's Impact on BenchSci

The deployment of Dagster revolutionized BenchSci's data analytics operations, streamlining complex workflows and enhancing data reliability.

"Dagster acts as a traffic controller," said the team, emphasizing its role in connecting disparate data sources and analytics tools.

Because the framework encourages the optimization of data pipelines—and provides the right observability—running on Dagster allowed the team to achieve a marked reduction in computational costs and data errors. This is achieved by only materializing assets (i.e. running compute) when there is a clear benefit of doing so. Cost reduction was also achieved by reducing test runs, full-batch replication, and processing of poor-quality data.

Insights into how the data platform was performing further enabled BenchSci to tackle the complexity of its data ecosystem head-on.

By integrating and managing data from diverse sources through a coherent orchestration platform, Dagster provided BenchSci with a comprehensive cataloged view of its data assets.

Analytics As A Product

Dagster's robust data orchestration greatly supported BenchSci's transition to viewing analytics as a product. It enabled the analytics team to derive actionable insights from platform usage and business decisions.

“We’ve been focused in the last year on ‘analytics as a product’. We can serve the teams with a hands-off approach, where stakeholders can help themselves. We break away from handling request after request. Now we can just let Dagster run, and only step in when we need to.” says the team.

With plans to integrate Dagster's capabilities further into its analytics framework, the team's vision for a data-centric, agile, and interconnected future is well on its way.

Conclusion

BenchSci's deployment of Dagster highlights an essential trend in modern business: the critical role of advanced data orchestration in meeting today's complex demands. Their success story exemplifies how Dagster enables organizations to swiftly understand and trust their data, adapt to new challenges, and monetize their data assets. Additionally, BenchSci's story demonstrates the growing importance of sophisticated data management tools in an era where the strategic utilization of data is critical to business success.


The Dagster Labs logo

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:


Read more filed under
Blog post category for User Story. User Story