Blog
Vibe Coding Survival Guide

Vibe Coding Survival Guide

May 15, 2025
Vibe Coding Survival Guide
Vibe Coding Survival Guide

How data engineers can get the most from AI coding

There is a lot of excitement around AI and its potential to enhance workflows. Coupling AI agents with well-crafted frameworks like Dagster can enable teams to build more scalable data platforms that can support complex use cases with much less investment in development.

At the same time, handing over too much responsibility for your data platform can feel risky. Data applications are particularly prone to subtle bugs that may not surface until they reach production. That’s why it's essential to follow best practices when building with AI to ensure you maintain high standards, write understandable code, and preserve long-term maintainability.

Start a conversation

You don’t necessarily need to generate code with your first few prompts. AI agents can also be used to explain existing code or clarify unfamiliar concepts. Starting with a more conversational approach helps establish shared context and ensures that both you and the agent are aligned.

This approach not only builds a clearer understanding of the problem space, but also results in better, more relevant code when you do start generating it. Gradually shaping the conversation by asking questions can guide the agent more effectively and avoid confusion or unnecessary complexity. It is much easier to work these finer points out at the beginning before code is involved.

Think in steps

An AI agent can build an entire website from a single prompt. The same is possible in data engineering: you can describe a pipeline and generate much of the implementation automatically. But the output from prompt engineering this way can become overwhelming. The volume of code and the number of steps involved can cause mental strain and you may find yourself on the defensive, debugging a codebase you didn’t write and don’t fully understand.

That’s why it’s better to start small and build out. At Dagster, we believe in thinking in assets, and we've found that this approach works especially well when coding alongside an AI agent. Instead of describing your entire pipeline up front, start with a single asset, whether it's a table in a database, a file in the cloud, or a machine learning model. This helps keep the scope more manageable and you’ll iterate more effectively.

Test each step

Another important step in AI-assisted development is incorporating tests. As you work with an AI agent, you’ll want to validate that the generated code behaves as expected. This is where agents can really shine. Once you’re confident in the asset you've developed, a simple prompt can generate tests that cover its core functionality.

Write a test to ensure that the campaigns asset returns the expected output
def test_campaigns_asset():
    # Create a context for the asset
    context = build_op_context()
    
    # Execute the asset
    result = campaigns(context)
    
    # Verify the result
    assert isinstance(result, dict)
    assert result["metadata"]["message"] == "ETL completed"
    assert result["run_config"]["etl_completed"] is True 

Testing also serves another crucial purpose. It helps you better understand the generated code. Before moving on to your next asset, make sure you have relevant tests in place and that you understand what those tests are doing. If a test seems unclear, redundant, or targets edge cases you don’t care about, remove it. It's more valuable to maintain a tight, purposeful test suite than one filled with noise.

To deepen your understanding, try modifying each test to make it fail, then revert the change to ensure it passes again. This kind of hands-on validation confirms that the test is meaningful and that you grasp both the behavior it enforces and the code it supports. You don’t need to fully embrace Test-Driven Development, but having solid tests in place will be invaluable as your asset graph grows.

Stay opinionated about tools

There are many ways to interact with data. Each data platform is unique and comprised of it’s own collection of tools. That’s why it’s important to stay opinionated in your prompts and include details about the tools and methods you want to use.

Let’s start with a vague prompt:

Write a Dagster asset that loads data from Postgres into S3.

An AI agent may return working code to accomplish this. However, the code may be more complex than necessary. Because the tooling was unspecified, the agent might default to building custom resources using psycopg2 for Postgres and boto3 for S3. While impressive, anyone with data engineering experience knows that replicating data at scale involves far more nuance than simply connecting a source and a destination.

Now let’s refine the prompt with more context:

Write a Dagster asset that loads data from Postgres into S3 using the Dagster Sling integration.

This time, the AI understands it can use more specific tooling. It walks through installing the dagster-sling package and sets up the Postgres and S3 connections using Sling’s built-in abstractions.

from dagster_sling import sling_asset

# Example: Move the "my_table" table from Postgres to S3 as a CSV
postgres_to_s3 = sling_asset(
   name="postgres_to_s3",
   source={
       "type": "postgres",
       "config": {
           "host": "your-postgres-host",
           "port": 5432,
           "database": "your_db",
           "username": "your_user",
           "password": "your_password",
           "table": "my_table",  # or use "query": "SELECT * FROM my_table"
       },
   },
   destination={
       "type": "s3",
       "config": {
           "bucket": "your-s3-bucket",
           "key": "my_table.csv",  # S3 object key
           "aws_access_key_id": "your-access-key",
           "aws_secret_access_key": "your-secret-key",
           "region": "us-east-1",
           "format": "csv",  # or "parquet", etc.
       },
   },
)

The resulting code achieves the same goal, but leverages Sling to handle more complex aspects of change data capture and replication without reinventing the wheel. When coding with an AI agent, it’s important to remember: you don’t have to build everything from scratch. Providing clarity on tools and intent can drastically simplify the outcome and make the code more aligned with real-world production needs.

Be mindful of types

Even though Python is a dynamically and weakly typed language, you may notice that generated code often includes type annotations. This is helpful for readability and tooling support. However, it's important to remember that in Python, type annotations are purely cosmetic. They do not enforce any runtime constraints.

That’s not the case in Dagster. If an asset is configured to return a DataFrame, it will only succeed if a DataFrame is actually returned. Enforcing data contracts like this is especially valuable when coding with AI, where you may not fully control or audit every line of code. It’s also critical to provide the right context in your prompt, otherwise, the AI might default to an inappropriate return type. For example, you likely don’t want an asset returning an in-memory DataFrame if you're querying a Snowflake table with billions of rows.

Another best practice is to record the metadata you care about, not just the final output of the asset. Dagster provides powerful flexibility for logging metadata. Including this in your prompt, such as asking the AI to log row counts, schema versions, or timestamps, helps you generate more production-ready, observable pipelines from the start.

You are the engineer

Do not get lulled into the rhythm of accepting everything the agent provides. Always keep in mind that you are responsible for the code. So if the agent suggests installing a Python package you are not fond of using or writes a function that is difficult to test, do not feel that you need to accept it.

Coding with AI can significantly boost productivity, but that efficiency should never come at the cost of long-term maintainability or code quality. Stay intentional, question outputs, and make sure every decision aligns with your standards and team conventions.

Wrapping Up

We’re all still figuring out the best practices for writing code alongside AI. What’s clear, though, is that this workflow will only become more common. So far, it’s been encouraging to see how well Dagster’s approach to data engineering aligns with AI-assisted code generation, helping make complex workflows more accessible and manageable for our users.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

Monorepos, the hub-and-spoke model, and Copybara
Monorepos, the hub-and-spoke model, and Copybara
Blog

April 3, 2026

Monorepos, the hub-and-spoke model, and Copybara

How we configure Copybara for bi-directional syncing to enable a hub-and-spoke model for Git repositories

Making Dagster Easier to Contribute to in an AI-Driven World
Making Dagster Easier to Contribute to in an AI-Driven World
Blog

April 1, 2026

Making Dagster Easier to Contribute to in an AI-Driven World

AI has made contributing to open source easier but reviewing contributions is still hard. At Dagster, we’re improving the contributor experience with smarter review tooling, clearer guidelines, and a focus on contributions that are easier to evaluate, merge, and maintain.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on how to scale data teams
Guide

November 5, 2025

Download the eBook on how to scale data teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the e-book primer on how to build data platforms
Guide

February 21, 2025

Download the e-book primer on how to build data platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.