Vibe Coding Survival Guide

How data engineers can get the most from AI coding

There is a lot of excitement around AI and its potential to enhance workflows. Coupling AI agents with well-crafted frameworks like Dagster can enable teams to build more scalable data platforms that can support complex use cases with much less investment in development.

At the same time, handing over too much responsibility for your data platform can feel risky. Data applications are particularly prone to subtle bugs that may not surface until they reach production. That’s why it's essential to follow best practices when building with AI to ensure you maintain high standards, write understandable code, and preserve long-term maintainability.

Start a conversation

You don’t necessarily need to generate code with your first few prompts. AI agents can also be used to explain existing code or clarify unfamiliar concepts. Starting with a more conversational approach helps establish shared context and ensures that both you and the agent are aligned.

This approach not only builds a clearer understanding of the problem space, but also results in better, more relevant code when you do start generating it. Gradually shaping the conversation by asking questions can guide the agent more effectively and avoid confusion or unnecessary complexity. It is much easier to work these finer points out at the beginning before code is involved.

Think in steps

An AI agent can build an entire website from a single prompt. The same is possible in data engineering: you can describe a pipeline and generate much of the implementation automatically. But the output from prompt engineering this way can become overwhelming. The volume of code and the number of steps involved can cause mental strain and you may find yourself on the defensive, debugging a codebase you didn’t write and don’t fully understand.

That’s why it’s better to start small and build out. At Dagster, we believe in thinking in assets, and we've found that this approach works especially well when coding alongside an AI agent. Instead of describing your entire pipeline up front, start with a single asset, whether it's a table in a database, a file in the cloud, or a machine learning model. This helps keep the scope more manageable and you’ll iterate more effectively.

Test each step

Another important step in AI-assisted development is incorporating tests. As you work with an AI agent, you’ll want to validate that the generated code behaves as expected. This is where agents can really shine. Once you’re confident in the asset you've developed, a simple prompt can generate tests that cover its core functionality.

Write a test to ensure that the campaigns asset returns the expected output

def test_campaigns_asset():
    # Create a context for the asset
    context = build_op_context()
    
    # Execute the asset
    result = campaigns(context)
    
    # Verify the result
    assert isinstance(result, dict)
    assert result["metadata"]["message"] == "ETL completed"
    assert result["run_config"]["etl_completed"] is True

‍Testing also serves another crucial purpose. It helps you better understand the generated code. Before moving on to your next asset, make sure you have relevant tests in place and that you understand what those tests are doing. If a test seems unclear, redundant, or targets edge cases you don’t care about, remove it. It's more valuable to maintain a tight, purposeful test suite than one filled with noise.

To deepen your understanding, try modifying each test to make it fail, then revert the change to ensure it passes again. This kind of hands-on validation confirms that the test is meaningful and that you grasp both the behavior it enforces and the code it supports. You don’t need to fully embrace Test-Driven Development, but having solid tests in place will be invaluable as your asset graph grows.

Stay opinionated about tools

There are many ways to interact with data. Each data platform is unique and comprised of it’s own collection of tools. That’s why it’s important to stay opinionated in your prompts and include details about the tools and methods you want to use.

Let’s start with a vague prompt:

Write a Dagster asset that loads data from Postgres into S3.

An AI agent may return working code to accomplish this. However, the code may be more complex than necessary. Because the tooling was unspecified, the agent might default to building custom resources using psycopg2 for Postgres and boto3 for S3. While impressive, anyone with data engineering experience knows that replicating data at scale involves far more nuance than simply connecting a source and a destination.

Now let’s refine the prompt with more context:

Write a Dagster asset that loads data from Postgres into S3 using the Dagster Sling integration.

This time, the AI understands it can use more specific tooling. It walks through installing the dagster-sling package and sets up the Postgres and S3 connections using Sling’s built-in abstractions.

from dagster_sling import sling_asset

# Example: Move the "my_table" table from Postgres to S3 as a CSV
postgres_to_s3 = sling_asset(
   name="postgres_to_s3",
   source={
       "type": "postgres",
       "config": {
           "host": "your-postgres-host",
           "port": 5432,
           "database": "your_db",
           "username": "your_user",
           "password": "your_password",
           "table": "my_table",  # or use "query": "SELECT * FROM my_table"
       },
   },
   destination={
       "type": "s3",
       "config": {
           "bucket": "your-s3-bucket",
           "key": "my_table.csv",  # S3 object key
           "aws_access_key_id": "your-access-key",
           "aws_secret_access_key": "your-secret-key",
           "region": "us-east-1",
           "format": "csv",  # or "parquet", etc.
       },
   },
)

‍The resulting code achieves the same goal, but leverages Sling to handle more complex aspects of change data capture and replication without reinventing the wheel. When coding with an AI agent, it’s important to remember: you don’t have to build everything from scratch. Providing clarity on tools and intent can drastically simplify the outcome and make the code more aligned with real-world production needs.

Be mindful of types

Even though Python is a dynamically and weakly typed language, you may notice that generated code often includes type annotations. This is helpful for readability and tooling support. However, it's important to remember that in Python, type annotations are purely cosmetic. They do not enforce any runtime constraints.

That’s not the case in Dagster. If an asset is configured to return a DataFrame, it will only succeed if a DataFrame is actually returned. Enforcing data contracts like this is especially valuable when coding with AI, where you may not fully control or audit every line of code. It’s also critical to provide the right context in your prompt, otherwise, the AI might default to an inappropriate return type. For example, you likely don’t want an asset returning an in-memory DataFrame if you're querying a Snowflake table with billions of rows.

Another best practice is to record the metadata you care about, not just the final output of the asset. Dagster provides powerful flexibility for logging metadata. Including this in your prompt, such as asking the AI to log row counts, schema versions, or timestamps, helps you generate more production-ready, observable pipelines from the start.

You are the engineer

Do not get lulled into the rhythm of accepting everything the agent provides. Always keep in mind that you are responsible for the code. So if the agent suggests installing a Python package you are not fond of using or writes a function that is difficult to test, do not feel that you need to accept it.

Coding with AI can significantly boost productivity, but that efficiency should never come at the cost of long-term maintainability or code quality. Stay intentional, question outputs, and make sure every decision aligns with your standards and team conventions.

Wrapping Up

We’re all still figuring out the best practices for writing code alongside AI. What’s clear, though, is that this workflow will only become more common. So far, it’s been encouraging to see how well Dagster’s approach to data engineering aligns with AI-assisted code generation, helping make complex workflows more accessible and manageable for our users.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.