Dagster Resources Best Practices: Clean Code for Data Pipelines

How dependency injection and smart resource management can save your sanity (and your deployments)

We talk a lot about how Dagster brings software engineering best practices to data engineering. Resources are an abstraction to help you not repeat yourself, handle dependency injection, testing, and building modular, scalable data platforms.

What Are Resources

Resources are a standard way of defining external services, tools, and storage locations in Dagster. Instead of hard-coding every database connection, API client, or storage location directly into your assets, you define them once as resources and inject them where needed.

Here's a simple example:

import dagster as dg

class DatabaseResource(dg.ConfigurableResource):
    connection_string: str
    
    def get_connection(self):
        # In real life, this would return a proper connection
        return f"Connected to {self.connection_string}"

@dg.asset
def user_data(database: DatabaseResource):
    conn = database.get_connection()
    # Your actual logic here
    return "user data"

If you can access it in Python, you can make a resource out of it. Databases, APIs, file systems, and legacy systems are all fair game.

Environment Management

Managing different environments is one of the challenging aspects of data engineering. This is mostly a function of cloud systems. Whether it is a data warehouse or distributed processing system, it can be difficult to replicate the same experience locally, in staging, and in production. To get around this with data warehouses, many data engineers use different databases and schemas for different environments, so you can have the same experience regardless of what environment you are in.

Here's how we handle this at Dagster:

import dagster as dg
import os

class SnowflakeResource(dg.ConfigurableResource):
    account: str
    user: str
    password: str
    database: str
    schema: str
    warehouse: str
    
    @property
    def connection_params(self):
        return {
            'account': self.account,
            'user': self.user,
            'password': self.password,
            'database': self.database,
            'schema': self._get_schema(),
            'warehouse': self.warehouse
        }
    
    def _get_schema(self):
        # Different schema based on environment
        env = os.getenv('DAGSTER_ENVIRONMENT', 'local')
        if env == 'prod':
            return self.schema
        elif env == 'staging':
            return f"{self.schema}_staging"
        else:
            return f"{self.schema}_{env}"

Now your assets don't need to know which environment they're running in—they just use the resource, and the resource figures out the rest:

@dg.asset
def daily_metrics(snowflake: SnowflakeResource):
    # This automatically goes to the right place
    query = "SELECT * FROM metrics WHERE date = CURRENT_DATE"
    # Execute against whatever environment we're in
    return execute_query(snowflake.connection_params, query)

Dependency Injection: Clean Architecture for Data Pipelines

Resources become really powerful when it comes to dependency injection. Instead of knowing about database connections, API keys, and file paths, your assets declare what they need. This separation keeps your business logic clean and focused on data transformation while handling infrastructure concerns separately.

Without resources, you might have something like this scattered across your codebase:

# Don't do this
@dg.asset
def messy_asset():
    # Hard-coded nightmare
    conn = snowflake.connector.connect(
        account='xy12345.us-east-1',
        user='data_eng_user',
        password='definitely_not_in_version_control',
        database='ANALYTICS_DB',
        schema='STAGING_SCHEMA',
        warehouse='COMPUTE_WH'
    )
    
    api_client = requests.Session()
    api_client.headers.update({
        'Authorization': 'Bearer another_secret_token',
        'User-Agent': 'our-data-pipeline/1.0'
    })
    
    # And then your actual business logic gets lost in the noise
    data = api_client.get('https://api.example.com/data')
    # ... do something with data
    return processed_data

With resources, your asset becomes clean and focused:

@dg.asset
def clean_asset(
    database: SnowflakeResource,
    api_client: APIResource
):
    # Clear, focused business logic
    raw_data = api_client.fetch_data()
    processed_data = transform_data(raw_data)
    database.save(processed_data)
    return processed_data

API Encapsulation

One of our favorite uses for resources is wrapping REST APIs. Here's a real example from our Scout integration (which powers the ask-ai feature in our Slack channel and docs):

import dagster as dg
import requests
from typing import Dict, List

class ScoutResource(dg.ConfigurableResource):
    api_key: str
    base_url: str = "https://api.scout.example.com"
    
    def _get_headers(self) -> Dict[str, str]:
        return {
            'Authorization': f'Bearer {self.api_key}',
            'Content-Type': 'application/json'
        }
    
    def write_documents(self, documents: List[Dict]) -> bool:
        """Upload documents to Scout for indexing"""
        response = requests.post(
            f"{self.base_url}/documents",
            json={"documents": documents},
            headers=self._get_headers()
        )
        return response.status_code == 200
    
    def search_documents(self, query: str) -> List[Dict]:
        """Search indexed documents"""
        response = requests.get(
            f"{self.base_url}/search",
            params={"q": query},
            headers=self._get_headers()
        )
        return response.json().get('results', [])

Now the assets that use Scout are nice and clean:

@dg.asset
def index_documentation(scout: ScoutResource):
    docs = load_documentation_from_somewhere()
    success = scout.write_documents(docs)
    return {"documents_indexed": len(docs), "success": success}

@dg.asset
def search_results(scout: ScoutResource):
    results = scout.search_documents("dagster best practices")
    return process_search_results(results)

Testing Without the Pain

Testing is one of those software engineering best practices that are often not practiced by less technical data practitioners. Introducing some simple functional tests can dramatically improve the reliability of your data platform. However, you want to be thoughtful in how you think about tests; nobody wants to spin up a Snowflake instance just to test that their transformation logic works. With resources, you can create lightweight mocks that return predictable test data:

class MockDatabaseResource(ConfigurableResource):
    def get_users(self):
        return [
            {"id": 1, "name": "Alice", "email": "alice@example.com"},
            {"id": 2, "name": "Bob", "email": "bob@example.com"}
        ]
    
    def save_processed_data(self, data):
        # In tests, just verify the data structure
        assert isinstance(data, list)
        assert all('processed_at' in item for item in data)
        return True

def test_user_processing():
    # Use the mock instead of real database
    result = user_processing_asset(MockDatabaseResource())
    assert len(result) == 2
    assert result[0]['processed_at'] is not None

Your tests run in milliseconds instead of minutes, and they're not dependent on external services. You're testing what actually matters: your business logic and data transformations.

We recently released a course around testing in Dagster on Dagster University and it is a great resource if you are new to testing in Data engineering or want to learn the best practices for Dagster.

Configuration That Makes Sense

Resources surface their configuration in Dagster's UI under the deployment tab, making it easy to see which assets are using which resources and how they're configured. This visibility is crucial when you're debugging issues or onboarding new team members.

# In your definitions file
import dagster as dg

defs = dg.Definitions(
    assets=[user_data, daily_metrics, search_results],
    resources={
        "database": SnowflakeResource(
            account="xy12345.us-east-1",
            user="data_eng_user",
            password={"env": "SNOWFLAKE_PASSWORD"},
            database="ANALYTICS_DB",
            schema="PRODUCTION",
            warehouse="COMPUTE_WH"
        ),
        "scout": ScoutResource(
            api_key={"env": "SCOUT_API_KEY"}
        )
    }
)

When to Use Resources

Use resources whenever you need to interact with an API, database, or have a common pattern that you use throughout your project. If you're leveraging a Dagster integration like Fivetran or Snowflake, there's usually a resource you'll need to configure with your API keys and connection details.

The general rule: if you find yourself repeating the same setup code across multiple assets, that's a resource waiting to happen.

The Bigger Picture

Resources are one of those practices that make you a better engineer while solving practical problems. They enforce clean separation of concerns, make your code more testable, and save you from the environment-specific issues that come with hard-coded configurations.

More importantly, they scale with your team. When you have multiple people working on the same project, resources provide a standard way to interact with external services. New team members don't need to figure out how to connect to the database—they just use the resource.

Getting Started

Start small. Pick one external service you're currently hard-coding and turn it into a resource. You'll immediately see benefits in terms of code clarity and testability. Then expand from there.

Remember: if you can access it in Python, you can make a resource out of it. And trust us, your future self (and your teammates) will thank you for taking the time to do it right.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

A Practical Guide to Dagster Resources

How dependency injection and smart resource management can save your sanity (and your deployments)

What Are Resources

Environment Management

Dependency Injection: Clean Architecture for Data Pipelines

API Encapsulation

Testing Without the Pain

Configuration That Makes Sense

When to Use Resources

The Bigger Picture

Getting Started

Dagster Newsletter

Latest writings

When (and When Not) to Optimize Data Pipelines

Your Data Team Shouldn't Be a Help Desk: Use Compass with Your Data

Introducing Our New eBook: Scaling Data Teams