How dependency injection and smart resource management can save your sanity (and your deployments)
We talk a lot about how Dagster brings software engineering best practices to data engineering. Resources are an abstraction to help you not repeat yourself, handle dependency injection, testing, and building modular, scalable data platforms.
What Are Resources
Resources are a standard way of defining external services, tools, and storage locations in Dagster. Instead of hard-coding every database connection, API client, or storage location directly into your assets, you define them once as resources and inject them where needed.
Here's a simple example:
import dagster as dg
class DatabaseResource(dg.ConfigurableResource):
connection_string: str
def get_connection(self):
# In real life, this would return a proper connection
return f"Connected to {self.connection_string}"
@dg.asset
def user_data(database: DatabaseResource):
conn = database.get_connection()
# Your actual logic here
return "user data"
If you can access it in Python, you can make a resource out of it. Databases, APIs, file systems, and legacy systems are all fair game.
Environment Management
Managing different environments is one of the challenging aspects of data engineering. This is mostly a function of cloud systems. Whether it is a data warehouse or distributed processing system, it can be difficult to replicate the same experience locally, in staging, and in production. To get around this with data warehouses, many data engineers use different databases and schemas for different environments, so you can have the same experience regardless of what environment you are in.
Here's how we handle this at Dagster:
import dagster as dg
import os
class SnowflakeResource(dg.ConfigurableResource):
account: str
user: str
password: str
database: str
schema: str
warehouse: str
@property
def connection_params(self):
return {
'account': self.account,
'user': self.user,
'password': self.password,
'database': self.database,
'schema': self._get_schema(),
'warehouse': self.warehouse
}
def _get_schema(self):
# Different schema based on environment
env = os.getenv('DAGSTER_ENVIRONMENT', 'local')
if env == 'prod':
return self.schema
elif env == 'staging':
return f"{self.schema}_staging"
else:
return f"{self.schema}_{env}"
Now your assets don't need to know which environment they're running in—they just use the resource, and the resource figures out the rest:
@dg.asset
def daily_metrics(snowflake: SnowflakeResource):
# This automatically goes to the right place
query = "SELECT * FROM metrics WHERE date = CURRENT_DATE"
# Execute against whatever environment we're in
return execute_query(snowflake.connection_params, query)
Dependency Injection: Clean Architecture for Data Pipelines
Resources become really powerful when it comes to dependency injection. Instead of knowing about database connections, API keys, and file paths, your assets declare what they need. This separation keeps your business logic clean and focused on data transformation while handling infrastructure concerns separately.
Without resources, you might have something like this scattered across your codebase:
# Don't do this
@dg.asset
def messy_asset():
# Hard-coded nightmare
conn = snowflake.connector.connect(
account='xy12345.us-east-1',
user='data_eng_user',
password='definitely_not_in_version_control',
database='ANALYTICS_DB',
schema='STAGING_SCHEMA',
warehouse='COMPUTE_WH'
)
api_client = requests.Session()
api_client.headers.update({
'Authorization': 'Bearer another_secret_token',
'User-Agent': 'our-data-pipeline/1.0'
})
# And then your actual business logic gets lost in the noise
data = api_client.get('https://api.example.com/data')
# ... do something with data
return processed_data
With resources, your asset becomes clean and focused:
@dg.asset
def clean_asset(
database: SnowflakeResource,
api_client: APIResource
):
# Clear, focused business logic
raw_data = api_client.fetch_data()
processed_data = transform_data(raw_data)
database.save(processed_data)
return processed_data
API Encapsulation
One of our favorite uses for resources is wrapping REST APIs. Here's a real example from our Scout integration (which powers the ask-ai feature in our Slack channel and docs):
import dagster as dg
import requests
from typing import Dict, List
class ScoutResource(dg.ConfigurableResource):
api_key: str
base_url: str = "https://api.scout.example.com"
def _get_headers(self) -> Dict[str, str]:
return {
'Authorization': f'Bearer {self.api_key}',
'Content-Type': 'application/json'
}
def write_documents(self, documents: List[Dict]) -> bool:
"""Upload documents to Scout for indexing"""
response = requests.post(
f"{self.base_url}/documents",
json={"documents": documents},
headers=self._get_headers()
)
return response.status_code == 200
def search_documents(self, query: str) -> List[Dict]:
"""Search indexed documents"""
response = requests.get(
f"{self.base_url}/search",
params={"q": query},
headers=self._get_headers()
)
return response.json().get('results', [])
Now the assets that use Scout are nice and clean:
@dg.asset
def index_documentation(scout: ScoutResource):
docs = load_documentation_from_somewhere()
success = scout.write_documents(docs)
return {"documents_indexed": len(docs), "success": success}
@dg.asset
def search_results(scout: ScoutResource):
results = scout.search_documents("dagster best practices")
return process_search_results(results)
Testing Without the Pain
Testing is one of those software engineering best practices that are often not practiced by less technical data practitioners. Introducing some simple functional tests can dramatically improve the reliability of your data platform. However, you want to be thoughtful in how you think about tests; nobody wants to spin up a Snowflake instance just to test that their transformation logic works. With resources, you can create lightweight mocks that return predictable test data:
class MockDatabaseResource(ConfigurableResource):
def get_users(self):
return [
{"id": 1, "name": "Alice", "email": "alice@example.com"},
{"id": 2, "name": "Bob", "email": "bob@example.com"}
]
def save_processed_data(self, data):
# In tests, just verify the data structure
assert isinstance(data, list)
assert all('processed_at' in item for item in data)
return True
def test_user_processing():
# Use the mock instead of real database
result = user_processing_asset(MockDatabaseResource())
assert len(result) == 2
assert result[0]['processed_at'] is not None
Your tests run in milliseconds instead of minutes, and they're not dependent on external services. You're testing what actually matters: your business logic and data transformations.
We recently released a course around testing in Dagster on Dagster University and it is a great resource if you are new to testing in Data engineering or want to learn the best practices for Dagster.
Configuration That Makes Sense
Resources surface their configuration in Dagster's UI under the deployment tab, making it easy to see which assets are using which resources and how they're configured. This visibility is crucial when you're debugging issues or onboarding new team members.
# In your definitions file
import dagster as dg
defs = dg.Definitions(
assets=[user_data, daily_metrics, search_results],
resources={
"database": SnowflakeResource(
account="xy12345.us-east-1",
user="data_eng_user",
password={"env": "SNOWFLAKE_PASSWORD"},
database="ANALYTICS_DB",
schema="PRODUCTION",
warehouse="COMPUTE_WH"
),
"scout": ScoutResource(
api_key={"env": "SCOUT_API_KEY"}
)
}
)
When to Use Resources
Use resources whenever you need to interact with an API, database, or have a common pattern that you use throughout your project. If you're leveraging a Dagster integration like Fivetran or Snowflake, there's usually a resource you'll need to configure with your API keys and connection details.
The general rule: if you find yourself repeating the same setup code across multiple assets, that's a resource waiting to happen.
The Bigger Picture
Resources are one of those practices that make you a better engineer while solving practical problems. They enforce clean separation of concerns, make your code more testable, and save you from the environment-specific issues that come with hard-coded configurations.
More importantly, they scale with your team. When you have multiple people working on the same project, resources provide a standard way to interact with external services. New team members don't need to figure out how to connect to the database—they just use the resource.
Getting Started
Start small. Pick one external service you're currently hard-coding and turn it into a resource. You'll immediately see benefits in terms of code clarity and testability. Then expand from there.
Remember: if you can access it in Python, you can make a resource out of it. And trust us, your future self (and your teammates) will thank you for taking the time to do it right.