November 21, 2022 • 2 minute read •
Safe and Easy: Managing Secrets in Dagster Cloud
- Name
- Erin Cochran
- Handle
- Name
- Daniel Gibson
- Handle
Along with the 1.1 release of Dagster’s open-source library this week, we’re excited to roll out a new and improved way for Dagster Cloud users to manage their secrets.
New: Setting Environment Variables in Dagster Cloud
Every data pipeline needs to authenticate with external services, but securely managing the credentials for these services can be challenging. One common approach is to use environment variables, which are key-value pairs configured outside your source code. For example, instead of hard-coding database credentials - which is bad practice and cumbersome for development - you can use environment variables to supply user details. This allows you to configure your pipeline without modifying code or insecurely storing sensitive data.
Dagster Cloud’s new Environment Variables UI makes it easy to set up environment variables, then make them available in your code. You can simply sign in to your Dagster Cloud account, click on the Deployment > Environment Variables tab, and add or remove environment variables, all within the Dagster Cloud UI.
These environment variables are automatically included any time your code is executed in Dagster Cloud.
Environment Variable Scopes
In keeping with Dagster’s focus on supporting data practitioners at every stage of the developer lifecycle, Dagster Cloud allows you to easily set different scopes for environment variables, from local development to production.
For example, let's say we want to use one database password in production, and another while testing using Branch Deployments. In our code, we use the SNOWFLAKE_PASSWORD
environment variable to pass in the database password. To use different passwords between production and Branch Deployments, we can create two instances of SNOWFLAKE_PASSWORD
. One instance is scoped to the prod
deployment and the other only to Branch Deployments:
For local development, you can also create environment variables with a Local scope, which allows you to download them in a .env
file to your local machine. Running dagit
in the same folder as the .env
file will cause those environment variables to be automatically included in the environment, just like in production.
Accessing environment variables in Dagster code
Once you have your environment variables configured, you have several options for accessing them in your code. The simplest way is simply to access the environment variable directly:
import os
database_name = os.getenv("DATABASE_NAME")
You can also reference environment variables in Dagster’s config system, to configure Dagster ops, assets, or resources. For example, you might want to use Dagster’s built-in snowflake_io_manager
to store the outputs of your assets in a Snowflake table, without needing to hard-code your Snowflake password:
from dagster_snowflake import snowflake_io_manager
prod_snowflake_io_manager = snowflake_io_manager.configured({
"account": "abc1234.us-east-1",
"user": "system@company.com",
"password": os.environ["SYSTEM_SNOWFLAKE_PASSWORD"],
"database": "PRODUCTION",
"schema": "PROD_SCHEMA",
})
This code creates a snowflake_io_manager
and configures it so that it can be used when materializing assets in Dagster. Much of the configuration is hard-coded, but the value of the more sensitive password
field is pulled from the SYSTEM_SNOWFLAKE_PASSWORD
environment variable.
In addition to environment variables that you specify, Dagster Cloud automatically includes a set of system environment variables whenever your code runs. You can use these to vary the configuration of your jobs in different environments. For example, you can have every Branch Deployment pull request use a different Snowflake database using the following code:
import os
database_name = f"PRODUCTION_CLONE_{os.getenv('DAGSTER_CLOUD_PULL_REQUEST_ID')}"
See our Testing Against Production with Dagster Cloud Branch Deployments guide for a full example of how to test your Dagster code in your cloud environment without impacting your production data.
Secure Storage
To securely store environment variables defined using the Dagster Cloud UI, Dagster Cloud uses Amazon Key Management Services (KMS) and envelope encryption. Envelope encryption is a multi-layered approach to key encryption. Plaintext data is encrypted using a data key, and then the data under the data key is encrypted under another key.
Here's a look at how it works in Dagster Cloud:
Each customer account is assigned a unique symmetric data key, generated by AWS KMS using a non-exportable master key. This data key is then used to encrypt and decrypt the data associated with that account using the Fernet symmetric encryption algorithm. Both customer keys and the master key are periodically rotated. This approach isolates each account's data and reduces the risk of exposure by limiting the amount of data a single key can access.
Conclusion
We hope that these new ways to configure and access environment variables make it much easier for you to securely manage and access secrets within your Dagster code. Keep an eye out for more secrets improvements coming soon, like better support for integrating with external secrets managers.
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!
Follow us:
From Chaos to Control: How Dagster Unifies Orchestration and Data Cataloging
- Name
- Alex Noonan
- Handle
- @noonan
Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal
- Name
- TéJaun RiChard
- Handle
- @tejaun
Dagster Deep Dive Recap: Building a True Data Platform
- Name
- TéJaun RiChard
- Handle
- @tejaun