Blog
Environment Variables in Python

Environment Variables in Python

August 7, 2023
Environment Variables in Python
Environment Variables in Python

In part V of our series on Data Engineering with Python, we cover best practices for managing environment variables in Python.

The following article is part of a series on Python for data engineering aimed at helping data engineers, data scientists, data analysts, Machine Learning engineers, or others who are new to Python master the basics. To date this beginners guide consists of:

Sign up for our newsletter to get all the updates! And if you enjoyed this guide check out our data engineering glossary, complete with Python code examples.

Environment variables offer a way to configure applications without hardcoding values, allowing modification of application behavior without changing the code itself. This becomes especially important when parameterizing data pipelines in a production environment. They enable storing sensitive information like database credentials or API keys outside the codebase, which enhances security and makes code more portable and easier to manage.

This article simplifies the concept of environment variables in Python, explaining their function, importance, and effective utilization to improve your Python programming skills. It provides practical examples, techniques, and best practices for setting up your Python environment, configuring paths to important tools, or defining variables for your scripts.

What are environment variables?

Python provides a built-in module named os as an interface to interact with the underlying operating system. This module provides a dictionary-like object, os.environ, that allows you to interact with environment variables.

os.environ acts as a mapping between environment variable names and their values. It can access, modify, or create environment variables in a Python program. Let's break it down.

Reading Python Environment Variables

To read the value of an environment variable in Python, you treat os.environ as a dictionary and access the variable by its name. Here's an example:

import os

### Accessing an environment variable
print(os.environ['HOME'])

This will print your default ‘home’ directory: /Users/username on a macOS system or /home/username on a Linux system:

/Users/elliot

Here, HOME is the name of the environment variable that typically stores the path to the current user's home directory. If the environment variable exists, its value will be printed to the console. If the environment variable does not exist, however, this will raise a KeyError.

To avoid this error, you might choose to use the get method, which allows you to provide a default value that will be returned if the environment variable is not found:

import os

### Accessing an environment variable with a default value
print(os.environ.get('HOME', '/home/default'))

In this case, if the HOME variable does not exist, the string /home/default will be returned instead.

You can print out and explore all environment variables with the following script:

import os

for name, value in os.environ.items():
    print("{0}: {1}".format(name, value))

Modifying and Adding Environment Variables

You can modify the value of an existing environment variable, or add a new one, by assigning a value to a key in the os.environ object:

import os

### Setting an environment variable
os.environ['MY_VARIABLE'] = 'foo'

### Now, the new variable is accessible via os.environ
print(os.environ['MY_VARIABLE'])  # Outputs: foo

In this example, the MY_VARIABLE environment variable is created and set to foo. However, this variable will be available only as long as the current process is running.

Changes made to the os.environ object are local to the process where they are made. If you set an environment variable in your Python script, it will be available to that script and any subprocesses it creates. However, it will not be visible in the broader environment or to other unrelated processes.

The scope and lifecycle of environment variables set in this way are limited to the current process where they are assigned. As this might be a complicated point, consider our example above.

A Python script sets an environment variable named MY_VARIABLE to foo. If this script calls another Python script and creates a subprocess (Process 1), the second script will also access MY_VARIABLE. But if you try to access MY_VARIABLE from a different Python script that was not launched by the first script (Process 2), or from the command line after running the script, MY_VARIABLE will not be available—it is not part of the broader environment. Both processes, however, have access to any global variables like GLOBAL_VARIABLE in the example above.

Scope and lifecycle of environment variables

To persist environment variables across different sessions or processes, you'll need to set them in your operating system's environment outside of Python. The way to do this varies based on your operating system and shell but generally involves editing shell configuration files or using command-line utilities.

Let’s review what we mean when we talk about operating systems, shells, and command-line utilities.

Operating systems, shells, and command-line utilities

Operating systems are the most fundamental piece of software that run on a computer. It acts as an intermediary between the user and the computer hardware, making it possible for users to execute programs, manage files, and interact with the device. Operating systems allow you to manage files, processes, memory, and security. Popular operating systems include Microsoft Windows, macOS, and Linux distributions (like Ubuntu, Fedora, etc.).

Shells allow you to interact with your computer's operating system. As a data engineer, you'll primarily work with command-line shells, which let you issue commands to the operating system by typing them in as text.

When you open a terminal window (on Linux or Mac) or command prompt (on Windows), you interact with a shell. You might type a command like python my_script.py to run a Python script, or ls (on Linux or Mac) or dir (on Windows) to list the files in the current directory. These are all examples of interacting with a command-line shell.

Shell configuration files are special files that the shell reads when it starts up. As a Dagster data engineer, you will probably use them to set environment variables that should be available every time you open a new terminal window. This simplifies things since the shell automatically executes them every time it starts.

Command-line utilities are programs designed for use from a text interface. Instead of clicking on buttons, you can run Python scripts from the command line (e.g., use pip to install Python packages or jupyter to start a Jupyter notebook server) or use Git from the command line for version control. Other examples include Unix command-line utilities such as grep, awk, sed to search, filter, and transform text data; curl and wget to download files from the web; and ssh to connect with remote servers when working with distributed systems or cloud-based resources.

Process-level scope

Environment variables set or modified via os.environ in your Python script are accessible within the same process. If your script starts another process (a subprocess), that subprocess inherits the environment of its parent, including any environment variables set by the parent. However, if two separate processes both set an environment variable with the same name, they will not interfere with each other; each process has its own independent copy of the environment.

in your Python script are accessible within the same process. If your script starts another process (called a subprocess), that subprocess will inherit the environment of its parent, including any environment variables set by the parent. But if two separate processes both set an environment variable with the same name, they won't interfere with each other; each process has its own independent copy of the environment.

User-level and system-level scope:

To set environment variables accessible to all processes of a particular user or to all processes on the system, you must do this outside of Python, in the configuration files for your shell or operating system. These changes have a broader scope but are not immediate; they typically require you to start a new shell session or reboot the system.If you want to set environment variables that are accessible to all processes of a particular user, or to all processes on the system, you need to do this outside of Python, in the configuration files for your shell or operating system. These changes have a broader scope but are not immediate; they typically require you to start a new shell session or reboot the system.

Persistence of changes

Changes to environment variables made using os.environ in a Python script do not persist after the script finishes executing. This means that if you run a script that sets an environment variable and then run another script or return to your shell, the environment variable changes from the first script will not be visible.

This non-persistence can be an advantage. It allows your script to modify its environment as needed for its own purposes without affecting other processes or leaving lasting changes that could affect future shell sessions.

If you want to set an environment variable that persists across multiple shell sessions or is available to other applications, you must do so at the shell level or via the operating system's interfaces for setting environment variables. The exact method depends on your operating system and shell. However, you should do this carefully, as it can potentially have wide-ranging effects on your system's behavior.

Environment variables and configuration

Using environment variables for configuration is considered a best practice in software development for two reasons.

First, they help avoid storing sensitive information, such as passwords, API keys, or database URIs, directly in your code. This prevents such information from being inadvertently exposed, for example, by being included in version control repositories.

Second, they also make your code more portable. When different settings are needed for development, testing, and production environments, these can be controlled through environment variables without modifying the code. If a configuration value needs to change, it can be updated in the environment variable without requiring a change to the application's code and without needing to redeploy the application.

Using environment variables for sensitive information

To use environment variables for sensitive data like API keys or database URIs, ensure that the relevant environment variables are set in the environment where your Python code is running.

For example, you might have a Python script that connects to a database and uses an API. Instead of hardcoding the database URI and the API key in the script, use environment variables:

import os

### Accessing sensitive data from environment variables
db_uri = os.environ.get('DATABASE_URI')
api_key = os.environ.get('API_KEY')

In this case, the DATABASE_URI and API_KEY environment variables would need to be set in the environment where the script is run. This could be done in the shell before starting the script, or in a configuration file that the shell reads when it starts.

Don’t hardcode!

Hardcoding sensitive information, such as directly embedding specific values or parameters into your code, like API keys or database URIs, poses a significant security risk. If your code is shared or made public, for example on GitHub, anyone who sees the code can access these sensitive details. This could allow unauthorized access to your database or misuse of your API keys.

Even if you do not plan to share your code, a risk remains. Hardcoded information stays in version control history, so anyone who later gains access to the repository could find sensitive data in old commits.

Hardcoding also makes your code less flexible. If you want to use a different database for testing or need to change API keys, you would need to change and redeploy your code. By contrast, using environment variables for such details allows you to simply change the environment variables when needed without touching your code.

Best practices for managing environmental variables

With "dotenv" files, you can store your configuration in a .env file that loads when your application starts. The python-dotenv library provides a way to load environment variables from .env files into the os.environ object in Python. Here is how you can use it:

object in Python. Here's how you can use it:

Install the python-dotenv library if you haven't done so already. You can install it via pip:

pip install python-dotenv

Create a .env file in your project root directory and add some variables to it. The file could look something like this:

DATABASE_URI=postgres://myuser:mypassword@localhost:5432/mydatabase
API_KEY=abcdef123456

Note: this file should not be committed to your version control system. You should add .env to your .gitignore file to ensure Git ignores it.

In your Python script, you can use python-dotenv to load the .env file:

from dotenv import load_dotenv

### Load the .env file
load_dotenv()

### Now you can access the variables
import os
db_uri = os.getenv('DATABASE_URI')
api_key = os.getenv('API_KEY')

The load_dotenv() function reads the .env file and loads its contents into os.environ.

Here are additional best practices for managing environment variables:

  • Always .gitignore .env files: Explicitly add .env to your .gitignore file to prevent sensitive data from being accidentally committed to version control.
  • Use descriptive variable names: Choose clear and unambiguous names (e.g., DATABASE_URL instead of DB_CONN) to make variables easy to understand and manage.

Validate required variables at startup: Implement checks at the beginning of your application to ensure all necessary environment variables are set. Raise an error if a critical variable is missing to prevent unexpected behavior.

Environment variables in production

In development, it's often beneficial to have a simple and secure way to set environment variables that your application can use. This can look very different depending on your specific production environment.

You can leverage specific services for managing secrets in cloud environments:

You can also use specific secret management objects in container orchestration systems:

We’ll see how this works in Amazon Elastic Container Service (Amazon ECS) and Kubernetes.

Amazon ECS task definitions and AWS Secrets Manager

You can define environment variables directly within the task definition. Here's an example in JSON format:

{
  "containerDefinitions": [
    {
      "name": "my-container",
      "image": "my-image",
      "environment": [
        {
          "name": "ENV_VARIABLE_NAME",
          "value": "value"
        }
      ]
    }
  ]
}

For sensitive data, you can store the secrets in AWS Secrets Manager and reference them in your task definition:

{
  "containerDefinitions": [
    {
      "name": "my-container",
      "image": "my-image",
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name"
        }
      ]
    }
  ]
}

Kubernetes ConfigMaps

In Kubernetes, you can manage environment variables using ConfigMaps for non-sensitive data and Secrets for sensitive data.

Here's an example of defining environment variables using a ConfigMap. First, create a ConfigMap in a yaml file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  ENV_VARIABLE_NAME: value

Then, reference the ConfigMap in a Pod:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: my-container
      image: my-image
      envFrom:
        - configMapRef:
            name: app-config

Alternatively, you can use Kubernetes Secrets for sensitive data. First, create a bash file with your secret:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: my-container
      image: my-image
      env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: DB_PASSWORD

Then, reference the Secret in a Pod:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: my-container
      image: my-image
      env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: DB_PASSWORD

When to use environment variables

Environment variables are best used for configuration data that varies between deployment environments and for sensitive data that should not be stored directly in the code.

You should consider using environment variables for:

  • Database URLs and other related settings
  • API keys, tokens, or secrets
  • Hostnames or URLs for external services
  • Any kind of sensitive data that should not be exposed in the code

However, environment variables may not be the best choice for:

  • Fine-tuned configuration options that may need to be changed often
  • Large amounts of binary data
  • Data that would be better stored in a database or other storage service

Additional considerations for managing secrets and sensitive information

Managing secrets and sensitive information is one of the key uses of environment variables. By storing this data in environment variables, you keep it out of your code. This prevents secrets from being exposed in your version control system and allows you to change the secrets without modifying your code.

While environment variables are useful, using them alone is not enough to keep your secrets secure. You should also:

  • Avoid logging environment variables, as logs can often be accessed by people who should not see the secrets.
  • Be cautious about error messages that might expose environment variables.
  • Ensure that your .env file is ignored by your version control system.

Environment variables foster consistency

Keeping development and production environments consistent can help prevent bugs that appear when your code behaves differently in different environments. Environment variables can help with this.

By using environment variables for configurations that vary between environments, you can ensure that your code itself remains consistent across all environments. Only the settings in the environment variables change.

This means that you can run your code in a development, testing, or production environment just by setting the appropriate environment variables. If your code works in one environment, it's more likely to work in others.

However, this requires discipline. You should resist the temptation to hardcode configuration data and instead always use environment variables. Also, all members of your team should understand how to set and use environment variables in their development environments. This can be facilitated by using tools like python-dotenv, which simplify the process of managing environment variables.

Conclusion

This article explored how using environment variables can help manage production applications more securely and parameterize data engineering pipelines.

For further assistance, join the Dagster Slack and ask the community for help.

Read the next article in our Python series, which builds on these data engineering concepts and explores type hints in Python.

FAQs about Python Environment Variables

What is the primary benefit of using environment variables in Python?

The main advantage of using environment variables in Python is to separate sensitive configuration data from your codebase. This practice enhances security by keeping details like API keys and database credentials out of version control, making your applications more portable and easier to manage across different environments. It allows you to modify application behavior without directly altering the code.

How do os.environ and os.getenv() differ when accessing variables?

When accessing environment variables in Python, os.environ['KEY'] will raise a KeyError if the specified variable is not found, which is useful when a variable is strictly required. In contrast, os.getenv('KEY') returns None if the variable doesn't exist, and os.getenv('KEY', 'default_value') allows you to provide a fallback default, offering a safer way to access optional variables. Both methods are part of the built-in os module for interacting with the operating system.

Why is it recommended to add .env files to .gitignore?

Adding .env files to .gitignore is a critical security best practice to prevent sensitive information from being committed to version control systems like Git. These files often contain credentials, API keys, and other secrets that should never be publicly exposed. Ignoring them ensures that your local configurations remain separate and secure, especially when collaborating or open-sourcing projects.

Are changes made to environment variables using os.environ permanent?

Changes made to environment variables via os.environ in a Python script are local and temporary, affecting only the current process and its subprocesses. These modifications do not persist after the script finishes executing and are not visible to other unrelated processes or future shell sessions. To establish permanent environment variables, changes must be made at the operating system or shell configuration level outside of Python.

How do organizations manage sensitive environment variables in production applications?

In production environments, organizations manage sensitive environment variables using specialized secret management services provided by cloud platforms or container orchestration systems. Examples include AWS Secrets Manager, Azure Key Vault, Google Cloud Secret Manager, Kubernetes Secrets, and Docker secrets. These tools securely store and inject credentials and API keys into applications at runtime, avoiding hardcoding and ensuring robust security practices.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Bridging High-Code and Low-Code

October 1, 2025

Bridging High-Code and Low-Code

Empowering engineers with flexibility and analysts with accessibility

Building a Better Lakehouse: From Airflow to Dagster

September 30, 2025

Building a Better Lakehouse: From Airflow to Dagster

How I took an excellent lakehouse tutorial and made it even better with modern data orchestration

Designing User-Friendly Dagster Components

September 25, 2025

Designing User-Friendly Dagster Components

The difference between components that thrive and components that collect digital dust? User experience design.