Python Packages: a Primer for Data People (part 2 of 2) | Dagster Blog

March 6, 20233 minute read

Python Packages: a Primer for Data People (part 2 of 2)

An introduction to managing Python dependencies and some virtual environment best practices.
Elliot Gunn
Name
Elliot Gunn
Handle
@elliot


In part 1 of this series, we explored the basics of Python modules, Python packages and how to import modules into your own projects.

As you build larger and more complex packages, you'll often need to use code from other packages in your project. This is where managing dependencies becomes important.

Let’s talk about what dependency management in Python looks like today. We’ll cover everything from how to manage Python packages using both old and new methods, different dependency management tools, and how to manage them with virtual environments.

Table of Contents

Managing dependencies

Dependencies are other packages that your package relies on to work correctly. Keeping track of dependencies can be a challenge, but there are tools to help you manage them effectively.

One of these tools is the Python Package Index (PyPI), which is a central repository of open-source Python packages. You can use PyPI to search for packages that you can include in your project and to make your own packages available to others.

In the following sections, we'll look at two different ways to manage dependencies in your Python projects: the old way using setup.py , and the new way using pyproject.toml.

Managing Dependencies the Old Way: setup.py

Before the introduction of pyproject.toml , the recommended way was to use a setup.py file to manage dependencies in Python projects.

setup.py is a file that you include in the root of your project, which contains information about your package and its dependencies. The file is used by pip, the package installer for Python, to install your package and its dependencies.

Here is an example of a setup.py file:

from setuptools import setup, find_packages

setup(
    name='your-package-name',
    version='0.0.1',
    description='A brief description of your package',
    author='Your Name',
    author_email='your.email@example.com',
    packages=find_packages(),
    install_requires=[
        'dependency1',
        'dependency2',
    ],
)
  • The name and version are required fields that specify the name and version of your package.
  • The description field provides a brief description of your package.
  • The author and author_email fields specify the name and email address of the person responsible for the package.The packages field specifies the packages that are included in your project. The find_packages() function is used to automatically find all packages in your project.
  • The install_requires field is a list of dependencies that your package needs in order to run correctly. In this example, the package depends on two other packages, dependency1 and dependency2.

To install the dependencies of your package, you can run the following command:

pip install -e .

The -e option tells pip to perform an "editable" install, which allows you to make changes to your package without having to reinstall it. The . at the end of the command specifies the current directory, which is the root of your package.

It is important to note that when you run the above command, pip will install the dependencies in the global environment which can create issues when you are working on multiple projects. This is addressed by virtual environments, which is covered later in this blog post.

Managing Dependencies the New Way: pyproject.toml

pyproject.toml is a new file format introduced to replace setup.py for managing dependencies in Python projects. It was introduced as part of PEP 518 and PEP 621.

It’s a configuration file that is used by pip to install your package and its dependencies. It has a simpler format compared to setup.py and is easier to read and maintain.

Here is an example of a pyproject.toml file:

[project]
name = "your-package-name"
version = "0.0.1"
description = "A brief description of your package"
authors = ["Your Name <your.email@example.com>"]

[project.dependencies]
dependency1 = "^1.0"
dependency2 = "^2.0"
  • The name and version fields are required, and specify the name and version of your package.
  • The description field provides a brief description of your package.
  • The authors field specifies the name and email address of the person responsible for the package.
  • The dependencies section specifies the dependencies that your package needs in order to run correctly. In this example, the package depends on two other packages, dependency1 and dependency2.

Like we discussed above, you can run the following command to perform an editable install:

pip install -e .

Installing “extras”

Let’s talk a bit about what happens when a package has optional features that require additional dependencies called "extras".

If you are using the setup.py file to manage your dependencies, you can specify the extras by including them in the extras_require argument of the setup() function. For example:

setup(
    ...
    extras_require={
        'extra_feature': ['dependency3', 'dependency4']
    }
    ...
)

To install the extra dependencies, you would run the following command:

pip install -e .[extra_feature]

If you are using the pyproject.toml file, you can specify the extras in the [project.extras] section of the file. For example:

[project.extras]
extra_feature = ["dependency3", "dependency4"]

You can use the same command to install the extra dependencies as above:

pip install -e .[extra_feature]

As mentioned earlier, the -e flag stands for "editable" and it installs the package in "developer mode". You’ll use this if you want to test any changes before publishing a new version.

The -e .[dev] flag is similar to the -e flag, but it also specifies that the package should be installed with the "dev" extras. Any additional packages or dependencies that are specified as "dev" in the package's setup.py file will also be installed. This is useful for installing development-specific dependencies that are not needed for production use.

Alternative Python Dependency Management Tools

Aside from pip, there are alternative tools available for managing dependencies in Python projects. One such tool is Poetry.

Poetry is a packaging and dependency management tool for Python projects. Poetry is designed to be more user-friendly than pip, with features like version constraint resolution and automatic virtual environment management.

One of the main advantages of using Poetry is its simplicity and ease of use. Poetry automatically manages your project's virtual environment, ensuring that each project has its own isolated environment with its dependencies. This reduces the risk of version conflicts between different projects. Additionally, Poetry provides a simple, concise syntax for specifying dependencies and version constraints in your pyproject.toml file.

However, there are also some disadvantages to using Poetry. For one, it is not as widely used as pip, and may not be as well-supported in the wider Python community. Some developers may prefer the more flexible and customizable approach offered by pip.

Virtual environments (a.k.a. 'venvs')

Virtual environments in Python provide a solution to the problem of dependency conflicts. By default, all Python packages are installed into a single global namespace, which can cause compatibility issues between different projects on a single machine and make it difficult to resolve conflicts.

A virtual environment creates isolated Python environments to allow for different versions of Python and libraries to be used in separate projects, without interfering with each other.

This means that you can have multiple virtual environments on the same computer, each with its own set of packages, without them interfering with each other.

To create a new virtual environment, you can use the python -m venv command. For example, to create an environment named "myenv", you would run:

python -m venv myenv

To activate a virtual environment, you can use the source command followed by the path to the environment's activate script.

On Linux and macOS, you would run:

source myenv/bin/activate

On Windows, you would run:

myenv\Scripts\activate

When the environment is activated, any Python scripts or commands run will use the version of Python and libraries within the virtualenv, rather than the system's global version.

To deactivate a virtual environment, simply type deactivate in the terminal.

You will notice that when you activate a virtual environment, your command line prompt changes to indicate the active venv:

myname@mymachine myProject % source myenv/bin/activate
(myenv) myname@mymachine myProject % deactivate
myname@mymachine myProject %

It is best practice to create a specific virtual environment for each new project and to keep it in the same directory as the project. This makes it easy to manage and ensure that all dependencies are contained within the project folder.

If your virtual environment is in a Git repository, it is recommended to add it to your .gitignore file. This helps to keep your repository clean and ensures that each developer's virtual environment is isolated from the GitHub repository.

Up next…

We hope this blog post has provided a useful introduction to effectively managing Python dependencies. If you have any questions or need further clarification, feel free to join the Dagster Slack and ask the community for help. Thank you for reading!

Our next article will explore a roadmap for structuring Python projects.


The Dagster Labs logo

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:


Read more filed under
Blog post category for Python Guide. Python Guide