Breaking Packages in Python | Dagster Blog

February 27, 20248 minute read

Breaking Packages in Python

An exposé of the nooks and crannies of Python’s modules and packages.
Pedram Navid
Name
Pedram Navid
Handle
@pdrmnvd


Anyone who has spent sufficient time in Python has been hit with seemingly odd behaviors that defy expectations. For a language that claims "There should be one-- and preferably only one --obvious way to do it”, there seem to be very many ways of doing things, and none of them all that obvious.

In this post, we’ll dive deep into Python’s system of imports, modules, and packages to look for some sharp edges and hopefully learn more about how Python works under the hood. Let’s dive in!

Two ways of running a module

Let’s start with a simple example and see how much we can break our expectations. We will start with a root project folder, in my case, it’s called python-envs .

We’ll create two files, hello.py and goodbye.py, and add some simple print statements.

We will import hello from the goodbye module to show how a simple import works.

# File: hello.py
# --------------
def hello():
    print("Hello, from the hello function in {}!".format(__name__))

if __name__ == "__main__":
    hello()
    print("Hello, from the main block!")

# File: goodbye.py
# ----------------
import hello

def goodbye():
    print("Goodbye, from the goodbye function in {}!".format(__name__))

if __name__ == "__main__":
    hello.hello()
    goodbye()
    print("Goodbye, from the main block!")

We can easily run hello.py two ways in Python, either by invoking the script directly with the command python hello.py or as a module with python -m hello. Both seem to work great!

❯ python hello.py
Hello, from the hello function in __main__!
Hello, from the main block!

❯ python -m hello
Hello, from the hello function in __main__!
Hello, from the main block!

And likewise, we can do the same with goodbye.py:

❯ python goodbye.py
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!

❯ python -m goodbye
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!

So far so good.

Structuring our project into subfolders

Let’s say we wanted to start organizing our files, what if we moved these two files into a subfolder?

❯ mkdir -p subfolder
❯ mv *.py ./subfolder
❯ tree -I __pycache__      # ignore __pycache__ folders
.
└── subfolder
    ├── goodbye.py
    └── hello.py

2 directories, 2 files

Nothing has really changed yet. Let’s try and run hello and goodbye from the root folder.

❯ python subfolder/hello.py
Hello, from the hello function in __main__!
Hello, from the main block!

❯ python subfolder/goodbye.py
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!

❯ python -m subfolder.hello
Hello, from the hello function in __main__!
Hello, from the main block!

❯ python -m subfolder.goodbye
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
    import hello
ModuleNotFoundError: No module named 'hello'

What happened? These two seemingly similar ways of running a Python file result in different behaviors. Let’s start with the first example python subfolder/goodbye.py

When we run the above commands what does Python actually do? One great way to find out is to run Python in interactive mode with python -i subfolder/goodbye.py.

This will run our script and then drop us in the interactive REPL.

 ❯ python -i subfolder/goodbye.py
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!
>>>

Let’s take a look at sys.path which defines every folder Python will search for in search of modules:

import sys
print(sys.path)
['/Users/pedram/projects/python-envs/subfolder', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/site-packages']

When we run a python file directly by invoking it with python script.py Python treats this file as a script, and adds the folder of that script to the system path. The system path determines where Python looks for modules, and since both hello.py and goodbye.py are in the same folder, Python is able to import hello .

>>> hello
<module 'hello' from '/Users/pedram/projects/python-envs/subfolder/hello.py'>

When we try to load subfolder.goodbye as a submodule, however, it doesn’t work:

❯ python -im subfolder.goodbye
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
    import hello
ModuleNotFoundError: No module named 'hello'
>>> import sys
>>> print(sys.path)
['/Users/pedram/projects/python-envs', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/site-packages']

The sharp edge of Python’s import system

Our sys.path is missing the subfolder! It turns out we’ve discovered the first sharp edge of Python’s import system: when running a script directly, Python adds the script’s folder to the system path, but it does not do this when running a module.

However, since it does include our root folder, we can import subfolder as a module

>>> import subfolder
>>> subfolder.__path__
_NamespacePath(['/Users/pedram/projects/python-envs/subfolder'])

We can update our existing code to change our import from import hello to import subfolder.hello as hello. Now our script works!

import subfolder.hello as hello

def goodbye():
    print("Goodbye, from the goodbye function in {}!".format(__name__))

if __name__ == "__main__":
    hello.hello()
    goodbye()
    print("Goodbye, from the main block!")
❯ python -im subfolder.goodbye
Hello, from the hello function in subfolder.hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!

Let’s see if we can still run the file directly:

❯ python subfolder/goodbye.py
Traceback (most recent call last):
  File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
    import subfolder.hello as hello
ModuleNotFoundError: No module named 'subfolder'

It turns out that in Python when you fix one problem, you sometimes create a new one.

Let’s see if we can diagnose this problem. We’ll run Python in interactive mode:

❯ python -i subfolder/goodbye.py
Traceback (most recent call last):
  File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
    import subfolder.hello as hello
ModuleNotFoundError: No module named 'subfolder'

>>> import subfolder
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'subfolder'

>>> import sys
>>> sys.path
['/Users/pedram/projects/python-envs/subfolder', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/site-packages']
>>>

It appears there’s two differences between running python subfolder/goodbye.py and python -m subfolder.goodbye. When running a script directly, Python does add the folder where the script is located to your sys.path but it does not add the current working directory to your path. Conversely, when running a file as a module with -m Python does add your current working directory to your sys.path but it does not add the folder of the script.

Let’s summarize what we’ve learned so far!

python -m foo.barpython foo/bar.py
Current Working DirectoryAdded to sys.pathNot added to sys.path
Folder of the scriptNot added to sys.pathAdded to sys.path

Now, you’re wondering “well how do I make sure this works in all cases?” The short answer is to install your code as a package. If you don’t know what that means, don’t worry, we’ll talk about it below, or you can check out our Python Packages Primer. In the meantime, let’s explore another sharp edge.

Implicit Namespace Packages

I didn’t tell you this, but this whole time, we were dealing with a second sharp edge: implicit namespace packages.

We saw in the previous section how, when we have a folder, we can use it as part of the namespace when importing modules. When we have this:

.
└── subfolder
    ├── goodbye.py
    └── hello.py

We can do this

❯ python -m subfolder.goodbye
Hello, from the hello function in subfolder.hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!

But this wasn’t always the case. In fact, prior to the release of Python 3.3 in 2012, the above code wouldn’t work. Since I couldn’t get Python 3.2 working on my Mac, I’ll use Docker.

❯ cat Dockerfile
FROM python:3.2

COPY . .
RUN python -m subfolder.goodbye

❯ docker build .
> [3/3] RUN python -m subfolder.goodbye:
0.752 /usr/local/bin/python: No module named subfolder

The reason our code above worked in Python 3.3 and later is due to implicit Namespace Packages, as discussed in PEP420 I won’t get into the nitty-gritty aspects of it (for the curious, take some time to read the Python-DEV thread on this) but in essence, PEP420 made it possible to treat folders as a Python Package without having to add an __init__.py

If we add the __init__.py file to our subfolder, then even Python 3.2 can run our code:

❯ touch subfolder/__init__.py
=> [3/3] RUN python -m subfolder.goodbye

What is this __init__.py file anyway? Let’s take a close look at what happens before and after we use one, back to Python 3.11:

# No __init__.py
>>> import namepkg.hello
>>> namepkg.__path__
_NamespacePath(['/Users/pedram/projects/python-envs/namespaces/namepkg'])

# With __init__.py
>>> import normalpkg.hello
>>> normalpkg.__path__
['/Users/pedram/projects/python-envs/namespaces/normalpkg']

When we import module, Python acts differently depending on whether or not the __init__.py file is found in the folder. In the first example, without __init__.py Python creates a namespace package when trying to import a module. The second example, with the __init__.py file, creates a normal Python package.

We can see the differences between these two types of imports in a few ways:

# Normal Packages
# ---------------
>>> import normalpkg

# Have a defined __file__ attribute
>>> normalpkg.__file__
'/Users/pedram/projects/python-envs/namespaces/normalpkg/__init__.py'

# Have a mutable __path__ attribute
>>> normalpkg.__path__
['/Users/pedram/projects/python-envs/namespaces/normalpkg']

# Use a SourceFileLoader
>>> normalpkg.__loader__
<_frozen_importlib_external.SourceFileLoader object at 0x100a0d9d0>

# Namespace Packages
# ---------------
>>> import namepkg

# Have no  __file__ attribute
>>> import namepkg
>>> namepkg.__file__

# Have a immutable __path__ attribute
>>> namepkg.__path__
_NamespacePath(['/Users/pedram/projects/python-envs/namespaces/namepkg'])

# Use a NamespaceFileLoader
>>> namepkg.__loader__
<_frozen_importlib_external.NamespaceLoader object at 0x100a0d3d0>Namespace Packages let you do some unholy things:

What the __init__.py file does is define whether a particular folder should be considered a package. A package is a way of structuring Python modules with namespaces. Let’s build out a deeper example.

For example, say you had a project that dealt with various languages. You may want to be explicit about which language a module belongs to. Packages allow you to use dotted module names to reference these. Suppose you had a structure like this:

lang/                   # Root package directory
__init__.py
    english/              # English package
        __init__.py
        translator.py       # english.translator module
    french/               # French package
        __init
        translator.py       # french.translator module

You could then reference your modules like so:

import lang.english.translator
import lang.french.translator

lang.english.translator.word("hello")
lang.french.translator.word("goodbye")

This is the normal way that Python works, but ever since the introduction of namespace packages, Python can now implicitly load packages.

mkdir -p lang lang/french lang/english
touch lang/english/translator.py lang/french/translator.py

# lang/english/translator.py
def word(word):
    return word

# lang/french/translator.py
def word(word):
    return "le " + word

>>> import lang.english.translator
>>> import lang.french.translator
>>>
>>> lang.english.translator.word("hello")
'hello'

lang.french.translator.word("goodbye")
>>> lang.french.translator.word("goodbye")
'le goodbye'

So when would you want to use namespace packages and when would you prefer to use explicit packages with __init__.py ? In general, you should almost always lean on explicit packages unless you’re building complex packages.

The trouble with relative imports

Odds are you have seen relative imports in the wild:


# ├── tree/
# │   ├── __init__.py
# │   ├── child.py
# │   └── sibling.py

# in tree/sibling.py
from . import child

Odds are you’ve even tried to use relative imports yourself:

❯ python sibling.py
Traceback (most recent call last):
  File "/Users/pedram/projects/python-envs/relatives/tree/sibling.py", line 1, in <module>
    from . import child
ImportError: attempted relative import with no known parent package

At this point, you may have just given up, or you may have tried to be persistent.

❯ python -m sibling
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/pedram/projects/python-envs/relatives/tree/sibling.py", line 1, in <module>
    from . import child
ImportError: attempted relative import with no known parent package

❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import child
hello my child
>>> import sibling
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pedram/projects/python-envs/relatives/tree/sibling.py", line 1, in <module>
    from . import child
ImportError: attempted relative import with no known parent package
>>>

Eventually, you may give up and just try

❯ cat sibling.py
import child

❯ python -m sibling
hello my child

So, what does this error even mean?

ImportError: attempted relative import with no known parent package

You created __init__.py so your tree folder must be a package.

Turns out, it is, but only when you’re not in the folder.

~/projects/python-envs/relatives/tree main*
❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import child
hello my child
>>> child.__package__
''
> cd ..
~/projects/python-envs/relatives main*
❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tree
>>> tree.__package__
'tree'
>>> from tree import sibling
hello my child

When you run a Python script, it will only detect a package when it is not run from within the folder itself. The reasons for this are arcane and involve mysterious concepts such as __name__ and __main__ but for our discussion it’s enough to know that to have a folder detected as a package, you must be one step above the folder when invoking Python from the command-line.

From the Python docs:

When importing the package, Python searches through the directories on sys.path looking for the package subdirectory.

When we ran python -m sibling.py Python runs sibling.py as a module, not as a package. Even though we have __init__.py, Python does not create a package.

To get Python to treat our files as a package, we need to move up directory and import tree in order to have it treated as a package.

Of course, if we want to avoid all of the pain we’ve experienced, the best option we have is to install our code as a Python Package.

A Minimal Python Package

my_package
│   setup.py
│
└───tree
    │   __init__.py
    │   child.py
    │   sibling.py

In setup.py

from setuptools import setup, find_packages

setup(
    name='tree',
    version='0.1',
    packages=find_packages(),
)

Now, from the root my_package folder, install this folder as an editable package:

pip install -e .
Obtaining file:///Users/pedram/projects/python-envs/relatives/tree
  Preparing metadata (setup.py) ... done
Installing collected packages: tree
  Attempting uninstall: tree
    Found existing installation: tree 0.1
    Uninstalling tree-0.1:
      Successfully uninstalled tree-0.1
  Running setup.py develop for tree
Successfully installed tree-0.1

Finally, to make it so we can run scripts from anywhere without worry, we’ll change all our relative imports to absolute imports:

test ❯ bat *.py
───────┬────────────────────────────────────────────────────────────────────────
       │ File: child.py
───────┼────────────────────────────────────────────────────────────────────────
   1print('hello my child')
───────┴────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────
       │ File: sibling.py
───────┼────────────────────────────────────────────────────────────────────────
   1from tree import child
───────┴────────────────────────────────────────────────────────────────────────

Now we can run our script in any way we want, anywhere we want, without ever having to think about the dread Python system path again.

~/projects/python-envs/relatives/tree/tree main*
test ❯ python -m sibling
hello my child

~/projects/python-envs/relatives/tree/tree main*
test ❯ python sibling.py
hello my child

~/projects/python-envs/relatives/tree main*
test ❯ python tree/sibling.py
hello my child

~/projects/python-envs/relatives/tree main*
test ❯ python -m tree.sibling
hello my child

test ❯ cd ~
~
test ❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tree.sibling
hello my child
>>>
>>> import sys
>>> sys.path
['', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip',
'/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11',
'/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload',
'/Users/pedram/.pyenv/versions/test/lib/python3.11/site-packages',
'/Users/pedram/projects/python-envs/relatives/tree']

Wrapping Up

I hope this was a fun exploration of the nooks and crannies of Python’s packaging system. For those of you who’ve had the pleasure of using a more modern system, Python’s package management can seem archaic and clunky, and well, it is. But hopefully understanding how Python deals with packages and imports can help the next time you get frustrated trying to do something as simple as importing a module.


The Dagster Labs logo

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:


Read more filed under
Blog post category for Python Guide. Python Guide