An exposé of the nooks and crannies of Python’s modules and packages.
The following article is part of a series on Python for data engineering aimed at helping data engineers, data scientists, data analysts, Machine Learning engineers, or others who are new to Python master the basics. To date this beginners guide consists of:
- Part 1: Python Packages: a Primer for Data People (part 1 of 2), explored the basics of Python modules, Python packages and how to import modules into your own projects.
- Part 2: Python Packages: a Primer for Data People (part 2 of 2), covered dependency management and virtual environments.
- Part 3: Best Practics in Structuring Python Projects, covered 9 best practices and examples on structuring your projects.
- Part 4: From Python Projects to Dagster Pipelines, we explore setting up a Dagster project, and the key concept of Data Assets.
- Part 5: Environment Variables in Python, we cover the importance of environment variables and how to use them.
- Part 6: Type Hinting, or how type hints reduce errors.
- Part 7: Factory Patterns, or learning design patterns, which are reusable solutions to common problems in software design.
- Part 8: Write-Audit-Publish in data pipelines a design pattern frequently used in ETL to ensure data quality and reliability.
- Part 9: CI/CD and Data Pipeline Automation (with Git), learn to automate data pipelines and deployments with Git.
- Part 10: High-performance Python for Data Engineering, learn how to code data pipelines in Python for performance.
- Part 11: Breaking Packages in Python, in which we explore the sharp edges of Python’s system of imports, modules, and packages.
Sign up for our newsletter to get all the updates! And if you enjoyed this guide check out our data engineering glossary, complete with Python code examples.
Anyone who has spent sufficient time in Python has been hit with seemingly odd behaviors that defy expectations. For a language that claims "There should be one-- and preferably only one --obvious way to do it”, there seem to be very many ways of doing things, and none of them all that obvious.

In this post, we’ll dive deep into Python’s system of imports, modules, and packages to look for some sharp edges and hopefully learn more about how Python works under the hood. Let’s dive in!
Two ways of running a module
Let’s start with a simple example and see how much we can break our expectations. We will start with a root project folder, in my case, it’s called python-envs
.
We’ll create two files, hello.py
and goodbye.py
, and add some simple print statements.
We will import hello
from the module to show how a simple import works.
### File: hello.py
### --------------
def hello():
print("Hello, from the hello function in {}!".format(__name__))
if __name__ == "__main__":
hello()
print("Hello, from the main block!")
### File: goodbye.py
### ----------------
import hello
def goodbye():
print("Goodbye, from the goodbye function in {}!".format(__name__))
if __name__ == "__main__":
hello.hello()
goodbye()
print("Goodbye, from the main block!")
We can easily run hello.py
two ways in Python, either by invoking the script directly with the command python hello.py
or as a module with python -m hello
. Both seem to work great!
❯ python hello.py
Hello, from the hello function in __main__!
Hello, from the main block!
❯ python -m hello
Hello, from the hello function in __main__!
Hello, from the main block!
And likewise, we can do the same with goodbye.py
:
❯ python goodbye.py
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!
❯ python -m goodbye
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!
So far so good.
Structuring our project into subfolders
Let’s say we wanted to start organizing our files, what if we moved these two files into a subfolder?
❯ mkdir -p subfolder
❯ mv *.py ./subfolder
❯ tree -I __pycache__ # ignore __pycache__ folders
.
└── subfolder
├── goodbye.py
└── hello.py
2 directories, 2 files
Nothing has really changed yet. Let’s try and run hello
and from the root folder.
❯ python subfolder/hello.py
Hello, from the hello function in __main__!
Hello, from the main block!
❯ python subfolder/goodbye.py
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!
❯ python -m subfolder.hello
Hello, from the hello function in __main__!
Hello, from the main block!
❯ python -m subfolder.goodbye
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
import hello
ModuleNotFoundError: No module named 'hello'
What happened? These two seemingly similar ways of running a Python file result in different behaviors. Let’s start with the first example python subfolder/goodbye.py
When we run the above commands what does Python actually do? One great way to find out is to run Python in interactive mode with python -i subfolder/goodbye.py
.
This will run our script and then drop us in the interactive REPL.
❯ python -i subfolder/goodbye.py
Hello, from the hello function in hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!
>>>
Let’s take a look at sys.path
which defines every folder Python will search for in search of modules:
import sys
print(sys.path)
['/Users/pedram/projects/python-envs/subfolder', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/site-packages']
When we run a python file directly by invoking it with python script.py
Python treats this file as a script, and adds the folder of that script to the system path. The system path determines where Python looks for modules, and since both hello.py
and goodbye.py
are in the same folder, Python is able to import hello
.
>>> hello
<module 'hello' from '/Users/pedram/projects/python-envs/subfolder/hello.py'>
When we try to load subfolder.goodbye
as a submodule, however, it doesn’t work:
❯ python -im subfolder.goodbye
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
import hello
ModuleNotFoundError: No module named 'hello'
>>> import sys
>>> print(sys.path)
['/Users/pedram/projects/python-envs', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/site-packages']
The sharp edge of Python’s import system
Our sys.path
is missing the subfolder! It turns out we’ve discovered the first sharp edge of Python’s import system: when running a script directly, Python adds the script’s folder to the system path, but it does not do this when running a module.
However, since it does include our root folder, we can import subfolder
as a module
>>> import subfolder
>>> subfolder.__path__
_NamespacePath(['/Users/pedram/projects/python-envs/subfolder'])
We can update our existing code to change our import from import hello
to import subfolder.hello as hello
. Now our script works!
import subfolder.hello as hello
def goodbye():
print("Goodbye, from the goodbye function in {}!".format(__name__))
if __name__ == "__main__":
hello.hello()
goodbye()
print("Goodbye, from the main block!")
❯ python -im subfolder.goodbye
Hello, from the hello function in subfolder.hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!
Let’s see if we can still run the file directly:
❯ python subfolder/goodbye.py
Traceback (most recent call last):
File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
import subfolder.hello as hello
ModuleNotFoundError: No module named 'subfolder'
It turns out that in Python when you fix one problem, you sometimes create a new one.
Let's see if we can diagnose this problem. We'll run Python in interactive mode:
❯ python -i subfolder/goodbye.py
Traceback (most recent call last):
File "/Users/pedram/projects/python-envs/subfolder/goodbye.py", line 1, in <module>
import subfolder.hello as hello
ModuleNotFoundError: No module named 'subfolder'
>>> import subfolder
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'subfolder'
>>> import sys
>>> sys.path
['/Users/pedram/projects/python-envs/subfolder', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload', '/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/site-packages']
>>>
It appears there’s two differences between running python subfolder/goodbye.py
and python -m subfolder.goodbye
. When running a script directly, Python does add the folder where the script is located to your sys.path
but it does not add the current working directory to your path. Conversely, when running a file as a module with -m
Python does add your current working directory to your sys.path
but it does not add the folder of the script.
Let’s summarize what we’ve learned so far!
Now, you’re wondering “well how do I make sure this works in all cases?” The short answer is to install your code as a package. If you don’t know what that means, don’t worry, we’ll talk about it below, or you can check out our Python Packages Primer. In the meantime, let’s explore another sharp edge.
Implicit Namespace Packages
I didn’t tell you this, but this whole time, we were dealing with a second sharp edge: implicit namespace packages.
We saw in the previous section how, when we have a folder, we can use it as part of the namespace when importing modules. When we have this:
We can do this
❯ python -m subfolder.goodbye
Hello, from the hello function in subfolder.hello!
Goodbye, from the goodbye function in __main__!
Goodbye, from the main block!
But this wasn’t always the case. In fact, prior to the release of Python 3.3 in 2012, the above code wouldn’t work. Since I couldn’t get Python 3.2 working on my Mac, I’ll use Docker.
> cat Dockerfile
FROM python:3.2
COPY . .
RUN python -m subfolder.goodbye
❯ docker build .
> [3/3] RUN python -m subfolder.goodbye:
0.752 /usr/local/bin/python: No module named subfolder
The reason our code above worked in Python 3.3 and later is due to implicit Namespace Packages, as discussed in PEP420 I won’t get into the nitty-gritty aspects of it (for the curious, take some time to read the Python-DEV thread on this) but in essence, PEP420 made it possible to treat folders as a Python Package without having to add an __init__.py
If we add the __init__.py
file to our subfolder, then even Python 3.2 can run our code:
❯ touch subfolder/__init__.py
=> [3/3] RUN python -m subfolder.goodbye
What is this __init__.py
file anyway? Let’s take a close look at what happens before and after we use one, back to Python 3.11: file to our subfolder, then even Python 3.2 can run our code:
### No __init__.py
>>> import namepkg.hello
>>> namepkg.__path__
_NamespacePath(['/Users/pedram/projects/python-envs/namespaces/namepkg'])
### With __init__.py
>>> import normalpkg.hello
>>> normalpkg.__path__
['/Users/pedram/projects/python-envs/namespaces/normalpkg']
When we import module
, Python acts differently depending on whether or not the __init__.py
file is found in the folder. In the first example, without __init__.py
Python creates a namespace package when trying to import a module. The second example, with the __init__.py
file, creates a normal Python package.
We can see the differences between these two types of imports in a few ways:
### Normal Packages
### ---------------
>>> import normalpkg
### Have a defined __file__ attribute
>>> normalpkg.__file__
'/Users/pedram/projects/python-envs/namespaces/normalpkg/__init__.py'
### Have a mutable __path__ attribute
>>> normalpkg.__path__
['/Users/pedram/projects/python-envs/namespaces/normalpkg']
### Use a SourceFileLoader
>>> normalpkg.__loader__
<_frozen_importlib_external.SourceFileLoader object at 0x100a0d9d0>
### Namespace Packages
### ---------------
>>> import namepkg
### Have no __file__ attribute
>>> import namepkg
>>> namepkg.__file__
### Have a immutable __path__ attribute
>>> namepkg.__path__
_NamespacePath(['/Users/pedram/projects/python-envs/namespaces/namepkg'])
### Use a NamespaceFileLoader
>>> namepkg.__loader__
<_frozen_importlib_external.NamespaceLoader object at 0x100a0d3d0>Namespace Packages let you do some unholy things:
What the __init__.py
file does is define whether a particular folder should be considered a package. A package is a way of structuring Python modules with namespaces. Let’s build out a deeper example.
For example, say you had a project that dealt with various languages. You may want to be explicit about which language a module belongs to. Packages allow you to use dotted module names to reference these. Suppose you had a structure like this:
lang/ # Root package directory
__init__.py
english/ # English package
__init__.py
translator.py # english.translator module
french/ # French package
__init
translator.py # french.translator module
You could then reference your modules like so:
import lang.english.translator
import lang.french.translator
lang.english.translator.word("hello")
lang.french.translator.word("goodbye")
This is the normal way that Python works, but ever since the introduction of namespace packages, Python can now implicitly load packages.
mkdir -p lang lang/french lang/english
touch lang/english/translator.py lang/french/translator.py
### lang/english/translator.py
def word(word):
return word
### lang/french/translator.py
def word(word):
return "le " + word
>>> import lang.english.translator
>>> import lang.french.translator
>>>
>>> lang.english.translator.word("hello")
'hello'
lang.french.translator.word("goodbye")
>>> lang.french.translator.word("goodbye")
'le goodbye'
So when would you want to use namespace packages and when would you prefer to use explicit packages with __init__.py
? In general, you should almost always lean on explicit packages unless you’re building complex packages.
The trouble with relative imports
Odds are you have seen relative imports in the wild:
### ├── tree/
### │ ├── __init__.py
### │ ├── child.py
### │ └── sibling.py
### in tree/sibling.py
from . import child
Odds are you’ve even tried to use relative imports yourself:
❯ python sibling.py
Traceback (most recent call last):
File "/Users/pedram/projects/python-envs/relatives/tree/sibling.py", line 1, in <module>
from . import child
ImportError: attempted relative import with no known parent package
At this point, you may have just given up, or you may have tried to be persistent.
❯ python -m sibling
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/pedram/projects/python-envs/relatives/tree/sibling.py", line 1, in <module>
from . import child
ImportError: attempted relative import with no known parent package
❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import child
hello my child
>>> import sibling
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/pedram/projects/python-envs/relatives/tree/sibling.py", line 1, in <module>
from . import child
ImportError: attempted relative import with no known parent package
>>>
Eventually, you may give up and just try
❯ cat sibling.py
import child
❯ python -m sibling
hello my child
So, what does this error even mean?
ImportError: attempted relative import with no known parent package
You created __init__.py
so your tree
folder must be a package.
Turns out, it is, but only when you’re not in the folder.
~/projects/python-envs/relatives/tree main*
❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import child
hello my child
>>> child.__package__
''
> cd ..
~/projects/python-envs/relatives main*
❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tree
>>> tree.__package__
'tree'
>>> from tree import sibling
hello my child
When you run a Python script, it will only detect a package when it is not run from within the folder itself. The reasons for this are arcane and involve mysterious concepts such as __name__
and __main__
but for our discussion it’s enough to know that to have a folder detected as a package, you must be one step above the folder when invoking Python from the command-line.
From the Python docs:
When importing the package, Python searches through the directories on sys.path
looking for the package subdirectory.
When we ran python -m sibling.py
Python runs sibling.py
as a module, not as a package. Even though we have __init__.py
, Python does not create a package.
To get Python to treat our files as a package, we need to move up directory and import tree
in order to have it treated as a package.
Of course, if we want to avoid all of the pain we’ve experienced, the best option we have is to install our code as a Python Package.
A Minimal Python Package
my_package
│ setup.py
│
└───tree
│ __init__.py
│ child.py
│ sibling.py
In setup.py
from setuptools import setup, find_packages
setup(
name='tree',
version='0.1',
packages=find_packages(),
)
Now, from the root folder, install this folder as an editable package:
pip install -e .
Obtaining file:///Users/pedram/projects/python-envs/relatives/tree
Preparing metadata (setup.py) ... done
Installing collected packages: tree
Attempting uninstall: tree
Found existing installation: tree 0.1
Uninstalling tree-0.1:
Successfully uninstalled tree-0.1
Running setup.py develop for tree
Successfully installed tree-0.1
Finally, to make it so we can run scripts from anywhere without worry, we’ll change all our relative imports to absolute imports:
test ❯ bat *.py
───────┬────────────────────────────────────────────────────────────────────────
│ File: child.py
───────┼────────────────────────────────────────────────────────────────────────
1 │ print('hello my child')
───────┴────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────
│ File: sibling.py
───────┼────────────────────────────────────────────────────────────────────────
1 │ from tree import child
───────┴────────────────────────────────────────────────────────────────────────
Now we can run our script in any way we want, anywhere we want, without ever having to think about the dread Python system path again.
~/projects/python-envs/relatives/tree/tree main*
test ❯ python -m sibling
hello my child
~/projects/python-envs/relatives/tree/tree main*
test ❯ python sibling.py
hello my child
~/projects/python-envs/relatives/tree main*
test ❯ python tree/sibling.py
hello my child
~/projects/python-envs/relatives/tree main*
test ❯ python -m tree.sibling
hello my child
test ❯ cd ~
~
test ❯ python
Python 3.11.5 (main, Jan 18 2024, 19:32:12) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tree.sibling
hello my child
>>>
>>> import sys
>>> sys.path
['', '/Users/pedram/.pyenv/versions/3.11.5/lib/python311.zip',
'/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11',
'/Users/pedram/.pyenv/versions/3.11.5/lib/python3.11/lib-dynload',
'/Users/pedram/.pyenv/versions/test/lib/python3.11/site-packages',
'/Users/pedram/projects/python-envs/relatives/tree']
Wrapping Up
I hope this was a fun exploration of the nooks and crannies of Python’s packaging system. For those of you who’ve had the pleasure of using a more modern system, Python’s package management can seem archaic and clunky, and well, it is. But hopefully understanding how Python deals with packages and imports can help the next time you get frustrated trying to do something as simple as importing a module.