November 7, 2022 • 5 minute read •

Adding Types to a Large Python Codebase

Name: Sean Mackesey
Handle

Python is one of the world’s beloved programming languages. Dynamic typing makes it easy to learn and fast to build with. Its popularity has led to its use in some of the most advanced AI systems and web applications on the planet.

However, traditional dynamically typed Python code struggles at scale. Specifically:

Refactoring is painful. It is challenging to refactor any large dynamically typed codebase. This is because dynamically typed code can’t be processed with the sort of static analysis and refactoring tools widely available in statically typed languages (e.g. Java, C#). These tools support automatic symbol renaming, reference lookup, edit-time type-checking etc.
It’s hard to ensure correctness. Dynamically typed Python code is difficult to check for correctness. The best available method is to write comprehensive automated tests, but writing and maintaining a large test suite is expensive. Tests are also often too slow to provide feedback while editing in the same way as static analysis tools, allowing errors to compound for longer.

Python is not the only language with these problems. The world’s most popular language, JavaScript, is in the same boat. TypeScript, a statically typed variant of JavaScript first released in 2012, has taken the JavaScript world by storm. Typescript added rich type annotations to Javascript, which opened the door to refactoring and static analysis tools that facilitate development and maintenance of large Javascript projects.

MyPy the most popular type checker for Python.

That same year, Jukka Lehtosalo presented mypy, a project that promised to bring similar benefits to Python. Two years later, mypy-inspired type annotations became a Python standard with PEP 484 in Python 3.5, with improvements released in each subsequent major version of Python. Today in 2022, there are a variety of mature type-checking tools (e.g. mypy, pyright, pyre) that consume Python type annotations, and libraries are increasingly expected to provide a type-annotated public interface.

📚 The Dagster story
💻 Step 1: configure a language server
✅ Step 2: mark published package with py.typed
📖 Step 3: formalize public API
🔌 Step 4: set up CI
🏦 Step 5: annotate!
🏆 The payoff

📚 The Dagster story

Development of Dagster started in 2018, before type-annotating Python code was a best practice. We wrote thousands of files containing hundreds of thousands of lines of code without types. At some point we started including types with most new code, but crucial older parts of the codebase remained untyped. As Dagster grew in scale and popularity, both our developers and users started to suffer from the lack of type annotations. Developers struggled to orient themselves in unfamiliar parts of the codebase, and users struggled to master Dagster’s many APIs.

We committed to making large improvements to Dagster’s typing, starting by driving the core dagster package to a 100%-typed public interface. This has turned out to be a significant undertaking. In this post we document what we learned for others embarking on similar projects. We’ve broken it down into a 5-step process for sprucing up and maintaining the type annotations in any Python project.

💻 Step 1: configure a language server

Annotating a large codebase is extremely tedious without proper tooling. It’s not uncommon for a single module to need to import a dozen or more symbols purely for annotation purposes. You do not want to do this manually in hundreds of files.

A type checker alone cannot help here— you need auto-import, an autocomplete “side effect” that seamlessly adds completed symbols to a module’s imports. Auto-import is particularly important in Python because so many common types (Optional, Sequence, List, etc) are not in the global namespace and will need to be imported from typing in almost every module.

Auto-import (and more generally autocomplete) is best provided by a language server. What is the difference between a type checker and language server? A type checker (e.g. mypy) is a program that analyzes your source code and flags type inconsistencies. Most can be run like a typical linter or other command line program— you pass in input/configuration and the program analyzes your code and dumps a list of errors as output. A language server (e.g. pylance) is a process that runs continuously in the background during editing. It is typically launched in the background by an editor, keeps a living representation of your code in memory, and communicates with the editor using Language Server Protocol. A full-featured language server provides a superset of type checker functionality-- it can perform type checking for a target file on each keystroke, or collate type errors with other kinds of errors.

	type checker	full language server
Checks types	✅	✅
Real-time feedback (autocomplete etc)	❌	✅
Interactive code navigation (jump-to-definition, find references etc)	❌	✅
Automated refactoring (rename symbol, etc)	❌	✅

Language servers can also leverage type annotations in ways outside the scope of a type-checker. In addition to the aforementioned auto-import, many language servers provide an API to jump to the class definition of an instance under your cursor, search for all instances in code where a class is referenced, and more. These features are very useful in large codebases. Do yourself a favor and do not attempt to annotate your code without first setting up a language server.

Python’s dominant language server is Microsoft’s pyright/pylance (there is also an unmaintained language server by Palantir named python-language-server-- skip this one). It is important to understand that pyright/pylance are totally separate entities from mypy. Mypy is Python's most popular type checker because it came first, is written in Python itself, and is semi-official (it’s hosted under the python organization on Github). But it is not a language server, although a variety of (limited functionality) third-party language servers or other editor extensions can convert mypy output into real-time editing feedback.

The marketing of pyright/pylance is somewhat confusing (see here for an incomplete list of differences).

Pyright is open-source and is billed as a “static type checker for Python”— this is accurate but misleading. Pyright does include a full-featured static type checker (written in Typescript and wholly separate from mypy's implementation) but it also (unlike mypy) includes a language server providing many powerful features, including auto-import.

Pylance is closed-source and is billed as a “language server extension for Visual Studio Code”. Whereas pyright can be run in any context, pylance is released as a VSCode extension and is only supposed to be run with VSCode. The core of the extension is simply a language server executable that in principle could be used with any editor that speaks Language Server Protocol. However, Microsoft has implemented protections to block execution outside of VSCode. Type-checking and the bulk of pylance’s other functionality is actually provided by the open-source pyright, but there is an additional proprietary layer of code that provides more powerful language server features.

Unless you have an existing mypy workflow or configuration, we recommend skipping mypy entirely and just using pyright (for command line type-checking and a language server for non-VSCode editors) and pylance (as a language server with VSCode). It’s not just the language server functionality— pyright’s type-checking is also better than mypy's. Pyright is faster, has a much more responsive development team (bugs are fixed fast; mypy bugs linger on the issue tracker for years), and understands certain typing constructs (particularly recursive types) better than mypy.

Note that configuring pyright/pylance (together with other tools like linters) can be tricky, particularly in a monorepo. It is quite easy to inadvertently run tools in the wrong virtual environment, spawn multiple language server instances when you only want one, overwrite configuration due to having multiple configuration files in the same repo, or exclude a subsidiary library from type-checking. It is well worth it to thoroughly acquaint yourself with pyright’s configuration (also consumed by pylance) and your editor’s documentation on language server management. If you see some confusing type error, it’s best to do a sanity check: ensure that pyright/pylance is using the right configuration and is running in the proper Python environment before banging your head against a wall trying to solve typing puzzles.

Interested in trying Dagster Cloud for Free?

Enterprise orchestration that puts developer experience first. Serverless or hybrid deployments, native branching, and out-of-the-box CI/CD.

Try Dagster Cloud Free for 30 days

✅ Step 2: mark published package with `py.typed`

Because so much Python code is untyped, popular static type checkers (mypy, pyright) do not use type information from third-party packages by default. You must explicitly mark your package as typed by including a py.typed file in your published package. This is described in PEP 561:

Package maintainers who wish to support type checking of their code MUST add a marker file named py.typed to their package supporting typing. This marker applies recursively: if a top-level package includes it, all its sub-packages MUST support type checking as well.

If your package does not include py.typed, your users will not get the benefit of your hard work adding type annotations! Even if every function in your package is fully annotated, mypy will ignore those annotations and print warnings like this:

main.py:1: error: Skipping analyzing 'dagster': module is installed, but missing library stubs or py.typed marker

This is easy to miss as a developer because it only occurs when mypy discovers a package via an import statement (typical for a library consumer), whereas a developer is likely to pass the source root directly to mypy, in which case it is checked regardless of the existence of py.typed. You can read more about this behavior here (for mypy) and here (for pyright/pylance).

Note that despite PEP 561’s claim that “if a top-level package includes it, all its sub-packages MUST support type checking as well”, you don’t have to wait until your package is 100% typed before adding py.typed. Type checkers are forgiving of untyped code, and it is better to offer users access to whatever type annotations you’ve added rather than waiting until 100% coverage.

One final gotcha: it’s not enough to just add py.typed to your source repo— you need to ensure it’s included in the published distribution of your package. If you are using setuptools (you probably are), PEP 561 recommends:

setup(
    ...,
    package_data = {
        'foopkg': ['py.typed'],
    },
    ...,
)

📖 Step 3: formalize public API

Our recommendation as library authors is to prioritize bringing your public API to 100% type coverage. To meet and maintain this standard, it helps to formalize your public API, meaning describe it in a machine-legible form. This will allow (a) typing coverage analysis to assess your public API separately from the rest of your codebase; (b) other static analysis tools to analyze and interpret your code for users (by, e.g., hiding private symbols from autocomplete); (c) generated API documentation to reliably include only APIs you intend to be public.

Historically, Python hasn’t had a way to formalize a public API at the library level. Fortunately, there is now a nascent Python standard (still not in a PEP to our knowledge). TL; DR:

A library's public API is generally opt-out-- modules and (most) symbols defined in those modules are public by default.
All modules/packages that you want to hide from your public API must be prefixed with an _. An _-prefixed package also marks all contained modules private, so these contained modules don’t themselves require an _-prefix.
Within public modules, there is a distinction between public and private symbols. _-prefixed symbols and imported symbols are not public by default. To mark an imported symbol public, you’ll need to alias it (from foo import bar as <some_alias>) or include it in __all__. (Note that the default private status of imported symbols can be unintuitive, since it is common to want to “reexport” imported symbols from a root-level package for a public interface).

In Dagster, our public API was previously informally defined as “everything importable from top-level dagster". To bring this into conformance with the above, we needed to _-prefix all of the top-level dagster submodules (e.g. dagster.core became dagster._core). Dagster is a large project with many source files, so this entailed changing thousands of import statements across multiple published libraries (as well as the occasional submodule reference in text) in addition to our private internal code. It was a significant project that we accomplished using a combination of refactoring capabilities of PyCharm IDE, pylance, and command-line tools sed and rpl. Note that PyCharm IDE performed much better than pylance for large rename operations— pylance sometimes crashed when the operation was too large (i.e. changes required in ~1000 files).

We also redundantly aliased all of the imported symbols in the top-level dagster module (per above, this is needed for these imported symbols to be public):

from dagster._builtins import (
    Any as Any,
    Bool as Bool,
    Float as Float,
    Int as Int,
    Nothing as Nothing,
    String as String,
)
from dagster._config.config_schema import (
    ConfigSchema as ConfigSchema,
)
...

Once we’d done this, we wanted to measure the typing coverage of just our public API. This can be done with the “verify types” functionality of pyright. Below is some (truncated) example output from running this against the dagster package (note that this won’t work if your package lacks a py.typed file):

$ pyright --ignoreexternal --verifytypes dagster

dagster._core.launcher.default_run_launcher.DefaultRunLauncher.inst_data
  /Users/smackesey/stm/code/elementl/oss/python_modules/dagster/dagster/_core/launcher/default_run_launcher.py: error: Return type annotation is missing
... more type errors...

Symbols exported by "dagster": 274
  With known type: 251
  With ambiguous type: 3
  With unknown type: 20
    (Ignoring unknown types imported from other packages)
  Functions without docstring: 4
  Functions without default param: 14
  Classes without docstring: 0

Other symbols referenced but not exported by "dagster": 5097
  With known type: 3091
  With ambiguous type: 64
  With unknown type: 1942

Type completeness score: 91.6%

Completed in 3.006sec

The above approach allowed us to formalize our public API at the level of modules and symbols exported from those modules. However, we wanted to be even more specific— we wanted the ability to have a “public” method on an exported class that was nevertheless hidden from our users.

This situation can easily arise in large projects. You have a class that is widely used internally, but also sometimes accessed by users (e.g. DagsterInstance). You want to provide a different class interface for internal code and for users. Python’s convention of _-prefixing methods cannot help you here— this marks a method private in the traditional object-oriented sense, meaning that the method is not intended to be called outside the class. But within the set of public (non _-prefixed methods), there is no standard way to distinguish methods intended for users from those intended for internal use. This problem is shared by many languages (and partially solvable with friend classes, which Python lacks).

To solve this problem, we decided to require that methods on public classes opt in to our public API through a custom @public decorator:

@public
def get_run_by_id(self, run_id: str) -> Optional[DagsterRun]:
    return cast(DagsterRun, self._run_storage.get_run_by_id(run_id))

Since this isn’t a Python standard, pyright --verifytypes does not respect it— it includes non-@public methods in its evaluation of our public API. However, we were able to use a custom Sphinx extension to filter the methods included in our API docs to the @public ones:

#9632 Fix sphinx autoclass members

🔌 Step 4: set up CI

Continuous integration (CI) is essential for maintaining standards in a large collaborative codebase. We use Buildkite for our CI platform, with our configuration defined in this package. In addition to assorted tests and linters, our setup also checks for type correctness and will soon check for type coverage.

Despite our earlier recommendation of pyright, our CI type correctness checks still use mypy for almost every package in our codebase. This is due to inertia— we started our typing journey with mypy, and pyright is by default significantly stricter than mypy. We haven’t yet gotten the codebase to a point where it will pass pyright.

Mypy works well in CI because of its conservative behavior-- by default, it will “ignore” untyped code. This means several things, but the most important is that the return values of untyped functions are assigned Any, which means they will never trigger type errors. This is convenient for running on CI on largely untyped code— mypy will only flag errors in code that someone has tried to type.

We will soon be using pyright --ignoreexternal --verifytypes in CI to check completeness of dagster. Many of our integration libraries lack thoroughly typed public interfaces, so we won’t be adding completeness checks for them for now. We are still just shy of a 100% typed public API for dagster, so the CI addition remains an open PR:

#9113 Add pyright typecheck and coverage steps to BK

⌨️ Step 5: annotate!

At this point, it’s time to actually add type annotations! Here are a bunch of PRs where we did this incrementally:

🏆 The payoff

So was it worth it? We think so. Despite the large developer investment, our type annotation improvements have paid off in both enhanced development velocity and user experience. We expect even greater dividends as Dagster ages, features are added, and our developers lose familiarity with older parts of the codebase.

As consumers of many other open-source Python libraries, we hope other Python developers start to prioritize fully typed public interfaces. We'd like to contribute where we can. In the coming months, we’ll be publishing at least one follow-up piece on this blog that gets into some of the nitty-gritty details around Python type annotations and associated tooling.

If you want to support the Dagster Open Source project, be sure to Star our Github repo.

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us: