Explore another set of powerful yet overlooked Python features—from overload and cached_property to contextvars and ExitStack
In our previous post, we peeled back the curtain on some lesser-known Python features that power the Dagster library. These features often won’t appear in Copilot suggestions, but they can make a big difference in your code’s performance, clarity, and maintainability.
Of course, Python has far too many of these gems to fit into a single post. In this sequel, we’ll explore another set of features and patterns we rely on in Dagster that are just as powerful, yet often overlooked.
So fire up your favorite IDE, follow along if you’d like, and let’s keep digging into Python features that your coding assistant probably won’t mention but you’ll be glad you know about.
overload
Function overloading lets you declare multiple variations of the same function name, each with a different signature. This is useful when you want a single function name to handle different styles of input while still giving developers precise type hints and IDE assistance.
In Python, the `typing.overload` decorator enables this at type-checking time. Dagster uses it in definitions like `ResourceDefinition` (and other decorators) to present user-friendly call signatures while still wrapping callables behind the scenes.
from typing import overload
@overload
def resource(config_schema: ResourceFunction) -> ResourceDefinition: ...
@overload
def resource(
config_schema: CoercableToConfigSchema = ...,
description: Optional[str] = ...,
required_resource_keys: Optional[AbstractSet[str]] = ...,
version: Optional[str] = ...,
) -> Callable[[ResourceFunction], "ResourceDefinition"]: ...
Each `@overload` here defines a valid call pattern for the `resource` decorator:
- The first overload supports `@resource` used directly on a function.
- The second overload supports `@resource(...)` with arguments.
Again it is important that overload definitions are only used by static type checkers. They don’t generate runtime behavior and should not be called directly.
To make them work, you still need to define a single implementation function without `@overload`:
def resource(
config_schema: Union[ResourceFunction, CoercableToConfigSchema] = None,
description: Optional[str] = None,
required_resource_keys: Optional[AbstractSet[str]] = None,
version: Optional[str] = None,
) -> Union[Callable[[ResourceFunction], "ResourceDefinition"], "ResourceDefinition"]:
if callable(config_schema) and not is_callable_valid_config_arg(config_schema):
return _ResourceDecoratorCallable()(config_schema)
def _wrap(resource_fn: ResourceFunction) -> "ResourceDefinition":
return _ResourceDecoratorCallable(
config_schema=cast("Optional[dict[str, Any]]", config_schema),
description=description,
required_resource_keys=required_resource_keys,
version=version,
)(resource_fn)
return _wrap
cached_property
In the last post, we looked at `functools.lru_cache`, a handy way to cache the results of expensive operations. But it’s not the only caching tool in Python’s standard library. `functools` also provides `cached_property`, which Dagster uses in situations like the `DbtCliResource` to determine the dbt CLI version only once.
from functools import cached_property
class DbtCliResource(ConfigurableResource):
@cached_property
def _cli_version(self) -> version.Version:
...
At first glance, cached_property might seem similar to lru_cache, but they serve different purposes:
cached_property
- Works only on instance methods meant to be accessed like attributes.
- Computes the value once on first access and stores it on that specific instance (in __dict__ or via __set_name__ on classes without __dict__).
- Each instance gets its own cached value.
lru_cache
- Can decorate any function or method.
- Caches results keyed by all arguments passed to the function.
- The cache is stored on the function itself, so it’s shared across all callers and instances.
- Supports an eviction policy (maxsize), unlike cached_property.
In general you should use `cached_property` for anything you would access as an attribute that should be fixed for the lifetime of the instance while `lru_cache` works better for functions (or methods) that are called multiple times. Especially when the computation depends on the arguments.
contextvars
Building a data orchestration tool means managing state across many different execution contexts, often spanning threads, async tasks, or subprocesses. Python offers several ways to handle this, but one we use extensively in Dagster is `contextvars`.
A `ContextVar` is a safe, efficient way to store values that are isolated to the current logical flow of execution. Context variables also work with asynchronous code, preventing state from leaking between coroutines or unrelated tasks.
import contextvars
traced_counter: contextvars.ContextVar[Optional[Counter]] = contextvars.ContextVar(
"traced_counts",
default=None,
)
We declare `ContextVar` objects at the module level, never inside functions or closures, so they have a stable identity and are easy to locate.
One common Dagster pattern is pairing a `ContextVar` with a context manager to manage and restore state automatically:
@contextmanager
def enter_loadable_target_origin_load_context(
loadable_target_origin: LoadableTargetOrigin,
) -> Iterator[None]:
token = _current_loadable_target_origin.set(loadable_target_origin)
try:
yield
finally:
_current_loadable_target_origin.reset(token)
Here, entering the context temporarily sets the active execution path, and `reset()` ensures the previous value is restored, no matter how the block exits. This guarantees that state changes don’t bleed into other parts of the system.
get_origin
Dagster relies heavily on Python’s type system not just to keep our own codebase high quality, but to make sure users can use the framework effectively and integrate it seamlessly with their own tooling.
In addition to standard typing features, we also work with many custom Dagster-specific types. To handle these at runtime, we often use `typing.get_origin` and `typing.get_args()` for type introspection. These functions let us pull out the “base” generic type and its parameters from an annotation.
from typing import get_args, get_origin
...
if get_origin(dagster_type) == list and len(get_args(dagster_type)) == 1: # noqa: E721
list_inner_type = get_args(dagster_type)[0]
return (
list_inner_type == DynamicOutput
or get_origin(list_inner_type) == DynamicOutput
)
- `get_origin` confirms the annotation is a `list`
- `get_args` verifies that exactly one type of parameter exists in the list and that it is not malformed
- `list_inner_type` uses `get_args` to pull out that `T` in `list[T]`
- Determines if the inner type is `DynamicOutput` or a parametrized `DynamicOutput`
This pattern allows Dagster to differentiate between valid return types and unsupported ones like a bare `DynamicOutput` or a list of something else entirely.
TYPE_CHECKING
You may be noticing a theme: we invest heavily in type checking. This pays off in code quality and developer experience, though it can occasionally add a bit of overhead.
For example, the EMR Pipes client uses the `mypy_boto3_emr` type stubs to get rich, accurate AWS EMR typings. However, we don’t want to require this package (and its transitive dependencies) to be installed at runtime, especially in production.
To avoid that, we wrap these imports in a `TYPE_CHECKING` block:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from mypy_boto3_emr import EMRClient
from mypy_boto3_emr.literals import ClusterStateType
from mypy_boto3_emr.type_defs import (
ConfigurationTypeDef,
DescribeClusterOutputTypeDef,
RunJobFlowInputTypeDef,
RunJobFlowOutputTypeDef,
)
By default, `TYPE_CHECKING` is always `False` at runtime, but static type checkers like Pyright or MyPy treat it as `True`. This means:
- Static analysis can see and use these imports for type checking, completion, and validation.
- Runtime skips them entirely, avoiding unnecessary imports and dependencies.
In the rest of the code, we can still use these types by referencing them as forward references by putting the type name in quotes:
@public
class PipesEMRClient(PipesClient, TreatAsResourceParam):
@property
def client(self) -> "EMRClient":
return self._client
ExitStack
Dagster often needs to run multiple cleanup tasks, closing file handles, removing temporary directories, shutting down threads, depending on which assets are being executed. This means the number and type of cleanup operations vary at runtime, and the library uses many different context managers to handle them.
You could handle all these cases with deeply nested `try/finally` blocks, but that quickly becomes hard to read and maintain. A more elegant solution comes from Python’s `contextlib` library: `ExitStack`.
`ExitStack` lets you dynamically enter and manage an arbitrary number of context managers, then clean them all up in the correct order when the `with` block exits, no matter how it exits.
from contextlib import ExitStack
with ExitStack() as stack:
if shutdown_pipe:
stack.enter_context(interrupt_on_ipc_shutdown_message(shutdown_pipe))
instance = stack.enter_context(
get_possibly_temporary_instance_for_cli("dagster dev", logger=logger)
)
Here:
- `enter_context(...)` enters each context manager and registers it for cleanup.
- If `shutdown_pipe` is set, it first adds a context to handle interrupt messages.
- It then creates (and registers) an instance for the CLI, which might be temporary.
- When the `with` block ends whether normally or due to an exception, `ExitStack` exits each context in LIFO order, ensuring all resources are released correctly.
This approach not only improves exception safety but also keeps the code linear and easy to follow, even when the number of resources to manage is decided at runtime.
Improving your code
It’s clear that more and more software will be generated with the help of AI in the years ahead. That’s an exciting shift, but it doesn’t mean the craft of programming disappears. In fact, understanding the deeper features of the languages you work in can make you a more effective collaborator with AI, enabling you to guide it toward cleaner, more elegant, and more sophisticated solutions. Mastery of these tools ensures that, even in an AI-assisted future, your code carries the mark of thoughtful, human design.