Blog
What CoPilot Won’t Teach You About Python (Part 2)

What CoPilot Won’t Teach You About Python (Part 2)

August 20, 2025
What CoPilot Won’t Teach You About Python (Part 2)
What CoPilot Won’t Teach You About Python (Part 2)

Explore another set of powerful yet overlooked Python features—from overload and cached_property to contextvars and ExitStack

In our previous post, we peeled back the curtain on some lesser-known Python features that power the Dagster library. These features often won’t appear in Copilot suggestions, but they can make a big difference in your code’s performance, clarity, and maintainability.

Of course, Python has far too many of these gems to fit into a single post. In this sequel, we’ll explore another set of features and patterns we rely on in Dagster that are just as powerful, yet often overlooked.

So fire up your favorite IDE, follow along if you’d like, and let’s keep digging into Python features that your coding assistant probably won’t mention but you’ll be glad you know about.

overload

Function overloading lets you declare multiple variations of the same function name, each with a different signature. This is useful when you want a single function name to handle different styles of input while still giving developers precise type hints and IDE assistance.

In Python, the `typing.overload` decorator enables this at type-checking time. Dagster uses it in definitions like `ResourceDefinition` (and other decorators) to present user-friendly call signatures while still wrapping callables behind the scenes.

from typing import overload

@overload
def resource(config_schema: ResourceFunction) -> ResourceDefinition: ...

@overload
def resource(
   config_schema: CoercableToConfigSchema = ...,
   description: Optional[str] = ...,
   required_resource_keys: Optional[AbstractSet[str]] = ...,
   version: Optional[str] = ...,
) -> Callable[[ResourceFunction], "ResourceDefinition"]: ...

Each `@overload` here defines a valid call pattern for the `resource` decorator:

  • The first overload supports `@resource` used directly on a function.
  • The second overload supports `@resource(...)` with arguments.

Again it is important that overload definitions are only used by static type checkers. They don’t generate runtime behavior and should not be called directly.

To make them work, you still need to define a single implementation function without `@overload`:

def resource(
   config_schema: Union[ResourceFunction, CoercableToConfigSchema] = None,
   description: Optional[str] = None,
   required_resource_keys: Optional[AbstractSet[str]] = None,
   version: Optional[str] = None,
) -> Union[Callable[[ResourceFunction], "ResourceDefinition"], "ResourceDefinition"]:
   if callable(config_schema) and not is_callable_valid_config_arg(config_schema):
       return _ResourceDecoratorCallable()(config_schema)

   def _wrap(resource_fn: ResourceFunction) -> "ResourceDefinition":
       return _ResourceDecoratorCallable(
           config_schema=cast("Optional[dict[str, Any]]", config_schema),
           description=description,
           required_resource_keys=required_resource_keys,
           version=version,
       )(resource_fn)

   return _wrap

cached_property

In the last post, we looked at `functools.lru_cache`, a handy way to cache the results of expensive operations. But it’s not the only caching tool in Python’s standard library. `functools` also provides `cached_property`, which Dagster uses in situations like the `DbtCliResource` to determine the dbt CLI version only once.

from functools import cached_property

class DbtCliResource(ConfigurableResource):

   @cached_property
   def _cli_version(self) -> version.Version:
       ...

At first glance, cached_property might seem similar to lru_cache, but they serve different purposes:

cached_property

  • Works only on instance methods meant to be accessed like attributes.
  • Computes the value once on first access and stores it on that specific instance (in __dict__ or via __set_name__ on classes without __dict__).
  • Each instance gets its own cached value.

lru_cache

  • Can decorate any function or method.
  • Caches results keyed by all arguments passed to the function.
  • The cache is stored on the function itself, so it’s shared across all callers and instances.
  • Supports an eviction policy (maxsize), unlike cached_property.

In general you should use `cached_property` for anything you would access as an attribute that should be fixed for the lifetime of the instance while `lru_cache` works better for functions (or methods) that are called multiple times. Especially when the computation depends on the arguments.

contextvars

Building a data orchestration tool means managing state across many different execution contexts, often spanning threads, async tasks, or subprocesses. Python offers several ways to handle this, but one we use extensively in Dagster is `contextvars`.

A `ContextVar` is a safe, efficient way to store values that are isolated to the current logical flow of execution. Context variables also work with asynchronous code, preventing state from leaking between coroutines or unrelated tasks.

import contextvars

traced_counter: contextvars.ContextVar[Optional[Counter]] = contextvars.ContextVar(
   "traced_counts",
   default=None,
)

We declare `ContextVar` objects at the module level, never inside functions or closures, so they have a stable identity and are easy to locate.

One common Dagster pattern is pairing a `ContextVar` with a context manager to manage and restore state automatically:

@contextmanager
def enter_loadable_target_origin_load_context(
   loadable_target_origin: LoadableTargetOrigin,
) -> Iterator[None]:
   token = _current_loadable_target_origin.set(loadable_target_origin)
   try:
       yield
   finally:
       _current_loadable_target_origin.reset(token)

Here, entering the context temporarily sets the active execution path, and `reset()` ensures the previous value is restored, no matter how the block exits. This guarantees that state changes don’t bleed into other parts of the system.

get_origin

Dagster relies heavily on Python’s type system not just to keep our own codebase high quality, but to make sure users can use the framework effectively and integrate it seamlessly with their own tooling.

In addition to standard typing features, we also work with many custom Dagster-specific types. To handle these at runtime, we often use `typing.get_origin` and `typing.get_args()` for type introspection. These functions let us pull out the “base” generic type and its parameters from an annotation.

from typing import get_args, get_origin

   ...
   if get_origin(dagster_type) == list and len(get_args(dagster_type)) == 1:  # noqa: E721
       list_inner_type = get_args(dagster_type)[0]
       return (
           list_inner_type == DynamicOutput
           or get_origin(list_inner_type) == DynamicOutput
       )
  1. `get_origin` confirms the annotation is a `list`
  2. `get_args` verifies that exactly one type of parameter exists in the list and that it is not malformed
  3. `list_inner_type` uses `get_args` to pull out that `T` in `list[T]`
  4. Determines if the inner type is `DynamicOutput` or a parametrized `DynamicOutput`

This pattern allows Dagster to differentiate between valid return types and unsupported ones like a bare `DynamicOutput` or a list of something else entirely.

TYPE_CHECKING

You may be noticing a theme: we invest heavily in type checking. This pays off in code quality and developer experience, though it can occasionally add a bit of overhead.

For example, the EMR Pipes client uses the `mypy_boto3_emr` type stubs to get rich, accurate AWS EMR typings. However, we don’t want to require this package (and its transitive dependencies) to be installed at runtime, especially in production.

To avoid that, we wrap these imports in a `TYPE_CHECKING` block:

from typing import TYPE_CHECKING

if TYPE_CHECKING:
   from mypy_boto3_emr import EMRClient
   from mypy_boto3_emr.literals import ClusterStateType
   from mypy_boto3_emr.type_defs import (
       ConfigurationTypeDef,
       DescribeClusterOutputTypeDef,
       RunJobFlowInputTypeDef,
       RunJobFlowOutputTypeDef,
   )

By default, `TYPE_CHECKING` is always `False` at runtime, but static type checkers like Pyright or MyPy treat it as `True`. This means:

  • Static analysis can see and use these imports for type checking, completion, and validation.
  • Runtime skips them entirely, avoiding unnecessary imports and dependencies.

In the rest of the code, we can still use these types by referencing them as forward references by putting the type name in quotes:

@public
class PipesEMRClient(PipesClient, TreatAsResourceParam):

   @property
   def client(self) -> "EMRClient":
       return self._client

ExitStack

Dagster often needs to run multiple cleanup tasks, closing file handles, removing temporary directories, shutting down threads, depending on which assets are being executed. This means the number and type of cleanup operations vary at runtime, and the library uses many different context managers to handle them.

You could handle all these cases with deeply nested `try/finally` blocks, but that quickly becomes hard to read and maintain. A more elegant solution comes from Python’s `contextlib` library: `ExitStack`.

`ExitStack` lets you dynamically enter and manage an arbitrary number of context managers, then clean them all up in the correct order when the `with` block exits, no matter how it exits.

from contextlib import ExitStack

with ExitStack() as stack:
   if shutdown_pipe:
       stack.enter_context(interrupt_on_ipc_shutdown_message(shutdown_pipe))
   instance = stack.enter_context(
       get_possibly_temporary_instance_for_cli("dagster dev", logger=logger)
   )

Here:

  • `enter_context(...)` enters each context manager and registers it for cleanup.
  • If `shutdown_pipe` is set, it first adds a context to handle interrupt messages.
  • It then creates (and registers) an instance for the CLI, which might be temporary.
  • When the `with` block ends whether normally or due to an exception, `ExitStack` exits each context in LIFO order, ensuring all resources are released correctly.

This approach not only improves exception safety but also keeps the code linear and easy to follow, even when the number of resources to manage is decided at runtime.

Improving your code

It’s clear that more and more software will be generated with the help of AI in the years ahead. That’s an exciting shift, but it doesn’t mean the craft of programming disappears. In fact, understanding the deeper features of the languages you work in can make you a more effective collaborator with AI, enabling you to guide it toward cleaner, more elegant, and more sophisticated solutions. Mastery of these tools ensures that, even in an AI-assisted future, your code carries the mark of thoughtful, human design.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

Monorepos, the hub-and-spoke model, and Copybara
Monorepos, the hub-and-spoke model, and Copybara
Blog

April 3, 2026

Monorepos, the hub-and-spoke model, and Copybara

How we configure Copybara for bi-directional syncing to enable a hub-and-spoke model for Git repositories

Making Dagster Easier to Contribute to in an AI-Driven World
Making Dagster Easier to Contribute to in an AI-Driven World
Blog

April 1, 2026

Making Dagster Easier to Contribute to in an AI-Driven World

AI has made contributing to open source easier but reviewing contributions is still hard. At Dagster, we’re improving the contributor experience with smarter review tooling, clearer guidelines, and a focus on contributions that are easier to evaluate, merge, and maintain.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on how to scale data teams
Guide

November 5, 2025

Download the eBook on how to scale data teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the e-book primer on how to build data platforms
Guide

February 21, 2025

Download the e-book primer on how to build data platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.