In part VI of our Data Engineering with Python series, we explore type hinting functions and classes, and how type hints reduce errors.
The following article is part of a series on Python for data engineering aimed at helping data engineers, data scientists, data analysts, Machine Learning engineers, or others who are new to Python master the basics. To date this beginners guide consists of:
- Part 1: Python Packages: a Primer for Data People (part 1 of 2), explored the basics of Python modules, Python packages and how to import modules into your own projects.
- Part 2: Python Packages: a Primer for Data People (part 2 of 2), covered dependency management and virtual environments.
- Part 3: Best Practices in Structuring Python Projects, covered 9 best practices and examples on structuring your projects.
- Part 4: From Python Projects to Dagster Pipelines, we explore setting up a Dagster project, and the key concept of Data Assets.
- Part 5: Environment Variables in Python, we cover the importance of environment variables and how to use them.
- Part 6: Type Hinting, or how type hints reduce errors.
- Part 7: Factory Patterns, or learning design patterns, which are reusable solutions to common problems in software design.
- Part 8: Write-Audit-Publish in data pipelines a design pattern frequently used in ETL to ensure data quality and reliability.
- Part 9: CI/CD and Data Pipeline Automation (with Git), learn to automate data pipelines and deployments with Git.
- Part 10: High-performance Python for Data Engineering, learn how to code data pipelines in Python for performance.
- Part 11: Breaking Packages in Python, in which we explore the sharp edges of Python’s system of imports, modules, and packages.
Sign up for our newsletter to get all the updates! And if you enjoyed this guide check out our data engineering glossary, complete with Python code examples.
One of the powerful tools Python provides to promote clear and reliable code is the concept of 'type hints'. You might wonder, "Python is a dynamically-typed language, so why should I bother with types?"
As a data engineer or a Python beginner interested in coding best practices, understanding and applying type hints in your Python code can be a real asset.
In this article, we will delve deeper into type hints, their applications, and their benefits in Python programming. As Dagster is a type-annotated framework, we’ll also explain how types can be used in data engineering pipelines to improve its readability and make it less error-prone. It's like providing a map to your future self and other developers who may interact with your code - a map that details the types of data flowing in and out of your functions and classes.
Table of contents
- What is dynamic typing?
- Basic type hints
- Built-in types for Python
- Function annotations
- Complex types
- User-defined types
- Generics
- Type checking with pyright
- Type hints and docstrings
- Appendix: Common Python types
What is dynamic typing?
Python is a dynamically-typed language. In static-typed languages like Java or C++, you have to declare the type of variables before using them. For example, you need to specify whether a variable is an integer, a float, a string, etc. In Python, you can code without giving a second thought to data types until runtime–which is one of the features that make Python particularly beginner-friendly.
For example, you can declare a variable and directly assign a value to it without specifying its type, hence the term 'dynamically-typed'. Python interpreter implicitly binds the value and its type at runtime.
x = 10  # x is an integer
x = "Hello"  # now x is a stringIn the first line, x is an integer. In the second line, the same x becomes a string. Python handles this transition seamlessly thanks to its dynamic typing nature.
However, this dynamic nature can also lead to bugs that are difficult to debug, especially in large codebases or complex data processing pipelines, where the data flow might not be immediately obvious.
Type hints, introduced in Python 3.5 as a part of the standard library via PEP 484, allow you to specify the expected type of a variable, function parameter, or return value.
Why use type hints?
While dynamic typing offers flexibility, it also creates room for potential bugs. Here's where type hints come in. They can significantly enhance code readability and prevent type-related errors.
Improved code readability: Type hints act as a form of documentation that helps developers understand the types of arguments a function expects and what it returns. This enhanced clarity makes the code more readable and easier to understand.
Error detection: Tools like 'pyright' and mypy can be used to statically analyze your Python code. It checks the consistency of types in your code based on the type hints and alerts you about type-related errors before runtime. Learn why the Dagster team recommends skipping mypy entirely and just using pyright.
Better IDE support: Many Integrated Development Environments (IDEs) and linters can utilize type hints to provide better code completion, error checking, and refactoring.
Facilitates large-scale projects: For larger projects with multiple developers, type hints can be very beneficial in understanding the structure and flow of data throughout the codebase. We’ve published a guide on how to include and maintain type annotations for public Python projects.
Limitations
Not enforced at runtime: Python's type hints are not enforced but are merely hints, and the Python interpreter will not raise errors if the provided types do not match the actual values. This might lead to a misconception that type hints can enforce type safety, which they cannot.
Over-complicated: For small or simple scripts, type hints might seem like an overkill, and could potentially complicate code that should be straightforward and simple.
Not flexible: One of the reasons for Python's popularity is its dynamic nature and type hints can restrict this.
Basic type hints
Python's typing module contains several functions and classes that are used to provide type hints for your Python code. Here's how you can apply type hints in different scenarios.
Declare types for variables
To provide type hints for variables, you can use the colon : symbol followed by the type. Here's an example:
age: int = 20
name: str = "Alice"
is_active: bool = TrueHere, age is hinted as an integer, name as a string, and is_active as a boolean.
Function annotations
You can provide type hints for function parameters and return values. This helps other developers understand what types of arguments are expected by the function and what type the function returns.
def greet(name: str) -> str: \    return f"Hello, {name}"
In this example, the function greet expects name to be a string and will return a string.
Built-in types in Python
Python has several built-in types. The most commonly used are:
- int: Represents an integer
- float: Represents a floating-point number
- bool: Represents a boolean value (True or False)
- str: Represents a string
There are also complex types such as lists, tuples, and dictionaries that can be used to provide more detailed type hints that we will look at later on.
You will also find a list of Python's main types in the appendix.
Atomic vs. composite types
In Python, there is a distinction between atomic and composite types when it comes to type hinting. Atomic types, such as int, float, and str, are simple and indivisible, and their type annotations can be provided directly using the type itself, like str.
def my_function(my_string: str) -> int:
    return len(my_string)On the other hand, composite types like List and Dict are composed of other types, and before Python 3.9, they often required importing specific definitions from the typing module, such as typing.List[int] for a list of integers.
from typing import List
def my_function(numbers: List[int]) -> int:
    return sum(numbers)In newer versions of Python, you can write list[str] instead of typing.List[int].
Function annotations
Type hints can be particularly useful when incorporated into function signatures. This not only allows developers to understand what types of arguments a function expects but also gives them an idea of what the function will return.
How to specify argument types and return type of a function
You can specify the types of arguments and the return type of a function using the : symbol for the arguments and the-> symbol for the return type. Here's the general syntax:
def function_name(arg1: type1, arg2: type2, ...) -> return_type:
    # function bodyIn this syntax, arg1, arg2, etc. are the function arguments, and type1, type2, etc. are the types of these arguments. return_type is the type of value the function returns.
Examples of using type hints in function signatures
Let's consider a function that calculates the area of a rectangle:
def area_rectangle(length: float, breadth: float) -> float:
    return length * breadthIn this function, length and breadth are expected to be floats, and the function also returns a float. The function will still work if you pass integers or even strings that can be converted to a float, but the type hint makes it clear that it's designed to handle floating-point numbers.
Another example can be a function that accepts a list of integers and returns their sum as an integer:
def sum_elements(numbers: list[int]) -> int:
    return sum(numbers)In this example, the numbers parameter is hinted as a list of integers, and the return type is an integer.
Note that these type hints do not enforce type checking at runtime. They are hints for developers, and Python will not raise a TypeError if the actual types do not match the specified types.
Complex types
The typing module in Python provides several classes that can be used to provide more complex type hints. Below are some of the most commonly used classes:
List, dict, tuple, set
The list, dict, tuple, and set classes can be used to provide type hints for lists, dictionaries, tuples, and sets respectively. They can be parameterized to provide even more detailed type hints.
### A list of integers
numbers: list[int] = [1, 2, 3]
### A dictionary with string keys and float values
weights: dict[str, float] = {"apple": 0.182, "banana": 0.120}
### A tuple with an integer and a string
student: tuple[int, str] = (1, "John")
### A set of strings
flags: set[str] = {"apple", "banana", "cherry"}In these examples, numbers is hinted as a list of integers, weights is a dictionary with string keys and float values,  student is a tuple with integers and a string, and flags is a set of strings.
Optional
The Optional type hint can be used to indicate that a variable can be either a specific type or None.
from typing import Optional
def find_student(student_id: int) -> Optional[dict[str, str]]:
    # If the student is found, return a dictionary containing their data
    # If the student is not found, return NoneUnion
The Union type hint is used to indicate that a variable can be one of several types. For example, if a variable can be either a str or an int, you can provide a type hint like this:
from typing import Union
def process(data: Union[str, int]) -> None:
    # This function can handle either a string or an integerIn newer versions of Python, you can use the pipe (|) operator to indicate a type that can be one of several options, replacing the need for Union:
def process(data: str | int) -> None:
    # This function can handle either a string or an integer
Any
The Any class is used to indicate that a variable can be of any type. This is equivalent to not providing a type hint at all.
from typing import Any
def process(data: Any) -> None:
    # This function can handle data of any typeThese tools from the typing module can help you provide detailed type hints that make your code easier to understand and debug.
However, remember that Python's type hints are optional and not enforced at runtime. They are intended as a tool for developers, not a way to enforce type safety.
User-Defined types
In Python, you can define your own types using classes, which is the fundamental mechanism to create custom types. You can use these classes in type hints just like you'd use the built-in types. The typing module also provides additional tools for creating more specific types, including Type and NewType.
Defining your own types using classes
You can create a class and use it as a type hint. Here's an example:
class Student:
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age
def print_student_details(student: Student) -> None:
    print(student.name, student.age)Student is a user-defined type, and it's used as a type hint in the print_student_details function.
Using Type for type hinting
The Type class from the typing module can be used to indicate that a variable will be a class, not an instance of a class. This is commonly used when a function argument is expected to be a class, for example in factory functions.
from typing import Type
def create_student(cls: Type[Student], name: str, age: int) -> Student:
    return cls(name, age)In this example, create_student expects a Student class (or a subclass of Student) as its first argument.
Using NewType to create distinct types
NewType is used to create distinct types. It's useful when you want to distinguish between two types that would otherwise be the same.
For example, let's say you're dealing with student IDs and course IDs in your program, and you want to make sure you don't mix them up. Both are represented as integers, so you can use NewType to create two distinct types:
from typing import NewType
StudentID = NewType('StudentID', int)
CourseID = NewType('CourseID', int)
def get_student(student_id: StudentID) -> None:
    # Fetch student data...
def enroll_in_course(student_id: StudentID, course_id: CourseID) -> None:
    # Enroll the student in the course...Even though StudentID and CourseID are both integers, they are considered distinct types and cannot be used interchangeably. However, remember that this check is not enforced at runtime, but during static type checking using tools like mypy.
Generics
Generics allows you to define a function, class, or data structure that works with different types. The Generic class and the TypeVar function from the typing module are used to define generic types. For example, a list is a generic data structure because it can contain elements of any type.
TypeVar
TypeVar is used to define a type variable, which can be any type, and the specific type is determined by the client code. Here's an example:
from typing import TypeVar
T = TypeVar('T')
def first_element(lst: List[T]) -> T:
    return lst[0]Here, T is a type variable that can be any type. The first_element function works with a list of any type and returns an element of that type. The specific type of T would be determined by the list you pass to the function.
Generic
Generic is used to define generic classes. A generic class can be initialized with a variety of types, and those types are used in type hints within the class.
from typing import Generic, TypeVar
T = TypeVar('T')
class Box(Generic[T]):
    def __init__(self, value: T):
        self.value = value
    def get(self) -> T:
        return self.valueHere, Box is a generic class that works with any type T. When you create an instance of Box, you can specify the type of T, and that type is used in the value attribute and the get method.
box1 = Box[int](10)
box2 = Box[str]("Hello")box1 is a Box that contains an integer, and box2 is a Box that contains a string.
Type checking with pyright
A type checker like pyright is a tool used to enforce type hinting in Python. At Dagster, we really like pyright because it is faster than other alternatives such as mypy.
Python itself is a dynamically-typed language, which means type checks happen at runtime and it does not enforce type hinting rules. If you try to perform an operation that's not supported for a given data type, Python will raise an error during runtime. For example, calling an undefined method on an object will only trigger an error during runtime.
However, when developing large or complex systems, enforcing type consistency can help catch potential bugs early. pyright performs static type checking, meaning it checks the types of your variables, function arguments, and return values before the code is actually run. It uses the type hints you've provided in your code to do this. It's important to understand that pyright does not execute or run your code; it simply reads and analyzes it.
How to use a type checker to verify your types
To use pyright, you first need to install it:
pip install pyrightThen, to check a Python file, you run pyright with the file as an argument:
pyright my_file.pyPyright will then analyze the file and report any type errors it finds.
For example, if you have a function that's annotated to receive a str as an argument and you pass an int, pyright will catch this.
Static vs dynamic type checking
Static type checking is the process of verifying the type safety of a program based on analysis of a program's text (source code). Static type checking is done at compile-time (before the program is run). Languages that enforce static type checking include C++, Java, and Rust.
Dynamic type checking, on the other hand, is the process of verifying the type safety of a program at runtime. Dynamic type checking occurs while the program is running. Languages that use dynamic type checking include Python, Ruby, and JavaScript.
In static type checking, types are checked before the program runs, which makes it easier to catch and prevent type errors. This makes the program safer to run, as most type-related bugs have been caught at compile-time. However, it also requires the programmer to explicitly declare the types of all variables and function return values, which can be seen as reducing flexibility.
Dynamic type checking provides more flexibility, as you don't have to explicitly declare the type of every variable. However, this also means that type errors can occur at runtime, which could potentially cause the program to crash.
Python is a dynamically-typed language, but it also supports optional static type checking through tools like pyright and type hints. This provides Python programmers with a unique flexibility, allowing them to choose when they want the safety of static type checking and when they prefer the flexibility of dynamic typing.
Type hints and docstrings
Type hints, as we've discussed, indicate the types of variables, function parameters, and return values. They can help other developers understand what types of data your function expects and what it will return.
Docstrings, on the other hand, are used to provide a description of what your function, class, or module does. A docstring can include a description of the function's purpose, its arguments, its return value, and any exceptions it may raise.
Here's an example of how you can use type hints and docstrings together:
def filter_and_sort_products(products: list[dict[str, int]], attribute: str, min_value: int) -> list[dict[str, int]]:
    """
    Filters a list of products by a given attribute and minimum value, and then sorts the filtered products by the attribute.
    Args:
        products (list[dict[str, int]]): A list of products represented as dictionaries.
        attribute (str): The attribute to filter and sort by.
        min_value (int): The minimum acceptable value of the specified attribute.
    Returns:
        list[dict[str, int]]: A list of filtered and sorted products.
    Raises:
        KeyError: If the specified attribute is not found in any product.
    Examples:
        >>> products = [{"name": "Apple", "price": 10}, {"name": "Banana", "price": 5}]
        >>> filter_and_sort_products(products, "price", 6)
        [{"name": "Apple", "price": 10}]
    """
    filtered_products = [product for product in products if product[attribute] >= min_value]
    return sorted(filtered_products, key=lambda x: x[attribute])
Here, the function signature shows that the function takes a list of dictionaries representing products, a string representing an attribute, and an integer representing a minimum value. It returns a list of filtered and sorted dictionaries.
The doc strings explain the purpose of the function, its parameters, return value, possible exceptions (such as a KeyError if the given attribute is not present), and includes an example of how to call the function.
This combination of type hints and docstrings can greatly improve the readability and maintainability of your code.
Conclusion
Building on Python programming best practices, we’ve looked at how type hints improve the readability and maintainability of your code.
If you have any questions or need further clarification, feel free to join the Dagster Slack and ask the community for help. Thank you for reading!
Our next article builds on these data engineering concepts and explores how Factory Patterns help you automate steps in your pipeline.
Sign up for our newsletter to stay in the loop!
Appendix: Python Types
To summarize, here are the most common built-in data types in Python. You might also come across or utilize custom data types from external libraries or those defined by other developers.
1. Numeric Types
- int– Integer
 Examples:- 5,- -3,- 42
- float– Floating-point number
 Examples:- 3.14,- -0.001,- 2.71
- complex– Complex number
 Examples:- 3+4j,- 2-5j
2. Text Type
- str– String
 Examples:- "Hello, World!",- 'Python'
3. Sequence Types
- list– List
 Examples:- [1, 2, 3],- ["a", "b", "c"]
- tuple– Tuple
 Examples:- (1, 2, 3),- ("a", "b", "c")
- range– Range object
 Examples:- range(5),- range(0, 5, 2)
4. Mapping Type
- dict– Dictionary
 Examples:- {"name": "John", "age": 30},- {1: "one", 2: "two"}
5. Set Types
- set– Mutable set
 Examples:- {1, 2, 3},- {"apple", "banana", "cherry"}
- frozenset– Immutable set
 Created with:- frozenset(["a", "b"])
6. Boolean Type
- bool– Boolean
 Values:- True,- False
7. Binary Types
- bytes– Immutable sequence of bytes
 Examples:- b'hello',- bytes([65, 66, 67])
- bytearray– Mutable sequence of bytes
 Example:- bytearray([65, 66, 67])
- memoryview– Memory view object
 Created with:- memoryview(b'abc')
8. None Type
- NoneType– Represents the absence of a value
 Only value:- None
Other Useful Modules & Types
1. datetime Module
- datetime.date– Represents a date
- datetime.datetime– Represents date and time
- datetime.time– Represents a time of day
- datetime.timedelta– Duration or difference between two dates/times
- datetime.tzinfo– Base class for time zone info
2. collections Module
- namedtuple– Factory function for creating tuple subclasses with named fields
- deque– Double-ended queue
- Counter– A dict subclass for counting hashable objects
- OrderedDict– Dict that remembers insertion order
- defaultdict– Dict that provides default values for missing keys
3. array Module
- array.array– Space-efficient array with type specification
4. struct Module
- Used for packing and unpacking binary data
5. json Module
- Tools for encoding and decoding JSON data
6. enum Module
- Enum– Base class for creating enumerated constants
- IntEnum– Enumerations that are also subclasses of- int





