August 11, 2023 • 8 minute read •
Type Hinting in Python
- Name
- Elliot Gunn
- Handle
- @elliot
One of the powerful tools Python provides to promote clear and reliable code is the concept of 'type hints'. You might wonder, "Python is a dynamically-typed language, so why should I bother with types?"
As a data engineer or a Python beginner interested in coding best practices, understanding and applying type hints in your Python code can be a real asset.
In this article, we will delve deeper into type hints, their applications, and their benefits in Python programming. As Dagster is a type-annotated framework, we’ll also explain how types can be used in data engineering pipelines to improve its readability and make it less error-prone. It's like providing a map to your future self and other developers who may interact with your code - a map that details the types of data flowing in and out of your functions and classes.
Table of contents
- What is dynamic typing?
- Basic type hints
- Built-in types for Python
- Function annotations
- Complex types
- User-defined types
- Generics
- Type checking with
pyright
- Type hints and docstrings
- Appendix: Common Python types
What is dynamic typing?
Python is a dynamically-typed language. In static-typed languages like Java or C++, you have to declare the type of variables before using them. For example, you need to specify whether a variable is an integer, a float, a string, etc. In Python, you can code without giving a second thought to data types until runtime–which is one of the features that make Python particularly beginner-friendly.
For example, you can declare a variable and directly assign a value to it without specifying its type, hence the term 'dynamically-typed'. Python interpreter implicitly binds the value and its type at runtime.
x = 10 # x is an integer
x = "Hello" # now x is a string
In the first line, x
is an integer. In the second line, the same x
becomes a string. Python handles this transition seamlessly thanks to its dynamic typing nature.
However, this dynamic nature can also lead to bugs that are difficult to debug, especially in large codebases or complex data processing pipelines, where the data flow might not be immediately obvious.
Type hints, introduced in Python 3.5 as a part of the standard library via PEP 484, allow you to specify the expected type of a variable, function parameter, or return value.
Why use type hints?
While dynamic typing offers flexibility, it also creates room for potential bugs. Here's where type hints come in. They can significantly enhance code readability and prevent type-related errors.
Improved code readability: Type hints act as a form of documentation that helps developers understand the types of arguments a function expects and what it returns. This enhanced clarity makes the code more readable and easier to understand.
Error detection: Tools like 'pyright' and mypy
can be used to statically analyze your Python code. It checks the consistency of types in your code based on the type hints and alerts you about type-related errors before runtime. Learn why the Dagster team recommends skipping mypy
entirely and just using pyright
.
Better IDE support: Many Integrated Development Environments (IDEs) and linters can utilize type hints to provide better code completion, error checking, and refactoring.
Facilitates large-scale projects: For larger projects with multiple developers, type hints can be very beneficial in understanding the structure and flow of data throughout the codebase. We’ve published a guide on how to include and maintain type annotations for public Python projects.
Limitations
Not enforced at runtime: Python's type hints are not enforced but are merely hints, and the Python interpreter will not raise errors if the provided types do not match the actual values. This might lead to a misconception that type hints can enforce type safety, which they cannot.
Over-complicated: For small or simple scripts, type hints might seem like an overkill, and could potentially complicate code that should be straightforward and simple.
Not flexible: One of the reasons for Python's popularity is its dynamic nature and type hints can restrict this.
Basic type hints
Python's typing
module contains several functions and classes that are used to provide type hints for your Python code. Here's how you can apply type hints in different scenarios.
Declare types for variables
To provide type hints for variables, you can use the colon :
symbol followed by the type. Here's an example:
age: int = 20
name: str = "Alice"
is_active: bool = True
Here, age
is hinted as an integer, name
as a string, and is_active
as a boolean.
Function annotations
You can provide type hints for function parameters and return values. This helps other developers understand what types of arguments are expected by the function and what type the function returns.
def greet(name: str) -> str: \
return f"Hello, {name}
"
In this example, the function greet
expects name
to be a string and will return a string.
Built-in types in Python
Python has several built-in types. The most commonly used are:
int
: Represents an integerfloat
: Represents a floating-point numberbool
: Represents a boolean value (True or False)str
: Represents a string
There are also complex types such as lists, tuples, and dictionaries that can be used to provide more detailed type hints that we will look at later on.
You will also find a list of Python's main types in the appendix.
Atomic vs. composite types
In Python, there is a distinction between atomic and composite types when it comes to type hinting. Atomic types, such as int
, float
, and str
, are simple and indivisible, and their type annotations can be provided directly using the type itself, like str
.
def my_function(my_string: str) -> int:
return len(my_string)
On the other hand, composite types like List
and Dict
are composed of other types, and before Python 3.9, they often required importing specific definitions from the typing module, such as typing.List[int]
for a list of integers.
from typing import List
def my_function(numbers: List[int]) -> int:
return sum(numbers)
In newer versions of Python, you can write list[str]
instead of typing.List[int]
.
Function annotations
Type hints can be particularly useful when incorporated into function signatures. This not only allows developers to understand what types of arguments a function expects but also gives them an idea of what the function will return.
How to specify argument types and return type of a function
You can specify the types of arguments and the return type of a function using the :
symbol for the arguments and the->
symbol for the return type. Here's the general syntax:
def function_name(arg1: type1, arg2: type2, ...) -> return_type:
# function body
In this syntax, arg1
, arg2
, etc. are the function arguments, and type1
, type2
, etc. are the types of these arguments. return_type
is the type of value the function returns.
Examples of using type hints in function signatures
Let's consider a function that calculates the area of a rectangle:
def area_rectangle(length: float, breadth: float) -> float:
return length * breadth
In this function, length
and breadth
are expected to be floats, and the function also returns a float. The function will still work if you pass integers or even strings that can be converted to a float, but the type hint makes it clear that it's designed to handle floating-point numbers.
Another example can be a function that accepts a list of integers and returns their sum as an integer:
def sum_elements(numbers: list[int]) -> int:
return sum(numbers)
In this example, the numbers
parameter is hinted as a list of integers, and the return type is an integer.
Note that these type hints do not enforce type checking at runtime. They are hints for developers, and Python will not raise a TypeError
if the actual types do not match the specified types.
Complex types
The typing module in Python provides several classes that can be used to provide more complex type hints. Below are some of the most commonly used classes:
List, dict, tuple, set
The list
, dict
, tuple
, and set
classes can be used to provide type hints for lists, dictionaries, tuples, and sets respectively. They can be parameterized to provide even more detailed type hints.
### A list of integers
numbers: list[int] = [1, 2, 3]
### A dictionary with string keys and float values
weights: dict[str, float] = {"apple": 0.182, "banana": 0.120}
### A tuple with an integer and a string
student: tuple[int, str] = (1, "John")
### A set of strings
flags: set[str] = {"apple", "banana", "cherry"}
In these examples, numbers
is hinted as a list of integers, weights
is a dictionary with string keys and float values, student
is a tuple with integers and a string, and flags
is a set of strings.
Optional
The Optional
type hint can be used to indicate that a variable can be either a specific type or None
.
from typing import Optional
def find_student(student_id: int) -> Optional[dict[str, str]]:
# If the student is found, return a dictionary containing their data
# If the student is not found, return None
Union
The Union
type hint is used to indicate that a variable can be one of several types. For example, if a variable can be either a str
or an int
, you can provide a type hint like this:
from typing import Union
def process(data: Union[str, int]) -> None:
# This function can handle either a string or an integer
In newer versions of Python, you can use the pipe (|) operator to indicate a type that can be one of several options, replacing the need for Union
:
def process(data: str | int) -> None:
# This function can handle either a string or an integer
Any
The Any
class is used to indicate that a variable can be of any type. This is equivalent to not providing a type hint at all.
from typing import Any
def process(data: Any) -> None:
# This function can handle data of any type
These tools from the typing
module can help you provide detailed type hints that make your code easier to understand and debug.
However, remember that Python's type hints are optional and not enforced at runtime. They are intended as a tool for developers, not a way to enforce type safety.
User-Defined types
In Python, you can define your own types using classes, which is the fundamental mechanism to create custom types. You can use these classes in type hints just like you'd use the built-in types. The typing
module also provides additional tools for creating more specific types, including Type
and NewType
.
Defining your own types using classes
You can create a class and use it as a type hint. Here's an example:
class Student:
def __init__(self, name: str, age: int):
self.name = name
self.age = age
def print_student_details(student: Student) -> None:
print(student.name, student.age)
Student
is a user-defined type, and it's used as a type hint in the print_student_details
function.
Type
for type hinting
Using The Type
class from the typing
module can be used to indicate that a variable will be a class, not an instance of a class. This is commonly used when a function argument is expected to be a class, for example in factory functions.
from typing import Type
def create_student(cls: Type[Student], name: str, age: int) -> Student:
return cls(name, age)
In this example, create_student
expects a Student
class (or a subclass of Student
) as its first argument.
NewType
to create distinct types
Using NewType
is used to create distinct types. It's useful when you want to distinguish between two types that would otherwise be the same.
For example, let's say you're dealing with student IDs and course IDs in your program, and you want to make sure you don't mix them up. Both are represented as integers, so you can use NewType
to create two distinct types:
from typing import NewType
StudentID = NewType('StudentID', int)
CourseID = NewType('CourseID', int)
def get_student(student_id: StudentID) -> None:
# Fetch student data...
def enroll_in_course(student_id: StudentID, course_id: CourseID) -> None:
# Enroll the student in the course...
Even though StudentID
and CourseID
are both integers, they are considered distinct types and cannot be used interchangeably. However, remember that this check is not enforced at runtime, but during static type checking using tools like mypy
.
Generics
Generics allows you to define a function, class, or data structure that works with different types. The Generic
class and the TypeVar
function from the typing
module are used to define generic types. For example, a list is a generic data structure because it can contain elements of any type.
TypeVar
TypeVar is used to define a type variable, which can be any type, and the specific type is determined by the client code. Here's an example:
from typing import TypeVar
T = TypeVar('T')
def first_element(lst: List[T]) -> T:
return lst[0]
Here, T
is a type variable that can be any type. The first_element
function works with a list of any type and returns an element of that type. The specific type of T
would be determined by the list you pass to the function.
Generic
Generic
is used to define generic classes. A generic class can be initialized with a variety of types, and those types are used in type hints within the class.
from typing import Generic, TypeVar
T = TypeVar('T')
class Box(Generic[T]):
def __init__(self, value: T):
self.value = value
def get(self) -> T:
return self.value
Here, Box
is a generic class that works with any type T
. When you create an instance of Box
, you can specify the type of T
, and that type is used in the value
attribute and the get
method.
box1 = Box[int](10)
box2 = Box[str]("Hello")
box1
is a Box
that contains an integer, and box2
is a Box
that contains a string.
pyright
Type checking with A type checker like pyright
is a tool used to enforce type hinting in Python. At Dagster, we really like pyright
because it is faster than other alternatives such as mypy
.
Python itself is a dynamically-typed language, which means type checks happen at runtime and it does not enforce type hinting rules. If you try to perform an operation that's not supported for a given data type, Python will raise an error during runtime. For example, calling an undefined method on an object will only trigger an error during runtime.
However, when developing large or complex systems, enforcing type consistency can help catch potential bugs early. pyright
performs static type checking, meaning it checks the types of your variables, function arguments, and return values before the code is actually run. It uses the type hints you've provided in your code to do this. It's important to understand that pyright
does not execute or run your code; it simply reads and analyzes it.
How to use a type checker to verify your types
To use pyright
, you first need to install it:
pip install pyright
Then, to check a Python file, you run pyright
with the file as an argument:
pyright my_file.py
Pyright will then analyze the file and report any type errors it finds.
For example, if you have a function that's annotated to receive a str
as an argument and you pass an int
, pyright
will catch this.
Static vs dynamic type checking
Static type checking is the process of verifying the type safety of a program based on analysis of a program's text (source code). Static type checking is done at compile-time (before the program is run). Languages that enforce static type checking include C++, Java, and Rust.
Dynamic type checking, on the other hand, is the process of verifying the type safety of a program at runtime. Dynamic type checking occurs while the program is running. Languages that use dynamic type checking include Python, Ruby, and JavaScript.
In static type checking, types are checked before the program runs, which makes it easier to catch and prevent type errors. This makes the program safer to run, as most type-related bugs have been caught at compile-time. However, it also requires the programmer to explicitly declare the types of all variables and function return values, which can be seen as reducing flexibility.
Dynamic type checking provides more flexibility, as you don't have to explicitly declare the type of every variable. However, this also means that type errors can occur at runtime, which could potentially cause the program to crash.
Python is a dynamically-typed language, but it also supports optional static type checking through tools like pyright
and type hints. This provides Python programmers with a unique flexibility, allowing them to choose when they want the safety of static type checking and when they prefer the flexibility of dynamic typing.
Type hints and docstrings
Type hints, as we've discussed, indicate the types of variables, function parameters, and return values. They can help other developers understand what types of data your function expects and what it will return.
Docstrings, on the other hand, are used to provide a description of what your function, class, or module does. A docstring can include a description of the function's purpose, its arguments, its return value, and any exceptions it may raise.
Here's an example of how you can use type hints and docstrings together:
def filter_and_sort_products(products: list[dict[str, int]], attribute: str, min_value: int) -> list[dict[str, int]]:
"""
Filters a list of products by a given attribute and minimum value, and then sorts the filtered products by the attribute.
Args:
products (list[dict[str, int]]): A list of products represented as dictionaries.
attribute (str): The attribute to filter and sort by.
min_value (int): The minimum acceptable value of the specified attribute.
Returns:
list[dict[str, int]]: A list of filtered and sorted products.
Raises:
KeyError: If the specified attribute is not found in any product.
Examples:
>>> products = [{"name": "Apple", "price": 10}, {"name": "Banana", "price": 5}]
>>> filter_and_sort_products(products, "price", 6)
[{"name": "Apple", "price": 10}]
"""
filtered_products = [product for product in products if product[attribute] >= min_value]
return sorted(filtered_products, key=lambda x: x[attribute])
Here, the function signature shows that the function takes a list of dictionaries representing products, a string representing an attribute, and an integer representing a minimum value. It returns a list of filtered and sorted dictionaries.
The doc strings explain the purpose of the function, its parameters, return value, possible exceptions (such as a KeyError
if the given attribute is not present), and includes an example of how to call the function.
This combination of type hints and docstrings can greatly improve the readability and maintainability of your code.
Conclusion
Building on Python programming best practices, we’ve looked at how type hints improve the readability and maintainability of your code.
If you have any questions or need further clarification, feel free to join the Dagster Slack and ask the community for help. Thank you for reading!
Our next article builds on these data engineering concepts and explores how Factory Patterns help you automate steps in your pipeline.
Sign up for our newsletter to stay in the loop!
Appendix: Python types
To summarize, here are the most common types used in Python. You might also come across or utilize custom data types from external libraries or those defined by other developers.
Numeric Types:
int
: Integer, e.g.,5
,-3
,42
float
: Floating-point number, e.g.,3.14
,-0.001
,2.71
complex
: Complex number, e.g.,3+4j
,2-5j
Text Type:
str
: String, e.g.,"Hello, World!"
,'Python'
Sequence Types:
list
: List, e.g.,[1, 2, 3]
,["a", "b", "c"]
tuple
: Tuple, e.g.,(1, 2, 3)
,("a", "b", "c")
range
: Range, e.g.,range(5)
,range(0, 5, 2)
Mapping Type:
dict
: Dictionary, e.g.,{"name": "John", "age": 30}
,{1: "one", 2: "two"}
Set Types:
set
: Set, e.g.,{1, 2, 3}
,{"apple", "banana", "cherry"}
frozenset
: Immutable set, created usingfrozenset()
Boolean Type:
bool
: Boolean, e.g.,True
,False
Binary Types:
bytes
: Immutable sequence of bytes, e.g.,b'hello'
,bytes([65, 66, 67])
bytearray
: Mutable sequence of bytes, e.g.,bytearray([65, 66, 67])
memoryview
: Memory view object, created usingmemoryview()
None Type:
NoneType
: Represents the absence of a value or a null value. The only value it can have isNone
.
Python also has many built-in modules that offer additional data types, and you can also define custom data types using classes. Some of the other noteworthy types/modules include:
datetime Module:
datetime.date
: Represents a date.datetime.datetime
: Represents date and time.datetime.time
: Represents time.datetime.timedelta
: Represents a duration or difference between two dates/times.datetime.tzinfo
: Base for dealing with time zones.
collections Module:
collections.namedtuple
: Returns a new tuple subclass named 'typename'.collections.deque
: Double-ended queue.collections.Counter
: Dict subclass for counting hashable objects.collections.OrderedDict
: Dict subclass that remembers the order entries were added.collections.defaultdict
: Dict subclass that calls a factory function to supply missing values.
array Module:
array.array
: A space-efficient array with type specification.
struct Module:
- Used for packing and unpacking binary data.
json Module:
- Helps in encoding and decoding JSON data.
enum Module:
enum.Enum
: Base class for creating enumerated constants.enum.IntEnum
: Base class for creating enumerated constants that are also subclasses of int.
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!
Follow us:
Breaking Packages in Python
- Name
- Pedram Navid
- Handle
- @pdrmnvd
High-performance Python for Data Engineering
- Name
- Elliot Gunn
- Handle
- @elliot
CI/CD and Data Pipeline Automation (with Git)
- Name
- Elliot Gunn
- Handle
- @elliot