Back to integrations
Dagster + Pandera

Dagster Integration:
Dagster + Pandera

Generate Dagster Types from Pandera dataframe schemas.

Installation

pip install dagster-pandera

Example

import random
import pandas as pd
import pandera as pa
from dagster_pandera import pandera_schema_to_dagster_type
from pandera.typing import Series
from dagster import asset

APPLE_STOCK_PRICES = {
    "name": ["AAPL", "AAPL", "AAPL", "AAPL", "AAPL"],
    "date": ["2018-01-22", "2018-01-23", "2018-01-24", "2018-01-25", "2018-01-26"],
    "open": [177.3, 177.3, 177.25, 174.50, 172.0],
    "close": [177.0, 177.04, 174.22, 171.11, 171.51],
}


class StockPrices(pa.SchemaModel):
    """Open/close prices for one or more stocks by day."""

    name: Series[str] = pa.Field(description="Ticker symbol of stock")
    date: Series[str] = pa.Field(description="Date of prices")
    open: Series[float] = pa.Field(ge=0, description="Price at market open")
    close: Series[float] = pa.Field(ge=0, description="Price at market close")


@asset(dagster_type=pandera_schema_to_dagster_type(StockPrices))
def apple_stock_prices_dirty():
    prices = pd.DataFrame(APPLE_STOCK_PRICES)
    i = random.choice(prices.index)
    prices.loc[i, "open"] = pd.NA
    prices.loc[i, "close"] = pd.NA
    return prices

About Pandera

Pandera is a statistical data testing toolkit, and a data validation library for scientists, engineers, and analysts seeking correctness.

About this integration

The dagster-pandera integration library provides an API for generating Dagster Types from Pandera dataframe schemas.

Like all Dagster types, dagster-pandera-generated types can be used to annotate op inputs and outputs. This provides runtime type-checking with rich error reporting and allows Dagit to display information about a dataframe's structure.