Dagster Integration:
Dagster + Pandera
Generate Dagster Types from Pandera dataframe schemas.
Installation
pip install dagster-pandera
Example
import random
import pandas as pd
import pandera as pa
from dagster_pandera import pandera_schema_to_dagster_type
from pandera.typing import Series
from dagster import asset
APPLE_STOCK_PRICES = {
"name": ["AAPL", "AAPL", "AAPL", "AAPL", "AAPL"],
"date": ["2018-01-22", "2018-01-23", "2018-01-24", "2018-01-25", "2018-01-26"],
"open": [177.3, 177.3, 177.25, 174.50, 172.0],
"close": [177.0, 177.04, 174.22, 171.11, 171.51],
}
class StockPrices(pa.SchemaModel):
"""Open/close prices for one or more stocks by day."""
name: Series[str] = pa.Field(description="Ticker symbol of stock")
date: Series[str] = pa.Field(description="Date of prices")
open: Series[float] = pa.Field(ge=0, description="Price at market open")
close: Series[float] = pa.Field(ge=0, description="Price at market close")
@asset(dagster_type=pandera_schema_to_dagster_type(StockPrices))
def apple_stock_prices_dirty():
prices = pd.DataFrame(APPLE_STOCK_PRICES)
i = random.choice(prices.index)
prices.loc[i, "open"] = pd.NA
prices.loc[i, "close"] = pd.NA
return prices
About Pandera
Pandera is a statistical data testing toolkit, and a data validation library for scientists, engineers, and analysts seeking correctness.
About this integration
The dagster-pandera
integration library provides an API for generating Dagster Types from Pandera dataframe schemas.
Like all Dagster types, dagster-pandera-generated types can be used to annotate op inputs and outputs. This provides runtime type-checking with rich error reporting and allows Dagit to display information about a dataframe's structure.