Back to integrations
Dagster + DuckDB

Dagster + DuckDB

Read and write natively to DuckDB from Software Defined Assets.

About this integration

This library provides an integration with the DuckDB database, and allows for an out-of-the-box I/O Manager so that you can make DuckDB your storage of choice.

Installation

pip install dagster-duckdb

Example

from dagster_duckdb import build_duckdb_io_manager
from dagster_duckdb_pandas import DuckDBPandasTypeHandler
from dagster import Definitions, asset
import pandas as pd

@asset(
    key_prefix=["my_schema"]  # will be used as the schema in duckdb
)
def my_table() -> pd.DataFrame:  # the name of the asset will be the table name
    return pd.DataFrame()

duckdb_io_manager = build_duckdb_io_manager([DuckDBPandasTypeHandler()])

defs = Definitions(
    assets=[my_table],
    resources={"io_manager": duckdb_io_manager.configured({"database": "my_db.duckdb"})}
)

About DuckDB

DuckDB is a column-oriented embeddable OLAP database. A typical OLTP relational database like SQLite is row-oriented. In row-oriented database, data is organised physically as consecutive tuples.