About this integration
This library provides an integration with the DuckDB database and Pandas data processing library, allowing you to build a DuckDB I/O Manager that can store and load Pandas DataFrames.
Installation
pip install dagster_duckdb dagster_duckdb_pandas
Example
from dagster_duckdb import build_duckdb_io_manager
from dagster_duckdb_pandas import DuckDBPandasTypeHandler
from dagster import asset, with_resources
import pandas as pd
@asset
def my_table():
return pd.DataFrame()
duckdb_io_manager = build_duckdb_io_manager([DuckDBPandasTypeHandler()])
assets = with_resources(
[my_table],
{"io_manager": duckdb_io_manager.configured({"database": "my_db.duckdb"})}
)
About DuckDB and Pandas
DuckDB is a column-oriented embeddable OLAP database. A typical OLTP relational database like SQLite is row-oriented. In row-oriented database, data is organised physically as consecutive tuples.
Pandas is a very popular Python package that provides data structures designed to make working with “relational” or “labeled” data both easy and intuitive. Pandas aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.
DuckDB can efficiently run SQL queries directly on Pandas DataFrames.