Product Overview
Data Orchestration
Data Catalog
Data Quality
Cost Insights
Compass
Integrations
Enterprise
Finance
Software & Technology
Retail & E-commerce
Life Sciences
ETL/ELT Pipelines
AI & Machine Learning
Data Modernization
Data Products
About us
Careers
Partners
Brand Kit
Blog
Events
Docs
Customer Stories
Community
University
GitHub
Slack
Dagster vs Airflow
Dagster vs Prefect
Dagster vs dbt Cloud
Dagster vs Azure Data Factory
Dagster vs AWS Step Functions
Data Engineering
Data Pipeline
Data Platform
Topics
dbt unit tests are a feature within dbt that enable the validation of a dbt model's SQL logic in isolation, using a small, controlled set of static input data
Data quality platforms work by automating the processes involved in identifying and correcting data errors. This automation reduces manual effort and minimizes the risk of human error.
SQL models in dbt are SQL files that define transformations in a data warehouse, each creating a view or table from a SELECT statement.
dbt (data build tool) macros are reusable, parameterized SQL and Jinja templates to automate repetitive SQL operations within a dbt project.
A data pipeline is a series of processes that move data from one system to another.
AWS provides an ecosystem of services to support data pipelines at every stage—from data ingestion and transformation to storage, analysis, and monitoring.
An ETL (extract, transform, load) pipeline is a data processing system that automates the extraction of data from various sources.
Data reliability refers to the consistency and dependability of data over time.
dbt (data build tool) seeds are static CSV files stored within your dbt project that are loaded into your analytics warehouse as database tables.
Data engineering tools are software applications and platforms that assist in building, managing, and optimizing data pipelines.
A data pipeline framework is a structured system that enables the movement and transformation of data within an organization.
Data orchestration tools manage data workflows, automating the movement and transformation of data across different systems.
Data visibility refers to how accessible, understandable, and useful data is within an organization.
dbt snapshots are a mechanism within dbt (data build tool) designed to track and preserve changes in data over time, specifically for tables that are mutable or have Slowly Changing Dimensions (SCD) Type 2. They allow users to maintain a historical record of how rows within a table evolve.
ETL (Extract, Transform, Load) tools are software solutions that help organizations manage and process data from multiple sources.
A data catalog is a centralized repository that provides an organized inventory of data assets within an organization.
Data engineering is the practice of designing, building, and maintaining the infrastructure necessary for collecting, storing, and processing large-scale data.
Ingestion capabilities are important to collect structured, semi-structured, and unstructured data. By ensuring data arrives in a consistent, well-organized manner, organizations can eliminate bottlenecks associated with data processing.
Data pipelines architecture automates the collection, processing, and transfer of data from various sources to destinations for analysis or storage.
Data orchestration refers to the automated coordination and management of data movement and data processing across different systems and environments
Data observability refers to the ability to fully understand the health and state of data in an organization.
Data quality testing involves evaluating data to ensure it meets specific standards for accuracy, completeness, consistency, and more.
A dbt Python model is a type of transformation within the dbt (data build tool) ecosystem that lets developers write business logic using Python, instead of SQL.
Data lineage refers to the end-to-end tracking of data as it moves through systems, from its origin to its final destination.
A data engineering workflow involves a series of structured steps for data management, from data acquisition to applications for organizational data users.