Dagster Glossary | Data Orchestration Terms Explained

Cursor

A database object used to traverse the results of a SQL query, allowing individual rows to be accessed.

Cybersecurity

The practice of protecting systems, networks, and programs from digital attacks aimed at accessing, changing, or destroying sensitive information.

Learn More

DAG (Directed Acyclic Graph)

A finite directed graph with no directed cycles, used extensively in representing data flow in data processing systems like Apache Airflow.

Learn More

Dagster

An open source solution for defining, building and managing critical data assets.

Learn More

Learn more

Data Aggregation

The process of gathering and summarizing information in a specified form, often used in statistical analysis.

Learn More

Data Allocation

The assignment of storage space to specific data, often in the context of distributed databases where data is allocated across multiple nodes.

Learn More

Data Analytics

The science of analyzing raw data to make conclusions about that information.

Learn More

Data Annotation

The process of adding explanatory notes or comments to data, often used in the context of machine learning to create labeled training data.

Learn More

Data Architecture

The overall structure, organization, and rules used to manage and use data within an organization, including the arrangement of data and data processing.

Learn More

Data Block

The smallest unit of data storage in a database, storing a set of rows or a subset of a table's columns.

Learn More

Data Catalog

A centralized repository that allows for the management, collaboration, discovery, and consumption of organizational datasets, serving as a metadata inventory.

Learn More

Data Degradation

The gradual loss or deterioration of data quality over time.

Learn More

Data Dictionary

A collection of descriptions of the data objects or items in a data model for the benefit of programmers and others who need to refer to them.

Learn More

Data Drift

A phenomenon where the statistical properties of incoming data change over time, potentially impacting model performance and accuracy.

Learn More

Data Fabric

A unified architecture that provides a consistent and coherent set of capabilities and services across different environments.

Learn More

Data Federation

The process of aggregating data from different sources to create a single, unified view.

Learn More

Data Fusion

The process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source.

Learn More

Data Governance

The overall management of the availability, usability, integrity, and security of data employed in an enterprise, involving a set of practices and policies.

Learn More

Data Lake

A centralized storage repository that allows storing structured and unstructured data at any scale, usually used for big data and real-time analytics.

Learn More

Data Lakehouse

A modern data architecture that combines the best elements of data lakes and data warehouses, enabling efficient handling of both structured and unstructured data.

Learn More

Data Lifecycle

The journey that data goes through from creation and initial storage to the time it becomes obsolete and is deleted.

Learn More

Data Lifecycle Management

The process of managing the flow of data throughout its lifecycle from creation and initial storage to the time it is archived or deleted.

Learn More

Data Mart

A subset of a data warehouse that is designed for a specific line of business or department within an organization.

Learn More

Data Marts

Subsets of data warehouses designed to provide data for specific business lines or departments.

Learn More

Data Mesh

A decentralized approach to data architecture and organizational structure that treats data as a product and emphasizes domain-oriented decentralized data ownership and architecture.

Learn More

Data Ops

An automated, process-oriented methodology used to improve the quality and reduce the cycle time of data analytics.

Learn More

Data Pipeline

A series of data processing steps involved in the flow of data from the source to its final destination, usually used in the context of ETL and data integration.

Learn More

Data Provenance

Information that helps to trace the origins, processing, and use of data, helping to determine the quality and reliability of the dataset.

Learn More

Data Quality

A comprehensive way of maintaining the accuracy, reliability, and consistency of data over its entire life cycle.

Learn More

Data Redundancy

The existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data.

Learn More

Data Reservoir

An expansive storage repository that allows for the integration and storage of data from various sources in its native format.

Learn More

Data Silo

A repository of data isolated or segregated from other parts of the organization's data system.

Learn More

Data Stewardship

Responsible management and oversight of an organization's data to help provide business users with high-quality data.

Learn More

Data Vault Modeling

A database modeling method specifically designed for top-down data warehouses with a focus on long-term historical storage, tractability, and scalability.

Learn More

Data Volume

The amount of data available for analysis, usually referred to in the context of Big Data.

Learn More

Data Warehouse

A central repository of integrated data from disparate sources, used to store and manage large volumes of historical data and enable fast, complex queries across all the consolidated data.

Learn More

Database Indexing

The use of special data structures that improve the speed of operations in a table, such as search, filter, and sort.

Learn More

Database Management System (DBMS)

A software package designed to define, manipulate, retrieve, and manage data in a database.

Learn More

Database Mirroring

A technique used to increase data availability by maintaining two copies of a single database that must reside on different server instances of SQL Server Database Engine.

Learn More

Database Normalization

A systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like insertion, update, and deletion anomalies.

Learn More

Database Schema

The structure or blueprint of a database that outlines the way data is organized and how relationships are between the data entities.

Learn More

De-identify

Remove personally identifiable information (PII) from data to protect privacy and comply with regulations.

Learn More

Deadlock

A condition where two or more database transactions are unable to proceed because each is waiting for the other to release a lock, leading to a cyclic waiting condition.

Learn More

Decision Tree

A tree-like model of decisions used to make predictions, especially in machine learning algorithms.

Learn More

Deduplicate

Identify and remove duplicate records or entries to improve data quality.

Learn More

Deep Learning

A subset of machine learning that utilizes neural networks with many layers (hence “deep”) to analyze various factors of data and to learn and make intelligent decisions.

Learn More

Delta Lake

An open-source storage layer that brings reliability to data lakes, ensuring ACID transactions, scalable metadata handling, and unifying streaming and batch data processing.

Learn More

Denoise

Remove noise or artifacts from data to improve its accuracy and quality.

Learn More

Denormalize

Optimize data for faster read access by reducing the number of joins needed to retrieve related data.

Learn More

Dependency Parsing

A Natural Language Processing (NLP) technique to analyze the grammatical structure of a sentence to establish relationships between words.

Learn More

Derive

Extracting, transforming, and generating new data from existing datasets.

Learn More

Deserialize

Deserialization is essentially the reverse process of serialization. See: 'Serialize'.

Learn More

DevOps

A set of practices that combines software development (Dev) and IT operations (Ops), aiming to shorten the systems development life cycle and provide continuous delivery.

Learn More

Differential Privacy

A system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.

Learn More

Dimension Table

A table in a star schema of a data warehouse that stores categorical, descriptive, hierarchical, or textual attributes of data.

Learn More

Dimensional Modeling

A design technique used in data warehousing to map and visualize data in a way that’s intuitive to business users, typically using facts and dimensions.

Learn More

Dimensionality

Analyzing the number of features or attributes in the data to improve performance.

Learn More

Dimensionality Reduction

The process of reducing the number of random variables under consideration by obtaining a set of principal variables, crucial for dealing with the “curse of dimensionality” in high-dimensional spaces.

Learn More

See Glossary entry

Directed Acyclic Graph (DAG)

A finite directed graph with no directed cycles, used extensively in representing data flow in data processing systems like Apache Airflow.

Learn More

Discretize

Transform continuous data into discrete categories or bins to simplify analysis.

Learn More

No results, please try different filters.

Data Engineering Terms Explained

Cursor

Cybersecurity

DAG (Directed Acyclic Graph)

Dagster

Data Aggregation

Data Allocation

Data Analytics

Data Annotation

Data Architecture

Data Block

Data Catalog

Data Degradation

Data Dictionary

Data Drift

Data Fabric

Data Federation

Data Fusion

Data Governance

Data Lake

Data Lakehouse

Data Lifecycle

Data Lifecycle Management

Data Mart

Data Marts

Data Mesh

Data Ops

Data Pipeline

Data Provenance

Data Quality

Data Redundancy

Data Reservoir

Data Silo

Data Stewardship

Data Vault Modeling

Data Volume

Data Warehouse

Database Indexing

Database Management System (DBMS)

Database Mirroring

Database Normalization

Database Schema

De-identify

Deadlock

Decision Tree

Deduplicate

Deep Learning

Delta Lake

Denoise

Denormalize

Dependency Parsing

Derive

Deserialize

DevOps

Differential Privacy

Dimension Table

Dimensional Modeling

Dimensionality

Dimensionality Reduction

Directed Acyclic Graph (DAG)

Discretize