Dagster Glossary | Data Orchestration Terms Explained

Back to Glossary Index

Cache

Store expensive computation results so they can be reused, not recomputed.

Learn More

Cache Invalidation

A process in a computing system where entries in a cache are replaced or removed due to change in the underlying data.

Learn More

Caching

The process of storing copies of files in a cache, or temporary storage location, so that they can be accessed more quickly.

Learn More

See Glossary entry

Callback

A piece of executable code that is passed as an argument to other code and is expected to execute at a given time.

Learn More

Capacity Planning

The process used to determine how much hardware and software is required to meet future workload demands.

Learn More

Cap’n Proto

A data interchange format similar to Protobuf, but faster. Instead of parsing the data and then unpacking it, the data is directly accessed in the binary form in which it is stored, reducing processing time.

Learn More

Visit the website

Cassandra

A highly scalable NoSQL database designed to handle large amounts of data.

Learn More

Categorical Data

A type of data that can take on one of a limited and usually fixed number of possible values, representing the membership of an object in a group, such as ‘male’ or ‘female’.

Learn More

Categorize

Organizing and classifying data into different categories, groups, or segments.

Learn More

Causal Inference

A process used to make conclusions about one variable’s effect on another, critical in understanding relationships in data and making informed decisions based on those relationships.

Learn More

Chaining

Linking two or more computing tasks together so that, as soon as one task is finished, the next task immediately begins.

Learn More

Character Encoding

A method used to represent a repertoire of characters by some kind of encoding system, e.g., ASCII or UTF-8.

Learn More

Checkpoint

A snapshot of the state of a system at a specific point in time, usually used to recover from failures.

Learn More

See Glossary entry

Checkpointing

Saving the state of a process at certain points so that it can be restarted from that point in case of failure.

Learn More

Circular Dependency

A relation between two or more modules which either directly or indirectly depend on each other to function properly.

Learn More

Class Variable

A variable that is shared by all instances of a class, belonging to the class rather than any object instance.

Learn More

Class-Method

A method that is bound to the class and not the instance of the class.

Learn More

Classify

The process of organizing data by relevant categories for efficient use and secure data management.

Learn More

Clean Code

Code that is easy to understand and easy to change, adhering to good programming principles and practices.

Learn More

Clean or Cleanse

Remove invalid or inconsistent data values, such as empty fields or outliers.

Learn More

Cloud Computing

The delivery of various services over the Internet, such as storage, processing, and networking resources.

Learn More

Cloudera

A provider of software for data engineering, data warehousing, machine learning, and analytics.

Learn More

Cluster

Group data points based on similarities or patterns to facilitate analysis and modeling.

Learn More

Cluster Analysis

A group of algorithms used to categorize data into groups, or clusters, where objects in the same group are more similar to each other than to those in other groups.

Learn More

Coalesce

A SQL function that returns the first non-null value in a list.

Learn More

Cold storage

A storage strategy for data that is accessed infrequently and is primarily for archival purposes, offering cost-efficiency at the expense of retrieval speed.

Learn More

Columnar Database

A database optimized for reading and writing columns of data as opposed to rows of data, often used for analytics and reporting.

Learn More

Combinatorial Explosion

A phenomenon in computer science where the number of possible solutions or combinations in a problem grows exponentially with the size of the problem.

Learn More

Command-Line Interface (CLI)

A text-based user interface used to interact with software by entering commands into the interface.

Learn More

Comment

A programming language feature allowing the insertion of human-readable descriptions or annotations in the source code.

Learn More

Commit

The act of saving changes in a database, version control system, or transactional system, making them permanent.

Learn More

Common Gateway Interface (CGI)

A standard protocol for web servers to execute programs and generate dynamic content, often used for form processing.

Learn More

Compact

Reducing the size of data while preserving its essential information.

Learn More

Compilation

The process of translating a high-level programming language into machine language or bytecode that can be executed by a computer’s CPU.

Learn More

Compound Key

A key that consists of multiple attributes to uniquely identify an entity in a database.

Learn More

Compress

Reduce the size of data to save storage space and improve processing performance.

Learn More

Computed Column

A virtual column in a database table that is based on a calculation or expression using other columns in the table.

Learn More

Concurrency Control

Techniques to manage simultaneous operations in a database system, ensuring consistency and resolving conflicts.

Learn More

Concurrent Processing

A computing concept where several tasks are executed during overlapping time periods, enabling more efficient use of computing resources.

Learn More

Configuration File

A file used to configure the initial settings of software programs, usually written in XML, JSON, or YAML.

Learn More

Configuration Management

The process of systematically managing, organizing, and controlling the changes in the documents, codes, and other entities during the development process.

Learn More

Connection Pool

A cache of database connections maintained to be reused by future requests, reducing the overhead of opening and closing connections.

Learn More

Consensus Algorithm

A process used in computer science to achieve agreement on a single data value among distributed processes or systems.

Learn More

Consolidate

Combine multiple datasets into one to create a more comprehensive view of the data.

Learn More

Container

A lightweight, stand-alone, and executable software package that includes everything needed to run a piece of software, including the code, runtime, and system libraries.

Learn More

Containerization

A lightweight, stand-alone, and executable software package that includes everything needed to run a piece of software, including the code, runtime, system tools, and libraries.

Learn More

Continuous Delivery

A software development discipline where software is built in such a way that it can be released to production at any time.

Learn More

Continuous Deployment (CD)

A software engineering approach in which software functionalities are delivered and deployed continuously and automatically into production, after passing a series of automated tests.

Learn More

Continuous Integration (CI)

A development practice where developers integrate code into a shared repository frequently, ideally several times a day, to detect errors quickly.

Learn More

Control Flow

The order in which individual statements, instructions, or function calls are executed within a program.

Learn More

Convergence

The state where different nodes (or systems) update their internal states to a common value, usually used in the context of iterative algorithms and distributed systems.

Learn More

Convolutional Neural Network (CNN)

A class of deep learning neural networks, most commonly applied to analyzing visual imagery, used in image recognition and classification tasks.

Learn More

Cosine Similarity

A measure of similarity between two entities used in text analysis, natural language processing, etc.

Learn More

Covariance

A statistical measure that indicates the extent to which two variables change together.

Learn More

Crash Recovery

The process by which an operating system or application restarts operation after a crash, possibly recovering lost data.

Learn More

Cron Job

A scheduled task in Unix-based operating systems, used to automate repetitive tasks.

Learn More

Cross-Join

A SQL join that returns the Cartesian product of the joined tables, meaning every row of the first table is combined with every row of the second table.

Learn More

Cross-Validation

A statistical method used to estimate the skill of machine learning models, it is primarily used in applied machine learning to assess a predictive modeling algorithm’s performance when there is no separate test dataset available.

Learn More

Cryptography

The practice and study of techniques for securing communication and data from third parties or the public.

Learn More

Curate

Select, organize, and annotate data to make it more useful for analysis and modeling.

Learn More

No results, please try different filters.

Data Engineering Terms Explained

Cache

Cache Invalidation

Caching

Callback

Capacity Planning

Cap’n Proto

Cassandra

Categorical Data

Categorize

Causal Inference

Chaining

Character Encoding

Checkpoint

Checkpointing

Circular Dependency

Class Variable

Class-Method

Classify

Clean Code

Clean or Cleanse

Cloud Computing

Cloudera

Cluster

Cluster Analysis

Coalesce

Cold storage

Columnar Database

Combinatorial Explosion

Command-Line Interface (CLI)

Comment

Commit

Common Gateway Interface (CGI)

Compact

Compilation

Compound Key

Compress

Computed Column

Concurrency Control

Concurrent Processing

Configuration File

Configuration Management

Connection Pool

Consensus Algorithm

Consolidate

Container

Containerization

Continuous Delivery

Continuous Deployment (CD)

Continuous Integration (CI)

Control Flow

Convergence

Convolutional Neural Network (CNN)

Cosine Similarity

Covariance

Crash Recovery

Cron Job

Cross-Join

Cross-Validation

Cryptography

Curate