Dagster Glossary | Data Orchestration Terms Explained

Race Condition

Handling conflicts when accessing a shared resource.

Rack Awareness

A concept applied in distributed computing to minimize the latency and use of resources while retrieving data and to ensure data availability during component failures.

Learn More

Radial Basis Function (RBF)

A function whose value depends on the distance between the input and some fixed point, typically used in various areas such as function approximation, time series prediction, and classification.

Learn More

Random Forest

An ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Learn More

Range Query

A type of query that retrieves data based on a range of values, typically used in the context of numerical or datetime values.

Learn More

Real-Time Bidding (RTB)

A means by which advertising inventory is bought and sold on a per-impression basis, via programmatic instantaneous auction.

Learn More

Real-Time Processing

The processing of data that continuously enters a system and obtains results within a timeframe short enough to affect the sources of the incoming data.

Learn More

Rebalance

Redistributing data across nodes or partitions for optimal performance.

Learn More

Recommender System

A subclass of information filtering system that seeks to predict the 'rating' or 'preference' a user would give to an item.

Learn More

Reconcile

The process of ensuring that two or more datasets are consistent with each other, identifying any discrepancies and resolving them.

Learn More

Record Linkage

The process of finding entries that refer to the same entity in different data sources.

Learn More

Recurrent Neural Network (RNN)

A class of artificial neural networks designed for sequence prediction problems and other tasks where data points have connections to previous points, such as time series analysis and natural language processing.

Learn More

Redis

An in-memory data structure store, used as a database, cache, and message broker.

Learn More

Reduce

Convert a large set of data into a smaller, more manageable form without significant loss of information.

Learn More

Redundancy

The duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the form of a backup or fail-safe.

Learn More

Referential Integrity

A property of data stating that all its references are valid and ensures that the relationship between tables remains consistent.

Learn More

Regression Analysis

A statistical process for estimating the relationships among variables, often used for prediction and forecasting, where one variable is dependent on one or more independent variables.

Learn More

Regular Expression (Regex)

A sequence of characters defining a search pattern, typically used by string-searching algorithms for 'find' or 'find and replace' operations on strings, crucial for data cleaning and transformation.

Learn More

Regularization

A technique used to prevent overfitting in a machine learning model by adding a penalty term to the model’s loss function, commonly used regularizations are L1 and L2 regularization.

Learn More

Reinforcement Learning

A type of machine learning where an agent learns how to behave in an environment by performing certain actions and receiving rewards or penalties in return.

Learn More

Relational Algebra

A theoretical set of mathematical principles and concepts forming the foundational basis for implementing and optimizing queries in Relational Database Management Systems.

Learn More

Relational Database

A type of database that stores data in structured tables and is based on the relational model.

Learn More

Relational Model

A database model based on first-order predicate logic, serving as the basis for relational databases, where all data is represented in terms of tuples, grouped into relations.

Learn More

Repartition

Redistribute data across multiple partitions for improved parallelism and performance.

Learn More

Replica Set

A group of database nodes that maintains the same data set, providing redundancy and increasing data availability with multiple copies of data on different database servers.

Learn More

Replicate

Create a copy of data for redundancy or distributed processing.

Learn More

Representation Learning

An area of machine learning where automatic feature learning from raw data is explored, aimed at identifying better representations and improving model generalization.

Learn More

Request-Response

A message exchange pattern in which a requester sends a request message to a replier system, which then sends a response message in return.

Learn More

Reshape

Change the structure of data to better fit specific analysis or modeling requirements.

Learn More

Resilient Distributed Dataset (RDD)

A fault-tolerant collection of elements that can be processed in parallel, fundamental data structure of Spark,

Learn More

Response Variable

The variable that is being predicted or modeled, often denoted as the dependent variable or output variable.

Learn More

Ridge Regression

A regularization technique for analyzing multiple regression data that suffer from multicollinearity, shrinking the coefficients of the model towards zero to stabilize them.

Learn More

Risk Analysis

The process of identifying and analyzing potential issues that could negatively impact key business initiatives or projects.

Learn More

Rollback

The operation which undoes partially completed transactions by the database management system after a failed transaction.

Learn More

Root Mean Square Error (RMSE)

A standard way to measure the error of a model in predicting quantitative data, it’s the square root of the average squared differences between the predicted and observed actual outcomes.

Learn More

Routing

The process of selecting a path for traffic in a network or between or across multiple networks, based on routing table information.

Learn More

Row-Level Security (RLS)

A method of restricting access at the database row level, based on parameters such as user roles or identity, enabling fine-grained access control.

Learn More

Ruby on Rails

A server-side web application framework written in Ruby, it is a model-view-controller (MVC) framework, providing default structures for a database, a web service, and web pages.

Learn More

SQL (Structured Query Language)

A standardized programming language used for managing and querying relational databases.

Learn More

SQL Injection

A code injection technique, used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution.

Learn More

SQLite

A C library that provides a lightweight, disk-based database.

Learn More

Sample

Extract a subset of data for exploratory analysis or to reduce computational complexity.

Learn More

Sampling

The process of selecting a subset of elements from a larger set to approximate the properties of the whole set, often used for statistical analysis.

Learn More

Sandboxing

A security mechanism used to run an application in a confined environment, isolating it from the system, preventing it from causing harm or accessing sensitive data.

Learn More

Scalability

The capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth.

Learn More

Scalar

A quantity represented by a single element in the corresponding field, usually a single number, as opposed to a vector or matrix.

Learn More

Scaling

Increasing the capacity or performance of a system to handle more data or traffic.

Learn More

Schema

The organization or structure for a database, defining tables, fields, relationships, indexes, etc.

Learn More

Schema Evolution

The ability of a database system to handle changes in a database schema, especially relevant for systems that require flexibility and adaptability to changing data requirements.

Learn More

Schema Inference

Automatically identify the structure of a dataset.

Learn More

Schema Mapping

Translate data from one schema or structure to another to facilitate data integration.

Learn More

Schema-on-Read

A strategy where data structure is inferred at read time, typically used in big data processing where data is not predefined and is instead interpreted when it is analyzed.

Learn More

Schema-on-Write

A strategy where data structure is defined before writing data, typically used in relational databases where data must conform to a known schema before it's written to disk.

Learn More

SciPy

An open-source Python library used for scientific and technical computing.

Learn More

Scikit-learn

A free software machine learning library for the Python programming language. It features various classification, regression, clustering algorithms, and efficient tools for data mining and data analysis.

Learn More

Scrape

Extract data from a website or another source.

Learn More

Scraping

The process of extracting data from websites, converting it from unstructured to structured form.

Learn More

Scrub

A process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated, also known as data cleansing.

Learn More

Search Engine

A software application designed to search for information in a database, with requested information returned to the user as search results.

Learn More

Search Engine Optimization (SEO)

The practice of optimizing content to be discovered through a search engine’s organic search results, affecting the visibility of a website or a web page.

Learn More

No results, please try different filters.

Data Engineering Terms Explained

Race Condition

Rack Awareness

Radial Basis Function (RBF)

Random Forest

Range Query

Real-Time Bidding (RTB)

Real-Time Processing

Rebalance

Recommender System

Reconcile

Record Linkage

Recurrent Neural Network (RNN)

Redis

Reduce

Redundancy

Referential Integrity

Regression Analysis

Regular Expression (Regex)

Regularization

Reinforcement Learning

Relational Algebra

Relational Database

Relational Model

Repartition

Replica Set

Replicate

Representation Learning

Request-Response

Reshape

Resilient Distributed Dataset (RDD)

Response Variable

Ridge Regression

Risk Analysis

Rollback

Root Mean Square Error (RMSE)

Routing

Row-Level Security (RLS)

Ruby on Rails

SQL (Structured Query Language)

SQL Injection

SQLite

Sample

Sampling

Sandboxing

Scalability

Scalar

Scaling

Schema

Schema Evolution

Schema Inference

Schema Mapping

Schema-on-Read

Schema-on-Write

SciPy

Scikit-learn

Scrape

Scraping

Scrub

Search Engine

Search Engine Optimization (SEO)