Dagster Glossary | Data Orchestration Terms Explained

Back to Glossary Index

Secondary Index

Improve the efficiency of data retrieval in a database or storage system.

Learn More

Secure

Protect data from unauthorized access, modification, or destruction.

Learn More

Segmentation

The process of dividing a data set into distinct and meaningful groups, usually to perform more specific analysis, or to target specific subsets of users.

Learn More

Semantic Analysis

The process of analyzing the meanings of words, texts, and sentences, typically used in NLP to understand the context and intent behind the words.

Learn More

Semi-Supervised Learning

A class of machine learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data.

Learn More

Sentiment Analysis

Analyze text data to identify and categorize the emotional tone or sentiment expressed.

Learn More

Sequential Pattern Mining

A method of discovering frequent subsequences or patterns in a sequence of items or events, usually in datasets of customer transactions or other sequence data.

Learn More

Serialize

Convert data into a linear format for efficient storage and processing.

Learn More

Serverless Computing

A cloud-computing execution model where the cloud provider runs the server and dynamically manages the allocation of machine resources, allowing developers to focus on individual functions.

Learn More

Service-Oriented Architecture (SOA)

An architectural pattern in software design where services are provided to the other components by application components, through a communication protocol over a network.

Learn More

Shard

Partitioning a database into smaller, more manageable pieces.

Learn More

Shred

Break down large datasets into smaller, more manageable pieces for easier processing and analysis.

Learn More

Shuffle

Randomize the order of data records to improve analysis and prevent bias.

Learn More

Similarity Measure

A numeric measure of how alike two data objects are, often used in clustering, classification, or nearest neighbor analysis.

Learn More

Single Source of Truth (SSOT)

A practice of structuring information models and associated schema such that every data element is mastered in only one place.

Learn More

Site Reliability Engineering (SRE)

A discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems, aiming for creating scalable and highly reliable software systems.

Learn More

Skew

An imbalance in the distribution or representation of data.

Learn More

Skewness

A measure of the asymmetry of the probability distribution of a real-valued random variable about its mean, indicating whether the data points are skewed to the left or right.

Learn More

See Glossary entry

Sliding Window

A technique used in analyzing or processing sequences of data, where a window of specified size moves across the data, and for each position of the window, a computation is performed.

Learn More

Snappy

A fast and efficient data compression and decompression library developed by Google, designed to balance processing speed and compression ratio. It is often used to compress data stored in Hadoop environments and for other similar applications.

Learn More

Snapshot

A set point in time copy of data that can be used as a backup for recovery purposes.

Learn More

Snapshot Isolation

A guarantee provided by some database systems that all reads made in a transaction will see a consistent snapshot of the database, and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.

Learn More

Snowflake

A cloud-based data warehouse service designed for high-performance analytics.

Learn More

Social Graph

A graph that depicts personal relations of internet users, representing the interconnection of relationships in an online social network.

Learn More

Soft Delete

A data removal strategy where records are marked as deleted but are not physically removed from the database, enabling potential recovery.

Learn More

Software as a Service (SaaS)

A cloud computing service model that provides access to software and its functions remotely as a web-based service, allowing users to access software applications over the internet.

Learn More

Software-defined Asset

A declarative design pattern that represents a data asset through code.

Learn More

Sorting Algorithm

An algorithm that puts elements of a list in a certain order, often numerical or lexicographical.

Learn More

Sparse Matrix

A matrix mostly containing zero values, represented and stored efficiently in memory by only storing the non-zero elements.

Learn More

Spatial Database

A database optimized to store and query data representing objects defined in a geometric space, often used for storing and analyzing geographical or spatial information.

Learn More

Spatial Index

A data structure that allows for accessing a spatial object efficiently, essential in spatial databases and geodatabases.

Learn More

Spatial Indexing

A data structure that allows for accessing a spatial object in a database in a more efficient manner, crucial in GIS systems, spatial databases, and spatial data processing.

Learn More

Speculative Execution

An optimization technique where a computer system performs some tasks before it knows whether these tasks will be needed, to reduce latency and improve throughput.

Learn More

Spill

Temporarily transfer data that exceeds available memory to disk.

Learn More

Split

Divide a dataset into training, validation, and testing sets for machine learning model training.

Learn More

Stack

A data structure that stores a collection of elements, with two main principal operations: Push, which adds an element to the collection, and Pop, which removes the most recently added element.

Learn More

Standardize

Transform data to a common unit or format to facilitate comparison and analysis.

Learn More

Star Schema

The simplest style of data warehouse schema that organizes data in a single fact table linked to one or more dimension tables, enabling easy and efficient data retrieval.

Learn More

Stateful Application

An application that saves client data from the activities of one session for use in the next session.

Learn More

Stateless Application

An application that does not save client data generated in one session for use in the next session with that client.

Learn More

Stateless Protocol

A communications protocol that treats each request as an independent transaction, without requiring the server to retain session information or status about each communicating partner for the duration of multiple requests.

Learn More

Stemming

The process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma.

Learn More

Stored Procedure

Precompiled and stored SQL statements and procedural logic for easy database operations and complex data manipulations.

Learn More

Strategic Information Systems

Information systems that are developed in response to corporate business initiatives to give competitive advantage to organizations.

Learn More

Stream Processing

The real-time processing of data continuously, concurrently, and record by record, often used in applications that require real-time response and analytics.

Learn More

Streaming Data

Data that is generated continuously by thousands of data sources, sending data records simultaneously and in small sizes.

Learn More

Structured Data

Data that is organized and formatted in a way that is easily searchable, often residing in relational databases and including data types such as numbers, dates, and strings.

Learn More

Structured Query Language (SQL)

A standard programming language specifically for managing and querying data in relational databases.

Learn More

Subquery

A SQL query nested inside a larger query, used to retrieve data that will be used in the main query as a condition to further restrict the data to be retrieved.

Learn More

Support Vector Machine (SVM)

A supervised machine learning algorithm, used for classification or regression analysis, that separates data into classes by finding the hyperplane that maximizes the margin between the classes.

Learn More

Surrogate Key

A unique identifier for a record in a database table that serves as a substitute for natural primary keys and is typically auto-generated.

Learn More

Swarm Intelligence

The collective behavior of decentralized, self-organized systems, typically inspired by nature, like ant colonies, bird flocking, and fish schooling, used in artificial intelligence for problem-solving and optimization.

Learn More

Synchronization

The coordination of events to operate a system in unison, ensuring that multiple threads or processes do not interfere with each other.

Learn More

Synchronize

Ensure that data in different systems or databases are in sync and up-to-date.

Learn More

Syntactic Sugar

Syntax within a programming language that is designed to make things easier to read or to express.

Learn More

Syntax Analysis

The analysis of the symbols or statements in a computer program to ensure their correct arrangement, often used in compilers to check the syntax of the programming code.

Learn More

Synthetic Data

Data that's artificially created, rather than being generated by actual events, often used for testing and training machine learning models when real data is scarce or sensitive.

Learn More

Systematic Sampling

A statistical method involving the selection of elements from an ordered sampling frame, selecting every kth (where k is a constant) item in the frame.

Learn More

Systems Development Life Cycle (SDLC)

The process of creating or altering systems, and the models and methodologies that development teams use to develop systems.

Learn More

T-distributed Stochastic Neighbor Embedding (t-SNE)

A machine learning algorithm for dimensionality reduction, particularly well suited for the visualization of high-dimensional datasets.

Learn More

No results, please try different filters.

Data Engineering Terms Explained

Secondary Index

Secure

Segmentation

Semantic Analysis

Semi-Supervised Learning

Sentiment Analysis

Sequential Pattern Mining

Serialize

Serverless Computing

Service-Oriented Architecture (SOA)

Shard

Shred

Shuffle

Similarity Measure

Single Source of Truth (SSOT)

Site Reliability Engineering (SRE)

Skew

Skewness

Sliding Window

Snappy

Snapshot

Snapshot Isolation

Snowflake

Social Graph

Soft Delete

Software as a Service (SaaS)

Software-defined Asset

Sorting Algorithm

Sparse Matrix

Spatial Database

Spatial Index

Spatial Indexing

Speculative Execution

Spill

Split

Stack

Standardize

Star Schema

Stateful Application

Stateless Application

Stateless Protocol

Stemming

Stored Procedure

Strategic Information Systems

Stream Processing

Streaming Data

Structured Data

Structured Query Language (SQL)

Subquery

Support Vector Machine (SVM)

Surrogate Key

Swarm Intelligence

Synchronization

Synchronize

Syntactic Sugar

Syntax Analysis

Synthetic Data

Systematic Sampling

Systems Development Life Cycle (SDLC)

T-distributed Stochastic Neighbor Embedding (t-SNE)