Dagster Glossary | Data Orchestration Terms Explained

T-distribution

A type of probability distribution that is symmetrical and bell-shaped, like the normal distribution, but has heavier tails.

Learn More

Tableau

A data visualization tool that is used for converting raw, unstructured data into an understandable or readable format.

Learn More

Tagging

The practice of labeling data with tags that categorize or annotate it, often used in organizing content or in natural language processing to identify parts of speech.

Learn More

Talend

A software integration vendor that provides data integration, data management, enterprise application integration, and big data software and services.

Learn More

Temporal Database

A database that is optimized to manage data relating to time instances, maintaining information about the times at which certain data is valid.

Learn More

Tensor

A mathematical object represented as arrays of higher dimensions, extended from matrices and used in machine learning and deep learning models, particularly in neural networks.

Learn More

TensorFlow

An open-source software library for dataflow and differentiable programming across a range of tasks, developed by the Google Brain team.

Learn More

Terabyte (TB)

A unit of information or computer storage equal to one trillion bytes or 1,024 gigabytes.

Learn More

Teradata

Offers products related to data warehousing, including a powerful, scalable, and reliable data warehousing solution.

Learn More

Text Mining

The process of deriving meaningful information from natural language text, involves the preprocessing (cleaning and transforming) of text data and the application of natural language processing (NLP) techniques.

Learn More

See Glossary entry

Thread

Enable concurrent execution in Python by decoupling tasks which are not sequentially dependent.

Learn More

Throughput

The amount of data transferred or processed in a specified time period, often used as a measure of system or network performance.

Learn More

Time Complexity

A concept in computer science that describes the amount of time an algorithm takes to run as a function of the length of the input.

Learn More

Time Series Analysis

Analyze data over time to identify trends, patterns, and relationships.

Learn More

Time Series Database (TSDB)

A database optimized for handling time series data, which are data points indexed in time order, commonly used for analyzing, storing, and querying time series data.

Learn More

Tokenization

The process of converting input text into smaller units, or tokens, typically words or phrases, used in natural language processing to understand the structure of the text.

Learn More

Tokenize

Convert data into tokens or smaller units to simplify analysis or processing.

Learn More

Top-Down Design

A design methodology that begins with specifying the high-level structure of a system and decomposes it into its components, focusing on the system as a whole before examining its parts.

Learn More

Topology

In networking, it refers to the arrangement of different elements (links, nodes, etc.) in a computer network. In data analysis, it refers to the study of geometric properties and spatial relations.

Learn More

Training Set

A subset of a dataset used to train machine learning models, helping the models make predictions or decisions without being explicitly programmed to perform the task.

Learn More

Transactional Database

A type of database that manages transaction-oriented applications, ensuring ACID properties (Atomicity, Consistency, Isolation, Durability) to maintain reliability in every transaction.

Learn More

Transfer Learning

A research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.

Learn More

Transform

Convert data from one format or structure to another.

Learn More

Transformation

The process of converting data from one format or structure into another, often involving cleaning, aggregating, enriching, and reformatting the data.

Learn More

See Glossary entry

Tree Structure

A hierarchical structure used in computer science to represent relationships between individual data points or nodes, where each node is connected to one parent node and zero or more child nodes.

Learn More

Triggers

Procedural code automatically executed in response to certain events on a particular table or view in a database, often used to maintain the integrity of the data.

Learn More

Tuple

An ordered list of elements, often used to represent a single row in a relational database table, or a single record in a dataset.

Learn More

Turing Machine

A mathematical model of computation that defines an abstract machine, which manipulates symbols on a strip of tape according to a table of rules, foundational in the theory of computation.

Learn More

Type Casting

The process of converting a variable from one data type to another, such as changing a float to an integer or a string to a number.

Learn More

URL Encoding

A method of encoding information in a Uniform Resource Identifier (URI) where certain characters are replaced by corresponding hexadecimal values, used in the submission of form data in HTTP requests.

Learn More

Undirected Graph

A graph in which edges have no orientation, meaning the edge from vertex A to vertex B is identical to the edge from vertex B to vertex A.

Learn More

Union

An operation in SQL that allows for the return of one distinct result set from multiple queries.

Learn More

Unique Constraint

A constraint applied on a field to ensure that it cannot have duplicate values.

Learn More

Univariate Analysis

The simplest form of analyzing data with one variable, without regard to any other variable, focusing on the patterns, and summarizing the underlying patterns in the data.

Learn More

Unstructured Data

Information that doesn't reside in a traditional row-column database and is often text-heavy.

Learn More

See Glossary entry

Unstructured Data Analysis

Analyze unstructured data, such as text or images, to extract insights and meaning.

Learn More

Unsupervised Learning

A type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses, often for clustering or association.

Learn More

Update Anomaly

A data inconsistency that occurs when not all instances of a redundant piece of data are updated, leading to inconsistent and inaccurate data in a database.

Learn More

Upsert

Update a record or insert a new record if it does not yet exist.

Learn More

Upstream

In data processing, refers to the tasks, operations, or stages of processing occurring or located before a particular stage in a specified direction or flow.

Learn More

User-Defined Function (UDF)

A function provided by the user of a program or environment, allowing for the creation of functions that are not included in the original software.

Learn More

Validate

Check data for completeness, accuracy, and consistency.

Learn More

Variable Selection

The process of selecting the most relevant features (variables, predictors) for use in model construction, reducing dimensionality and improving model performance.

Learn More

Variance Inflation Factor (VIF)

A measure used to quantify how much the variance of a regression coefficient is inflated due to multicollinearity in the model.

Learn More

Variational Autoencoder (VAE)

A type of autoencoder with added constraints on the encoded representations being learned, often used for generating new data that's similar to the training data.

Learn More

Vectorization

The process of converting an algorithm from operating on a single value at a time to operating on a set of values (vector) at one time, improving performance by exploiting data-level parallelism.

Learn More

See Glossary entry

Vectorize

Executing a single operation on multiple data points simultaneously.

Learn More

Version

Maintain a history of changes to data for auditing and tracking purposes.

Learn More

Version Control

The management of changes to documents, computer programs, large websites, and other collections of information, allowing for revisions and variations to be tracked and managed efficiently.

Learn More

Vertex

In graph theory, a vertex is a point where two or more curves, lines, or edges meet, representing entities in graph-based storage and analysis systems.

Learn More

Vertical Scaling

Adding more resources such as CPU, memory to an existing server, or replacing the server with a more powerful one.

Learn More

View

A virtual table based on the result-set of an SQL statement, often used to focus, simplify, and customize the perception each user has of the database.

Learn More

Virtual Private Network (VPN)

A technology that creates a safe and encrypted connection over a less secure network, such as the internet, allowing for secure remote access to network resources.

Learn More

Virtualization

The process of creating a virtual version of something, including virtual computer hardware systems, storage devices, and network resources.

Learn More

Virtualization (in analytics)

A data integration process to provide a unified, real-time, and consistent view of data across different data sources without having to move or replicate the data.

Learn More

Visualization

The graphical representation of information and data, using visual elements like charts, graphs, and maps.

Learn More

Volatile Memory

Computer memory that requires power to maintain the stored information; all data is lost when the system’s power is turned off or interrupted.

Learn More

Volume Testing

A type of software testing that checks the system’s performance and behavior under high volumes of data, ensuring the software can handle large data quantities effectively.

Learn More

Vulnerability Assessment

The process of identifying, quantifying, and prioritizing the vulnerabilities in a system, involving the evaluation of system or software weaknesses and potential threats.

Learn More

Warehouse Modeling

The process of developing abstract representations of a data warehouse system, typically structured in a way that helps in understanding, analyzing, and designing the data warehouse.

Learn More

No results, please try different filters.

Data Engineering Terms Explained

T-distribution

Tableau

Tagging

Talend

Temporal Database

Tensor

TensorFlow

Terabyte (TB)

Teradata

Text Mining

Thread

Throughput

Time Complexity

Time Series Analysis

Time Series Database (TSDB)

Tokenization

Tokenize

Top-Down Design

Topology

Training Set

Transactional Database

Transfer Learning

Transform

Transformation

Tree Structure

Triggers

Tuple

Turing Machine

Type Casting

URL Encoding

Undirected Graph

Union

Unique Constraint

Univariate Analysis

Unstructured Data

Unstructured Data Analysis

Unsupervised Learning

Update Anomaly

Upsert

Upstream

User-Defined Function (UDF)

Validate

Variable Selection

Variance Inflation Factor (VIF)

Variational Autoencoder (VAE)

Vectorization

Vectorize

Version

Version Control

Vertex

Vertical Scaling

View

Virtual Private Network (VPN)

Virtualization

Virtualization (in analytics)

Visualization

Volatile Memory

Volume Testing

Vulnerability Assessment

Warehouse Modeling