Dagster Data Engineering Glossary:
Data Engineering Terms Explained
Terms and Definitions You Need to Know as a Data Engineer
Web Application Firewall (WAF)
A security policy enforcement point positioned between a web application and the client endpoint, monitoring, and controlling communications to protect against attacks.
Web Crawling
The automated process of browsing the web to collect information about websites and their pages, often used by search engines to index web content.
Web Framework
A software framework designed to aid the development of web applications including web services, web resources, and web APIs.
Web Scraping
An automated method used to extract large amounts of data from websites quickly, used in data mining where you extract useful information or knowledge from data.
Web Services
Standardized software systems designed to communicate over the Internet using standardized protocols, allowing different applications to talk to each other.
Weighted Graph
A graph in which a number (the weight) is assigned to each edge, representing quantities such as cost, length, or capacity, depending on the problem at hand.
Whitespace Tokenization
The process of breaking up text into tokens based on whitespace characters such as spaces, tabs, and newline characters, commonly used in natural language processing.
Wide Column Store
A type of NoSQL database that uses tables, rows, and columns, but unlike a relational database, names and format of the columns can vary from row to row in the same table.
Wildcard Character
A character used to replace or represent one or more characters in string comparisons, often used in search operations to represent unknown characters in the search pattern.
Window Function
In SQL, a type of function that performs a calculation across a set of table rows related to the current row, providing access to rows at a specified physical offset without using a self-join.
Workflow
The sequence of industrial, administrative, or other processes through which a piece of work passes from initiation to completion, automated by software in many cases.
Wrapper
A function, method, or class that contains a piece of existing code and typically adds some additional functionality or converts inputs or outputs.
Write-Ahead Logging (WAL)
A method where changes are written to a log before they are applied, ensuring data integrity and consistency by providing a recovery mechanism in case of system failures.
XML (eXtensible Markup Language)
A markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. Widely used for the representation of arbitrary data structures such as those used in web services.
XML Database
A database that stores data in a structured format, typically XML, allowing for complex and hierarchical data relationships.
XML Parsing
The process of analyzing an XML document to read the codes and to access or modify data, used in various applications to interact with XML data.
XOR (Exclusive Or)
A logical operator that outputs true only when inputs differ (one is true, the other is false).
XPath
A query language for selecting nodes from an XML document, providing a way to navigate through elements and attributes in XML documents.
YARN (Yet Another Resource Negotiator)
A resource-management technology in Hadoop, allocating resources to various applications and managing resource consumption and task scheduling.
Z-Index
A property specifying the stack order of elements, commonly used in web development to manage overlaying of elements.
Z-Score
A statistical measurement that describes a value's relationship to the mean of a group of values, measured in terms of standard deviations from the mean.
Zero Trust Security
A security concept centered on the belief that organizations should not automatically trust anything inside or outside its perimeters and must verify
Zero-Copy
A method of transferring data in computer systems so that it does not need to be copied from one buffer or memory location to another.
Zettabyte
A unit of digital information storage used to denote the size of data. It is equivalent to one sextillion (10^21) bytes or 1000 exabytes.
Zone Replication
The process of replicating data across different zones in a multi-zone environment, usually for data redundancy and availability.