Data Obfuscation | Dagster Glossary

Back to Glossary Index

Data Obfuscation

Make data unintelligible or difficult to understand.

Data obfuscation definition:

Data obfuscation is a technique used to make data unintelligible or difficult to understand. It involves altering the data in a way that makes it difficult to interpret or make sense of without additional context or knowledge. It involves changing the original data values to new values that are not meaningful or significant, but still maintain the same data format.

Obfuscated data is not reversible, meaning that the original data cannot be retrieved.

Data obfuscation example using Python:

In Python, one simple example of data obfuscation is using the random module to generate random values to replace the original data:

import random

# define the original data
original_data = "Sensitive data that needs to be obfuscated."

# obfuscate the data by replacing characters with random characters
obfuscated_data = ""
for char in original_data:
    if char.isalpha():
        if char.isupper():
            obfuscated_data += random.choice("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
        else:
            obfuscated_data += random.choice("abcdefghijklmnopqrstuvwxyz")
    elif char.isdigit():
        obfuscated_data += random.choice("0123456789")
    else:
        obfuscated_data += char

print("Original data:", original_data)
print("Obfuscated data:", obfuscated_data)

The output of the above code sample would be something like:

Original data: Sensitive data that needs to be obfuscated.
Obfuscated data: HuItOmwtpwu rwqudnlw ugxcq wxh mebgakjthay.

In this example, the original sensitive data is replaced with random characters while preserving the original data format. The resulting obfuscated data is unreadable and unusable to unauthorized individuals, while still functional for authorized users. Note that it will also change each time you execute the code.


Other data engineering terms related to
Data Security and Privacy: