Asynchronous HTTP Client/Server for asyncio and Python.
pip install aiohttp
Pull data out of HTML and XML files.
pip install beautifulsoup4
Compress and decompress files using the bzip2 compression algorithm.
Bz2 comes packaged with Python. It is unlikely you will need to install it.
A lightweight and extensible data validation library that supports type checking, value constraints, and schema-based validation.
pip install Cerberus
Provides various cryptographic services, including encryption, decryption, hashing, and key management.
pip install cryptography
Parallel computing: allows you to scale Python computations across multiple CPUs or even across a cluster of machines.
python -m pip install dask
Data preparation, cleaning, and exploration, provides tools for handling missing data, encoding categorical variables, and visualizing data distributions.
pip install dataprep
Exploratory data analysis and feature engineering, provides tools for data visualization, data profiling, and data cleaning.
pip install dora
A version control system for data science projects that enables you to track changes to your data and models, and collaborate with others on your project.
brew install dvc Or pip install dvc
Profile and analyze data. Provides statistics and visualizations to help you understand your data and identify potential issues.
pip install DataProfiler
A python implementation of the parquet format, aiming integrate into python-based big data work-flows.
pip install fastparquet
Python Avro (data serialization and data exchange services for Apache Hadoop), but fast.
pip install fastavro
Fast tools for functional programming.
pip install functools
Work with geospatial data, provides tools for reading, writing, and manipulating spatial data in a pandas-like framework.
pip install geopandas
Data validation, testing, and documentation, enabling you to define and enforce expectations about your data.
python -m pip install great_expectations
Compress and decompress files using the gzip compression algorithm.
Gzip is installed natively in Python3 and it’s unlikely you will need to install it.
Secure hashing: providesvarious hash functions such as SHA-256 and SHA-512.
pip install hashlib
Clean and transform data with tools for renaming columns, removing duplicate rows, and filling missing values.
pip install pyjanitor
Data visualization: provides tools for creating various types of charts and plots.
python3 -m pip install -U matplotlib
Natural Language Toolkit (NLTK) library
Natural language processing (NLP), provides tools for tokenization, part-of-speech tagging, and sentiment analysis.
pip install --user -U nltk
A Python interface for storing multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed, and direction.
pip install netCDF4
A leading platform for building Python programs to work with human language data.
pip install nltk
Scientific computing: provides tools for working with arrays, matrices, and numerical operations.
pip install numpy
Computer vision: provides tools for image and video processing, object detection, and feature extraction.
pip install opencv-python
Data manipulation and analysis: provides tools for working with tabular data, handling missing values, and performing aggregations.
pip install pandas
**Note: Obsolete** Generate data profiling reports: provides statistics,and visualizations to help you understand your data and identify potential issues.
pip install pandas_profiling (Note this package is obsolete and you should use ydata-profiling instead.)
A blazingly fast and memory-efficient Python library for data manipulation and analysis: provides a DataFrame API similar to Pandas but optimized for performance on large datasets.
pip install polars
Time series forecasting, provides a simple yet powerful model based on decomposable time series components.
pip install fbprophet
The most popular PostgreSQL database adapter for Python.
pip install psycopg2
The Python API of Apache Arrow. Apache Arrow is a development platform for in-memory analytics.
pip install pyarrow
Data validation and settings management, provides a declarative syntax for defining data models with type hints and validation rules.
pip install pydantic
Work with MongoDB: provides tools for connecting to a MongoDB instance, querying data, and performing CRUD operations.
Pip install pymongo
Apache Spark: provides tools for distributed computing, data processing, and machine learning on large datasets.
pip install pyspark - or - Brew install apache-spark
A Python interface to Stan, a probabilistic programming language for Bayesian inference and statistical modeling.
pip install pystan
A publish-subscribe API to facilitate event-based programming and decoupling an application’s in-memory components.
pip install pypubsub
Work with SQLite databases, provides tools for connecting to a database, querying data, and performing CRUD operations.
Wavelet transforms and signal processing, provides tools for time-frequency analysis, denoising, and compression.
pip install PyWavelets
Regular expressions, provides tools for pattern matching and string manipulation based on a specified pattern.
pip install regex
Elegant mapping of the HTTP protocol onto Python's object-oriented semantics.
pip install requests
Scientific computing and technical computing, provides tools for optimization, interpolation, signal processing, and statistics.
pip install scipy
Machine learning tools for classification, regression, clustering, dimensionality reduction, and model selection.
pip install -U scikit-learn
Natural Language Processing (NLP) library.
pip install -U spacy
Statistical modeling and analysis tools for regression analysis, time series analysis, and hypothesis testing.
pip install statsmodels
Machine learning and deep learning tools for building and training neural networks and other machine learning models.
python3 -m pip install tensorflow
Data profiling tools for generating data profiling reports with statistics and visualizations.
pip install ydata-profiling
Compress and decompress files using the ZIP compression algorithm.
ZipFile is installed natively in Python3 and it’s unlikely you will need to install it.