Package | Descriptions | install command | Docs link |
---|---|---|---|
aiohttp | Asynchronous HTTP Client/Server for asyncio and Python. | pip install aiohttp | docs |
beautifulsoup | Pull data out of HTML and XML files. | pip install beautifulsoup4 | docs |
bz2 | Compress and decompress files using the bzip2 compression algorithm. | Bz2 comes packaged with Python. It is unlikely you will need to install it. | docs |
Cerberus | A lightweight and extensible data validation library that supports type checking, value constraints, and schema-based validation. | pip install Cerberus | docs |
Cryptography | Provides various cryptographic services, including encryption, decryption, hashing, and key management. | pip install cryptography | docs |
Dask | Parallel computing: allows you to scale Python computations across multiple CPUs or even across a cluster of machines. | python -m pip install dask | docs |
dataprep | Data preparation, cleaning, and exploration, provides tools for handling missing data, encoding categorical variables, and visualizing data distributions. | pip install dataprep | docs |
dora | Exploratory data analysis and feature engineering, provides tools for data visualization, data profiling, and data cleaning. | pip install dora | docs |
DVC | A version control system for data science projects that enables you to track changes to your data and models, and collaborate with others on your project. | brew install dvc Or pip install dvc | docs |
DataProfiler | Profile and analyze data. Provides statistics and visualizations to help you understand your data and identify potential issues. | pip install DataProfiler | docs |
Fastparquet | A python implementation of the parquet format, aiming integrate into python-based big data work-flows. | pip install fastparquet | docs |
Fastavro | Python Avro (data serialization and data exchange services for Apache Hadoop), but fast. | pip install fastavro | docs |
Functools | Fast tools for functional programming. | pip install functools | docs |
GeoPandas | Work with geospatial data, provides tools for reading, writing, and manipulating spatial data in a pandas-like framework. | pip install geopandas | docs |
Great Expectations | Data validation, testing, and documentation, enabling you to define and enforce expectations about your data. | python -m pip install great_expectations | docs |
gzip | Compress and decompress files using the gzip compression algorithm. | Gzip is installed natively in Python3 and it’s unlikely you will need to install it. | docs |
hashlib | Secure hashing: providesvarious hash functions such as SHA-256 and SHA-512. | pip install hashlib | docs |
janitor | Clean and transform data with tools for renaming columns, removing duplicate rows, and filling missing values. | pip install pyjanitor | docs |
Matplotlib | Data visualization: provides tools for creating various types of charts and plots. | python3 -m pip install -U matplotlib | docs |
Natural Language Toolkit (NLTK) library | Natural language processing (NLP), provides tools for tokenization, part-of-speech tagging, and sentiment analysis. | pip install --user -U nltk | docs |
NetCDF4 | A Python interface for storing multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed, and direction. | pip install netCDF4 | docs |
nltk | A leading platform for building Python programs to work with human language data. | pip install nltk | docs |
numpy | Scientific computing: provides tools for working with arrays, matrices, and numerical operations. | pip install numpy | docs |
OpenCV (cv2) | Computer vision: provides tools for image and video processing, object detection, and feature extraction. | pip install opencv-python | docs |
Pandas | Data manipulation and analysis: provides tools for working with tabular data, handling missing values, and performing aggregations. | pip install pandas | docs |
Pandas Profiling | **Note: Obsolete** Generate data profiling reports: provides statistics,and visualizations to help you understand your data and identify potential issues. | pip install pandas_profiling (Note this package is obsolete and you should use ydata-profiling instead.) | docs |
Polars | A blazingly fast and memory-efficient Python library for data manipulation and analysis: provides a DataFrame API similar to Pandas but optimized for performance on large datasets. | pip install polars | docs |
Prophet | Time series forecasting, provides a simple yet powerful model based on decomposable time series components. | pip install fbprophet | docs |
Psycopg2 | The most popular PostgreSQL database adapter for Python. | pip install psycopg2
| docs |
Pyarrow | The Python API of Apache Arrow. Apache Arrow is a development platform for in-memory analytics. | pip install pyarrow | docs |
Pydantic | Data validation and settings management, provides a declarative syntax for defining data models with type hints and validation rules. | pip install pydantic | docs |
pymongo | Work with MongoDB: provides tools for connecting to a MongoDB instance, querying data, and performing CRUD operations. | Pip install pymongo | docs |
PySpark | Apache Spark: provides tools for distributed computing, data processing, and machine learning on large datasets. | pip install pyspark - or - Brew install apache-spark | docs |
PyStan | A Python interface to Stan, a probabilistic programming language for Bayesian inference and statistical modeling. | pip install pystan | docs |
PyPubSub | A publish-subscribe API to facilitate event-based programming and decoupling an application’s in-memory components. | pip install pypubsub | docs |
pysqlite3 | Work with SQLite databases, provides tools for connecting to a database, querying data, and performing CRUD operations. | https://www.sqlite.org/download.html | docs |
pywt | Wavelet transforms and signal processing, provides tools for time-frequency analysis, denoising, and compression. | pip install PyWavelets | docs |
Re (regex) | Regular expressions, provides tools for pattern matching and string manipulation based on a specified pattern. | pip install regex | docs |
requests | Elegant mapping of the HTTP protocol onto Python's object-oriented semantics. | pip install requests | docs |
SciPy | Scientific computing and technical computing, provides tools for optimization, interpolation, signal processing, and statistics. | pip install scipy | docs |
sklearn | Machine learning tools for classification, regression, clustering, dimensionality reduction, and model selection. | pip install -U scikit-learn | docs |
spaCy | Natural Language Processing (NLP) library. | pip install -U spacy | docs |
statsmodels | Statistical modeling and analysis tools for regression analysis, time series analysis, and hypothesis testing. | pip install statsmodels | docs |
Tensorflow | Machine learning and deep learning tools for building and training neural networks and other machine learning models. | python3 -m pip install tensorflow | docs |
Ydata Profiling | Data profiling tools for generating data profiling reports with statistics and visualizations. | pip install ydata-profiling | docs |
zipfile | Compress and decompress files using the ZIP compression algorithm. | ZipFile is installed natively in Python3 and it’s unlikely you will need to install it. | docs |