Awesome Parquet

Useful resources for using the Parquet format

Libraries

Multiple languages

Apache Arrow - A library with support for reading and writing Parquet files, with multiple packages for C++, Java, JavaScript, Python, R, Rust, and more.
DuckDB - An in-process database library that supports reading and writing Parquet files, with multiple packages for C, Java, Python, R, JavaScript (WASM), and more.

Go

parquet - A Go library for reading and writing Parquet files.

Java

parquet-carpet - A Java library for serializing and deserializing Parquet files efficiently using Java records.
parquet-java - A Java implementation of the Parquet format, owned by the Apache Software Foundation.

JavaScript

hyparquet - A lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files.
parquet-wasm - WebAssembly bindings to read and write the Apache Parquet format to and from Apache Arrow using the Rust parquet and arrow crates.

Python

fastparquet - A Python implementation of the Parquet columnar file format.
petastorm - Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Python-based ML training frameworks.
dask - Dask is a flexible parallel computing library for analytics that can efficiently load and process multiple Parquet files as a unified dataset, enabling distributed computations on datasets larger than memory.

R

nanoparquet - A reader and writer for a common subset of Parquet files.

Rust

Polars - A DataFrame interface on top of an OLAP Query Engine that supports reading and writing Parquet files, with bindings for Python.

Tools

Command-line

DuckDB CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
parquet-tools - Python-based CLI tool for exploring parquet files (part of Apache Arrow).
parquet-cli - Java-based CLI tool for exploring parquet files.
parquet-cli-standalone - A JAR file for the parquet-cli tool which can be run without any dependencies.
Spark - A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Tabiew - A lightweight TUI application to view and query tabular data files, such as CSV, TSV, and parquet.

Desktop applications

Pink Parquet - A free and open-source, user-friendly viewer for Parquet files for Windows.
Tad - An application for viewing and analyzing tabular data sets.

Plugins

nf-parquet - A Nextflow plugin able to read and write parquet files.

Web

ChatDB - Online tools for viewing and converting from and to Parquet files.
DataConverter.io - Online tools for viewing, converting, and transforming Parquet files.
Datasette - A tool to explore datasets, with support for reading Parquet files.
Onyxia Data Explorer - A web-based tool to explore Parquet files in the browser.
Quak - A scalable data profiler for quickly scanning large tables.

Resources

Blogs

icem7 - Un blog sur les outils de data science, avec des articles de fond sur Parquet.
Hyparquet: The Quest for Instant Data - 6 optimization tricks to read Parquet files faster in the browser.
Querying Parquet with Precision Using DuckDB - Describes how DuckDB optimizes queries to a Parquet file using projection & filter pushdown.
Why Parquet Is the Go-To Format for Data Engineers - A graphical description of the Parquet format with optimization and best practices.

Documentation

Parquet - The specification for Apache Parquet and Apache Thrift definitions to read and write Parquet metadata.
Apache Parquet Documentation - The official documentation for Apache Parquet.

Educative resources

ssphub - Un atelier de l'Insee illustrant l'utilisation des données du recensement 🇫🇷 diffusées au format Parquet.

Contributing

Contributions welcome! Read the contribution guidelines first.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
assets		assets
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Parquet

Contents

Libraries

Multiple languages

Go

Java

JavaScript

Python

R

Rust

Tools

Command-line

Desktop applications

Plugins

Web

Resources

Blogs

Documentation

Educative resources

Contributing

About

Uh oh!

Releases

Packages

License

djfrancesco/awesome-parquet

Folders and files

Latest commit

History

Repository files navigation

Awesome Parquet

Contents

Libraries

Multiple languages

Go

Java

JavaScript

Python

R

Rust

Tools

Command-line

Desktop applications

Plugins

Web

Resources

Blogs

Documentation

Educative resources

Contributing

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages