Awesome Parquet

Useful resources for using the Parquet format

Libraries

C GLib

Arrow GLib - A wrapper library for Arrow C++.
DuckDB - An in-process database library that supports reading and writing Parquet files.

C++

Apache Arrow C++ - A library with support for reading and writing Parquet files.
DuckDB C++ API - Internal DuckDB C++ API.
libcudf - A GPU-accelerated DataFrame library for tabular data processing.

Dart

DuckDB.Dart - DuckDB Dart bindings.

Go

duckdb-go - DuckDB Go client.
parquet - Official Go implementation of Apache Arrow.
parsyl/parquet - A Go library for reading and writing Parquet files.

Java

cudf - Java bindings for cudf, to be able to process large amounts of data on a GPU.
duckdb-java - DuckDB Java/JDBC API.
parquet-carpet - A Java library for serializing and deserializing Parquet files efficiently using Java records.
parquet-java - A Java implementation of the Parquet format, owned by the Apache Software Foundation.

JavaScript

duckdb-wasm - WebAssembly version of DuckDB.
duckdb-node-neo - DuckDB Node.js client.
hyparquet - A lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files.
parquet-wasm - WebAssembly bindings to read and write the Apache Parquet format to and from Apache Arrow using the Rust parquet and arrow crates.

Julia

DuckDB - Official DuckDB Julia package.
Parquet.jl - Julia implementation of Parquet columnar file format reader.

.NET

ParquetSharp - A .NET wrapper over the C++ Parquet library that integrates with .NET Arrow.
Parquet.Net - A fully managed Parquet library for .NET.

PHP

duckdb-php - DuckDB API for PHP.

Python

duckdb-python - DuckDB Python client.
pyarrow - A Python API for functionality provided by the Arrow C++ libraries, along with tools for Arrow integration and interoperability with Pandas, NumPy, and other software in the Python ecosystem.
pylibcudf - A lightweight Cython interface to libcudf that provides near-zero overhead for GPU-accelerated data processing in Python.
fastparquet - A Python implementation of the Parquet columnar file format.

R

arrow - The arrow package provides an Arrow C++ backend to dplyr, and access to the Arrow C++ library through familiar base R and tidyverse functions, or R6 classes.
duckdb-r - DuckDB R package.
nanoparquet - A reader and writer for a common subset of Parquet files.

Ruby

Red Parquet - The Ruby bindings of Apache Parquet, based on GObject Introspection.

Rust

datafusion - An extensible query engine written in Rust that can read/write Parquet files using SQL or a DataFrame API.
duckdb-rs - DuckDB Rust client.
parquet - The official Native Rust implementation of Apache Parquet, part of the Apache Arrow project.
Polars - A DataFrame interface on top of an OLAP Query Engine that supports reading and writing Parquet files, with bindings for Python.

Swift

duckdb-swift - DuckDB Swift client.

Tools

Command-line

DataFusion CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
DuckDB CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
parquet-tools - Python-based CLI tool for exploring parquet files (part of Apache Arrow).
parquet-cli - Java-based CLI tool for exploring parquet files.
parquet-cli-standalone - A JAR file for the parquet-cli tool which can be run without any dependencies.
Spark - A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Tabiew - A lightweight TUI application to view and query tabular data files, such as CSV, TSV, and parquet.

Desktop applications

Pink Parquet - A free and open-source, user-friendly viewer for Parquet files for Windows.
Tad - An application for viewing and analyzing tabular data sets.

Plugins

nf-parquet - A Nextflow plugin able to read and write parquet files.

Web

ChatDB - Online tools for viewing and converting from and to Parquet files.
DataConverter.io - Online tools for viewing, converting, and transforming Parquet files.
Datasette - A tool to explore datasets, with support for reading Parquet files.
Onyxia Data Explorer - A web-based tool to explore Parquet files in the browser.
Parquet File Visualizer - Claude-code generated parquet metadata vizualizer that runs in your browser.
Parquet Viewer - View parquet files online.
Quak - A scalable data profiler for quickly scanning large tables.

Resources

Blogs

icem7 - Un blog sur les outils de data science, avec des articles de fond sur Parquet.
Hyparquet: The Quest for Instant Data - 6 optimization tricks to read Parquet files faster in the browser.
Querying Parquet with Precision Using DuckDB - Describes how DuckDB optimizes queries to a Parquet file using projection & filter pushdown.
Why Parquet Is the Go-To Format for Data Engineers - A graphical description of the Parquet format with optimization and best practices.

Documentation

Parquet - The specification for Apache Parquet and Apache Thrift definitions to read and write Parquet metadata.
Apache Parquet Documentation - The official documentation for Apache Parquet.

Educative resources

ssphub - Un atelier de l'Insee illustrant l'utilisation des données du recensement 🇫🇷 diffusées au format Parquet.

Tests

parquet-testing - Testing Data and Utilities for Apache Parquet.

Related formats

F3 - A data file format that is designed with efficiency, interoperability, and extensibility in mind.
GeoParquet - Specification for storing geospatial vector data (point, line, polygon) in Parquet.
Iceberg - A high-performance format for huge analytic tables, that supports Parquet as one of its storage formats.
Lance - Modern columnar data format for ML and LLMs.
Nimble - File format for storage of large columnar datasets.
ORC - Self-describing type-aware columnar file format designed for Hadoop workloads.
Vortex - A columnar file format designed for high-performance data processing.

Contributing

Contributions welcome! Read the contribution guidelines first.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
assets		assets
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Parquet

Contents

Libraries

C GLib

C++

Dart

Go

Java

JavaScript

Julia

.NET

PHP

Python

R

Ruby

Rust

Swift

Tools

Command-line

Desktop applications

Plugins

Web

Resources

Blogs

Documentation

Educative resources

Tests

Related formats

Contributing

About

Uh oh!

Contributors 7

License

severo/awesome-parquet

Folders and files

Latest commit

History

Repository files navigation

Awesome Parquet

Contents

Libraries

C GLib

C++

Dart

Go

Java

JavaScript

Julia

.NET

PHP

Python

R

Ruby

Rust

Swift

Tools

Command-line

Desktop applications

Plugins

Web

Resources

Blogs

Documentation

Educative resources

Tests

Related formats

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors 7