Useful resources for using the Parquet format
- Arrow GLib - A wrapper library for Arrow C++.
- DuckDB - An in-process database library that supports reading and writing Parquet files.
- Apache Arrow C++ - A library with support for reading and writing Parquet files.
- DuckDB C++ API - Internal DuckDB C++ API.
- libcudf - A GPU-accelerated DataFrame library for tabular data processing.
- DuckDB.Dart - DuckDB Dart bindings.
- duckdb-go - DuckDB Go client.
- parquet - Official Go implementation of Apache Arrow.
- parsyl/parquet - A Go library for reading and writing Parquet files.
- cudf - Java bindings for cudf, to be able to process large amounts of data on a GPU.
- duckdb-java - DuckDB Java/JDBC API.
- parquet-carpet - A Java library for serializing and deserializing Parquet files efficiently using Java records.
- parquet-java - A Java implementation of the Parquet format, owned by the Apache Software Foundation.
- duckdb-wasm - WebAssembly version of DuckDB.
- duckdb-node-neo - DuckDB Node.js client.
- hyparquet - A lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files.
- parquet-wasm - WebAssembly bindings to read and write the Apache Parquet format to and from Apache Arrow using the Rust parquet and arrow crates.
- DuckDB - Official DuckDB Julia package.
- Parquet.jl - Julia implementation of Parquet columnar file format reader.
- ParquetSharp - A .NET wrapper over the C++ Parquet library that integrates with .NET Arrow.
- Parquet.Net - A fully managed Parquet library for .NET.
- duckdb-php - DuckDB API for PHP.
- duckdb-python - DuckDB Python client.
- pyarrow - A Python API for functionality provided by the Arrow C++ libraries, along with tools for Arrow integration and interoperability with Pandas, NumPy, and other software in the Python ecosystem.
- pylibcudf - A lightweight Cython interface to libcudf that provides near-zero overhead for GPU-accelerated data processing in Python.
- fastparquet - A Python implementation of the Parquet columnar file format.
- arrow - The
arrowpackage provides an Arrow C++ backend todplyr, and access to the Arrow C++ library through familiar base R and tidyverse functions, orR6classes. - duckdb-r - DuckDB R package.
- nanoparquet - A reader and writer for a common subset of Parquet files.
- Red Parquet - The Ruby bindings of Apache Parquet, based on GObject Introspection.
- datafusion - An extensible query engine written in Rust that can read/write Parquet files using SQL or a DataFrame API.
- duckdb-rs - DuckDB Rust client.
- parquet - The official Native Rust implementation of Apache Parquet, part of the Apache Arrow project.
- Polars - A DataFrame interface on top of an OLAP Query Engine that supports reading and writing Parquet files, with bindings for Python.
- duckdb-swift - DuckDB Swift client.
- DataFusion CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
- DuckDB CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
- parquet-tools - Python-based CLI tool for exploring parquet files (part of Apache Arrow).
- parquet-cli - Java-based CLI tool for exploring parquet files.
- parquet-cli-standalone - A JAR file for the parquet-cli tool which can be run without any dependencies.
- Spark - A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
- Tabiew - A lightweight TUI application to view and query tabular data files, such as CSV, TSV, and parquet.
- Pink Parquet - A free and open-source, user-friendly viewer for Parquet files for Windows.
- Tad - An application for viewing and analyzing tabular data sets.
- nf-parquet - A Nextflow plugin able to read and write parquet files.
- ChatDB - Online tools for viewing and converting from and to Parquet files.
- DataConverter.io - Online tools for viewing, converting, and transforming Parquet files.
- Datasette - A tool to explore datasets, with support for reading Parquet files.
- Onyxia Data Explorer - A web-based tool to explore Parquet files in the browser.
- Parquet File Visualizer - Claude-code generated parquet metadata vizualizer that runs in your browser.
- Parquet Viewer - View parquet files online.
- Quak - A scalable data profiler for quickly scanning large tables.
- icem7 - Un blog sur les outils de data science, avec des articles de fond sur Parquet.
- Hyparquet: The Quest for Instant Data - 6 optimization tricks to read Parquet files faster in the browser.
- Querying Parquet with Precision Using DuckDB - Describes how DuckDB optimizes queries to a Parquet file using projection & filter pushdown.
- Why Parquet Is the Go-To Format for Data Engineers - A graphical description of the Parquet format with optimization and best practices.
- Parquet - The specification for Apache Parquet and Apache Thrift definitions to read and write Parquet metadata.
- Apache Parquet Documentation - The official documentation for Apache Parquet.
- ssphub - Un atelier de l'Insee illustrant l'utilisation des données du recensement 🇫🇷 diffusées au format Parquet.
- parquet-testing - Testing Data and Utilities for Apache Parquet.
- F3 - A data file format that is designed with efficiency, interoperability, and extensibility in mind.
- GeoParquet - Specification for storing geospatial vector data (point, line, polygon) in Parquet.
- Iceberg - A high-performance format for huge analytic tables, that supports Parquet as one of its storage formats.
- Lance - Modern columnar data format for ML and LLMs.
- Nimble - File format for storage of large columnar datasets.
- ORC - Self-describing type-aware columnar file format designed for Hadoop workloads.
- Vortex - A columnar file format designed for high-performance data processing.
Contributions welcome! Read the contribution guidelines first.