Skip to content

djfrancesco/awesome-parquet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Parquet Awesome

Parquet Logo

Useful resources for using the Parquet format

Contents

Libraries

Multiple languages

  • Apache Arrow - A library with support for reading and writing Parquet files, with multiple packages for C++, Java, JavaScript, Python, R, Rust, and more.
  • DuckDB - An in-process database library that supports reading and writing Parquet files, with multiple packages for C, Java, Python, R, JavaScript (WASM), and more.

Go

  • parquet - A Go library for reading and writing Parquet files.

Java

  • parquet-carpet - A Java library for serializing and deserializing Parquet files efficiently using Java records.
  • parquet-java - A Java implementation of the Parquet format, owned by the Apache Software Foundation.

JavaScript

  • hyparquet - A lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files.
  • parquet-wasm - WebAssembly bindings to read and write the Apache Parquet format to and from Apache Arrow using the Rust parquet and arrow crates.

Python

  • fastparquet - A Python implementation of the Parquet columnar file format.
  • petastorm - Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Python-based ML training frameworks.
  • dask - Dask is a flexible parallel computing library for analytics that can efficiently load and process multiple Parquet files as a unified dataset, enabling distributed computations on datasets larger than memory.

R

  • nanoparquet - A reader and writer for a common subset of Parquet files.

Rust

  • Polars - A DataFrame interface on top of an OLAP Query Engine that supports reading and writing Parquet files, with bindings for Python.

Tools

Command-line

  • DuckDB CLI - A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
  • parquet-tools - Python-based CLI tool for exploring parquet files (part of Apache Arrow).
  • parquet-cli - Java-based CLI tool for exploring parquet files.
  • parquet-cli-standalone - A JAR file for the parquet-cli tool which can be run without any dependencies.
  • Spark - A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
  • Tabiew - A lightweight TUI application to view and query tabular data files, such as CSV, TSV, and parquet.

Desktop applications

  • Pink Parquet - A free and open-source, user-friendly viewer for Parquet files for Windows.
  • Tad - An application for viewing and analyzing tabular data sets.

Plugins

  • nf-parquet - A Nextflow plugin able to read and write parquet files.

Web

  • ChatDB - Online tools for viewing and converting from and to Parquet files.
  • DataConverter.io - Online tools for viewing, converting, and transforming Parquet files.
  • Datasette - A tool to explore datasets, with support for reading Parquet files.
  • Onyxia Data Explorer - A web-based tool to explore Parquet files in the browser.
  • Quak - A scalable data profiler for quickly scanning large tables.

Resources

Blogs

Documentation

  • Parquet - The specification for Apache Parquet and Apache Thrift definitions to read and write Parquet metadata.
  • Apache Parquet Documentation - The official documentation for Apache Parquet.

Educative resources

  • ssphub - Un atelier de l'Insee illustrant l'utilisation des données du recensement 🇫🇷 diffusées au format Parquet.

Contributing

Contributions welcome! Read the contribution guidelines first.

About

Useful resources for using the Parquet format

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published