Skip to content

Apsalar/parquet-writer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This project provides libparquetfile, a C++ library which can generate parquet files.

Additionally the proto2parq application is provided which can convert a data files or streams containing protobuf defined records into parquet format.

Acknowledgements

The early development on this project was inspired by Neal Sidhwaney's cpp-parquet project.

Some of the parquet writing C++ components are extracted from the Impala Database Project.

Building

You'll need the Thrift development tools installed:

sudo dnf install thrift-devel

Update the parquet-format submodule:

git submodule update --init

Build with make from the top-level directory:

gmake

Running the Sample Program

The sample program generates the sample data described in the Dremel paper and the parquet-mr annotation document in protobuf format.

The output of this program can be piped into proto2parq and converted to parquet format:

cd sample/OBJDIR
./sample | ../../proto2parq/OBJDIR/proto2parq --outfile=sample.parquet

The protobuf schema is prepended to the begining of the protobuf data output.

Releases

No releases published

Packages

No packages published

Languages