A fast, pure C library for reading and writing Apache Parquet files.
- Pure C11 with three external dependencies (zstd, zlib, lz4) -- all auto-fetched by CMake
- ~200KB binary vs ~50MB+ for Arrow
- Built-in CLI for file inspection (
schema,info,head,tail,stat, ...) and C code generation (codegen) - 70x faster reads than Arrow C++ on uncompressed data (mmap zero-copy), 150x faster than PyArrow
- 1.2-2.6x faster compressed reads than Arrow C++ on the same file (cross-read benchmark)
- Writes 1.0-2.3x faster than Arrow C++ across codecs and platforms
- Reads 10M uncompressed rows in 0.25ms (mmap zero-copy on Apple M3)
- Full Parquet spec: all types, encodings, compression codecs, nested schemas, bloom filters, page indexes
- SIMD-optimized (SSE4.2, AVX2, AVX-512, NEON, SVE) with runtime detection and scalar fallbacks
- PyArrow, DuckDB, Spark compatible out of the box
Carquet vs Arrow C++ 23.0.1 at 10M rows (the most representative size). Higher ratio = Carquet faster.
| x86 (Xeon D-1531) | ARM (Apple M3) | |||
|---|---|---|---|---|
| Codec | Write | Read | Write | Read |
| snappy | 1.55x | 1.25x | 1.10x | 1.53x |
| zstd | 1.31x | 1.04x | 1.37x | 1.28x |
| lz4 | 1.02x | 0.83x | 1.25x | 0.96x |
| none | 1.13x | 40.6x* | 1.33x | 70.4x* |
* Uncompressed reads use mmap zero-copy -- see note below.
Compressed reads involve full decompression and decoding of every value, no shortcuts — and both libraries use the same system lz4/zstd shared libraries, so the raw codec speed is identical. The most meaningful comparison is the same-file cross-read table (below), where both libraries read the exact same Parquet file: Carquet reads compressed data 1.5-2.6x faster than Arrow C++ on that apples-to-apples test.
Benchmark methodology
All benchmarks use identical data (deterministic LCG PRNG), identical Parquet settings (no dictionary, BYTE_STREAM_SPLIT for floats, page checksums, mmap reads), trimmed median of 11-51 iterations, with OS page cache purged between write and read phases and cooldown between configurations. Schema: 3 columns (INT64, DOUBLE, INT32). Compared against Arrow C++ 23.0.1 low-level Parquet reader (bypassing Arrow Table materialization) and PyArrow 23.0.1.
The same-file cross-read benchmark is the fairest comparison: both libraries read the exact same Parquet file (written by one, read by both). This eliminates differences in page sizes, encoding choices, and row group layout.
Uncompressed reads marked with * use Carquet's mmap zero-copy path: for PLAIN-encoded, uncompressed, fixed-size, required columns, the batch reader returns pointers directly into the memory-mapped file with no memcpy. Arrow always materializes into its own buffers. The compressed read numbers are the most representative measure of end-to-end read throughput.
Full x86 results (Intel Xeon D-1531, Linux)
12 threads @ 2.2GHz, 32GB RAM, Ubuntu 24.04 -- ZSTD level 1
| Codec | Carquet Write | Arrow C++ Write | W ratio | Carquet Read | Arrow C++ Read | R ratio | Size |
|---|---|---|---|---|---|---|---|
| none | 1557ms | 1766ms | 1.13x | 1.25ms | 50.8ms | 40.6x* | 190.7MB |
| snappy | 1002ms | 1549ms | 1.55x | 78ms | 97.8ms | 1.25x | 125.1MB |
| zstd | 1311ms | 1714ms | 1.31x | 76.8ms | 80.2ms | 1.04x | 95.3MB |
| lz4 | 1521ms | 1554ms | 1.02x | 59.1ms | 49.0ms | 0.83x | 122.9MB |
| Codec | Carquet Write | Arrow C++ Write | W ratio | Carquet Read | Arrow C++ Read | R ratio |
|---|---|---|---|---|---|---|
| none | 180ms | 196ms | 1.09x | 0.22ms | 6.2ms | 28x* |
| snappy | 141ms | 148ms | 1.05x | 8.1ms | 11.6ms | 1.44x |
| zstd | 131ms | 185ms | 1.41x | 10.3ms | 9.1ms | 0.88x |
| lz4 | 143ms | 149ms | 1.04x | 8.5ms | 6.1ms | 0.72x |
| Codec | Carquet Write | Arrow C++ Write | W ratio | Carquet Read | Arrow C++ Read | R ratio |
|---|---|---|---|---|---|---|
| none | 14.1ms | 18.4ms | 1.30x | 0.11ms | 2.18ms | 19.8x* |
| snappy | 10.1ms | 10.6ms | 1.05x | 1.27ms | 5.97ms | 4.70x |
| zstd | 8.7ms | 14.1ms | 1.62x | 1.58ms | 3.88ms | 2.46x |
| lz4 | 9.6ms | 11.0ms | 1.14x | 0.77ms | 2.78ms | 3.61x |
Both libraries read the same Parquet file — the fairest apples-to-apples comparison.
| Codec | Writer | Carquet Read | Arrow C++ Read | Ratio |
|---|---|---|---|---|
| none | Carquet | 0.99ms | 73.6ms | 74x* |
| none | Arrow | 7.6ms | 51.2ms | 6.8x* |
| snappy | Carquet | 41.0ms | 107ms | 2.61x |
| snappy | Arrow | 43.4ms | 101ms | 2.33x |
| zstd | Carquet | 46.1ms | 88.4ms | 1.92x |
| zstd | Arrow | 49.1ms | 79.5ms | 1.62x |
| lz4 | Carquet | 34.8ms | 74.8ms | 2.15x |
| lz4 | Arrow | 27.4ms | 52.0ms | 1.90x |
| Codec | Carquet Write | PyArrow Write | W ratio | Carquet Read | PyArrow Read | R ratio |
|---|---|---|---|---|---|---|
| none | 1557ms | 1806ms | 1.16x | 1.25ms | 213ms | 170x* |
| snappy | 1002ms | 1649ms | 1.65x | 78ms | 384ms | 4.91x |
| zstd | 1311ms | 1796ms | 1.37x | 76.8ms | 369ms | 4.81x |
| lz4 | 1521ms | 1676ms | 1.10x | 59.1ms | 281ms | 4.76x |
* Zero-copy mmap path
Full ARM results (Apple M3, macOS)
MacBook Air M3, 16GB RAM, macOS 26.2, Arrow C++ 23.0.1, PyArrow 23.0.1 -- ZSTD level 1
| Codec | Carquet Write | Arrow C++ Write | W ratio | Carquet Read | Arrow C++ Read | R ratio | Size |
|---|---|---|---|---|---|---|---|
| none | 99.4ms | 131.9ms | 1.33x | 0.25ms | 17.59ms | 70.4x* | 190.7MB |
| snappy | 231.0ms | 253.1ms | 1.10x | 16.15ms | 24.75ms | 1.53x | 125.1MB |
| zstd | 253.3ms | 347.5ms | 1.37x | 22.91ms | 29.38ms | 1.28x | 95.3MB |
| lz4 | 198.3ms | 248.8ms | 1.25x | 18.90ms | 18.05ms | 0.96x | 122.9MB |
| Codec | Carquet Write | Arrow C++ Write | W ratio | Carquet Read | Arrow C++ Read | R ratio |
|---|---|---|---|---|---|---|
| none | 7.57ms | 12.91ms | 1.71x | 0.05ms | 1.77ms | 35.4x* |
| snappy | 13.43ms | 24.50ms | 1.82x | 1.52ms | 2.55ms | 1.68x |
| zstd | 15.05ms | 34.12ms | 2.27x | 2.29ms | 3.06ms | 1.34x |
| lz4 | 13.09ms | 25.11ms | 1.92x | 1.03ms | 1.74ms | 1.69x |
| Codec | Carquet Write | Arrow C++ Write | W ratio | Carquet Read | Arrow C++ Read | R ratio |
|---|---|---|---|---|---|---|
| none | 1.13ms | 1.56ms | 1.38x | 0.02ms | 0.23ms | 11.5x* |
| snappy | 1.64ms | 2.50ms | 1.52x | 0.37ms | 0.90ms | 2.43x |
| zstd | 1.69ms | 3.52ms | 2.08x | 0.64ms | 1.31ms | 2.05x |
| lz4 | 1.58ms | 2.49ms | 1.58x | 0.25ms | 0.57ms | 2.28x |
Both libraries read the same Parquet file — the fairest apples-to-apples comparison.
| Codec | Writer | Carquet Read | Arrow C++ Read | Ratio |
|---|---|---|---|---|
| none | Carquet | 0.36ms | 18.33ms | 50.9x* |
| none | Arrow | 1.01ms | 17.60ms | 17.4x* |
| snappy | Carquet | 20.54ms | 24.52ms | 1.19x |
| snappy | Arrow | 14.91ms | 23.65ms | 1.59x |
| zstd | Carquet | 23.11ms | 34.71ms | 1.50x |
| zstd | Arrow | 22.03ms | 29.87ms | 1.36x |
| lz4 | Carquet | 10.96ms | 18.54ms | 1.69x |
| lz4 | Arrow | 10.54ms | 17.43ms | 1.65x |
| Codec | Carquet Write | PyArrow Write | W ratio | Carquet Read | PyArrow Read | R ratio |
|---|---|---|---|---|---|---|
| none | 99.4ms | 193.4ms | 1.95x | 0.25ms | 37.64ms | 150.6x* |
| snappy | 231.0ms | 306.3ms | 1.33x | 16.15ms | 48.01ms | 2.97x |
| zstd | 253.3ms | 405.7ms | 1.60x | 22.91ms | 61.63ms | 2.69x |
| lz4 | 198.3ms | 309.4ms | 1.56x | 18.90ms | 40.09ms | 2.12x |
| Codec | Carquet Write | PyArrow Write | W ratio | Carquet Read | PyArrow Read | R ratio |
|---|---|---|---|---|---|---|
| none | 7.57ms | 18.41ms | 2.43x | 0.05ms | 2.63ms | 52.6x* |
| snappy | 13.43ms | 30.73ms | 2.29x | 1.52ms | 3.65ms | 2.40x |
| zstd | 15.05ms | 39.84ms | 2.65x | 2.29ms | 4.43ms | 1.93x |
| lz4 | 13.09ms | 30.27ms | 2.31x | 1.03ms | 3.10ms | 3.01x |
| Codec | Carquet Write | PyArrow Write | W ratio | Carquet Read | PyArrow Read | R ratio |
|---|---|---|---|---|---|---|
| none | 1.13ms | 1.95ms | 1.73x | 0.02ms | 0.23ms | 11.5x* |
| snappy | 1.64ms | 2.98ms | 1.82x | 0.37ms | 0.59ms | 1.59x |
| zstd | 1.69ms | 4.15ms | 2.46x | 0.64ms | 0.81ms | 1.27x |
| lz4 | 1.58ms | 3.05ms | 1.93x | 0.25ms | 0.40ms | 1.60x |
* Zero-copy mmap path
- C11 compiler (GCC 4.9+, Clang 3.4+, MSVC 2015+)
- CMake 3.16+
- zstd, zlib, lz4 (auto-fetched if missing)
- OpenMP (optional, for parallel column reading)
git clone https://github.com/Vitruves/carquet.git
cd carquet
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)| Option | Default | Description |
|---|---|---|
CARQUET_BUILD_DEV |
OFF | Build everything (tests, examples, benchmarks) |
CARQUET_BUILD_TESTS |
OFF | Build test suite only |
CARQUET_BUILD_CLI |
ON | Build carquet CLI tool |
CARQUET_BUILD_SHARED |
OFF | Build shared library instead of static |
CARQUET_NATIVE_ARCH |
OFF | -march=native for max performance |
CARQUET_ENABLE_SVE |
OFF | ARM SVE (experimental) |
All x86 SIMD (SSE, AVX, AVX2, AVX-512) and ARM NEON are auto-detected and enabled by default.
All build options
| Option | Default | Description |
|---|---|---|
CARQUET_BUILD_EXAMPLES |
OFF | Build example programs |
CARQUET_BUILD_BENCHMARKS |
OFF | Build benchmark and profiling programs |
CARQUET_BUILD_ARROW_CPP_BENCHMARK |
OFF | Optional Arrow C++ comparison benchmark |
CARQUET_BUILD_INTEROP |
OFF | Build interoperability tests |
CARQUET_BUILD_FUZZ |
OFF | Build fuzz targets |
CARQUET_ENABLE_SSE |
ON | SSE optimizations (x86, auto-detected) |
CARQUET_ENABLE_AVX |
ON | AVX optimizations (x86, auto-detected) |
CARQUET_ENABLE_AVX2 |
ON | AVX2 optimizations (x86, auto-detected) |
CARQUET_ENABLE_AVX512 |
ON | AVX-512 optimizations (x86, auto-detected) |
CARQUET_ENABLE_NEON |
ON | NEON optimizations (ARM, auto-detected) |
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
sudo cmake --install buildThis installs:
libcarquet.a(or.so/.dylibwith-DCARQUET_BUILD_SHARED=ON)include/carquet/headerscarquetCLI binary
After installation, link your project with -lcarquet.
You can use the CLI directly if you want to create a file reader:
carquet info data.parquet
carquet codegen -f data.parquet -o reader.ccmake -B build -DCARQUET_BUILD_DEV=ON
cmake --build build -j$(nproc)
cd build && ctest --output-on-failureCarquet ships with a command-line tool for inspecting Parquet files and generating C reader code. Built and installed by default alongside the library.
Commands:
schema Print file schema
info Print detailed file metadata
head Print first N rows
tail Print last N rows
count Print total row count
columns List column names (one per line)
stat Print column statistics
validate Verify file integrity
sample Print N random rows
codegen Generate C reader code
carquet schema data.parquet
carquet head -n 20 data.parquet
carquet stat data.parquet
carquet validate data.parquetGenerate a complete, compilable C reader from any Parquet file's schema:
carquet codegen -f data.parquet -o reader.c
# Generated: reader.c
# Compile: clang -o reader reader.c -I.../include -L.../build -lcarquet ...
./reader # reads data.parquet (embedded as default)
./reader other.parquet # override with different fileOptions:
| Flag | Description |
|---|---|
-f, --file FILE |
Parquet file to inspect schema from |
-o, --output FILE |
Output source file (default: stdout) |
--mmap |
Use memory-mapped I/O in generated code |
--skeleton |
Generate empty process_batch for custom logic |
-c, --columns COLS |
Comma-separated column filter |
-b, --batch-size N |
Batch size (default: 1024) |
The top-level README is intentionally short. For day-to-day usage, prefer the versioned manual in docs/:
- Manual index
- Reading files
- Writing files
- Nested and nullable data
- Performance and tuning
- Error handling and type reference
#include <carquet/carquet.h>
int main(void) {
carquet_error_t err = CARQUET_ERROR_INIT;
// Define schema
carquet_schema_t* schema = carquet_schema_create(&err);
carquet_schema_add_column(schema, "id", CARQUET_PHYSICAL_INT64, NULL, CARQUET_REPETITION_REQUIRED, 0, 0);
carquet_schema_add_column(schema, "value", CARQUET_PHYSICAL_DOUBLE, NULL, CARQUET_REPETITION_REQUIRED, 0, 0);
// Configure writer
carquet_writer_options_t opts;
carquet_writer_options_init(&opts);
opts.compression = CARQUET_COMPRESSION_ZSTD;
// Write
carquet_writer_t* w = carquet_writer_create("output.parquet", schema, &opts, &err);
int64_t ids[] = {1, 2, 3, 4, 5};
double values[] = {1.1, 2.2, 3.3, 4.4, 5.5};
carquet_writer_write_batch(w, 0, ids, 5, NULL, NULL);
carquet_writer_write_batch(w, 1, values, 5, NULL, NULL);
carquet_writer_close(w);
carquet_schema_free(schema);
return 0;
}#include <carquet/carquet.h>
#include <stdio.h>
int main(void) {
carquet_error_t err = CARQUET_ERROR_INIT;
// Open with mmap for best read performance
carquet_reader_options_t opts;
carquet_reader_options_init(&opts);
opts.use_mmap = true;
carquet_reader_t* r = carquet_reader_open("output.parquet", &opts, &err);
if (!r) { printf("Error: %s\n", err.message); return 1; }
printf("Rows: %lld, Columns: %d\n",
(long long)carquet_reader_num_rows(r),
carquet_reader_num_columns(r));
// Batch reader for efficient iteration
carquet_batch_reader_config_t cfg;
carquet_batch_reader_config_init(&cfg);
cfg.batch_size = 65536;
carquet_batch_reader_t* br = carquet_batch_reader_create(r, &cfg, &err);
carquet_row_batch_t* batch = NULL;
while (carquet_batch_reader_next(br, &batch) == CARQUET_OK && batch) {
const void* data;
const uint8_t* nulls;
int64_t n;
carquet_row_batch_column(batch, 0, &data, &nulls, &n);
const int64_t* ids = (const int64_t*)data;
// process ids[0..n-1] ...
carquet_row_batch_free(batch);
batch = NULL;
}
carquet_batch_reader_free(br);
carquet_reader_close(r);
return 0;
}// Schema with nullable column
carquet_schema_add_column(schema, "name", CARQUET_PHYSICAL_BYTE_ARRAY,
NULL, CARQUET_REPETITION_OPTIONAL, 0, 0);
// Write with definition levels (1 = present, 0 = null)
carquet_byte_array_t names[] = {{(uint8_t*)"Alice", 5}, {(uint8_t*)"Bob", 3}};
int16_t def_levels[] = {1, 0, 1}; // Alice, NULL, Bob (3 rows, 2 values)
carquet_writer_write_batch(writer, col, names, 3, def_levels, NULL);// list<int32>
int32_t list_leaf = carquet_schema_add_list(
schema, "tags", CARQUET_PHYSICAL_INT32, NULL,
CARQUET_REPETITION_OPTIONAL, 0, 0);
// map<string, int32>
int32_t map_val = carquet_schema_add_map(
schema, "props",
CARQUET_PHYSICAL_BYTE_ARRAY, NULL, 0, // key: string
CARQUET_PHYSICAL_INT32, NULL, 0, // value: int32
CARQUET_REPETITION_OPTIONAL, 0);
// Write list data: row0=[100,200], row1=NULL, row2=[300]
int32_t vals[] = {100, 200, 300};
int16_t def[] = { 3, 3, 0, 3};
int16_t rep[] = { 0, 1, 0, 0};
carquet_writer_write_batch(writer, col, vals, 4, def, rep);carquet_batch_reader_config_t cfg;
carquet_batch_reader_config_init(&cfg);
// Read only specific columns
const char* names[] = {"id", "timestamp"};
cfg.column_names = names;
cfg.num_column_names = 2;Skip entire row groups that cannot match a query, based on column statistics:
// Filter callback: only read row groups where column 0 might have values > threshold
bool filter_fn(const carquet_reader_t* reader, int32_t rg, void* ctx) {
int64_t threshold = *(int64_t*)ctx;
bool might_match = true;
carquet_reader_row_group_matches(reader, rg, 0,
CARQUET_COMPARE_GT, &threshold, sizeof(threshold), &might_match);
return might_match;
}
int64_t threshold = 1000;
cfg.row_group_filter = filter_fn;
cfg.row_group_filter_ctx = &threshold;
// Non-matching row groups are skipped with zero I/OPre-buffer multiple columns in a single read (reduces seeks for fread path, no-op for mmap):
int32_t cols[] = {0, 2, 5};
carquet_reader_prebuffer(reader, 0, cols, 3, &err);
// Subsequent column reads from row group 0 use the cached data| Codec | Enum | Best For |
|---|---|---|
| ZSTD | CARQUET_COMPRESSION_ZSTD |
Best overall (great ratio + speed) |
| LZ4 | CARQUET_COMPRESSION_LZ4_RAW |
Read-heavy workloads (fastest decompression) |
| Snappy | CARQUET_COMPRESSION_SNAPPY |
Wide compatibility |
| GZIP | CARQUET_COMPRESSION_GZIP |
Maximum compatibility with older tools |
opts.compression = CARQUET_COMPRESSION_ZSTD;
opts.compression_level = 1; // 0 = codec default; ZSTD: 1-22, GZIP: 1-9carquet_writer_options_t opts;
carquet_writer_options_init(&opts);
opts.compression = CARQUET_COMPRESSION_ZSTD;
opts.row_group_size = 128 * 1024 * 1024; // 128 MB row groups
opts.write_statistics = true; // min/max for predicate pushdown
opts.write_crc = true; // CRC32 page verification
opts.write_bloom_filters = true; // bloom filters per column
opts.write_page_index = true; // column/offset page indexescarquet_error_t err = CARQUET_ERROR_INIT;
carquet_reader_t* r = carquet_reader_open("data.parquet", NULL, &err);
if (!r) {
printf("[%s] %s\n", carquet_status_name(err.code), err.message);
printf("Hint: %s\n", carquet_error_recovery_hint(err.code));
return 1;
}All functions return carquet_status_t or use carquet_error_t* out-parameters. Programming errors (NULL where a valid pointer is required) trigger assertions; runtime errors (bad files, OOM) return error codes.
Carquet files are fully compatible with PyArrow, DuckDB, Spark, and any Parquet reader:
import pyarrow.parquet as pq
table = pq.read_table("carquet_output.parquet") # just works-- DuckDB
SELECT * FROM read_parquet('carquet_output.parquet');Bidirectional interop testing:
cmake -B build -DCARQUET_BUILD_INTEROP=ON && cmake --build build
python3 interop/run_interop.py| Feature | Status |
|---|---|
| Physical types | All 8 (BOOLEAN through FIXED_LEN_BYTE_ARRAY) |
| Logical types | STRING, DATE, TIME, TIMESTAMP, DECIMAL, UUID, JSON |
| Encodings | PLAIN, RLE, DICTIONARY, DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY, BYTE_STREAM_SPLIT |
| Compression | UNCOMPRESSED, SNAPPY, GZIP, LZ4, ZSTD |
| Nested schemas | Groups, lists, maps with definition/repetition levels |
| Bloom filters | Read, write, and query (carquet_bloom_filter_check_*) |
| Page indexes | Column index + offset index (read + write + per-page stats access) |
| Statistics | Min/max/null count per column chunk |
| Predicate pushdown | Row group filtering via statistics; page-level via column index |
| Key-value metadata | Read and write arbitrary footer metadata |
| Per-column options | Per-column encoding, compression, statistics, bloom filter |
| Buffer writer | Write Parquet to in-memory buffer |
| CRC32 | Page-level verification (HW-accelerated on ARM) |
| Memory-mapped I/O | Zero-copy reads for uncompressed PLAIN data |
| Column projection | Read only selected columns |
| I/O coalescing | Pre-buffer multi-column reads in a single I/O |
| Speculative footer | Single-I/O file open for most files |
| OpenMP parallel reads | When available |
| Encryption | Not supported |
# Build with max optimizations
cmake -B build -DCMAKE_BUILD_TYPE=Release -DCARQUET_NATIVE_ARCH=ON -DCARQUET_BUILD_DEV=ON
cmake --build build -j$(nproc)
cd build
./benchmark_carquet # Carquet standalone
python3 ../benchmark/run_benchmark.py # Full comparison (+ PyArrow, + Arrow C++)
# Skip 100M-row (xlarge) configs — they write ~2GB files per codec
# and can take 30+ minutes depending on hardware
python3 ../benchmark/run_benchmark.py --skip-xlarge
# Override ZSTD level (default: 1)
CARQUET_BENCH_ZSTD_LEVEL=3 python3 ../benchmark/run_benchmark.pyOptional Arrow C++ benchmark
cmake -B build -DCMAKE_BUILD_TYPE=Release -DCARQUET_NATIVE_ARCH=ON \
-DCARQUET_BUILD_BENCHMARKS=ON \
-DCARQUET_BUILD_ARROW_CPP_BENCHMARK=ON
cmake --build build -j$(nproc)
# Or point at a custom Arrow install
cmake -B build ... -DCARQUET_ARROW_CPP_ROOT=/path/to/arrow-prefixThe Arrow C++ benchmark uses the low-level parquet::ParquetFileReader API (bypassing Arrow Table materialization overhead) with parallel row group readers. The same-file cross-read mode has both libraries read the exact same Parquet file, eliminating differences in page sizes, encoding, and row group layout. Both benchmarks use identical data, row group sizing, no dictionary, page checksums, mmap reads, BYTE_STREAM_SPLIT for floats.
Full API is in include/carquet/carquet.h. Key types:
| Type | Purpose |
|---|---|
carquet_reader_t |
File reader (open from path, FILE*, or memory buffer) |
carquet_writer_t |
File writer |
carquet_batch_reader_t |
High-level batch iteration |
carquet_schema_t |
Schema definition and introspection |
carquet_error_t |
Rich error info (code, message, source location, recovery hint) |
Core API functions
Reader
carquet_reader_t* carquet_reader_open(const char* path, const carquet_reader_options_t* opts, carquet_error_t* err);
carquet_reader_t* carquet_reader_open_buffer(const void* buf, size_t size, const carquet_reader_options_t* opts, carquet_error_t* err);
void carquet_reader_close(carquet_reader_t* reader);
int64_t carquet_reader_num_rows(const carquet_reader_t* reader);
int32_t carquet_reader_num_columns(const carquet_reader_t* reader);Batch Reader
carquet_batch_reader_t* carquet_batch_reader_create(carquet_reader_t* reader, const carquet_batch_reader_config_t* cfg, carquet_error_t* err);
carquet_status_t carquet_batch_reader_next(carquet_batch_reader_t* br, carquet_row_batch_t** batch);
carquet_status_t carquet_row_batch_column(const carquet_row_batch_t* batch, int32_t col, const void** data, const uint8_t** nulls, int64_t* n);Writer
carquet_writer_t* carquet_writer_create(const char* path, const carquet_schema_t* schema, const carquet_writer_options_t* opts, carquet_error_t* err);
carquet_status_t carquet_writer_write_batch(carquet_writer_t* w, int32_t col, const void* values, int64_t n, const int16_t* def, const int16_t* rep);
carquet_status_t carquet_writer_close(carquet_writer_t* w);Schema
carquet_schema_t* carquet_schema_create(carquet_error_t* err);
carquet_status_t carquet_schema_add_column(carquet_schema_t* s, const char* name, carquet_physical_type_t type, const carquet_logical_type_t* logical, carquet_field_repetition_t rep, int32_t type_len, int32_t parent);
int32_t carquet_schema_add_list(carquet_schema_t* s, const char* name, carquet_physical_type_t elem_type, const carquet_logical_type_t* elem_logical, carquet_field_repetition_t rep, int32_t type_len, int32_t parent);
int32_t carquet_schema_add_map(carquet_schema_t* s, const char* name, carquet_physical_type_t key_type, const carquet_logical_type_t* key_logical, int32_t key_len, carquet_physical_type_t val_type, const carquet_logical_type_t* val_logical, int32_t val_len, carquet_field_repetition_t rep, int32_t parent);Filtering
int32_t carquet_reader_filter_row_groups(const carquet_reader_t* reader, int32_t col, carquet_compare_op_t op, const void* value, int32_t value_size, int32_t* matching, int32_t max);include/carquet/ Public API (carquet.h, types.h, error.h)
src/
core/ Arena allocator, buffer, bitpack, endian
encoding/ PLAIN, RLE, DELTA, DICTIONARY, BYTE_STREAM_SPLIT
compression/ Snappy (internal), GZIP, ZSTD, LZ4 (wrappers)
thrift/ Thrift compact protocol for Parquet metadata
simd/ Runtime dispatch + x86 (SSE/AVX2/AVX-512) + ARM (NEON/SVE)
reader/ File, row group, column, page, batch readers + mmap
writer/ File, row group, column, page writers
metadata/ Schema, statistics, bloom filters, page indexes
cli/ CLI tool and code generator
util/ CRC32, xxHash
tests/ 18 test files
examples/ basic_write_read, data_types, compression_codecs, nullable_columns, advanced_features
benchmark/ Performance benchmarks and comparison tools
MIT
