❄️ Firn - Go Bindings for Polars

Firn — Go bindings for Polars with optimized FFI performance.

Firn is a Go library providing bindings to the Polars data manipulation library. Named after the granular snow that forms the transitional layer between fresh snow and dense glacial ice, Firn provides an efficient interface between Go applications and Polars operations.

Firn focuses on minimizing CGO overhead through operation batching and a stack-machine architecture for expression evaluation.

🎯 Performance Philosophy

Unlike existing Go-Polars libraries that incur high CGO costs for each method invocation, Firn employs a batch-oriented architecture to minimize overhead:

🔥 High-Performance Architecture:

RPN Stack Machine - Batch multiple operations into single FFI calls
Deferred Execution - Build operation graphs without CGO overhead
Direct Rust Integration - Minimal wrapper around native Polars
Multi-Architecture Support - Native ARM64 and AMD64 binaries

📊 Performance Goals:

Minimize CGO overhead through operation batching and static linking
Leverage native Polars performance with minimal Go wrapper cost
Memory-efficient zero-copy data sharing where possible
Cross-platform native performance on ARM64 and AMD64

🛠 Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Go Layer (polars/)                       │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   DataFrame     │   ExprNode      │      Operation Queue        │
│   Operations    │   (Lazy Iter)   │   []Operation{opcode,args}  │
│ .Filter().Sort()│ Col("x").Gt(5)  │     (Zero CGO until         │
│                 │                 │      .Collect())            │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
                           ▼ Single CGO Call (.Collect())
┌─────────────────────────────────────────────────────────────────┐
│                     Rust Layer (rust/src/)                      │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ ExecutionContext│  Expression     │    DataFrame Dispatch       │
│ {expr_stack,    │  Stack Machine  │   match opcode {            │
│  operation_args}│  Vec<Expr>      │     OpFilter => filter(),   │
│                 │                 │     OpSort => sort(), ...   │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
                           ▼ Direct Polars API calls
┌─────────────────────────────────────────────────────────────────┐
│                    Native Polars Library                        │
│  LazyFrame::filter(expr).sort().collect() -> DataFrame          │
└─────────────────────────────────────────────────────────────────┘

Key Architecture Points:

Go Layer: Builds operation queues with zero CGO overhead
Single FFI Call: All operations batched into one execute_operations() call
Rust Execution Engine: Processes operation queue with expression stack machine
Context Tracking: Maintains DataFrame/LazyFrame/LazyGroupBy state across operations
Native Polars: Direct integration with Polars LazyFrame for optimal performance

Polars Immutability Design

Critical Architecture Note: Polars follows an immutable DataFrame design where each operation returns a new DataFrame instance rather than modifying the original. This is fundamental to Polars' thread-safety and performance model.

Evidence from Polars source code (rust/src/lib.rs:540):

// Each operation creates a NEW DataFrame
match df.clone().lazy().filter(filter_expr).collect() {
    Ok(new_df) => Result::success(new_df),  // Returns NEW handle
    Err(e) => Result::error(ERROR_POLARS_OPERATION, &e.to_string()),
}

Memory Management: Our Go bindings automatically handle the lifecycle of intermediate DataFrames by releasing old handles when Execute() creates new ones, preventing memory leaks while maintaining Polars' immutable semantics.

Why Not SIMBA Trampolines? While SIMBA provides ultra-fast FFI for simple SIMD operations, Polars operations are complex library functions involving file I/O, parsing, and deep call stacks that exceed Go's NOSPLIT stack constraints (~2KB). Therefore, we use optimized CGO with static linking instead.

🚀 Quick Start

Basic DataFrame Operations

package main

import (
    "fmt"
    "github.com/miretskiy/firn/polars"
)

func main() {
    // Create DataFrame from CSV
    df, err := polars.ReadCSV("employees.csv")
    if err != nil {
        panic(err)
    }
    
    // Chain operations efficiently
    result := df.
        Filter(polars.Col("age").Gt(25)).
        WithColumns(
            polars.Col("salary").Mul(1.1).Alias("salary_with_bonus"),
        ).
        GroupBy("department").
        Agg(
            polars.Col("salary_with_bonus").Mean().Alias("avg_salary"),
            polars.Col("age").Max().Alias("max_age"),
            polars.Col("*").Count().Alias("employee_count"),
        ).
        SortBy([]polars.SortField{polars.Desc("avg_salary")})
    
    fmt.Printf("Processed %d rows\n", result.Height())
    fmt.Println(result.String())
}

Reading Data from Files

// Read CSV files (with glob pattern support)
df := polars.ReadCSV("data.csv")
df := polars.ReadCSV("data_*.csv")  // Multiple files

// Read Parquet files with full optimization support
df := polars.ReadParquet("data.parquet")

// Advanced Parquet reading with options
df := polars.ReadParquetWithOptions("large_dataset.parquet", polars.ParquetOptions{
    Columns:  []string{"id", "name", "value"},  // Column pruning
    NRows:    1000,                             // Row limiting
    Parallel: true,                             // Parallel reading
    WithGlob: true,                             // Glob pattern support
})

// Parquet with glob patterns for partitioned datasets
df := polars.ReadParquet("year=2024/month=*/data_*.parquet")

Creating DataFrames from Go Data

// Create DataFrame from Go slices
df := polars.NewDataFrame(
    polars.NewSeries("name", []string{"Alice", "Bob", "Charlie"}),
    polars.NewSeries("age", []int{25, 30, 35}),
    polars.NewSeries("salary", []float64{50000, 60000, 70000}),
)

// Or from a map
data := map[string]interface{}{
    "product": []string{"A", "B", "C", "A", "B"},
    "price":   []float64{10.0, 15.0, 20.0, 12.0, 18.0},
    "qty":     []int{100, 200, 150, 80, 120},
}
df := polars.FromMap(data)

📦 Installation

Prerequisites

Go 1.23+ (requires iterators support)
CGO enabled (CGO_ENABLED=1)
Rust toolchain (for building the Polars integration)

Install from Source

Since this project requires compiling Rust libraries, you cannot use go get directly. You must build from source:

git clone https://github.com/miretskiy/firn
cd firn

# Build Rust library and Go bindings
bazel build //rust:all
bazel build //polars:all

# Run tests to verify installation
bazel test //polars:all
# Or run Go tests directly with linker warning suppression
CGO_LDFLAGS="-w" go test -v ./polars

Using in Your Project

After building, you can import and use Firn in your Go projects:

import "github.com/miretskiy/firn/polars"

Suppressing Linker Warnings ⚠️ macOS Users

If you see macOS version compatibility warnings during compilation like:

ld: warning: object file (...) was built for newer 'macOS' version (15.5) than being linked (15.0)

You can suppress them using any of these methods:

Method 1: Environment Variable (Recommended)

# Set once in your shell profile (.zshrc, .bashrc, etc.)
export CGO_LDFLAGS="-w"

# Then run tests/builds normally
go test -v ./polars
go build

Method 2: Per-Command Basis

# For testing
CGO_LDFLAGS="-w" go test -v ./polars

# For building
CGO_LDFLAGS="-w" go build

# For specific test runs
CGO_LDFLAGS="-w" go test -v -run TestBasicOperations ./polars

Method 3: Project Integration

Add to your project's build scripts or CI configuration:

# In your build script
export CGO_LDFLAGS="-w"

# Then run your normal commands
go test -v ./polars
go build

Note: These warnings are harmless - they occur because the Rust library was compiled with a newer macOS SDK than Go's default target. The -w flag suppresses all linker warnings.

Build with Bazel (Alternative)

You can also build using Bazel:

bazel build //polars:all
bazel test //polars:all

Test Data Requirements

Some performance tests require large test data files that are not included in the repository due to GitHub's file size limits. These tests will be automatically skipped if the required files are not present.

Performance Tests Requiring Large Data:

TestPerformanceBenchmarks - Tests with 10M+ row datasets
Large weather data files (weather_data_part_*.csv) - ~340MB each
100M+ row aggregation tests

To generate test data locally:

# Generate large CSV test files (optional - for performance testing)
python3 scripts/generate_large_csv.py

# This creates weather_data_part_*.csv files in testdata/ and scripts/testdata/
# These files are automatically ignored by git (.gitignore)

What gets skipped without large data:

Performance benchmarks on 10M+ row datasets
Complex filtering tests on large datasets
100M row aggregation performance tests

All other tests work without large data:

Core DataFrame operations (uses small sample.csv)
Expression system tests
Join operations
SQL query tests
Parquet integration tests (uses fortune1000_2024.parquet)
Window functions
Error handling tests

The repository includes smaller test files that cover all functionality:

testdata/sample.csv - 7 rows for basic operations
testdata/fortune1000_2024.parquet - Fortune 1000 companies data
Various small CSV files for specific test scenarios

⚡ Performance Goals

Our performance strategy focuses on minimizing the overhead that plagues existing Go-Polars solutions:

🎯 Key Optimizations

Reduced CGO calls through operation batching
Static linking with pre-compiled Rust libraries (.syso files)
Zero-copy data sharing where possible
Multi-architecture native binaries (ARM64/AMD64)

📊 Benchmarking

Firn includes comprehensive performance tests that demonstrate real-world DataFrame operations:

10M row operations: 76-88 million rows/second
100M row operations: 59-67 million rows/second
Complex filtering and aggregations: Maintains high throughput on large datasets
Memory efficiency: Automatic handle cleanup prevents memory leaks

Run benchmarks with:

# Run all tests including performance benchmarks
bazel test //polars:all
# Or with Go directly
CGO_LDFLAGS="-w" go test -v ./polars

# Detailed benchmarking
cd benchmarks && CGO_LDFLAGS="-w" go test -bench=. -benchmem

🧩 Advanced Features

📁 High-Performance File I/O

Parquet Support 🚀

Firn provides comprehensive Parquet support with advanced optimization features:

// Basic Parquet reading
df := polars.ReadParquet("dataset.parquet")

// Advanced Parquet with column pruning and row limiting
df := polars.ReadParquetWithOptions("large_dataset.parquet", polars.ParquetOptions{
    Columns:  []string{"id", "timestamp", "value"},  // Only read needed columns
    NRows:    100000,                                // Limit rows for sampling
    Parallel: true,                                  // Enable parallel reading
    WithGlob: true,                                  // Support glob patterns
})

// Partitioned datasets with glob patterns
df := polars.ReadParquet("year=*/month=*/data_*.parquet")

// Combine with Firn operations for optimal performance
result := polars.ReadParquetWithOptions("fortune1000.parquet", polars.ParquetOptions{
    Columns: []string{"Rank", "Company", "Revenue", "Sector"},
    NRows:   100,  // Top 100 companies
}).
Filter(polars.Col("Revenue").Gt(polars.Lit(50000))).
GroupBy("Sector").
Agg(
    polars.Col("Revenue").Mean().Alias("avg_revenue"),
    polars.Col("Company").Count().Alias("company_count"),
).
SortBy([]polars.SortField{polars.Desc("avg_revenue")}).
Collect()

Parquet Performance Benefits:

Column Pruning: Only read columns you need, dramatically reducing I/O
Row Limiting: Sample large datasets efficiently with NRows parameter
Parallel Reading: Leverage multiple cores for faster file processing
Native Integration: Seamless integration with Firn's RPN stack machine
Memory Efficient: Polars' zero-copy architecture minimizes memory usage

CSV Support

// Basic CSV reading
df := polars.ReadCSV("data.csv")

// Multiple files with glob patterns
df := polars.ReadCSV("data_part_*.csv")

// Advanced CSV options
df := polars.ReadCSVWithOptions("data.csv", hasHeader, inferSchema)

🔄 Lazy Evaluation

// Build computation graph without executing
lazy := polars.LazyFrame().
    ReadCSV("large_file.csv").
    Filter(polars.Col("status").Eq("active")).
    GroupBy("category").
    Agg(polars.Col("value").Sum()).
    Sort("value", polars.Descending)

// Execute when ready - optimized query plan
result := lazy.Collect()

📊 Complex Expressions

// Advanced column operations
df = df.WithColumns(
    // Mathematical operations
    polars.Col("price").Mul(polars.Col("quantity")).Alias("total"),
    
    // String operations (basic operations available)
    polars.Col("name").StrLen().Alias("name_length"),
    polars.Col("name").StrToUppercase().Alias("name_upper"),
    
    // Arithmetic and comparison
    polars.Col("salary").Add(polars.Col("bonus")).Alias("total_comp"),
    polars.Col("age").Gt(polars.Lit(30)).Alias("is_senior"),
)

🔀 Conditional Expressions (When/Then/Otherwise)

Firn provides SQL CASE-like conditional expressions with a fluent API for building complex conditional logic:

// Basic conditional expression - equivalent to SQL CASE WHEN
df = df.WithColumns(
    polars.When(polars.Col("age").Gt(polars.Lit(65))).
        Then(polars.Lit("senior")).
        Otherwise(polars.Lit("adult")).
        Alias("age_category"),
)

// Multiple conditions with chained When statements
df = df.WithColumns(
    polars.When(polars.Col("score").Gt(polars.Lit(90))).
        Then(polars.Lit("A")).
        When(polars.Col("score").Gt(polars.Lit(80))).
        Then(polars.Lit("B")).
        When(polars.Col("score").Gt(polars.Lit(70))).
        Then(polars.Lit("C")).
        Otherwise(polars.Lit("F")).
        Alias("grade"),
)

// Complex conditional expressions with computed values
df = df.WithColumns(
    polars.When(polars.Col("department").Eq(polars.Lit("Engineering"))).
        Then(polars.Col("salary").Mul(polars.Lit(1.15))).  // 15% bonus for engineers
        When(polars.Col("department").Eq(polars.Lit("Sales"))).
        Then(polars.Col("salary").Mul(polars.Lit(1.10))).  // 10% bonus for sales
        Otherwise(polars.Col("salary").Mul(polars.Lit(1.05))). // 5% for others
        Alias("salary_with_bonus"),
)

// Use in filtering operations
filtered := df.Filter(
    polars.When(polars.Col("status").Eq(polars.Lit("active"))).
        Then(polars.Col("last_login").Gt(polars.Lit("2024-01-01"))).
        Otherwise(polars.Lit(false)),
)

// Use in aggregations
summary := df.GroupBy("department").Agg(
    polars.When(polars.Col("performance").Gt(polars.Lit(8.0))).
        Then(polars.Lit(1)).
        Otherwise(polars.Lit(0)).
        Sum().Alias("high_performers"),
    polars.Col("salary").Mean().Alias("avg_salary"),
)

Conditional Expression Features:

SQL CASE Equivalent: Familiar WHEN condition THEN value syntax
Chained Conditions: Multiple When().Then() pairs for complex logic
Type Safety: Full type checking at compile time through Go's type system
Performance Optimized: Compiled to native Polars conditional expressions
Flexible Usage: Works in columns, filters, aggregations, and any expression context
RPN Stack Integration: Zero CGO overhead during expression building

🔍 SQL Queries

Firn provides flexible SQL support that can be mixed seamlessly with fluent-style expressions, giving you the best of both worlds:

// Execute SQL queries directly on DataFrames
// The DataFrame is automatically registered as "df" table
result := df.Query(`
    SELECT name, salary * 1.1 as new_salary 
    FROM df 
    WHERE age > 25 AND department = 'Engineering'
`).Collect()

// Complex SQL with aggregations and grouping
summary := df.Query(`
    SELECT 
        department,
        AVG(salary) as avg_salary,
        COUNT(*) as employee_count,
        MAX(age) as max_age
    FROM df 
    GROUP BY department 
    HAVING COUNT(*) > 2
    ORDER BY avg_salary DESC
`).Collect()

// Mix SQL strings with fluent expressions for maximum flexibility
result := df.
    Query("SELECT * FROM df WHERE active = true").           // SQL for complex filtering
    WithColumns(polars.Col("bonus").Mul(polars.Lit(1.1))).   // Fluent for type-safe operations
    SortBy([]polars.SortField{polars.Desc("salary")}).       // Fluent for programmatic control
    Collect()

// Use SQL for what it's best at (complex queries, familiar syntax)
// Use fluent API for what it's best at (type safety, IDE support, composition)
complex := df.
    Query(`
        SELECT *, 
               CASE WHEN age > 50 THEN 'senior' ELSE 'junior' END as category
        FROM df 
        WHERE department IN ('Engineering', 'Data Science')
    `).
    WithColumns(
        polars.Col("salary").Quantile(0.95).Over("category").Alias("p95_salary"),
        polars.Col("performance_score").Rank().Over("department").Alias("dept_rank"),
    ).
    Filter(polars.Col("dept_rank").Lt(polars.Lit(10))).
    Collect()

🔗 Joins and Concatenation

// Basic join operations
employees, _ := polars.ReadCSV("employees.csv").Collect()
departments, _ := polars.ReadCSV("departments.csv").Collect()

// Inner join (most common)
result, _ := employees.InnerJoin(departments, "dept_id").Collect()

// Left join with all employees, even those without departments
result, _ := employees.LeftJoin(departments, "dept_id").Collect()

// Advanced join with different column names
result, _ := employees.Join(departments, 
    polars.LeftOn("department_id").RightOn("id")).Collect()

// Join with custom suffix for duplicate columns
result, _ := employees.Join(departments, 
    polars.On("dept_id").WithType(polars.JoinTypeLeft).WithSuffix("_dept")).Collect()

// Cross join (Cartesian product)
result, _ := employees.CrossJoin(departments).Collect()

// Concatenate DataFrames vertically
combined, _ := polars.Concat(df1, df2, df3).Collect()

🎯 Window Functions

// Window operations
df = df.WithColumns(
    // Running sum
    polars.Col("sales").Sum().Over("department").Alias("dept_total"),
    
    // Rank within groups
    polars.Col("score").Rank().Over("team").Alias("team_rank"),
    
    // Moving average (using Over with partition - window functions need partitioning)
    polars.Col("price").Mean().Over("date").Alias("price_ma7"),
)

📈 Deferred Execution (Performance Optimization)

// Operations build an execution plan without CGO calls
result, err := df.
    Filter(polars.Col("active").Eq(true)).        // No CGO - builds operation
    WithColumns(polars.Col("a").Add(polars.Col("b")).Alias("computed")). // No CGO
    SortBy([]polars.SortField{polars.Desc("timestamp")}).  // No CGO
    Execute()                                     // Single CGO call executes all

Performance comparison (M4 Mac measurements):

Traditional go-polars approach:

df.Filter(...) → ~22ns CGO overhead + C string alloc/dealloc costs
df.WithColumns(...) → ~22ns CGO overhead + C string alloc/dealloc costs
df.Sort(...) → ~22ns CGO overhead + C string alloc/dealloc costs
Total: ~66ns + 3x string allocation + free overhead + actual work

Firn's batched approach: ~22ns + actual work

Key architectural advantage: Firn's Operation args function captures arguments that remain alive for the duration of the CGO call, allowing raw string passing (char * + len) to Rust without CGO allocation/deallocation. Rust copies these buffers as needed, eliminating repeated boundary costs.

🏗️ Implementation Architecture

🎯 RPN Stack Machine Architecture

Firn implements a Reverse Polish Notation (RPN) stack machine for expression evaluation to optimize FFI performance:

How the Stack Machine Works

// Go side: Build expression as operation sequence
expr := Col("salary").Mul(Lit(2)).Add(Col("bonus"))

// Generates RPN sequence:
// [push_col("salary"), push_lit(2), mul, push_col("bonus"), add]

// Single FFI call executes entire expression tree
result, err := df.WithColumns(expr.Alias("total_comp")).Execute()

Stack Machine Benefits ✅

Single FFI Call: Entire expression trees execute in one CGO boundary crossing
Zero CGO During Construction: Expressions build locally in Go with zero CGO overhead
Native Polars Integration: Stack operations map directly to polars::Expr operations
Memory Efficient: Linear operation sequence vs heap-allocated expression trees
Type Safe: All operations validated at the Rust boundary with proper error reporting

Expression Execution Flow

// Rust side: Execute RPN sequence on expression stack
let mut expr_stack: Vec<Expr> = Vec::new();

for operation in operations {
    match operation.func_ptr {
        expr_column => expr_stack.push(col(&args.name)),
        expr_literal => expr_stack.push(lit(args.value)),
        expr_mul => {
            let right = expr_stack.pop().unwrap();
            let left = expr_stack.pop().unwrap();
            expr_stack.push(left * right);
        }
        // ... other operations
    }
}

Performance Impact

// Traditional approach (multiple CGO calls):
df.Filter(col.Gt(5))     // ~22ns CGO overhead
  .WithColumns(expr)     // ~22ns CGO overhead  
  .Sort("name")          // ~22ns CGO overhead
// Total: 66ns + actual work

// Stack machine approach (single CGO call):
df.Filter(col.Gt(5)).WithColumns(expr).Sort("name").Execute()
// Total: 22ns + actual work (3x improvement!)

Architecture Comparison

Function Pointer + RPN Stack ✅ Selected

type Operation struct {
    funcPtr unsafe.Pointer  // Points to Rust dispatch function
    args    unsafe.Pointer  // Operation-specific arguments
}

Advantages:

Uniform Interface: All operations use (handle, context) -> Result signature
RPN Evaluation: Natural expression tree evaluation via stack machine
Type Safety: Each operation validates its specific argument types
Performance: Direct function calls, no opcode dispatch overhead

OpCode Dispatch ✅ Selected

match operation.opcode {
    OP_FILTER => dispatch_filter(handle, args),
    OP_SELECT => dispatch_select(handle, args),
    // ... opcode-based dispatch system
}

Why Selected:

Uniform Interface: Consistent opcode-based dispatch system
Type Safety: Each operation validates its specific argument types
Extensible: Easy to add new operations by defining new opcodes
Performance: Direct opcode matching with minimal overhead

The opcode dispatch system provides a clean, extensible architecture for operation handling.

🛠 Development

Build & Test

# Build library
bazel build //rust:all //polars:all

# Run tests
bazel test //polars:all
# Or with Go directly
CGO_LDFLAGS="-w" go test -v ./polars

# Run benchmarks
cd benchmarks && CGO_LDFLAGS="-w" go test -bench=. -benchmem

# CPU/memory profiling
CGO_LDFLAGS="-w" go test -cpuprofile=cpu.prof -memprofile=mem.prof -bench=. ./polars

CGO Integration

Rust library automatically built via scripts/build_rust.sh
Type-safe bindings in internal/cgo/ and internal/ffi/
Memory management handled automatically

🎯 Roadmap

Phase 1: Core Foundation ✅ Completed

Project structure and architecture design
RPN stack machine implementation with function pointers
Unified dispatch system with ExecutionContext
Core DataFrame and Series types
Basic I/O operations (CSV with glob support)
Parquet I/O operations with column pruning and row limiting
Memory management and safety (automatic handle cleanup)
Expression system with move semantics

Phase 2: DataFrame Operations ✅ Completed

Column operations and expressions (Col, Lit, arithmetic, boolean)
Filtering with complex expressions
Selection and projection operations
WithColumns for computed columns (single and multiple)
Comprehensive aggregation operations (Count, Sum, Mean, Min, Max, Median, First, Last, NUnique, Std, Var)
Null-aware operations (IsNull, IsNotNull, Count vs CountWithNulls)
Statistical functions with ddof parameter support
DataFrame concatenation
Expression aliases and column naming

Phase 3: Advanced Features ✅ Completed

Deferred execution API for performance (Execute pattern)
Complex expression composition (chained operations)
Multi-file operations with glob patterns
GroupBy and aggregation operations (complete implementation)
Sort operations with multi-column and nulls ordering support
String operations (Tier 1: length, contains, starts/ends with, case conversion)
Context-aware lazy evaluation (DataFrame, LazyFrame, LazyGroupBy)
SQL query support with full Polars SQL syntax

Phase 4: Advanced Operations ✅ Completed

Join operations (inner, left, right, outer, cross) with comprehensive API
Window functions and rolling operations
Conditional expressions (When/Then/Otherwise) - SQL CASE-like functionality
Cast operations with comprehensive data type support
Advanced string operations (Tier 2: slice, replace, split)
Date/time operations

Phase 5: Extended I/O and Extensibility 🎯 Next

Golden test framework for output validation
Multi-architecture support (ARM64/AMD64)
Extended I/O support (JSON, Arrow, ORC, Avro)
Go extension framework for custom data sources
Plugin system for user-defined functions
Streaming I/O for large datasets
Advanced string operations (Tier 2: slice, replace, split)
Date/time operations

📚 Documentation

API Reference - Complete API documentation
Performance Guide - Optimization techniques
CGO Integration - Internal architecture details
Benchmarking - Performance measurement

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

Clone the repository
Build the project with bazel build //rust:all //polars:all
Run tests with bazel test //polars:all or CGO_LDFLAGS="-w" go test -v ./polars
Make your changes and add tests
Ensure all tests pass before submitting
Submit a pull request

Guidelines

Follow existing code style and patterns
Add tests for new functionality
Update documentation as needed
Ensure all tests pass before submitting
AI Tools Encouraged: Use of AI tools like Cline is not only recommended but encouraged for development

📄 License

Licensed under the Apache License, Version 2.0.

🙏 Acknowledgments

Polars - The amazing DataFrame library
SIMBA - High-performance FFI inspiration
go-polars - Prior art and inspiration

Built for speed. Designed for scale. Optimized for Go.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
benchmarks		benchmarks
cmd/schema_inspector		cmd/schema_inspector
polars		polars
rust		rust
scripts		scripts
testdata		testdata
.bazelrc		.bazelrc
.gitignore		.gitignore
BAZEL.md		BAZEL.md
BUILD.bazel		BUILD.bazel
MODULE.bazel		MODULE.bazel
MODULE.bazel.lock		MODULE.bazel.lock
README.md		README.md
bazel-turbo-polars		bazel-turbo-polars
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

❄️ Firn - Go Bindings for Polars

🎯 Performance Philosophy

🔥 High-Performance Architecture:

📊 Performance Goals:

🛠 Architecture Overview

Polars Immutability Design

🚀 Quick Start

Basic DataFrame Operations

Reading Data from Files

Creating DataFrames from Go Data

📦 Installation

Prerequisites

Install from Source

Using in Your Project

Suppressing Linker Warnings ⚠️ macOS Users

Method 1: Environment Variable (Recommended)

Method 2: Per-Command Basis

Method 3: Project Integration

Build with Bazel (Alternative)

Test Data Requirements

⚡ Performance Goals

🎯 Key Optimizations

📊 Benchmarking

🧩 Advanced Features

📁 High-Performance File I/O

Parquet Support 🚀

CSV Support

🔄 Lazy Evaluation

📊 Complex Expressions

🔀 Conditional Expressions (When/Then/Otherwise)

🔍 SQL Queries

🔗 Joins and Concatenation

🎯 Window Functions

📈 Deferred Execution (Performance Optimization)

🏗️ Implementation Architecture

🎯 RPN Stack Machine Architecture

How the Stack Machine Works

Stack Machine Benefits ✅

Expression Execution Flow

Performance Impact

Architecture Comparison

Function Pointer + RPN Stack ✅ Selected

OpCode Dispatch ✅ Selected

🛠 Development

Build & Test

CGO Integration

🎯 Roadmap

Phase 1: Core Foundation ✅ Completed

Phase 2: DataFrame Operations ✅ Completed

Phase 3: Advanced Features ✅ Completed

Phase 4: Advanced Operations ✅ Completed

Phase 5: Extended I/O and Extensibility 🎯 Next

📚 Documentation

🤝 Contributing

Development Setup

Guidelines

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages