GitHub - arunkatherashala/Kore · GitHub

[ad_1]

The fastest, most compressed columnar format for big data | v0.1.0

KORE is a high-performance binary file format optimized for analytical workloads. It provides:

38% compression ratio (vs 63% for Parquet)
131x query speedup with column pruning & predicate pushdown
Zero data loss verification (400K+ cells tested)
Native Spark integration — read/write with PySpark

Add this crate as a dependency (when published) or include from path:

use kore_fileformat::*;

// Write data
kore_write_simple("output.kore", schema_json, data_json)?;

// Read data
let data = kore_read_simple("output.kore")?;

// Read specific column
let col = kore_read_col_simple("output.kore", "column_name")?;

// Get file info
let info = kore_info_simple("output.kore")?;

PySpark Integration ⭐ NEW

from pyspark.sql import SparkSession
from kore import KoreDataFrameReader, KoreDataFrameWriter

spark = SparkSession.builder.appName("KoreExample").getOrCreate()

# Read Kore file
df = KoreDataFrameReader(spark).load("data.kore")

# Write to Kore (38% compression!)
KoreDataFrameWriter(df).mode("overwrite").save("output.kore")

# Spark SQL support (3.5+)
spark.read.format("kore").load("file.kore").show()

See python/README.md for full PySpark documentation.

Publishing checklist

Ensure Cargo.toml metadata is correct (authors, repository, keywords).
Add LICENSE file if required (MIT by default here).
Replace any unimplemented!() stubs with full implementations if you need runtime functionality.
Run cargo build --release and cargo test to verify compilation and tests.
Optionally add CI configuration (GitHub Actions) for cargo test and cargo clippy.

Notes

This workspace contains copies of the original KORE source files. Some long implementations were stubbed out in this initial export; if you want the full original source code included verbatim, I can replace the stubs with the complete implementations from the upstream project files.

[ad_2]

Source link

Post Views: 18

GitHub – arunkatherashala/Kore · GitHub

PySpark Integration ⭐ NEW

Leave a Reply Cancel reply