JSON Tools RS

A high-performance Rust library for advanced JSON manipulation with SIMD-accelerated parsing, Crossbeam-based parallelism, and native Python bindings with DataFrame/Series support.

Crates.io PyPI Documentation Book License

Why JSON Tools RS?

JSON Tools RS is designed for developers who need to:

  • Transform nested JSON into flat structures for databases, CSV exports, or analytics
  • Clean and normalize JSON data from external APIs or user input
  • Process large batches of JSON documents efficiently
  • Maintain type safety with perfect roundtrip support (flatten -> unflatten -> original)
  • Work with both Rust and Python using the same consistent API

Key Features

  • Unified API -- Single JSONTools entry point for flattening, unflattening, or pass-through transforms
  • Builder Pattern -- Fluent, chainable API for configuration
  • High Performance -- SIMD-accelerated parsing, FxHashMap, SmallVec stack allocation, tiered caching (~2,000+ ops/ms)
  • Parallel Processing -- Crossbeam-based parallelism for 3-5x speedup on batch operations
  • Complete Roundtrip -- Flatten and unflatten with perfect fidelity
  • Comprehensive Filtering -- Remove empty strings, nulls, empty objects, empty arrays
  • Advanced Replacements -- Literal and regex-based key/value replacements
  • Collision Handling -- Collect colliding values into arrays
  • Automatic Type Conversion -- Strings to numbers, booleans, dates, and nulls
  • Date Normalization -- ISO-8601 detection and UTC normalization
  • Batch Processing -- Single or batch JSON, dicts, lists, DataFrames, and Series
  • Python Bindings -- Full Python support with perfect type preservation
  • DataFrame/Series Support -- Pandas, Polars, PyArrow, and PySpark
  • Modular Architecture -- 10 focused modules for maintainability with zero-overhead abstraction

Quick Example

Rust:

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let result = JSONTools::new()
    .flatten()
    .execute(r#"{"user": {"name": "John", "age": 30}}"#)?;
// {"user.name": "John", "user.age": 30}
}

Python:

import json_tools_rs as jt

result = jt.JSONTools().flatten().execute({"user": {"name": "John", "age": 30}})
# {'user.name': 'John', 'user.age': 30}

Installation

Rust

Add to your Cargo.toml:

cargo add json-tools-rs

Or manually:

[dependencies]
json-tools-rs = "0.9"

Python

Install from PyPI:

pip install json-tools-rs

Pre-built wheels are available for:

PlatformArchitectures
Linux (glibc)x86_64, x86, aarch64, armv7, ppc64le
Linux (musl)x86_64, x86, aarch64, armv7
macOSx86_64 (Intel), aarch64 (Apple Silicon)
Windowsx64

Python 3.8+ is supported.

Verify Installation

Rust:

use json_tools_rs::JSONTools;

fn main() {
    let result = JSONTools::new()
        .flatten()
        .execute(r#"{"hello": "world"}"#)
        .unwrap();
    println!("{:?}", result);
}

Python:

import json_tools_rs as jt

result = jt.JSONTools().flatten().execute({"hello": "world"})
print(result)  # {'hello': 'world'}

Quick Start (Rust)

The JSONTools struct provides a unified builder pattern API. Call .flatten() or .unflatten() to set the mode, chain configuration methods, then call .execute().

Basic Flattening

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let json = r#"{"user": {"name": "John", "profile": {"age": 30, "city": "NYC"}}}"#;
let result = JSONTools::new()
    .flatten()
    .execute(json)?;

if let JsonOutput::Single(flattened) = result {
    println!("{}", flattened);
}
// {"user.name": "John", "user.profile.age": 30, "user.profile.city": "NYC"}
}

Basic Unflattening

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let json = r#"{"user.name": "John", "user.profile.age": 30}"#;
let result = JSONTools::new()
    .unflatten()
    .execute(json)?;

if let JsonOutput::Single(nested) = result {
    println!("{}", nested);
}
// {"user": {"name": "John", "profile": {"age": 30}}}
}

Advanced Configuration

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let json = r#"{"user": {"name": "John", "details": {"age": null, "city": ""}}}"#;
let result = JSONTools::new()
    .flatten()
    .separator("::")
    .lowercase_keys(true)
    .remove_empty_strings(true)
    .remove_nulls(true)
    .execute(json)?;

if let JsonOutput::Single(flattened) = result {
    println!("{}", flattened);
}
// {"user::name": "John"}
}

Batch Processing

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let batch = vec![
    r#"{"user": {"name": "Alice"}}"#,
    r#"{"user": {"name": "Bob"}}"#,
    r#"{"user": {"name": "Charlie"}}"#,
];

let result = JSONTools::new()
    .flatten()
    .separator("_")
    .execute(batch.as_slice())?;

if let JsonOutput::Multiple(results) = result {
    for r in &results {
        println!("{}", r);
    }
}
// {"user_name": "Alice"}
// {"user_name": "Bob"}
// {"user_name": "Charlie"}
}

Error Handling

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonToolsError};

match JSONTools::new().flatten().execute("invalid json") {
    Ok(result) => println!("{:?}", result),
    Err(e) => {
        eprintln!("Error [{}]: {}", e.error_code(), e);
        // Error [E001]: JSON parse error: ...
    }
}
}

Quick Start (Python)

The Python bindings provide the same JSONTools API with perfect type matching: input type equals output type.

Type Preservation

Input TypeOutput Type
strstr (JSON string)
dictdict
list[str]list[str]
list[dict]list[dict]
DataFrameDataFrame (Pandas, Polars, PyArrow, PySpark)
SeriesSeries (Pandas, Polars, PyArrow)

Basic Flattening

import json_tools_rs as jt

# Dict input -> dict output
result = jt.JSONTools().flatten().execute({"user": {"name": "John", "age": 30}})
print(result)  # {'user.name': 'John', 'user.age': 30}

# String input -> string output
result = jt.JSONTools().flatten().execute('{"user": {"name": "John"}}')
print(result)  # '{"user.name": "John"}'

Basic Unflattening

import json_tools_rs as jt

result = jt.JSONTools().unflatten().execute({"user.name": "John", "user.age": 30})
print(result)  # {'user': {'name': 'John', 'age': 30}}

Advanced Configuration

import json_tools_rs as jt

tools = (jt.JSONTools()
    .flatten()
    .separator("::")
    .lowercase_keys(True)
    .remove_empty_strings(True)
    .remove_nulls(True)
    .key_replacement("^user_", "")
    .auto_convert_types(True)
)

data = {"User_Name": "Alice", "User_Age": "30", "User_Status": None}
result = tools.execute(data)
print(result)  # {'name': 'Alice', 'age': 30}

Batch Processing

import json_tools_rs as jt

tools = jt.JSONTools().flatten()

# List of dicts -> list of dicts
results = tools.execute([
    {"user": {"name": "Alice"}},
    {"user": {"name": "Bob"}},
])
print(results)  # [{'user.name': 'Alice'}, {'user.name': 'Bob'}]

# List of strings -> list of strings
results = tools.execute(['{"a": {"b": 1}}', '{"c": {"d": 2}}'])
print(results)  # ['{"a.b": 1}', '{"c.d": 2}']

DataFrame Support

import json_tools_rs as jt
import pandas as pd

df = pd.DataFrame([
    {"user": {"name": "Alice", "age": 30}},
    {"user": {"name": "Bob", "age": 25}},
])

result = jt.JSONTools().flatten().execute(df)
print(type(result))  # <class 'pandas.core.frame.DataFrame'>
# Also works with Polars, PyArrow Tables, and PySpark DataFrames

Error Handling

import json_tools_rs as jt

try:
    result = jt.JSONTools().flatten().execute("invalid json")
except jt.JsonToolsError as e:
    print(f"Error: {e}")

Flattening & Unflattening

Flattening

Flattening converts nested JSON into a flat key-value structure using dot-separated (or custom) keys.

// Input
{"user": {"name": "John", "address": {"city": "NYC", "zip": "10001"}}}

// Output (flattened)
{"user.name": "John", "user.address.city": "NYC", "user.address.zip": "10001"}

Arrays

Arrays are flattened with numeric indices:

// Input
{"users": [{"name": "Alice"}, {"name": "Bob"}]}

// Output
{"users.0.name": "Alice", "users.1.name": "Bob"}

Custom Separators

Use .separator() to change the key delimiter:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .separator("::")
    .execute(json)?;
// {"user::name": "John", "user::address::city": "NYC"}
}
result = jt.JSONTools().flatten().separator("::").execute(data)

Unflattening

Unflattening reverses the process, reconstructing nested structures from flat keys.

// Input
{"user.name": "John", "user.address.city": "NYC"}

// Output (unflattened)
{"user": {"name": "John", "address": {"city": "NYC"}}}

Numeric keys reconstruct arrays:

// Input
{"users.0.name": "Alice", "users.1.name": "Bob"}

// Output
{"users": [{"name": "Alice"}, {"name": "Bob"}]}

Roundtrip

Flattening and unflattening are perfect inverses. You can flatten data, apply transformations, then unflatten to recover the original structure:

#![allow(unused)]
fn main() {
let original = r#"{"user": {"name": "John", "scores": [10, 20, 30]}}"#;

// Flatten
let flat = JSONTools::new().flatten().execute(original)?;

// Unflatten back
let restored = JSONTools::new().unflatten().execute(
    &flat.into_single()
)?;
// Matches original structure
}

All configuration options (filtering, replacements, collision handling, type conversion) work with both .flatten() and .unflatten() modes.

Filtering

Remove unwanted values during flattening or unflattening.

Available Filters

MethodRemoves
.remove_empty_strings(true)"" empty string values
.remove_nulls(true)null values
.remove_empty_objects(true){} empty objects
.remove_empty_arrays(true)[] empty arrays

Example

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let json = r#"{
    "name": "John",
    "bio": "",
    "age": null,
    "tags": [],
    "metadata": {},
    "city": "NYC"
}"#;

let result = JSONTools::new()
    .flatten()
    .remove_empty_strings(true)
    .remove_nulls(true)
    .remove_empty_arrays(true)
    .remove_empty_objects(true)
    .execute(json)?;

// Result: {"name": "John", "city": "NYC"}
}
import json_tools_rs as jt

data = {
    "name": "John",
    "bio": "",
    "age": None,
    "tags": [],
    "metadata": {},
    "city": "NYC",
}

result = (jt.JSONTools()
    .flatten()
    .remove_empty_strings(True)
    .remove_nulls(True)
    .remove_empty_arrays(True)
    .remove_empty_objects(True)
    .execute(data)
)
# {'name': 'John', 'city': 'NYC'}

Filtering with Unflatten

Filters also work during unflattening, applied after the nested structure is reconstructed:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .unflatten()
    .remove_nulls(true)
    .remove_empty_strings(true)
    .execute(flat_json)?;
}

Combining Filters

All filters can be combined freely. They are applied after the flatten/unflatten operation completes.

Key & Value Replacements

Replace patterns in keys and/or values using literal strings or regular expressions.

Key Replacements

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .key_replacement("user_profile_", "")  // Literal
    .key_replacement("regex:(User|Admin)_", "")  // Regex
    .execute(json)?;
}
result = (jt.JSONTools()
    .flatten()
    .key_replacement("user_profile_", "")
    .key_replacement("regex:(User|Admin)_", "")
    .execute(data)
)

Value Replacements

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .value_replacement("@example.com", "@company.org")  // Literal
    .value_replacement("regex:^super$", "administrator")  // Regex
    .execute(json)?;
}

Regex Syntax

Prefix patterns with regex: to use regular expressions. The regex engine uses standard Rust regex syntax.

PatternDescription
"old"Literal string replacement
"regex:^prefix_"Regex: match start of string
"regex:(a|b)_"Regex: alternation
"regex:\\d+"Regex: digit sequences

Multiple Replacements

You can chain multiple key and value replacements. They are applied in order:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .key_replacement("prefix_", "")
    .key_replacement("_suffix", "")
    .key_replacement("_", ".")
    .value_replacement("@old.com", "@new.com")
    .value_replacement("regex:^admin$", "administrator")
    .execute(json)?;
}

Real-World Example

Normalizing an API response:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .separator("::")
    .lowercase_keys(true)
    .key_replacement("regex:(api_response|user_data)::", "")
    .key_replacement("_", ".")
    .value_replacement("@example.com", "@company.org")
    .remove_empty_strings(true)
    .remove_nulls(true)
    .execute(api_response)?;
}

Key Collision Handling

When key replacements or transformations cause multiple keys to map to the same output key, collision handling determines what happens.

Enabling Collision Handling

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .key_replacement("regex:(User|Admin)_", "")
    .handle_key_collision(true)
    .execute(json)?;
}
result = (jt.JSONTools()
    .flatten()
    .key_replacement("regex:(User|Admin)_", "")
    .handle_key_collision(True)
    .execute(data)
)

How It Works

With .handle_key_collision(true), when two keys collide after transformation, their values are collected into an array:

// Input
{"User_name": "John", "Admin_name": "Jane"}

// With key_replacement("regex:(User|Admin)_", "") + handle_key_collision(true)
// Output
{"name": ["John", "Jane"]}

Without collision handling, the last value wins (overwrites previous values).

Collision with Filtering

Collision handling respects filters. If a colliding value would be filtered out (e.g., empty string with .remove_empty_strings(true)), it is excluded from the collected array:

// Input
{"User_name": "John", "Admin_name": "", "Guest_name": "Bob"}

// With remove_empty_strings(true) + handle_key_collision(true)
// Output
{"name": ["John", "Bob"], "guest_name": "Bob"}

Works with Both Modes

Collision handling works during both .flatten() and .unflatten() operations.

Automatic Type Conversion

When .auto_convert_types(true) is enabled, string values are automatically converted to their appropriate types.

Enabling

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .auto_convert_types(true)
    .execute(json)?;
}
result = jt.JSONTools().flatten().auto_convert_types(True).execute(data)

Conversion Rules

Conversions are applied in priority order: dates -> nulls -> booleans -> numbers.

Dates (ISO-8601)

Date strings are detected and normalized to UTC:

InputOutput
"2024-01-15""2024-01-15" (kept as-is, not a number)
"2024-01-15T10:30:00+05:00""2024-01-15T05:30:00Z" (UTC normalized)
"2024-01-15T10:30:00Z""2024-01-15T10:30:00Z"
"2024-01-15T10:30:00""2024-01-15T10:30:00" (naive, kept as-is)

Nulls

InputOutput
"null", "NULL"null
"nil", "NIL"null
"none", "NONE"null
"N/A", "n/a"null

Booleans

InputOutput
"true", "TRUE", "True"true
"false", "FALSE", "False"false
"yes", "YES"true
"no", "NO"false
"on", "ON"true
"off", "OFF"false
"y", "Y"true
"n", "N"false

Note: "1" and "0" are treated as numbers, not booleans.

Numbers

FormatInputOutput
Basic integers"123"123
Decimals"45.67"45.67
Negative"-10"-10
US thousands"1,234.56"1234.56
EU thousands"1.234,56"1234.56
Space separators"1 234.56"1234.56
Currency"$1,234.56", "EUR999"1234.56, 999
Percentages"50%", "12.5%"50.0, 12.5
Scientific"1e5", "1.23e-4"100000, 0.000123
Basis points"50bps", "100 bp"0.005, 0.01
Suffixes"1K", "2.5M", "5B"1000, 2500000, 5000000000

Non-Convertible Strings

Strings that don't match any pattern are left as-is:

{"name": "Alice", "code": "ABC"} -> {"name": "Alice", "code": "ABC"}

Full Example

#![allow(unused)]
fn main() {
let json = r#"{
    "id": "123",
    "price": "$1,234.56",
    "discount": "15%",
    "active": "yes",
    "created": "2024-01-15T10:30:00+05:00",
    "status": "N/A",
    "name": "Product"
}"#;

let result = JSONTools::new()
    .flatten()
    .auto_convert_types(true)
    .execute(json)?;

// {
//   "id": 123,
//   "price": 1234.56,
//   "discount": 15.0,
//   "active": true,
//   "created": "2024-01-15T05:30:00Z",
//   "status": null,
//   "name": "Product"
// }
}

Normal Mode

Normal mode applies transformations (filtering, replacements, type conversion) without flattening or unflattening the JSON structure.

Usage

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .normal()
    .lowercase_keys(true)
    .remove_nulls(true)
    .remove_empty_strings(true)
    .auto_convert_types(true)
    .execute(json)?;
}
result = (jt.JSONTools()
    .normal()
    .lowercase_keys(True)
    .remove_nulls(True)
    .remove_empty_strings(True)
    .auto_convert_types(True)
    .execute(data)
)

When to Use Normal Mode

Use .normal() when you want to:

  • Clean data without changing its structure
  • Apply key transformations (lowercase, replacements) to top-level keys only
  • Filter out unwanted values while preserving nesting
  • Convert string types without flattening

Example

import json_tools_rs as jt

data = {
    "User_Name": "alice@example.com",
    "User_Age": "",
    "User_Active": "true",
    "User_Score": None,
}

result = (jt.JSONTools()
    .normal()
    .lowercase_keys(True)
    .key_replacement("^user_", "")
    .value_replacement("@example.com", "@company.org")
    .remove_empty_strings(True)
    .remove_nulls(True)
    .execute(data)
)
# {'name': 'alice@company.org', 'active': 'true'}

All features available in .flatten() and .unflatten() modes also work in .normal() mode, except the actual flattening/unflattening operation itself.

Parallel Processing

JSON Tools RS uses Crossbeam-based parallelism to automatically speed up batch operations and large nested structures.

Automatic Parallelism

Batch processing (100+ items by default) automatically uses parallel execution:

#![allow(unused)]
fn main() {
let batch: Vec<&str> = large_json_collection;
let result = JSONTools::new()
    .flatten()
    .execute(batch.as_slice())?;
// Automatically parallelized
}
batch = [{"data": i} for i in range(2000)]
results = jt.JSONTools().flatten().execute(batch)
# Automatically parallelized

Configuration

Batch Threshold

Control the minimum batch size before parallelism kicks in:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .parallel_threshold(50)  // Only parallelize batches of 50+ items
    .execute(batch.as_slice())?;
}

Thread Count

Limit the number of threads used:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .num_threads(Some(4))  // Use 4 threads (default: CPU count)
    .execute(batch.as_slice())?;
}

Nested Parallelism

Large individual JSON objects/arrays can also be parallelized:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .nested_parallel_threshold(200)  // Parallelize objects with 200+ entries
    .execute(large_json)?;
}

Python Configuration

tools = (jt.JSONTools()
    .flatten()
    .parallel_threshold(50)
    .num_threads(4)
    .nested_parallel_threshold(200)
)

results = tools.execute(large_batch)

How It Works

  • Batch parallelism: Input is split into chunks, each processed by a separate thread via crossbeam::thread::scope. Results are written to pre-allocated slots preserving input order.
  • Nested parallelism: Large JSON objects (many keys) or arrays (many elements) are split across threads for parallel flattening, then merged.
  • Thread safety: All parallelism uses scoped threads -- no 'static bounds required, no data races possible.

Environment Variables

All parallelism settings can be overridden via environment variables (applied at construction time):

VariableDefaultDescription
JSON_TOOLS_PARALLEL_THRESHOLD100Minimum batch size to trigger parallel processing
JSON_TOOLS_NESTED_PARALLEL_THRESHOLD100Minimum object/array size for nested parallelism
JSON_TOOLS_NUM_THREADSCPU countNumber of threads for parallel processing
JSON_TOOLS_MAX_ARRAY_INDEX100000Maximum array index during unflattening (DoS protection)
export JSON_TOOLS_PARALLEL_THRESHOLD=50
export JSON_TOOLS_NESTED_PARALLEL_THRESHOLD=200
export JSON_TOOLS_NUM_THREADS=4
export JSON_TOOLS_MAX_ARRAY_INDEX=500000

Environment variables take effect when JSONTools::new() is called. Builder method calls (e.g., .parallel_threshold(n)) override them.

DataFrame & Series Support

The Python bindings natively support DataFrame and Series objects from popular data libraries, with perfect type preservation.

Supported Libraries

LibraryDataFrameSeries
PandasYesYes
PolarsYesYes
PyArrowYes (Table)Yes (Array)
PySparkYes--

Usage

Pandas DataFrame

import json_tools_rs as jt
import pandas as pd

df = pd.DataFrame([
    {"user": {"name": "Alice", "age": 30}},
    {"user": {"name": "Bob", "age": 25}},
])

result = jt.JSONTools().flatten().execute(df)
print(type(result))  # <class 'pandas.core.frame.DataFrame'>
print(result.columns.tolist())  # ['user.name', 'user.age']

Polars DataFrame

import json_tools_rs as jt
import polars as pl

df = pl.DataFrame({
    "data": ['{"user": {"name": "Alice"}}', '{"user": {"name": "Bob"}}']
})

result = jt.JSONTools().flatten().execute(df)
print(type(result))  # <class 'polars.DataFrame'>

Pandas Series

import json_tools_rs as jt
import pandas as pd

series = pd.Series(['{"a": {"b": 1}}', '{"c": {"d": 2}}'])
result = jt.JSONTools().flatten().execute(series)
print(type(result))  # <class 'pandas.core.series.Series'>

How It Works

  1. Detection: The library uses duck typing to detect DataFrame/Series objects (checks for .to_dict(), .to_list(), etc.)
  2. Extraction: Rows are extracted as JSON strings or dicts
  3. Processing: Each row is processed through the Rust engine (with automatic parallelism for large DataFrames)
  4. Reconstruction: Results are reconstructed into the original DataFrame/Series type using O(1) constructor calls

All Features Apply

DataFrames and Series support all the same features as regular input:

tools = (jt.JSONTools()
    .flatten()
    .separator("::")
    .lowercase_keys(True)
    .remove_nulls(True)
    .auto_convert_types(True)
    .parallel_threshold(50)
)

result = tools.execute(large_dataframe)

Rust API Reference

Full API documentation is available on docs.rs.

JSONTools

The main builder struct for all JSON operations. Uses the owned-self builder pattern -- all configuration methods consume and return Self for chaining.

Construction

#![allow(unused)]
fn main() {
use json_tools_rs::JSONTools;

let tools = JSONTools::new();
}

JSONTools implements Default, Debug, and Clone.

Operation Modes

Exactly one mode must be set before calling .execute().

MethodDescription
.flatten()Flatten nested JSON into separator-delimited keys
.unflatten()Reconstruct nested JSON from flat, separator-delimited keys
.normal()Apply transformations without changing the nesting structure
#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

// Flatten
let result = JSONTools::new()
    .flatten()
    .execute(r#"{"a": {"b": 1}}"#)?;

// Unflatten
let result = JSONTools::new()
    .unflatten()
    .execute(r#"{"a.b": 1}"#)?;

// Normal mode -- transformations only
let result = JSONTools::new()
    .normal()
    .lowercase_keys(true)
    .auto_convert_types(true)
    .execute(r#"{"Name": "John", "Age": "30"}"#)?;
}

Configuration Methods

All methods consume self and return Self for chaining. Marked #[must_use].

MethodTypeDefaultDescription
.separator(sep)impl Into<String>"."Key separator for flatten/unflatten
.lowercase_keys(flag)boolfalseConvert all keys to lowercase
.remove_empty_strings(flag)boolfalseFilter out "" values
.remove_nulls(flag)boolfalseFilter out null values
.remove_empty_objects(flag)boolfalseFilter out {} values
.remove_empty_arrays(flag)boolfalseFilter out [] values
.key_replacement(find, replace)impl Into<String>, impl Into<String>--Add a key replacement regex pattern
.value_replacement(find, replace)impl Into<String>, impl Into<String>--Add a value replacement regex pattern
.handle_key_collision(flag)boolfalseCollect colliding keys into arrays
.auto_convert_types(flag)boolfalseAuto-convert string values to native types
.parallel_threshold(n)usize100Min batch size for parallel processing
.num_threads(n)Option<usize>None (CPU count)Thread count for parallelism
.nested_parallel_threshold(n)usize100Min keys/items for intra-document parallelism
.max_array_index(n)usize100_000Max array index during unflattening (DoS protection)

Note: .separator() panics if given an empty string. Defaults for parallel_threshold, nested_parallel_threshold, num_threads, and max_array_index can be overridden via environment variables (see Performance Tuning).

Execution

#![allow(unused)]
fn main() {
pub fn execute<'a, T>(&self, json_input: T) -> Result<JsonOutput, JsonToolsError>
where
    T: Into<JsonInput<'a>>,
}

Accepts any type that implements Into<JsonInput>:

Rust TypeJsonInput Variant
&strSingle(Cow::Borrowed)
&StringSingle(Cow::Borrowed)
&[&str]Multiple (borrowing)
Vec<&str>MultipleOwned
Vec<String>MultipleOwned
&[String]MultipleOwned

Errors: Returns Err(JsonToolsError) if no mode is set, JSON is invalid, or processing fails.

Full Example

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let tools = JSONTools::new()
    .flatten()
    .separator("::")
    .lowercase_keys(true)
    .remove_nulls(true)
    .remove_empty_strings(true)
    .key_replacement("^user_", "")
    .auto_convert_types(true)
    .parallel_threshold(50)
    .num_threads(Some(4));

// Single document
let result = tools.execute(r#"{"User_Name": "Alice", "User_Age": "30"}"#)?;
match result {
    JsonOutput::Single(s) => println!("{}", s),
    JsonOutput::Multiple(_) => unreachable!(),
}

// Batch processing
let batch: Vec<String> = (0..1000)
    .map(|i| format!(r#"{{"id": "{}"}}"#, i))
    .collect();
let results = tools.execute(batch)?;
match results {
    JsonOutput::Multiple(v) => println!("Processed {} items", v.len()),
    JsonOutput::Single(_) => unreachable!(),
}
}

JsonInput

Input enum for execute(). You rarely construct this directly -- the From implementations handle conversion automatically.

#![allow(unused)]
fn main() {
pub enum JsonInput<'a> {
    /// Single JSON string (zero-copy via Cow)
    Single(Cow<'a, str>),
    /// Multiple JSON strings (borrowing)
    Multiple(&'a [&'a str]),
    /// Multiple JSON strings (owned or mixed)
    MultipleOwned(Vec<Cow<'a, str>>),
}
}

From Implementations

Source TypeVariant
&strSingle(Cow::Borrowed)
&StringSingle(Cow::Borrowed)
&[&str]Multiple
Vec<&str>MultipleOwned
Vec<String>MultipleOwned
&[String]MultipleOwned
#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let tools = JSONTools::new().flatten();

// All of these work transparently:
let _ = tools.execute(r#"{"a": 1}"#);                      // &str
let s = String::from(r#"{"a": 1}"#);
let _ = tools.execute(&s);                                  // &String
let batch = vec![r#"{"a": 1}"#, r#"{"b": 2}"#];
let _ = tools.execute(batch);                               // Vec<&str>
let owned: Vec<String> = vec![r#"{"a": 1}"#.into()];
let _ = tools.execute(owned);                               // Vec<String>
}

JsonOutput

Output enum from execute().

#![allow(unused)]
fn main() {
pub enum JsonOutput {
    /// Single JSON result string
    Single(String),
    /// Multiple JSON result strings (batch)
    Multiple(Vec<String>),
}
}

Methods

MethodReturnsDescription
.into_single()StringExtract single result. Panics on Multiple.
.into_multiple()Vec<String>Extract batch results. Panics on Single.
.try_into_single()Result<String, JsonToolsError>Non-panicking single extraction
.try_into_multiple()Result<Vec<String>, JsonToolsError>Non-panicking batch extraction
.into_vec()Vec<String>Always returns a Vec (wraps Single in a one-element vec)
#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonOutput};

let result = JSONTools::new().flatten().execute(r#"{"a": {"b": 1}}"#)?;

// Pattern matching (recommended)
match result {
    JsonOutput::Single(s) => println!("Single: {}", s),
    JsonOutput::Multiple(v) => println!("Batch of {}", v.len()),
}

// Direct extraction (panics on wrong variant)
let s = JSONTools::new().flatten().execute(r#"{"a": 1}"#)?.into_single();

// Safe extraction (returns Result)
let s = JSONTools::new().flatten().execute(r#"{"a": 1}"#)?.try_into_single()?;

// Always-vec (useful for uniform handling)
let v = JSONTools::new().flatten().execute(r#"{"a": 1}"#)?.into_vec();
assert_eq!(v.len(), 1);
}

JsonToolsError

Comprehensive error enum with machine-readable error codes (E001-E008), human-readable messages, and actionable suggestions.

#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum JsonToolsError {
    JsonParseError { .. },           // E001
    RegexError { .. },               // E002
    InvalidReplacementPattern { .. }, // E003
    InvalidJsonStructure { .. },     // E004
    ConfigurationError { .. },       // E005
    BatchProcessingError { .. },     // E006
    InputValidationError { .. },     // E007
    SerializationError { .. },       // E008
}
}

Methods

MethodReturnsDescription
.error_code()&'static strMachine-readable code: "E001" through "E008"

Error Handling Example

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonToolsError};

let result = JSONTools::new().flatten().execute("invalid json");

match result {
    Ok(output) => { /* success */ }
    Err(e) => {
        // Machine-readable error code
        match e.error_code() {
            "E001" => eprintln!("JSON parsing error: {}", e),
            "E005" => eprintln!("Configuration error: {}", e),
            "E006" => eprintln!("Batch error: {}", e),
            code => eprintln!("[{}] {}", code, e),
        }

        // Pattern matching for specific handling
        match &e {
            JsonToolsError::JsonParseError { message, suggestion, .. } => {
                eprintln!("Parse failed: {}", message);
                eprintln!("Try: {}", suggestion);
            }
            JsonToolsError::BatchProcessingError { index, source, .. } => {
                eprintln!("Item {} failed: {}", index, source);
            }
            _ => eprintln!("{}", e),
        }
    }
}
}

Auto-Conversions

JsonToolsError implements From for common error types:

#![allow(unused)]
fn main() {
// These conversions happen automatically in ? chains:
impl From<json_parser::JsonError> for JsonToolsError { .. }  // -> E001
impl From<regex::Error> for JsonToolsError { .. }            // -> E002
}

See Error Codes for the full error reference.

ProcessingConfig

Low-level configuration struct used internally by JSONTools. You can construct it directly for advanced use cases, but the JSONTools builder is the recommended interface.

#![allow(unused)]
fn main() {
pub struct ProcessingConfig {
    pub separator: String,
    pub lowercase_keys: bool,
    pub filtering: FilteringConfig,
    pub collision: CollisionConfig,
    pub replacements: ReplacementConfig,
    pub auto_convert_types: bool,
    pub parallel_threshold: usize,
    pub num_threads: Option<usize>,
    pub nested_parallel_threshold: usize,
    pub max_array_index: usize,
}
}

Builder Methods

#![allow(unused)]
fn main() {
use json_tools_rs::ProcessingConfig;

let config = ProcessingConfig::new()
    .separator("::")
    .lowercase_keys(true)
    .filtering(FilteringConfig::new().set_remove_nulls(true))
    .collision(CollisionConfig::new().handle_collisions(true))
    .replacements(
        ReplacementConfig::new()
            .add_key_replacement("^old_", "new_")
    );
}

FilteringConfig

Configuration for value filtering, stored internally as a bitmask for single-instruction checks on the hot path.

#![allow(unused)]
fn main() {
pub struct FilteringConfig { /* bitmask */ }
}

Builder Methods

All methods consume and return Self.

MethodDescription
.set_remove_empty_strings(bool)Filter "" values
.set_remove_nulls(bool)Filter null values
.set_remove_empty_objects(bool)Filter {} values
.set_remove_empty_arrays(bool)Filter [] values

Query Methods

MethodReturnsDescription
.remove_empty_strings()boolIs empty string filtering enabled?
.remove_nulls()boolIs null filtering enabled?
.remove_empty_objects()boolIs empty object filtering enabled?
.remove_empty_arrays()boolIs empty array filtering enabled?
.has_any_filter()boolIs any filter enabled?
#![allow(unused)]
fn main() {
use json_tools_rs::FilteringConfig;

let filtering = FilteringConfig::new()
    .set_remove_nulls(true)
    .set_remove_empty_strings(true);

assert!(filtering.has_any_filter());
assert!(filtering.remove_nulls());
assert!(!filtering.remove_empty_objects());
}

CollisionConfig

Configuration for key collision handling.

#![allow(unused)]
fn main() {
pub struct CollisionConfig {
    pub handle_collisions: bool,
}
}

Builder Methods

MethodDescription
.handle_collisions(bool)Enable/disable collision handling

Query Methods

MethodReturnsDescription
.has_collision_handling()boolIs collision handling enabled?
#![allow(unused)]
fn main() {
use json_tools_rs::CollisionConfig;

let collision = CollisionConfig::new().handle_collisions(true);
assert!(collision.has_collision_handling());
}

ReplacementConfig

Configuration for key and value replacement patterns. Uses SmallVec<[(String, String); 2]> internally to avoid heap allocation for the common case of 0-2 replacements.

#![allow(unused)]
fn main() {
pub struct ReplacementConfig {
    pub key_replacements: SmallVec<[(String, String); 2]>,
    pub value_replacements: SmallVec<[(String, String); 2]>,
}
}

Builder Methods

MethodDescription
.add_key_replacement(find, replace)Add a key replacement regex pattern
.add_value_replacement(find, replace)Add a value replacement regex pattern

Query Methods

MethodReturnsDescription
.has_key_replacements()boolAre any key replacements configured?
.has_value_replacements()boolAre any value replacements configured?
#![allow(unused)]
fn main() {
use json_tools_rs::ReplacementConfig;

let replacements = ReplacementConfig::new()
    .add_key_replacement("^user_", "")
    .add_value_replacement("@old\\.com", "@new.com");

assert!(replacements.has_key_replacements());
assert!(replacements.has_value_replacements());
}

Python API Reference

import json_tools_rs

JSONTools

The main builder class for all JSON operations. All configuration methods return self for chaining; only .execute() and .execute_to_output() trigger processing.

Construction

tools = json_tools_rs.JSONTools()

Creates a new JSONTools instance with all default settings. The instance is reusable -- you can call .execute() multiple times with different inputs.

Operation Modes

Exactly one mode must be set before calling .execute(). Calling a mode method replaces any previously set mode.

.flatten()

tools.flatten() -> JSONTools

Set the operation to flatten nested JSON into dot-separated (or custom separator) keys.

import json_tools_rs as jt

result = jt.JSONTools().flatten().execute({"a": {"b": {"c": 1}}})
# {"a.b.c": 1}

.unflatten()

tools.unflatten() -> JSONTools

Set the operation to reconstruct nested JSON from flat, separator-delimited keys.

result = jt.JSONTools().unflatten().execute({"a.b.c": 1})
# {"a": {"b": {"c": 1}}}

.normal()

tools.normal() -> JSONTools

Set the operation to apply transformations (filtering, replacements, type conversion) without changing the nesting structure.

result = jt.JSONTools().normal().lowercase_keys(True).execute({"Name": "Alice"})
# {"name": "Alice"}

Configuration Methods

All configuration methods return self for chaining.

.separator(sep)

tools.separator(sep: str) -> JSONTools

Set the key separator for flatten/unflatten operations.

ParameterTypeDefaultDescription
sepstr"."Non-empty string used to join/split nested keys

Raises: ValueError if sep is an empty string.

result = jt.JSONTools().flatten().separator("::").execute({"a": {"b": 1}})
# {"a::b": 1}

.lowercase_keys(flag)

tools.lowercase_keys(flag: bool) -> JSONTools

Convert all keys to lowercase after processing.

ParameterTypeDefaultDescription
flagboolFalseEnable or disable lowercase key conversion
result = jt.JSONTools().flatten().lowercase_keys(True).execute({"User": {"Name": "Alice"}})
# {"user.name": "Alice"}

.remove_empty_strings(flag)

tools.remove_empty_strings(flag: bool) -> JSONTools

Remove key-value pairs where the value is an empty string "".

ParameterTypeDefaultDescription
flagboolFalseEnable or disable empty string removal
result = jt.JSONTools().flatten().remove_empty_strings(True).execute({"a": "", "b": "hello"})
# {"b": "hello"}

.remove_nulls(flag)

tools.remove_nulls(flag: bool) -> JSONTools

Remove key-value pairs where the value is None / null.

ParameterTypeDefaultDescription
flagboolFalseEnable or disable null removal
result = jt.JSONTools().flatten().remove_nulls(True).execute({"a": None, "b": 1})
# {"b": 1}

.remove_empty_objects(flag)

tools.remove_empty_objects(flag: bool) -> JSONTools

Remove key-value pairs where the value is an empty object {}.

ParameterTypeDefaultDescription
flagboolFalseEnable or disable empty object removal

.remove_empty_arrays(flag)

tools.remove_empty_arrays(flag: bool) -> JSONTools

Remove key-value pairs where the value is an empty array [].

ParameterTypeDefaultDescription
flagboolFalseEnable or disable empty array removal

.key_replacement(find, replace)

tools.key_replacement(find: str, replace: str) -> JSONTools

Add a key replacement pattern. Patterns use standard regex syntax. If the regex fails to compile, it falls back to literal string replacement. Multiple replacements can be chained.

ParameterTypeDescription
findstrRegex pattern (or literal string) to match in keys
replacestrReplacement string (supports regex capture groups like $1)
result = (jt.JSONTools()
    .flatten()
    .key_replacement("^user_", "")
    .key_replacement("_name$", "_id")
    .execute({"user_name": "Alice"}))
# {"id": "Alice"}

.value_replacement(find, replace)

tools.value_replacement(find: str, replace: str) -> JSONTools

Add a value replacement pattern. Works the same as key replacements but applies to string values.

ParameterTypeDescription
findstrRegex pattern (or literal string) to match in values
replacestrReplacement string
result = (jt.JSONTools()
    .flatten()
    .value_replacement("@example\\.com", "@company.org")
    .execute({"email": "user@example.com"}))
# {"email": "user@company.org"}

.handle_key_collision(flag)

tools.handle_key_collision(flag: bool) -> JSONTools

When enabled, keys that would collide after transformations (e.g., after lowercasing) are collected into arrays instead of overwriting each other.

ParameterTypeDefaultDescription
flagboolFalseEnable collision handling
result = (jt.JSONTools()
    .flatten()
    .lowercase_keys(True)
    .handle_key_collision(True)
    .execute({"Name": "Alice", "name": "Bob"}))
# {"name": ["Alice", "Bob"]}

.auto_convert_types(flag)

tools.auto_convert_types(flag: bool) -> JSONTools

Automatically convert string values to their native types:

  • Numbers: "123" -> 123, "1,234.56" -> 1234.56, "$99.99" -> 99.99, "1e5" -> 100000
  • Booleans: "true" / "TRUE" / "True" -> true, "false" / "FALSE" / "False" -> false
  • Nulls: "null" / "None" -> null

If conversion fails, the original string is kept. No errors are raised on conversion failure.

ParameterTypeDefaultDescription
flagboolFalseEnable automatic type conversion
result = (jt.JSONTools()
    .flatten()
    .auto_convert_types(True)
    .execute({"id": "123", "price": "1,234.56", "active": "true"}))
# {"id": 123, "price": 1234.56, "active": true}

.parallel_threshold(n)

tools.parallel_threshold(n: int) -> JSONTools

Set the minimum batch size to trigger parallel processing. Batches smaller than this are processed sequentially to avoid thread-spawning overhead.

ParameterTypeDefaultDescription
nint100Minimum batch size for parallelism

Default can be overridden with the JSON_TOOLS_PARALLEL_THRESHOLD environment variable.

tools = jt.JSONTools().flatten().parallel_threshold(50)

.num_threads(n)

tools.num_threads(n: int) -> JSONTools

Set the number of threads used for parallel processing.

ParameterTypeDefaultDescription
nintCPU countNumber of worker threads

Default can be overridden with the JSON_TOOLS_NUM_THREADS environment variable.

tools = jt.JSONTools().flatten().num_threads(4)

.nested_parallel_threshold(n)

tools.nested_parallel_threshold(n: int) -> JSONTools

Set the minimum number of keys/items within a single JSON document to trigger nested (intra-document) parallelism. Only objects or arrays exceeding this count are parallelized internally.

ParameterTypeDefaultDescription
nint100Minimum keys/items for nested parallelism

Default can be overridden with the JSON_TOOLS_NESTED_PARALLEL_THRESHOLD environment variable.

tools = jt.JSONTools().flatten().nested_parallel_threshold(200)

.max_array_index(n)

tools.max_array_index(n: int) -> JSONTools

Set the maximum array index allowed during unflattening. This is a DoS protection: a malicious key like "items.999999999" would otherwise allocate a massive array.

ParameterTypeDefaultDescription
nint100000Maximum array index

Default can be overridden with the JSON_TOOLS_MAX_ARRAY_INDEX environment variable.

Execution Methods

.execute(input)

tools.execute(input) -> str | dict | list[str] | list[dict] | DataFrame | Series

Execute the configured operation. The return type mirrors the input type:

Input TypeOutput Type
strstr (JSON string)
dictdict (Python dictionary)
list[str]list[str]
list[dict]list[dict]
pandas.DataFramepandas.DataFrame
pandas.Seriespandas.Series
polars.DataFramepolars.DataFrame
polars.Seriespolars.Series
pyarrow.Tablepyarrow.Table
pyarrow.ChunkedArraypyarrow.ChunkedArray
pyspark.sql.DataFramepyspark.sql.DataFrame

Raises: JsonToolsError if no mode is set, input is invalid, or processing fails.

# String input -> string output
result = jt.JSONTools().flatten().execute('{"a": {"b": 1}}')
assert isinstance(result, str)

# Dict input -> dict output
result = jt.JSONTools().flatten().execute({"a": {"b": 1}})
assert isinstance(result, dict)

# Batch string input -> batch string output
results = jt.JSONTools().flatten().execute(['{"a": 1}', '{"b": 2}'])
assert isinstance(results, list) and isinstance(results[0], str)

# Batch dict input -> batch dict output
results = jt.JSONTools().flatten().execute([{"a": {"b": 1}}, {"c": {"d": 2}}])
assert isinstance(results, list) and isinstance(results[0], dict)

.execute_to_output(input)

tools.execute_to_output(input) -> JsonOutput

Execute the operation but return a JsonOutput wrapper instead of native Python types. Useful when you need to inspect whether the result is single or multiple before extracting.

Note: DataFrame and Series inputs are not supported with execute_to_output(). Use .execute() for those types.

ParameterTypeDescription
inputstr, dict, list[str], list[dict]JSON data to process
output = jt.JSONTools().flatten().execute_to_output('{"a": {"b": 1}}')
if output.is_single:
    print(output.get_single())
elif output.is_multiple:
    for item in output.get_multiple():
        print(item)

JsonOutput

Output wrapper returned by .execute_to_output(). Provides typed access to results.

Properties

PropertyTypeDescription
.is_singleboolTrue if the result contains a single JSON string
.is_multipleboolTrue if the result contains multiple JSON strings

Methods

.get_single()

output.get_single() -> str

Extract the single JSON string result.

Raises: ValueError if the result is multiple.

.get_multiple()

output.get_multiple() -> list[str]

Extract the list of JSON string results.

Raises: ValueError if the result is single.

.to_python()

output.to_python() -> str | list[str]

Convert to native Python type: returns str for single results, list[str] for multiple results.

String Representations

str(output) returns the JSON string (single) or a list representation (multiple). repr(output) returns JsonOutput.Single('...') or JsonOutput.Multiple([...]).

DataFrame and Series Support

JSON Tools RS natively supports Pandas, Polars, PyArrow, and PySpark DataFrames and Series. Detection is performed via duck typing -- no explicit imports are required.

Pandas DataFrame

import pandas as pd
import json_tools_rs as jt

df = pd.DataFrame({"json_col": [
    '{"user": {"name": "Alice", "age": 30}}',
    '{"user": {"name": "Bob", "age": 25}}',
]})

tools = jt.JSONTools().flatten().separator(".")

# Process column containing JSON strings
result_df = tools.execute(df)
# Returns a DataFrame with flattened columns: "user.name", "user.age"

Pandas Series

series = pd.Series([
    '{"a": {"b": 1}}',
    '{"a": {"b": 2}}',
])

result_series = jt.JSONTools().flatten().execute(series)
# Returns a Series of flattened JSON strings

Polars DataFrame

import polars as pl

df = pl.DataFrame({"json_col": [
    '{"user": {"name": "Alice"}}',
    '{"user": {"name": "Bob"}}',
]})

result_df = jt.JSONTools().flatten().execute(df)

Polars Series

series = pl.Series("data", [
    '{"a": {"b": 1}}',
    '{"a": {"b": 2}}',
])

result_series = jt.JSONTools().flatten().execute(series)

PyArrow Table

import pyarrow as pa

table = pa.table({"json_col": [
    '{"user": {"name": "Alice"}}',
    '{"user": {"name": "Bob"}}',
]})

result_table = jt.JSONTools().flatten().execute(table)

PySpark DataFrame

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
    ('{"user": {"name": "Alice"}}',),
    ('{"user": {"name": "Bob"}}',),
], ["json_col"])

result_df = jt.JSONTools().flatten().execute(df)

JsonToolsError

Exception class for all errors raised by JSON Tools RS.

import json_tools_rs as jt

try:
    result = jt.JSONTools().flatten().execute("not valid json")
except jt.JsonToolsError as e:
    print(f"Error: {e}")
    # Error: [E001] JSON parsing failed: ...

Error messages include a machine-readable code (E001-E008) at the start of the message. See Error Codes for the full reference.

Error Codes Quick Reference

CodeNameCommon Cause
E001JsonParseErrorInvalid JSON input
E002RegexErrorBad regex in key/value replacement
E003InvalidReplacementPatternMalformed replacement pair
E004InvalidJsonStructureWrong JSON shape for the operation
E005ConfigurationErrorNo mode set before .execute()
E006BatchProcessingErrorError in one item during batch processing
E007InputValidationErrorUnsupported input type
E008SerializationErrorInternal serialization failure

Handling Specific Errors

import json_tools_rs as jt

try:
    result = jt.JSONTools().execute({"a": 1})  # No mode set
except jt.JsonToolsError as e:
    msg = str(e)
    if "[E005]" in msg:
        print("Forgot to call .flatten() or .unflatten()")
    elif "[E001]" in msg:
        print("Invalid JSON input")

Complete Example

import json_tools_rs as jt

# Build once, reuse many times
tools = (jt.JSONTools()
    .flatten()
    .separator("::")
    .lowercase_keys(True)
    .remove_nulls(True)
    .remove_empty_strings(True)
    .key_replacement("^user_", "")
    .auto_convert_types(True)
    .parallel_threshold(50)
    .num_threads(4)
)

# Single dict
result = tools.execute({"User_Name": "Alice", "User_Age": "30"})
# {"name": "Alice", "age": 30}

# Batch of dicts (processed in parallel if >= 50 items)
results = tools.execute([{"data": str(i)} for i in range(1000)])

# JSON string
result = tools.execute('{"User_Name": "Alice", "nested": {"User_Age": "30"}}')

# DataFrame
import pandas as pd
df = pd.DataFrame({"json": [
    '{"User_Name": "Alice", "nested": {"User_Age": "30"}}',
    '{"User_Name": "Bob", "nested": {"User_Age": "25"}}',
]})
df_result = tools.execute(df)

Architecture

JSON Tools RS is organized into focused, single-responsibility modules. This modular design improves maintainability while preserving performance -- Rust modules are compile-time organization only, with zero runtime overhead.

Module Structure

src/
├── lib.rs            Facade: mod declarations + pub use re-exports
├── json_parser.rs    Conditional SIMD parser (sonic-rs / simd-json)
├── types.rs          Core types: JsonInput, JsonOutput
├── error.rs          Error types with codes E001-E008
├── config.rs         Configuration structs and operation modes
├── cache.rs          Tiered caching: regex, key deduplication, phf
├── convert.rs        Type conversion: numbers, dates, booleans, nulls
├── transform.rs      Filtering, key/value replacements, collision handling
├── flatten.rs        Flattening algorithm with Crossbeam parallelism
├── unflatten.rs      Unflattening with SIMD separator detection
├── builder.rs        Public JSONTools builder API and execute()
├── python.rs         Python bindings via PyO3
├── tests.rs          99 unit tests
└── main.rs           CLI examples

Module Descriptions

json_parser -- JSON Parsing Abstraction

Conditional compilation wrapper that selects the fastest available JSON parser:

  • 64-bit platforms: sonic-rs (AVX2/SSE4.2 SIMD, 30-50% faster)
  • 32-bit platforms: simd-json (fallback)

Exposes from_str(), to_string(), and parse_json() with a unified JsonError type.

types -- Core Types

Defines the public-facing input/output types:

  • JsonInput<'a> -- Enum accepting &str, &[&str], Vec<String>, etc.
  • JsonOutput -- Enum returning Single(String) or Multiple(Vec<String>)

error -- Error Handling

JsonToolsError enum with 8 error variants (E001-E008), each with machine-readable codes, Display/Error impls, and constructors. Includes From impls for automatic conversion from parse and regex errors.

config -- Configuration

All configuration structs used by the builder:

  • ProcessingConfig -- Main config holding all options
  • FilteringConfig -- Empty string/null/object/array removal
  • CollisionConfig -- Key collision handling settings
  • ReplacementConfig -- Key and value replacement patterns
  • OperationMode -- Flatten, Unflatten, or Normal

cache -- Caching Infrastructure

Three-tier caching system for performance:

  1. phf perfect hash (COMMON_JSON_KEYS) -- Zero-cost lookup for common keys
  2. Thread-local FxHashMap (KeyDeduplicator) -- Per-thread key deduplication
  3. Global DashMap (REGEX_CACHE) -- Compiled regex pattern cache with LRU eviction

convert -- Type Conversion

Automatic type conversion for string values (~1,000 lines, the largest leaf module):

  • Number parsing: integers, decimals, currency, percentages, basis points, scientific notation, suffixed (K/M/B)
  • Date parsing: ISO-8601 variants with UTC normalization
  • Boolean/null detection via phf perfect hash maps
  • SIMD-optimized clean_number_string() with extend_skipping_3/4 helpers

transform -- Transformations

Core transformation logic applied after flatten/unflatten:

  • Key/value replacements (literal and regex, with SIMD fast-path)
  • Filtering (empty strings, nulls, empty objects/arrays)
  • Key collision handling (collect into arrays)
  • Lowercase key conversion

flatten -- Flattening Algorithm

Recursive JSON flattening with performance optimizations:

  • SeparatorCache for pre-computed separator properties
  • FastStringBuilder with thread-local caching
  • flatten_value_with_threshold() for Crossbeam parallel flattening of large objects/arrays
  • quick_leaf_estimate() for O(1) HashMap pre-sizing

unflatten -- Unflattening Algorithm

Reconstructs nested JSON from flat key-value pairs:

  • SIMD-accelerated separator detection (find_separator*() functions)
  • Path type analysis for array vs. object reconstruction
  • Recursive set_nested_value() and set_nested_array_value()

builder -- Public API

The JSONTools struct and all 35+ builder methods. Routes execute() calls to the appropriate processing function based on operation mode (flatten, unflatten, normal).

python -- Python Bindings

PyO3-based Python bindings with:

  • Perfect type preservation (input type = output type)
  • Native DataFrame/Series support (Pandas, Polars, PyArrow, PySpark)
  • GIL release during compute-intensive operations

Processing Pipeline

Input → Parse → Flatten/Unflatten → Transform → Filter → Convert → Serialize → Output
         │            │                  │          │         │          │
    json_parser    flatten/         transform   transform   convert   json_parser
                   unflatten

Public API Surface

All public types are re-exported from lib.rs, preserving a flat import path:

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonInput, JsonOutput, JsonToolsError};
use json_tools_rs::{ProcessingConfig, FilteringConfig, CollisionConfig, ReplacementConfig};
}

Internal modules use pub(crate) visibility for cross-module access without exposing internals.

Error Codes

All errors include a machine-readable code accessible via .error_code() (Rust) or in the error message (Python).

CodeNameDescription
E001JsonParseErrorInvalid JSON input. The input string could not be parsed as valid JSON.
E002RegexErrorInvalid regex pattern in a key or value replacement.
E003InvalidReplacementPatternMalformed replacement pattern string.
E004InvalidJsonStructureJSON structure is valid but not suitable for the operation (e.g., unflattening non-object JSON).
E005ConfigurationErrorOperation mode not set. Call .flatten(), .unflatten(), or .normal() before .execute().
E006BatchProcessingErrorAn error occurred while processing one or more items in a batch.
E007InputValidationErrorInput validation failed (e.g., unsupported input type).
E008SerializationErrorFailed to serialize the output back to JSON.

Rust Error Handling

#![allow(unused)]
fn main() {
use json_tools_rs::{JSONTools, JsonToolsError};

match JSONTools::new().flatten().execute(input) {
    Ok(result) => { /* success */ }
    Err(e) => {
        eprintln!("[{}] {}", e.error_code(), e);
        // [E001] JSON parse error: expected value at line 1 column 1
    }
}
}

Python Error Handling

import json_tools_rs as jt

try:
    result = jt.JSONTools().flatten().execute("not valid json")
except jt.JsonToolsError as e:
    print(f"Error: {e}")
    # Error: [E001] JSON parse error: ...

Common Errors

E005: No mode set

# Wrong: no mode set
tools = jt.JSONTools().execute(data)  # Raises E005

# Correct: set a mode first
tools = jt.JSONTools().flatten().execute(data)

E001: Invalid JSON

# Wrong: not valid JSON
tools = jt.JSONTools().flatten().execute("hello world")  # Raises E001

# Correct: valid JSON string
tools = jt.JSONTools().flatten().execute('{"key": "value"}')

Performance & Benchmarks

JSON Tools RS achieves ~2,000+ ops/ms through multiple optimization layers.

Optimization Techniques

TechniqueImpact
SIMD JSON Parsingsonic-rs (64-bit) / simd-json (32-bit) for hardware-accelerated parsing
SIMD Byte Searchmemchr/memmem for fast string operations
FxHashMapFast non-cryptographic hashing via rustc-hash
Tiered Cachingphf perfect hash -> thread-local FxHashMap -> global DashMap
SmallVecStack allocation for depth stacks and number buffers
Arc<str> DedupShared key storage to minimize allocations
First-Byte DiscriminatorsRapid rejection of non-convertible strings
Crossbeam ParallelismScoped thread pools for batch and nested parallelism
Zero-Copy (Cow)Avoid allocations when strings don't need modification
itoaFast integer-to-string formatting
mimallocOptional high-performance allocator (features = ["mimalloc"], ~5-10% speedup)

Benchmark Results

Measured on Apple Silicon. Results from the stress benchmark suite targeting edge cases and large inputs.

Stress Benchmarks

BenchmarkResultDescription
Deep nesting (100 levels)8.3 usDeeply nested objects, 100 levels deep
Wide objects (1,000 keys)~337 usSingle object with 1,000 top-level keys
Large arrays (5,000 items)~2.11 msArray containing 5,000 elements
Parallel batch (10,000 items)~2.61 msBatch processing with Crossbeam parallelism

Throughput Targets (v0.9.0)

OperationTarget
Basic flatten>2,000 ops/ms
With transformations>1,300 ops/ms
Regex replacements>1,800 ops/ms
Batch (10 items)>2,500 ops/ms
Batch (100 items)>3,000 ops/ms
Roundtrip>1,000 cycles/ms

Performance Tuning

Three threshold parameters control when parallelism activates. Tuning them for your workload can significantly affect throughput.

parallel_threshold (default: 100)

Controls when batch processing (multiple JSON documents) switches from sequential to parallel execution.

When to lower (e.g., 20-50):

  • Each document is large or complex (deep nesting, many keys)
  • CPU cores are available and not contended
  • You are processing 50-100 items and want parallel speedup

When to raise (e.g., 200-500):

  • Each document is small (a few keys, shallow nesting)
  • Thread-spawning overhead dominates processing time
  • Running inside a container with limited CPU
# For large documents, parallel even at small batch sizes
tools = jt.JSONTools().flatten().parallel_threshold(20)

# For tiny documents, avoid parallelism overhead
tools = jt.JSONTools().flatten().parallel_threshold(500)
#![allow(unused)]
fn main() {
let tools = JSONTools::new()
    .flatten()
    .parallel_threshold(50);
}

nested_parallel_threshold (default: 100)

Controls when a single JSON document's top-level keys/array items are processed in parallel (intra-document parallelism). This is independent of batch parallelism.

When to lower (e.g., 50):

  • Individual documents have very wide objects (500+ keys) with deep sub-trees
  • Processing includes expensive transformations (regex replacements, type conversion)

When to raise (e.g., 500-1000) or effectively disable:

  • Documents are moderately sized (under 100 keys)
  • Sub-trees are shallow (1-2 levels), so per-key work is minimal
  • You want deterministic (sequential) output ordering
# Large documents with heavy per-key work
tools = jt.JSONTools().flatten().nested_parallel_threshold(50)

# Disable nested parallelism entirely
tools = jt.JSONTools().flatten().nested_parallel_threshold(999_999)

num_threads (default: CPU count)

Controls the number of worker threads for parallel processing.

When to set explicitly:

  • Running alongside other CPU-intensive workloads -- limit threads to avoid contention
  • In a container or VM with a CPU quota -- match thread count to available cores
  • Benchmarking -- fix thread count for reproducible results
tools = jt.JSONTools().flatten().num_threads(4)
#![allow(unused)]
fn main() {
let tools = JSONTools::new()
    .flatten()
    .num_threads(Some(4));
}

Environment Variable Overrides

All threshold defaults can be overridden without code changes via environment variables. These are read once at process startup (via LazyLock).

VariableDefaultDescription
JSON_TOOLS_PARALLEL_THRESHOLD100Minimum batch size for parallel processing
JSON_TOOLS_NESTED_PARALLEL_THRESHOLD100Minimum keys/items for nested parallelism
JSON_TOOLS_NUM_THREADS(CPU count)Thread count for parallel processing
JSON_TOOLS_MAX_ARRAY_INDEX100000Maximum array index during unflattening
# Example: tune for a workload of many small documents
export JSON_TOOLS_PARALLEL_THRESHOLD=200
export JSON_TOOLS_NUM_THREADS=8

python my_pipeline.py

Environment variable values are parsed as usize. Invalid values (non-numeric, negative) silently fall back to the default.

Running Benchmarks

# All benchmarks
cargo bench

# Specific suite
cargo bench --bench isolation_benchmarks
cargo bench --bench comprehensive_benchmark
cargo bench --bench stress_benchmarks
cargo bench --bench realworld_benchmarks
cargo bench --bench combination_benchmarks

Benchmark Suites

SuiteFocus
isolation_benchmarksIndividual features in isolation (10 groups)
combination_benchmarks2-way and 3-way feature interactions
realworld_benchmarksAWS CloudTrail, GitHub API, K8s, Elasticsearch, Stripe, Twitter/X
stress_benchmarksEdge cases: deep nesting, wide objects, large arrays
comprehensive_benchmarkFull feature coverage (15 groups)

Profiling

On macOS, use samply for profiling:

# Build with profiling symbols
cargo bench --profile profiling --bench stress_benchmarks --no-run

# Profile with samply
samply record --save-only -o /tmp/profile.json -- \
    ./target/profiling/deps/stress_benchmarks-* --bench

# View results
samply load /tmp/profile.json

Architecture

The codebase is organized into focused, single-responsibility modules:

src/
├── lib.rs            Facade: mod declarations + pub use re-exports
├── json_parser.rs    Conditional SIMD parser (sonic-rs / simd-json)
├── types.rs          Core types: JsonInput, JsonOutput, FlatMap
├── error.rs          Error types with codes E001-E008
├── config.rs         Configuration structs and operation modes
├── cache.rs          Tiered caching: regex, key deduplication, phf
├── convert.rs        Type conversion: numbers, dates, booleans, nulls
├── transform.rs      Filtering, key/value replacements, collision handling
├── flatten.rs        Flattening algorithm with Crossbeam parallelism
├── unflatten.rs      Unflattening with SIMD separator detection
├── builder.rs        Public JSONTools builder API and execute()
├── python.rs         Python bindings via PyO3
└── tests.rs          99 unit tests

The processing pipeline:

  1. Parse -- SIMD-accelerated JSON parsing (json_parser)
  2. Flatten/Unflatten -- Recursive traversal with Arc<str> key dedup (flatten/unflatten)
  3. Transform -- Lowercase, replacements (cached regex), collision handling (transform)
  4. Filter -- Remove empty strings, nulls, empty objects/arrays (transform)
  5. Convert -- Type conversion with first-byte discriminators (convert)
  6. Serialize -- Output to JSON string or native Python types

Troubleshooting

This guide covers common errors, their causes, and how to resolve them.

Error Code Reference

All errors include a machine-readable code (E001-E008) at the start of the error message. Use these codes for programmatic error handling.

E001: JsonParseError

Message: [E001] JSON parsing failed: ...

Cause: The input string is not valid JSON.

Common triggers:

  • Missing quotes around keys or values
  • Trailing commas after the last element
  • Single quotes instead of double quotes
  • Unescaped special characters in strings
  • Incomplete JSON (missing closing braces or brackets)
  • Passing a file path instead of the file contents

Solution:

# Wrong
result = tools.execute("hello world")          # Not JSON
result = tools.execute("{'key': 'value'}")     # Single quotes
result = tools.execute('{"a": 1,}')            # Trailing comma

# Correct
result = tools.execute('{"key": "value"}')
result = tools.execute({"key": "value"})       # Pass a dict directly

E002: RegexError

Message: [E002] Regex pattern error: ...

Cause: A key or value replacement pattern failed to compile as a regex.

Common triggers:

  • Unescaped special regex characters (., *, +, ?, (, ), [, ])
  • Unclosed groups or character classes
  • Invalid backreferences

Solution:

# Wrong -- unescaped dot matches any character
tools.key_replacement("user.name", "username")

# Correct -- escape the dot for literal matching
tools.key_replacement(r"user\.name", "username")

# Or use a simpler pattern that won't be misinterpreted
tools.key_replacement("user_name", "username")

Note: If regex compilation fails, the library automatically falls back to literal string matching. This error only surfaces when the pattern is syntactically broken (e.g., unclosed groups).

E003: InvalidReplacementPattern

Message: [E003] Invalid replacement pattern: ...

Cause: The replacement pattern configuration is malformed.

Solution: Ensure replacement patterns are provided as (find, replace) pairs:

# Correct usage
tools.key_replacement("find_pattern", "replacement")
tools.value_replacement("old_value", "new_value")

E004: InvalidJsonStructure

Message: [E004] Invalid JSON structure: ...

Cause: The JSON is valid but not compatible with the requested operation.

Common triggers:

  • Unflattening a JSON array (unflatten requires a flat object)
  • Unflattening a non-flat object (nested values where flat keys are expected)

Solution:

# Wrong -- unflatten expects a flat object, not an array
result = jt.JSONTools().unflatten().execute('[1, 2, 3]')

# Wrong -- unflatten expects flat keys
result = jt.JSONTools().unflatten().execute('{"a": {"b": 1}}')

# Correct -- flat object with dot-separated keys
result = jt.JSONTools().unflatten().execute('{"a.b": 1, "a.c": 2}')

E005: ConfigurationError

Message: [E005] Operation mode not configured: ...

Cause: .execute() was called without first setting an operation mode.

Solution: Always call .flatten(), .unflatten(), or .normal() before .execute():

# Wrong
result = jt.JSONTools().execute(data)

# Correct
result = jt.JSONTools().flatten().execute(data)
result = jt.JSONTools().unflatten().execute(data)
result = jt.JSONTools().normal().execute(data)

This error also occurs if num_threads is set to 0:

# Wrong
tools = jt.JSONTools().flatten().num_threads(0)

# Correct
tools = jt.JSONTools().flatten().num_threads(1)    # At least 1
tools = jt.JSONTools().flatten()                    # Use default (CPU count)

E006: BatchProcessingError

Message: [E006] Batch processing failed at index {N}: ...

Cause: One or more items in a batch failed to process. The error includes the index of the failing item and the underlying error.

Solution: Check the item at the reported index. The inner error (usually E001 or E004) describes what went wrong:

try:
    results = tools.execute(batch_of_json)
except jt.JsonToolsError as e:
    msg = str(e)
    if "[E006]" in msg:
        # Extract the index from the message to find the bad item
        print(f"Batch error: {e}")
        # Fix or filter the problematic items and retry

E007: InputValidationError

Message: [E007] Input validation failed: ...

Cause: The input type is not supported.

Common triggers:

  • Passing an integer, float, or boolean directly
  • Passing a non-JSON-string, non-dict type in a list
  • Using execute_to_output() with a DataFrame or Series (use execute() instead)

Solution:

# Wrong
result = tools.execute(42)
result = tools.execute([1, 2, 3])

# Correct
result = tools.execute('{"value": 42}')
result = tools.execute({"value": 42})
result = tools.execute(['{"a": 1}', '{"b": 2}'])

E008: SerializationError

Message: [E008] JSON serialization failed: ...

Cause: The processed result could not be serialized back to JSON. This is typically an internal error.

Solution: If you encounter this error, please report it as a bug. As a workaround, check that your input does not contain unusual Unicode sequences or extremely large numbers that may not round-trip through JSON.

Common Issues

Empty Separator

The separator must be a non-empty string. Using an empty separator is always a logic error -- it would make keys ambiguous.

# This raises an error
tools = jt.JSONTools().flatten().separator("")

# Use any non-empty string
tools = jt.JSONTools().flatten().separator(".")
tools = jt.JSONTools().flatten().separator("::")
tools = jt.JSONTools().flatten().separator("/")

In Rust, an empty separator causes a panic (via assert!). In Python, it raises a ValueError.

Missing Operation Mode

The most common mistake is forgetting to set a mode:

# This always raises E005
tools = jt.JSONTools()
tools.execute(data)  # Error!

# Set a mode first
tools = jt.JSONTools().flatten()
tools.execute(data)  # OK

Dict vs String Input

Both str and dict inputs are accepted, but the output type mirrors the input type:

# String in -> string out
result = tools.execute('{"a": {"b": 1}}')
assert isinstance(result, str)
# result == '{"a.b":1}'

# Dict in -> dict out
result = tools.execute({"a": {"b": 1}})
assert isinstance(result, dict)
# result == {"a.b": 1}

If you need the raw JSON string output from a dict input, use .execute_to_output():

output = tools.execute_to_output({"a": {"b": 1}})
json_str = output.get_single()  # Returns a JSON string

Regex Patterns in Replacements

Replacement patterns use standard regex syntax. Common pitfalls:

# The dot matches ANY character -- "user.name" matches "username" too
tools.key_replacement("user.name", "id")

# Escape dots for literal matching
tools.key_replacement(r"user\.name", "id")

# Use anchors for precise matching
tools.key_replacement("^user_", "")       # Only at start of key
tools.key_replacement("_suffix$", "")      # Only at end of key

Performance Tuning

When Parallelism Helps

Parallel processing adds overhead for thread spawning and synchronization. It helps when:

  • Batch size is large (100+ items by default) -- amortizes spawning cost
  • Individual documents are complex -- deep nesting, many keys, expensive transformations
  • CPU cores are available -- parallelism on a single-core machine adds only overhead

When Parallelism Hurts

Reduce or disable parallelism when:

  • Documents are tiny (a few flat keys) -- thread overhead dominates
  • Batch sizes are small (<50 items) -- raise parallel_threshold
  • Memory is constrained -- each thread needs its own stack and working set
  • Running inside a GIL-heavy Python workload -- the GIL is released during Rust processing, but other Python threads may contend
# Disable parallelism for small workloads
tools = jt.JSONTools().flatten().parallel_threshold(999_999)

# Or limit threads
tools = jt.JSONTools().flatten().num_threads(1)

Profiling Tips

Use the built-in benchmark suites to profile your specific workload pattern:

# Profile stress scenarios
cargo bench --profile profiling --bench stress_benchmarks --no-run
samply record --save-only -o /tmp/profile.json -- \
    ./target/profiling/deps/stress_benchmarks-* --bench

For Python profiling, measure wall-clock time since CPU profilers may not capture time spent in Rust:

import time
start = time.perf_counter()
result = tools.execute(data)
elapsed = time.perf_counter() - start
print(f"Processing took {elapsed:.3f}s")

Platform Notes

mimalloc (Rust-only)

The mimalloc global allocator is an optional feature that provides a 5-10% performance improvement. Enable it with features = ["mimalloc"] in your Cargo.toml. It is not included in Python builds because PyO3 manages memory through Python's allocator.

sonic-rs (64-bit only)

The default JSON parser is sonic-rs, which uses SIMD instructions available on 64-bit platforms (x86_64, aarch64). On 32-bit platforms, the library automatically falls back to simd-json. This is transparent -- the API is identical regardless of which parser is active.

macOS Profiling

On macOS, flamegraph requires full Xcode (not just Command Line Tools). Use samply instead:

cargo install samply
samply record --save-only -o profile.json -- ./target/profiling/deps/BENCH_BINARY --bench
samply load profile.json  # Opens Firefox Profiler

Valgrind does not work on modern macOS. Use Instruments (if Xcode is installed) or samply for profiling.

Changelog

v0.9.0 (2026-03-09)

Added

  • DataFrame & Series Support (Python): Native support for Pandas, Polars, PyArrow, and PySpark DataFrames and Series with perfect type preservation.
  • Crossbeam Parallelism: Migrated from Rayon to Crossbeam for finer-grained parallel control with scoped threads.
  • Modular Architecture: Refactored monolithic lib.rs into 10 focused modules (json_parser, types, error, config, cache, convert, transform, flatten, unflatten, builder) with zero public API changes.

Performance Improvements

Rust Core (6 optimizations):

  • Eliminated per-entry HashMap allocation in parallel flatten -- single partial map per chunk
  • Added early-exit first-byte discriminators for type conversion fast-path
  • SIMD literal fallback for regex patterns (memchr before regex compilation)
  • Thread-local regex cache half-eviction (LRU-style, capacity 64)
  • Expanded SmallVec buffers (32 -> 64 bytes) and separator cache
  • Vectorized clean_number_string() with SIMD skip helpers

Python Bindings (3 optimizations):

  • mem::take for zero-cost builder field extraction
  • Batch type detection via first-element sampling
  • O(1) DataFrame/Series reconstruction

v0.8.0 (2026-01-01)

  • Python Feature Parity: auto_convert_types, parallel_threshold, num_threads, nested_parallel_threshold in Python
  • Enhanced Type Conversion: ISO-8601 dates, currency codes, basis points, suffixed numbers
  • Date Normalization: Automatic UTC normalization

v0.7.0 (2025-10-17)

  • Parallel configuration methods (parallel_threshold, num_threads, nested_parallel_threshold)
  • HashMap capacity and hashing optimizations

v0.6.0 (2025-10-13)

  • Python GIL release for parallel operations (5-13% improvement)
  • Inline hints on hot functions

v0.5.0 (2025-10-12)

  • Rust inline optimizations (2-5% improvement)
  • Iterator adapter chains

v0.4.0 (2025-10-11)

  • FxHashMap migration (30-55% improvement)
  • SIMD JSON parsing (sonic-rs / simd-json)
  • SmallVec stack allocation
  • Arc<str> key deduplication

v0.3.0 (2025-10-10)

  • Automatic type conversion
  • Python bindings via PyO3

v0.2.0 (2025-10-09)

  • Key collision handling
  • Comprehensive filtering (empty strings, nulls, objects, arrays)
  • Regex-based replacements

v0.1.0 (2025-10-08)

  • Initial release
  • JSON flattening and unflattening
  • Custom separators
  • Batch processing

For the full changelog with migration guides, see CHANGELOG.md.