JSON Tools RS
A high-performance Rust library for advanced JSON manipulation with SIMD-accelerated parsing, Crossbeam-based parallelism, and native Python bindings with DataFrame/Series support.
Why JSON Tools RS?
JSON Tools RS is designed for developers who need to:
- Transform nested JSON into flat structures for databases, CSV exports, or analytics
- Clean and normalize JSON data from external APIs or user input
- Process large batches of JSON documents efficiently
- Maintain type safety with perfect roundtrip support (flatten -> unflatten -> original)
- Work with both Rust and Python using the same consistent API
Key Features
- Unified API -- Single
JSONToolsentry point for flattening, unflattening, or pass-through transforms - Builder Pattern -- Fluent, chainable API for configuration
- High Performance -- SIMD-accelerated parsing, FxHashMap, SmallVec stack allocation, tiered caching (~2,000+ ops/ms)
- Parallel Processing -- Crossbeam-based parallelism for 3-5x speedup on batch operations
- Complete Roundtrip -- Flatten and unflatten with perfect fidelity
- Comprehensive Filtering -- Remove empty strings, nulls, empty objects, empty arrays
- Advanced Replacements -- Literal and regex-based key/value replacements
- Collision Handling -- Collect colliding values into arrays
- Automatic Type Conversion -- Strings to numbers, booleans, dates, and nulls
- Date Normalization -- ISO-8601 detection and UTC normalization
- Batch Processing -- Single or batch JSON, dicts, lists, DataFrames, and Series
- Python Bindings -- Full Python support with perfect type preservation
- DataFrame/Series Support -- Pandas, Polars, PyArrow, and PySpark
- Modular Architecture -- 10 focused modules for maintainability with zero-overhead abstraction
Quick Example
Rust:
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let result = JSONTools::new() .flatten() .execute(r#"{"user": {"name": "John", "age": 30}}"#)?; // {"user.name": "John", "user.age": 30} }
Python:
import json_tools_rs as jt
result = jt.JSONTools().flatten().execute({"user": {"name": "John", "age": 30}})
# {'user.name': 'John', 'user.age': 30}
Installation
Rust
Add to your Cargo.toml:
cargo add json-tools-rs
Or manually:
[dependencies]
json-tools-rs = "0.9"
Python
Install from PyPI:
pip install json-tools-rs
Pre-built wheels are available for:
| Platform | Architectures |
|---|---|
| Linux (glibc) | x86_64, x86, aarch64, armv7, ppc64le |
| Linux (musl) | x86_64, x86, aarch64, armv7 |
| macOS | x86_64 (Intel), aarch64 (Apple Silicon) |
| Windows | x64 |
Python 3.8+ is supported.
Verify Installation
Rust:
use json_tools_rs::JSONTools; fn main() { let result = JSONTools::new() .flatten() .execute(r#"{"hello": "world"}"#) .unwrap(); println!("{:?}", result); }
Python:
import json_tools_rs as jt
result = jt.JSONTools().flatten().execute({"hello": "world"})
print(result) # {'hello': 'world'}
Quick Start (Rust)
The JSONTools struct provides a unified builder pattern API. Call .flatten() or .unflatten() to set the mode, chain configuration methods, then call .execute().
Basic Flattening
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let json = r#"{"user": {"name": "John", "profile": {"age": 30, "city": "NYC"}}}"#; let result = JSONTools::new() .flatten() .execute(json)?; if let JsonOutput::Single(flattened) = result { println!("{}", flattened); } // {"user.name": "John", "user.profile.age": 30, "user.profile.city": "NYC"} }
Basic Unflattening
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let json = r#"{"user.name": "John", "user.profile.age": 30}"#; let result = JSONTools::new() .unflatten() .execute(json)?; if let JsonOutput::Single(nested) = result { println!("{}", nested); } // {"user": {"name": "John", "profile": {"age": 30}}} }
Advanced Configuration
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let json = r#"{"user": {"name": "John", "details": {"age": null, "city": ""}}}"#; let result = JSONTools::new() .flatten() .separator("::") .lowercase_keys(true) .remove_empty_strings(true) .remove_nulls(true) .execute(json)?; if let JsonOutput::Single(flattened) = result { println!("{}", flattened); } // {"user::name": "John"} }
Batch Processing
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let batch = vec![ r#"{"user": {"name": "Alice"}}"#, r#"{"user": {"name": "Bob"}}"#, r#"{"user": {"name": "Charlie"}}"#, ]; let result = JSONTools::new() .flatten() .separator("_") .execute(batch.as_slice())?; if let JsonOutput::Multiple(results) = result { for r in &results { println!("{}", r); } } // {"user_name": "Alice"} // {"user_name": "Bob"} // {"user_name": "Charlie"} }
Error Handling
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonToolsError}; match JSONTools::new().flatten().execute("invalid json") { Ok(result) => println!("{:?}", result), Err(e) => { eprintln!("Error [{}]: {}", e.error_code(), e); // Error [E001]: JSON parse error: ... } } }
Quick Start (Python)
The Python bindings provide the same JSONTools API with perfect type matching: input type equals output type.
Type Preservation
| Input Type | Output Type |
|---|---|
str | str (JSON string) |
dict | dict |
list[str] | list[str] |
list[dict] | list[dict] |
DataFrame | DataFrame (Pandas, Polars, PyArrow, PySpark) |
Series | Series (Pandas, Polars, PyArrow) |
Basic Flattening
import json_tools_rs as jt
# Dict input -> dict output
result = jt.JSONTools().flatten().execute({"user": {"name": "John", "age": 30}})
print(result) # {'user.name': 'John', 'user.age': 30}
# String input -> string output
result = jt.JSONTools().flatten().execute('{"user": {"name": "John"}}')
print(result) # '{"user.name": "John"}'
Basic Unflattening
import json_tools_rs as jt
result = jt.JSONTools().unflatten().execute({"user.name": "John", "user.age": 30})
print(result) # {'user': {'name': 'John', 'age': 30}}
Advanced Configuration
import json_tools_rs as jt
tools = (jt.JSONTools()
.flatten()
.separator("::")
.lowercase_keys(True)
.remove_empty_strings(True)
.remove_nulls(True)
.key_replacement("^user_", "")
.auto_convert_types(True)
)
data = {"User_Name": "Alice", "User_Age": "30", "User_Status": None}
result = tools.execute(data)
print(result) # {'name': 'Alice', 'age': 30}
Batch Processing
import json_tools_rs as jt
tools = jt.JSONTools().flatten()
# List of dicts -> list of dicts
results = tools.execute([
{"user": {"name": "Alice"}},
{"user": {"name": "Bob"}},
])
print(results) # [{'user.name': 'Alice'}, {'user.name': 'Bob'}]
# List of strings -> list of strings
results = tools.execute(['{"a": {"b": 1}}', '{"c": {"d": 2}}'])
print(results) # ['{"a.b": 1}', '{"c.d": 2}']
DataFrame Support
import json_tools_rs as jt
import pandas as pd
df = pd.DataFrame([
{"user": {"name": "Alice", "age": 30}},
{"user": {"name": "Bob", "age": 25}},
])
result = jt.JSONTools().flatten().execute(df)
print(type(result)) # <class 'pandas.core.frame.DataFrame'>
# Also works with Polars, PyArrow Tables, and PySpark DataFrames
Error Handling
import json_tools_rs as jt
try:
result = jt.JSONTools().flatten().execute("invalid json")
except jt.JsonToolsError as e:
print(f"Error: {e}")
Flattening & Unflattening
Flattening
Flattening converts nested JSON into a flat key-value structure using dot-separated (or custom) keys.
// Input
{"user": {"name": "John", "address": {"city": "NYC", "zip": "10001"}}}
// Output (flattened)
{"user.name": "John", "user.address.city": "NYC", "user.address.zip": "10001"}
Arrays
Arrays are flattened with numeric indices:
// Input
{"users": [{"name": "Alice"}, {"name": "Bob"}]}
// Output
{"users.0.name": "Alice", "users.1.name": "Bob"}
Custom Separators
Use .separator() to change the key delimiter:
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .separator("::") .execute(json)?; // {"user::name": "John", "user::address::city": "NYC"} }
result = jt.JSONTools().flatten().separator("::").execute(data)
Unflattening
Unflattening reverses the process, reconstructing nested structures from flat keys.
// Input
{"user.name": "John", "user.address.city": "NYC"}
// Output (unflattened)
{"user": {"name": "John", "address": {"city": "NYC"}}}
Numeric keys reconstruct arrays:
// Input
{"users.0.name": "Alice", "users.1.name": "Bob"}
// Output
{"users": [{"name": "Alice"}, {"name": "Bob"}]}
Roundtrip
Flattening and unflattening are perfect inverses. You can flatten data, apply transformations, then unflatten to recover the original structure:
#![allow(unused)] fn main() { let original = r#"{"user": {"name": "John", "scores": [10, 20, 30]}}"#; // Flatten let flat = JSONTools::new().flatten().execute(original)?; // Unflatten back let restored = JSONTools::new().unflatten().execute( &flat.into_single() )?; // Matches original structure }
All configuration options (filtering, replacements, collision handling, type conversion) work with both .flatten() and .unflatten() modes.
Filtering
Remove unwanted values during flattening or unflattening.
Available Filters
| Method | Removes |
|---|---|
.remove_empty_strings(true) | "" empty string values |
.remove_nulls(true) | null values |
.remove_empty_objects(true) | {} empty objects |
.remove_empty_arrays(true) | [] empty arrays |
Example
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let json = r#"{ "name": "John", "bio": "", "age": null, "tags": [], "metadata": {}, "city": "NYC" }"#; let result = JSONTools::new() .flatten() .remove_empty_strings(true) .remove_nulls(true) .remove_empty_arrays(true) .remove_empty_objects(true) .execute(json)?; // Result: {"name": "John", "city": "NYC"} }
import json_tools_rs as jt
data = {
"name": "John",
"bio": "",
"age": None,
"tags": [],
"metadata": {},
"city": "NYC",
}
result = (jt.JSONTools()
.flatten()
.remove_empty_strings(True)
.remove_nulls(True)
.remove_empty_arrays(True)
.remove_empty_objects(True)
.execute(data)
)
# {'name': 'John', 'city': 'NYC'}
Filtering with Unflatten
Filters also work during unflattening, applied after the nested structure is reconstructed:
#![allow(unused)] fn main() { let result = JSONTools::new() .unflatten() .remove_nulls(true) .remove_empty_strings(true) .execute(flat_json)?; }
Combining Filters
All filters can be combined freely. They are applied after the flatten/unflatten operation completes.
Key & Value Replacements
Replace patterns in keys and/or values using literal strings or regular expressions.
Key Replacements
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .key_replacement("user_profile_", "") // Literal .key_replacement("regex:(User|Admin)_", "") // Regex .execute(json)?; }
result = (jt.JSONTools()
.flatten()
.key_replacement("user_profile_", "")
.key_replacement("regex:(User|Admin)_", "")
.execute(data)
)
Value Replacements
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .value_replacement("@example.com", "@company.org") // Literal .value_replacement("regex:^super$", "administrator") // Regex .execute(json)?; }
Regex Syntax
Prefix patterns with regex: to use regular expressions. The regex engine uses standard Rust regex syntax.
| Pattern | Description |
|---|---|
"old" | Literal string replacement |
"regex:^prefix_" | Regex: match start of string |
"regex:(a|b)_" | Regex: alternation |
"regex:\\d+" | Regex: digit sequences |
Multiple Replacements
You can chain multiple key and value replacements. They are applied in order:
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .key_replacement("prefix_", "") .key_replacement("_suffix", "") .key_replacement("_", ".") .value_replacement("@old.com", "@new.com") .value_replacement("regex:^admin$", "administrator") .execute(json)?; }
Real-World Example
Normalizing an API response:
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .separator("::") .lowercase_keys(true) .key_replacement("regex:(api_response|user_data)::", "") .key_replacement("_", ".") .value_replacement("@example.com", "@company.org") .remove_empty_strings(true) .remove_nulls(true) .execute(api_response)?; }
Key Collision Handling
When key replacements or transformations cause multiple keys to map to the same output key, collision handling determines what happens.
Enabling Collision Handling
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .key_replacement("regex:(User|Admin)_", "") .handle_key_collision(true) .execute(json)?; }
result = (jt.JSONTools()
.flatten()
.key_replacement("regex:(User|Admin)_", "")
.handle_key_collision(True)
.execute(data)
)
How It Works
With .handle_key_collision(true), when two keys collide after transformation, their values are collected into an array:
// Input
{"User_name": "John", "Admin_name": "Jane"}
// With key_replacement("regex:(User|Admin)_", "") + handle_key_collision(true)
// Output
{"name": ["John", "Jane"]}
Without collision handling, the last value wins (overwrites previous values).
Collision with Filtering
Collision handling respects filters. If a colliding value would be filtered out (e.g., empty string with .remove_empty_strings(true)), it is excluded from the collected array:
// Input
{"User_name": "John", "Admin_name": "", "Guest_name": "Bob"}
// With remove_empty_strings(true) + handle_key_collision(true)
// Output
{"name": ["John", "Bob"], "guest_name": "Bob"}
Works with Both Modes
Collision handling works during both .flatten() and .unflatten() operations.
Automatic Type Conversion
When .auto_convert_types(true) is enabled, string values are automatically converted to their appropriate types.
Enabling
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .auto_convert_types(true) .execute(json)?; }
result = jt.JSONTools().flatten().auto_convert_types(True).execute(data)
Conversion Rules
Conversions are applied in priority order: dates -> nulls -> booleans -> numbers.
Dates (ISO-8601)
Date strings are detected and normalized to UTC:
| Input | Output |
|---|---|
"2024-01-15" | "2024-01-15" (kept as-is, not a number) |
"2024-01-15T10:30:00+05:00" | "2024-01-15T05:30:00Z" (UTC normalized) |
"2024-01-15T10:30:00Z" | "2024-01-15T10:30:00Z" |
"2024-01-15T10:30:00" | "2024-01-15T10:30:00" (naive, kept as-is) |
Nulls
| Input | Output |
|---|---|
"null", "NULL" | null |
"nil", "NIL" | null |
"none", "NONE" | null |
"N/A", "n/a" | null |
Booleans
| Input | Output |
|---|---|
"true", "TRUE", "True" | true |
"false", "FALSE", "False" | false |
"yes", "YES" | true |
"no", "NO" | false |
"on", "ON" | true |
"off", "OFF" | false |
"y", "Y" | true |
"n", "N" | false |
Note:
"1"and"0"are treated as numbers, not booleans.
Numbers
| Format | Input | Output |
|---|---|---|
| Basic integers | "123" | 123 |
| Decimals | "45.67" | 45.67 |
| Negative | "-10" | -10 |
| US thousands | "1,234.56" | 1234.56 |
| EU thousands | "1.234,56" | 1234.56 |
| Space separators | "1 234.56" | 1234.56 |
| Currency | "$1,234.56", "EUR999" | 1234.56, 999 |
| Percentages | "50%", "12.5%" | 50.0, 12.5 |
| Scientific | "1e5", "1.23e-4" | 100000, 0.000123 |
| Basis points | "50bps", "100 bp" | 0.005, 0.01 |
| Suffixes | "1K", "2.5M", "5B" | 1000, 2500000, 5000000000 |
Non-Convertible Strings
Strings that don't match any pattern are left as-is:
{"name": "Alice", "code": "ABC"} -> {"name": "Alice", "code": "ABC"}
Full Example
#![allow(unused)] fn main() { let json = r#"{ "id": "123", "price": "$1,234.56", "discount": "15%", "active": "yes", "created": "2024-01-15T10:30:00+05:00", "status": "N/A", "name": "Product" }"#; let result = JSONTools::new() .flatten() .auto_convert_types(true) .execute(json)?; // { // "id": 123, // "price": 1234.56, // "discount": 15.0, // "active": true, // "created": "2024-01-15T05:30:00Z", // "status": null, // "name": "Product" // } }
Normal Mode
Normal mode applies transformations (filtering, replacements, type conversion) without flattening or unflattening the JSON structure.
Usage
#![allow(unused)] fn main() { let result = JSONTools::new() .normal() .lowercase_keys(true) .remove_nulls(true) .remove_empty_strings(true) .auto_convert_types(true) .execute(json)?; }
result = (jt.JSONTools()
.normal()
.lowercase_keys(True)
.remove_nulls(True)
.remove_empty_strings(True)
.auto_convert_types(True)
.execute(data)
)
When to Use Normal Mode
Use .normal() when you want to:
- Clean data without changing its structure
- Apply key transformations (lowercase, replacements) to top-level keys only
- Filter out unwanted values while preserving nesting
- Convert string types without flattening
Example
import json_tools_rs as jt
data = {
"User_Name": "alice@example.com",
"User_Age": "",
"User_Active": "true",
"User_Score": None,
}
result = (jt.JSONTools()
.normal()
.lowercase_keys(True)
.key_replacement("^user_", "")
.value_replacement("@example.com", "@company.org")
.remove_empty_strings(True)
.remove_nulls(True)
.execute(data)
)
# {'name': 'alice@company.org', 'active': 'true'}
All features available in .flatten() and .unflatten() modes also work in .normal() mode, except the actual flattening/unflattening operation itself.
Parallel Processing
JSON Tools RS uses Crossbeam-based parallelism to automatically speed up batch operations and large nested structures.
Automatic Parallelism
Batch processing (100+ items by default) automatically uses parallel execution:
#![allow(unused)] fn main() { let batch: Vec<&str> = large_json_collection; let result = JSONTools::new() .flatten() .execute(batch.as_slice())?; // Automatically parallelized }
batch = [{"data": i} for i in range(2000)]
results = jt.JSONTools().flatten().execute(batch)
# Automatically parallelized
Configuration
Batch Threshold
Control the minimum batch size before parallelism kicks in:
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .parallel_threshold(50) // Only parallelize batches of 50+ items .execute(batch.as_slice())?; }
Thread Count
Limit the number of threads used:
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .num_threads(Some(4)) // Use 4 threads (default: CPU count) .execute(batch.as_slice())?; }
Nested Parallelism
Large individual JSON objects/arrays can also be parallelized:
#![allow(unused)] fn main() { let result = JSONTools::new() .flatten() .nested_parallel_threshold(200) // Parallelize objects with 200+ entries .execute(large_json)?; }
Python Configuration
tools = (jt.JSONTools()
.flatten()
.parallel_threshold(50)
.num_threads(4)
.nested_parallel_threshold(200)
)
results = tools.execute(large_batch)
How It Works
- Batch parallelism: Input is split into chunks, each processed by a separate thread via
crossbeam::thread::scope. Results are written to pre-allocated slots preserving input order. - Nested parallelism: Large JSON objects (many keys) or arrays (many elements) are split across threads for parallel flattening, then merged.
- Thread safety: All parallelism uses scoped threads -- no
'staticbounds required, no data races possible.
Environment Variables
All parallelism settings can be overridden via environment variables (applied at construction time):
| Variable | Default | Description |
|---|---|---|
JSON_TOOLS_PARALLEL_THRESHOLD | 100 | Minimum batch size to trigger parallel processing |
JSON_TOOLS_NESTED_PARALLEL_THRESHOLD | 100 | Minimum object/array size for nested parallelism |
JSON_TOOLS_NUM_THREADS | CPU count | Number of threads for parallel processing |
JSON_TOOLS_MAX_ARRAY_INDEX | 100000 | Maximum array index during unflattening (DoS protection) |
export JSON_TOOLS_PARALLEL_THRESHOLD=50
export JSON_TOOLS_NESTED_PARALLEL_THRESHOLD=200
export JSON_TOOLS_NUM_THREADS=4
export JSON_TOOLS_MAX_ARRAY_INDEX=500000
Environment variables take effect when JSONTools::new() is called. Builder method calls (e.g., .parallel_threshold(n)) override them.
DataFrame & Series Support
The Python bindings natively support DataFrame and Series objects from popular data libraries, with perfect type preservation.
Supported Libraries
| Library | DataFrame | Series |
|---|---|---|
| Pandas | Yes | Yes |
| Polars | Yes | Yes |
| PyArrow | Yes (Table) | Yes (Array) |
| PySpark | Yes | -- |
Usage
Pandas DataFrame
import json_tools_rs as jt
import pandas as pd
df = pd.DataFrame([
{"user": {"name": "Alice", "age": 30}},
{"user": {"name": "Bob", "age": 25}},
])
result = jt.JSONTools().flatten().execute(df)
print(type(result)) # <class 'pandas.core.frame.DataFrame'>
print(result.columns.tolist()) # ['user.name', 'user.age']
Polars DataFrame
import json_tools_rs as jt
import polars as pl
df = pl.DataFrame({
"data": ['{"user": {"name": "Alice"}}', '{"user": {"name": "Bob"}}']
})
result = jt.JSONTools().flatten().execute(df)
print(type(result)) # <class 'polars.DataFrame'>
Pandas Series
import json_tools_rs as jt
import pandas as pd
series = pd.Series(['{"a": {"b": 1}}', '{"c": {"d": 2}}'])
result = jt.JSONTools().flatten().execute(series)
print(type(result)) # <class 'pandas.core.series.Series'>
How It Works
- Detection: The library uses duck typing to detect DataFrame/Series objects (checks for
.to_dict(),.to_list(), etc.) - Extraction: Rows are extracted as JSON strings or dicts
- Processing: Each row is processed through the Rust engine (with automatic parallelism for large DataFrames)
- Reconstruction: Results are reconstructed into the original DataFrame/Series type using O(1) constructor calls
All Features Apply
DataFrames and Series support all the same features as regular input:
tools = (jt.JSONTools()
.flatten()
.separator("::")
.lowercase_keys(True)
.remove_nulls(True)
.auto_convert_types(True)
.parallel_threshold(50)
)
result = tools.execute(large_dataframe)
Rust API Reference
Full API documentation is available on docs.rs.
JSONTools
The main builder struct for all JSON operations. Uses the owned-self builder pattern -- all configuration methods consume and return Self for chaining.
Construction
#![allow(unused)] fn main() { use json_tools_rs::JSONTools; let tools = JSONTools::new(); }
JSONTools implements Default, Debug, and Clone.
Operation Modes
Exactly one mode must be set before calling .execute().
| Method | Description |
|---|---|
.flatten() | Flatten nested JSON into separator-delimited keys |
.unflatten() | Reconstruct nested JSON from flat, separator-delimited keys |
.normal() | Apply transformations without changing the nesting structure |
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; // Flatten let result = JSONTools::new() .flatten() .execute(r#"{"a": {"b": 1}}"#)?; // Unflatten let result = JSONTools::new() .unflatten() .execute(r#"{"a.b": 1}"#)?; // Normal mode -- transformations only let result = JSONTools::new() .normal() .lowercase_keys(true) .auto_convert_types(true) .execute(r#"{"Name": "John", "Age": "30"}"#)?; }
Configuration Methods
All methods consume self and return Self for chaining. Marked #[must_use].
| Method | Type | Default | Description |
|---|---|---|---|
.separator(sep) | impl Into<String> | "." | Key separator for flatten/unflatten |
.lowercase_keys(flag) | bool | false | Convert all keys to lowercase |
.remove_empty_strings(flag) | bool | false | Filter out "" values |
.remove_nulls(flag) | bool | false | Filter out null values |
.remove_empty_objects(flag) | bool | false | Filter out {} values |
.remove_empty_arrays(flag) | bool | false | Filter out [] values |
.key_replacement(find, replace) | impl Into<String>, impl Into<String> | -- | Add a key replacement regex pattern |
.value_replacement(find, replace) | impl Into<String>, impl Into<String> | -- | Add a value replacement regex pattern |
.handle_key_collision(flag) | bool | false | Collect colliding keys into arrays |
.auto_convert_types(flag) | bool | false | Auto-convert string values to native types |
.parallel_threshold(n) | usize | 100 | Min batch size for parallel processing |
.num_threads(n) | Option<usize> | None (CPU count) | Thread count for parallelism |
.nested_parallel_threshold(n) | usize | 100 | Min keys/items for intra-document parallelism |
.max_array_index(n) | usize | 100_000 | Max array index during unflattening (DoS protection) |
Note: .separator() panics if given an empty string. Defaults for parallel_threshold, nested_parallel_threshold, num_threads, and max_array_index can be overridden via environment variables (see Performance Tuning).
Execution
#![allow(unused)] fn main() { pub fn execute<'a, T>(&self, json_input: T) -> Result<JsonOutput, JsonToolsError> where T: Into<JsonInput<'a>>, }
Accepts any type that implements Into<JsonInput>:
| Rust Type | JsonInput Variant |
|---|---|
&str | Single(Cow::Borrowed) |
&String | Single(Cow::Borrowed) |
&[&str] | Multiple (borrowing) |
Vec<&str> | MultipleOwned |
Vec<String> | MultipleOwned |
&[String] | MultipleOwned |
Errors: Returns Err(JsonToolsError) if no mode is set, JSON is invalid, or processing fails.
Full Example
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let tools = JSONTools::new() .flatten() .separator("::") .lowercase_keys(true) .remove_nulls(true) .remove_empty_strings(true) .key_replacement("^user_", "") .auto_convert_types(true) .parallel_threshold(50) .num_threads(Some(4)); // Single document let result = tools.execute(r#"{"User_Name": "Alice", "User_Age": "30"}"#)?; match result { JsonOutput::Single(s) => println!("{}", s), JsonOutput::Multiple(_) => unreachable!(), } // Batch processing let batch: Vec<String> = (0..1000) .map(|i| format!(r#"{{"id": "{}"}}"#, i)) .collect(); let results = tools.execute(batch)?; match results { JsonOutput::Multiple(v) => println!("Processed {} items", v.len()), JsonOutput::Single(_) => unreachable!(), } }
JsonInput
Input enum for execute(). You rarely construct this directly -- the From implementations handle conversion automatically.
#![allow(unused)] fn main() { pub enum JsonInput<'a> { /// Single JSON string (zero-copy via Cow) Single(Cow<'a, str>), /// Multiple JSON strings (borrowing) Multiple(&'a [&'a str]), /// Multiple JSON strings (owned or mixed) MultipleOwned(Vec<Cow<'a, str>>), } }
From Implementations
| Source Type | Variant |
|---|---|
&str | Single(Cow::Borrowed) |
&String | Single(Cow::Borrowed) |
&[&str] | Multiple |
Vec<&str> | MultipleOwned |
Vec<String> | MultipleOwned |
&[String] | MultipleOwned |
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let tools = JSONTools::new().flatten(); // All of these work transparently: let _ = tools.execute(r#"{"a": 1}"#); // &str let s = String::from(r#"{"a": 1}"#); let _ = tools.execute(&s); // &String let batch = vec![r#"{"a": 1}"#, r#"{"b": 2}"#]; let _ = tools.execute(batch); // Vec<&str> let owned: Vec<String> = vec![r#"{"a": 1}"#.into()]; let _ = tools.execute(owned); // Vec<String> }
JsonOutput
Output enum from execute().
#![allow(unused)] fn main() { pub enum JsonOutput { /// Single JSON result string Single(String), /// Multiple JSON result strings (batch) Multiple(Vec<String>), } }
Methods
| Method | Returns | Description |
|---|---|---|
.into_single() | String | Extract single result. Panics on Multiple. |
.into_multiple() | Vec<String> | Extract batch results. Panics on Single. |
.try_into_single() | Result<String, JsonToolsError> | Non-panicking single extraction |
.try_into_multiple() | Result<Vec<String>, JsonToolsError> | Non-panicking batch extraction |
.into_vec() | Vec<String> | Always returns a Vec (wraps Single in a one-element vec) |
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonOutput}; let result = JSONTools::new().flatten().execute(r#"{"a": {"b": 1}}"#)?; // Pattern matching (recommended) match result { JsonOutput::Single(s) => println!("Single: {}", s), JsonOutput::Multiple(v) => println!("Batch of {}", v.len()), } // Direct extraction (panics on wrong variant) let s = JSONTools::new().flatten().execute(r#"{"a": 1}"#)?.into_single(); // Safe extraction (returns Result) let s = JSONTools::new().flatten().execute(r#"{"a": 1}"#)?.try_into_single()?; // Always-vec (useful for uniform handling) let v = JSONTools::new().flatten().execute(r#"{"a": 1}"#)?.into_vec(); assert_eq!(v.len(), 1); }
JsonToolsError
Comprehensive error enum with machine-readable error codes (E001-E008), human-readable messages, and actionable suggestions.
#![allow(unused)] fn main() { #[derive(Debug, thiserror::Error)] #[non_exhaustive] pub enum JsonToolsError { JsonParseError { .. }, // E001 RegexError { .. }, // E002 InvalidReplacementPattern { .. }, // E003 InvalidJsonStructure { .. }, // E004 ConfigurationError { .. }, // E005 BatchProcessingError { .. }, // E006 InputValidationError { .. }, // E007 SerializationError { .. }, // E008 } }
Methods
| Method | Returns | Description |
|---|---|---|
.error_code() | &'static str | Machine-readable code: "E001" through "E008" |
Error Handling Example
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonToolsError}; let result = JSONTools::new().flatten().execute("invalid json"); match result { Ok(output) => { /* success */ } Err(e) => { // Machine-readable error code match e.error_code() { "E001" => eprintln!("JSON parsing error: {}", e), "E005" => eprintln!("Configuration error: {}", e), "E006" => eprintln!("Batch error: {}", e), code => eprintln!("[{}] {}", code, e), } // Pattern matching for specific handling match &e { JsonToolsError::JsonParseError { message, suggestion, .. } => { eprintln!("Parse failed: {}", message); eprintln!("Try: {}", suggestion); } JsonToolsError::BatchProcessingError { index, source, .. } => { eprintln!("Item {} failed: {}", index, source); } _ => eprintln!("{}", e), } } } }
Auto-Conversions
JsonToolsError implements From for common error types:
#![allow(unused)] fn main() { // These conversions happen automatically in ? chains: impl From<json_parser::JsonError> for JsonToolsError { .. } // -> E001 impl From<regex::Error> for JsonToolsError { .. } // -> E002 }
See Error Codes for the full error reference.
ProcessingConfig
Low-level configuration struct used internally by JSONTools. You can construct it directly for advanced use cases, but the JSONTools builder is the recommended interface.
#![allow(unused)] fn main() { pub struct ProcessingConfig { pub separator: String, pub lowercase_keys: bool, pub filtering: FilteringConfig, pub collision: CollisionConfig, pub replacements: ReplacementConfig, pub auto_convert_types: bool, pub parallel_threshold: usize, pub num_threads: Option<usize>, pub nested_parallel_threshold: usize, pub max_array_index: usize, } }
Builder Methods
#![allow(unused)] fn main() { use json_tools_rs::ProcessingConfig; let config = ProcessingConfig::new() .separator("::") .lowercase_keys(true) .filtering(FilteringConfig::new().set_remove_nulls(true)) .collision(CollisionConfig::new().handle_collisions(true)) .replacements( ReplacementConfig::new() .add_key_replacement("^old_", "new_") ); }
FilteringConfig
Configuration for value filtering, stored internally as a bitmask for single-instruction checks on the hot path.
#![allow(unused)] fn main() { pub struct FilteringConfig { /* bitmask */ } }
Builder Methods
All methods consume and return Self.
| Method | Description |
|---|---|
.set_remove_empty_strings(bool) | Filter "" values |
.set_remove_nulls(bool) | Filter null values |
.set_remove_empty_objects(bool) | Filter {} values |
.set_remove_empty_arrays(bool) | Filter [] values |
Query Methods
| Method | Returns | Description |
|---|---|---|
.remove_empty_strings() | bool | Is empty string filtering enabled? |
.remove_nulls() | bool | Is null filtering enabled? |
.remove_empty_objects() | bool | Is empty object filtering enabled? |
.remove_empty_arrays() | bool | Is empty array filtering enabled? |
.has_any_filter() | bool | Is any filter enabled? |
#![allow(unused)] fn main() { use json_tools_rs::FilteringConfig; let filtering = FilteringConfig::new() .set_remove_nulls(true) .set_remove_empty_strings(true); assert!(filtering.has_any_filter()); assert!(filtering.remove_nulls()); assert!(!filtering.remove_empty_objects()); }
CollisionConfig
Configuration for key collision handling.
#![allow(unused)] fn main() { pub struct CollisionConfig { pub handle_collisions: bool, } }
Builder Methods
| Method | Description |
|---|---|
.handle_collisions(bool) | Enable/disable collision handling |
Query Methods
| Method | Returns | Description |
|---|---|---|
.has_collision_handling() | bool | Is collision handling enabled? |
#![allow(unused)] fn main() { use json_tools_rs::CollisionConfig; let collision = CollisionConfig::new().handle_collisions(true); assert!(collision.has_collision_handling()); }
ReplacementConfig
Configuration for key and value replacement patterns. Uses SmallVec<[(String, String); 2]> internally to avoid heap allocation for the common case of 0-2 replacements.
#![allow(unused)] fn main() { pub struct ReplacementConfig { pub key_replacements: SmallVec<[(String, String); 2]>, pub value_replacements: SmallVec<[(String, String); 2]>, } }
Builder Methods
| Method | Description |
|---|---|
.add_key_replacement(find, replace) | Add a key replacement regex pattern |
.add_value_replacement(find, replace) | Add a value replacement regex pattern |
Query Methods
| Method | Returns | Description |
|---|---|---|
.has_key_replacements() | bool | Are any key replacements configured? |
.has_value_replacements() | bool | Are any value replacements configured? |
#![allow(unused)] fn main() { use json_tools_rs::ReplacementConfig; let replacements = ReplacementConfig::new() .add_key_replacement("^user_", "") .add_value_replacement("@old\\.com", "@new.com"); assert!(replacements.has_key_replacements()); assert!(replacements.has_value_replacements()); }
Python API Reference
import json_tools_rs
JSONTools
The main builder class for all JSON operations. All configuration methods return self for chaining; only .execute() and .execute_to_output() trigger processing.
Construction
tools = json_tools_rs.JSONTools()
Creates a new JSONTools instance with all default settings. The instance is reusable -- you can call .execute() multiple times with different inputs.
Operation Modes
Exactly one mode must be set before calling .execute(). Calling a mode method replaces any previously set mode.
.flatten()
tools.flatten() -> JSONTools
Set the operation to flatten nested JSON into dot-separated (or custom separator) keys.
import json_tools_rs as jt
result = jt.JSONTools().flatten().execute({"a": {"b": {"c": 1}}})
# {"a.b.c": 1}
.unflatten()
tools.unflatten() -> JSONTools
Set the operation to reconstruct nested JSON from flat, separator-delimited keys.
result = jt.JSONTools().unflatten().execute({"a.b.c": 1})
# {"a": {"b": {"c": 1}}}
.normal()
tools.normal() -> JSONTools
Set the operation to apply transformations (filtering, replacements, type conversion) without changing the nesting structure.
result = jt.JSONTools().normal().lowercase_keys(True).execute({"Name": "Alice"})
# {"name": "Alice"}
Configuration Methods
All configuration methods return self for chaining.
.separator(sep)
tools.separator(sep: str) -> JSONTools
Set the key separator for flatten/unflatten operations.
| Parameter | Type | Default | Description |
|---|---|---|---|
sep | str | "." | Non-empty string used to join/split nested keys |
Raises: ValueError if sep is an empty string.
result = jt.JSONTools().flatten().separator("::").execute({"a": {"b": 1}})
# {"a::b": 1}
.lowercase_keys(flag)
tools.lowercase_keys(flag: bool) -> JSONTools
Convert all keys to lowercase after processing.
| Parameter | Type | Default | Description |
|---|---|---|---|
flag | bool | False | Enable or disable lowercase key conversion |
result = jt.JSONTools().flatten().lowercase_keys(True).execute({"User": {"Name": "Alice"}})
# {"user.name": "Alice"}
.remove_empty_strings(flag)
tools.remove_empty_strings(flag: bool) -> JSONTools
Remove key-value pairs where the value is an empty string "".
| Parameter | Type | Default | Description |
|---|---|---|---|
flag | bool | False | Enable or disable empty string removal |
result = jt.JSONTools().flatten().remove_empty_strings(True).execute({"a": "", "b": "hello"})
# {"b": "hello"}
.remove_nulls(flag)
tools.remove_nulls(flag: bool) -> JSONTools
Remove key-value pairs where the value is None / null.
| Parameter | Type | Default | Description |
|---|---|---|---|
flag | bool | False | Enable or disable null removal |
result = jt.JSONTools().flatten().remove_nulls(True).execute({"a": None, "b": 1})
# {"b": 1}
.remove_empty_objects(flag)
tools.remove_empty_objects(flag: bool) -> JSONTools
Remove key-value pairs where the value is an empty object {}.
| Parameter | Type | Default | Description |
|---|---|---|---|
flag | bool | False | Enable or disable empty object removal |
.remove_empty_arrays(flag)
tools.remove_empty_arrays(flag: bool) -> JSONTools
Remove key-value pairs where the value is an empty array [].
| Parameter | Type | Default | Description |
|---|---|---|---|
flag | bool | False | Enable or disable empty array removal |
.key_replacement(find, replace)
tools.key_replacement(find: str, replace: str) -> JSONTools
Add a key replacement pattern. Patterns use standard regex syntax. If the regex fails to compile, it falls back to literal string replacement. Multiple replacements can be chained.
| Parameter | Type | Description |
|---|---|---|
find | str | Regex pattern (or literal string) to match in keys |
replace | str | Replacement string (supports regex capture groups like $1) |
result = (jt.JSONTools()
.flatten()
.key_replacement("^user_", "")
.key_replacement("_name$", "_id")
.execute({"user_name": "Alice"}))
# {"id": "Alice"}
.value_replacement(find, replace)
tools.value_replacement(find: str, replace: str) -> JSONTools
Add a value replacement pattern. Works the same as key replacements but applies to string values.
| Parameter | Type | Description |
|---|---|---|
find | str | Regex pattern (or literal string) to match in values |
replace | str | Replacement string |
result = (jt.JSONTools()
.flatten()
.value_replacement("@example\\.com", "@company.org")
.execute({"email": "user@example.com"}))
# {"email": "user@company.org"}
.handle_key_collision(flag)
tools.handle_key_collision(flag: bool) -> JSONTools
When enabled, keys that would collide after transformations (e.g., after lowercasing) are collected into arrays instead of overwriting each other.
| Parameter | Type | Default | Description |
|---|---|---|---|
flag | bool | False | Enable collision handling |
result = (jt.JSONTools()
.flatten()
.lowercase_keys(True)
.handle_key_collision(True)
.execute({"Name": "Alice", "name": "Bob"}))
# {"name": ["Alice", "Bob"]}
.auto_convert_types(flag)
tools.auto_convert_types(flag: bool) -> JSONTools
Automatically convert string values to their native types:
- Numbers:
"123"->123,"1,234.56"->1234.56,"$99.99"->99.99,"1e5"->100000 - Booleans:
"true"/"TRUE"/"True"->true,"false"/"FALSE"/"False"->false - Nulls:
"null"/"None"->null
If conversion fails, the original string is kept. No errors are raised on conversion failure.
| Parameter | Type | Default | Description |
|---|---|---|---|
flag | bool | False | Enable automatic type conversion |
result = (jt.JSONTools()
.flatten()
.auto_convert_types(True)
.execute({"id": "123", "price": "1,234.56", "active": "true"}))
# {"id": 123, "price": 1234.56, "active": true}
.parallel_threshold(n)
tools.parallel_threshold(n: int) -> JSONTools
Set the minimum batch size to trigger parallel processing. Batches smaller than this are processed sequentially to avoid thread-spawning overhead.
| Parameter | Type | Default | Description |
|---|---|---|---|
n | int | 100 | Minimum batch size for parallelism |
Default can be overridden with the JSON_TOOLS_PARALLEL_THRESHOLD environment variable.
tools = jt.JSONTools().flatten().parallel_threshold(50)
.num_threads(n)
tools.num_threads(n: int) -> JSONTools
Set the number of threads used for parallel processing.
| Parameter | Type | Default | Description |
|---|---|---|---|
n | int | CPU count | Number of worker threads |
Default can be overridden with the JSON_TOOLS_NUM_THREADS environment variable.
tools = jt.JSONTools().flatten().num_threads(4)
.nested_parallel_threshold(n)
tools.nested_parallel_threshold(n: int) -> JSONTools
Set the minimum number of keys/items within a single JSON document to trigger nested (intra-document) parallelism. Only objects or arrays exceeding this count are parallelized internally.
| Parameter | Type | Default | Description |
|---|---|---|---|
n | int | 100 | Minimum keys/items for nested parallelism |
Default can be overridden with the JSON_TOOLS_NESTED_PARALLEL_THRESHOLD environment variable.
tools = jt.JSONTools().flatten().nested_parallel_threshold(200)
.max_array_index(n)
tools.max_array_index(n: int) -> JSONTools
Set the maximum array index allowed during unflattening. This is a DoS protection: a malicious key like "items.999999999" would otherwise allocate a massive array.
| Parameter | Type | Default | Description |
|---|---|---|---|
n | int | 100000 | Maximum array index |
Default can be overridden with the JSON_TOOLS_MAX_ARRAY_INDEX environment variable.
Execution Methods
.execute(input)
tools.execute(input) -> str | dict | list[str] | list[dict] | DataFrame | Series
Execute the configured operation. The return type mirrors the input type:
| Input Type | Output Type |
|---|---|
str | str (JSON string) |
dict | dict (Python dictionary) |
list[str] | list[str] |
list[dict] | list[dict] |
pandas.DataFrame | pandas.DataFrame |
pandas.Series | pandas.Series |
polars.DataFrame | polars.DataFrame |
polars.Series | polars.Series |
pyarrow.Table | pyarrow.Table |
pyarrow.ChunkedArray | pyarrow.ChunkedArray |
pyspark.sql.DataFrame | pyspark.sql.DataFrame |
Raises: JsonToolsError if no mode is set, input is invalid, or processing fails.
# String input -> string output
result = jt.JSONTools().flatten().execute('{"a": {"b": 1}}')
assert isinstance(result, str)
# Dict input -> dict output
result = jt.JSONTools().flatten().execute({"a": {"b": 1}})
assert isinstance(result, dict)
# Batch string input -> batch string output
results = jt.JSONTools().flatten().execute(['{"a": 1}', '{"b": 2}'])
assert isinstance(results, list) and isinstance(results[0], str)
# Batch dict input -> batch dict output
results = jt.JSONTools().flatten().execute([{"a": {"b": 1}}, {"c": {"d": 2}}])
assert isinstance(results, list) and isinstance(results[0], dict)
.execute_to_output(input)
tools.execute_to_output(input) -> JsonOutput
Execute the operation but return a JsonOutput wrapper instead of native Python types. Useful when you need to inspect whether the result is single or multiple before extracting.
Note: DataFrame and Series inputs are not supported with execute_to_output(). Use .execute() for those types.
| Parameter | Type | Description |
|---|---|---|
input | str, dict, list[str], list[dict] | JSON data to process |
output = jt.JSONTools().flatten().execute_to_output('{"a": {"b": 1}}')
if output.is_single:
print(output.get_single())
elif output.is_multiple:
for item in output.get_multiple():
print(item)
JsonOutput
Output wrapper returned by .execute_to_output(). Provides typed access to results.
Properties
| Property | Type | Description |
|---|---|---|
.is_single | bool | True if the result contains a single JSON string |
.is_multiple | bool | True if the result contains multiple JSON strings |
Methods
.get_single()
output.get_single() -> str
Extract the single JSON string result.
Raises: ValueError if the result is multiple.
.get_multiple()
output.get_multiple() -> list[str]
Extract the list of JSON string results.
Raises: ValueError if the result is single.
.to_python()
output.to_python() -> str | list[str]
Convert to native Python type: returns str for single results, list[str] for multiple results.
String Representations
str(output) returns the JSON string (single) or a list representation (multiple).
repr(output) returns JsonOutput.Single('...') or JsonOutput.Multiple([...]).
DataFrame and Series Support
JSON Tools RS natively supports Pandas, Polars, PyArrow, and PySpark DataFrames and Series. Detection is performed via duck typing -- no explicit imports are required.
Pandas DataFrame
import pandas as pd
import json_tools_rs as jt
df = pd.DataFrame({"json_col": [
'{"user": {"name": "Alice", "age": 30}}',
'{"user": {"name": "Bob", "age": 25}}',
]})
tools = jt.JSONTools().flatten().separator(".")
# Process column containing JSON strings
result_df = tools.execute(df)
# Returns a DataFrame with flattened columns: "user.name", "user.age"
Pandas Series
series = pd.Series([
'{"a": {"b": 1}}',
'{"a": {"b": 2}}',
])
result_series = jt.JSONTools().flatten().execute(series)
# Returns a Series of flattened JSON strings
Polars DataFrame
import polars as pl
df = pl.DataFrame({"json_col": [
'{"user": {"name": "Alice"}}',
'{"user": {"name": "Bob"}}',
]})
result_df = jt.JSONTools().flatten().execute(df)
Polars Series
series = pl.Series("data", [
'{"a": {"b": 1}}',
'{"a": {"b": 2}}',
])
result_series = jt.JSONTools().flatten().execute(series)
PyArrow Table
import pyarrow as pa
table = pa.table({"json_col": [
'{"user": {"name": "Alice"}}',
'{"user": {"name": "Bob"}}',
]})
result_table = jt.JSONTools().flatten().execute(table)
PySpark DataFrame
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
('{"user": {"name": "Alice"}}',),
('{"user": {"name": "Bob"}}',),
], ["json_col"])
result_df = jt.JSONTools().flatten().execute(df)
JsonToolsError
Exception class for all errors raised by JSON Tools RS.
import json_tools_rs as jt
try:
result = jt.JSONTools().flatten().execute("not valid json")
except jt.JsonToolsError as e:
print(f"Error: {e}")
# Error: [E001] JSON parsing failed: ...
Error messages include a machine-readable code (E001-E008) at the start of the message. See Error Codes for the full reference.
Error Codes Quick Reference
| Code | Name | Common Cause |
|---|---|---|
E001 | JsonParseError | Invalid JSON input |
E002 | RegexError | Bad regex in key/value replacement |
E003 | InvalidReplacementPattern | Malformed replacement pair |
E004 | InvalidJsonStructure | Wrong JSON shape for the operation |
E005 | ConfigurationError | No mode set before .execute() |
E006 | BatchProcessingError | Error in one item during batch processing |
E007 | InputValidationError | Unsupported input type |
E008 | SerializationError | Internal serialization failure |
Handling Specific Errors
import json_tools_rs as jt
try:
result = jt.JSONTools().execute({"a": 1}) # No mode set
except jt.JsonToolsError as e:
msg = str(e)
if "[E005]" in msg:
print("Forgot to call .flatten() or .unflatten()")
elif "[E001]" in msg:
print("Invalid JSON input")
Complete Example
import json_tools_rs as jt
# Build once, reuse many times
tools = (jt.JSONTools()
.flatten()
.separator("::")
.lowercase_keys(True)
.remove_nulls(True)
.remove_empty_strings(True)
.key_replacement("^user_", "")
.auto_convert_types(True)
.parallel_threshold(50)
.num_threads(4)
)
# Single dict
result = tools.execute({"User_Name": "Alice", "User_Age": "30"})
# {"name": "Alice", "age": 30}
# Batch of dicts (processed in parallel if >= 50 items)
results = tools.execute([{"data": str(i)} for i in range(1000)])
# JSON string
result = tools.execute('{"User_Name": "Alice", "nested": {"User_Age": "30"}}')
# DataFrame
import pandas as pd
df = pd.DataFrame({"json": [
'{"User_Name": "Alice", "nested": {"User_Age": "30"}}',
'{"User_Name": "Bob", "nested": {"User_Age": "25"}}',
]})
df_result = tools.execute(df)
Architecture
JSON Tools RS is organized into focused, single-responsibility modules. This modular design improves maintainability while preserving performance -- Rust modules are compile-time organization only, with zero runtime overhead.
Module Structure
src/
├── lib.rs Facade: mod declarations + pub use re-exports
├── json_parser.rs Conditional SIMD parser (sonic-rs / simd-json)
├── types.rs Core types: JsonInput, JsonOutput
├── error.rs Error types with codes E001-E008
├── config.rs Configuration structs and operation modes
├── cache.rs Tiered caching: regex, key deduplication, phf
├── convert.rs Type conversion: numbers, dates, booleans, nulls
├── transform.rs Filtering, key/value replacements, collision handling
├── flatten.rs Flattening algorithm with Crossbeam parallelism
├── unflatten.rs Unflattening with SIMD separator detection
├── builder.rs Public JSONTools builder API and execute()
├── python.rs Python bindings via PyO3
├── tests.rs 99 unit tests
└── main.rs CLI examples
Module Descriptions
json_parser -- JSON Parsing Abstraction
Conditional compilation wrapper that selects the fastest available JSON parser:
- 64-bit platforms: sonic-rs (AVX2/SSE4.2 SIMD, 30-50% faster)
- 32-bit platforms: simd-json (fallback)
Exposes from_str(), to_string(), and parse_json() with a unified JsonError type.
types -- Core Types
Defines the public-facing input/output types:
JsonInput<'a>-- Enum accepting&str,&[&str],Vec<String>, etc.JsonOutput-- Enum returningSingle(String)orMultiple(Vec<String>)
error -- Error Handling
JsonToolsError enum with 8 error variants (E001-E008), each with machine-readable codes, Display/Error impls, and constructors. Includes From impls for automatic conversion from parse and regex errors.
config -- Configuration
All configuration structs used by the builder:
ProcessingConfig-- Main config holding all optionsFilteringConfig-- Empty string/null/object/array removalCollisionConfig-- Key collision handling settingsReplacementConfig-- Key and value replacement patternsOperationMode-- Flatten, Unflatten, or Normal
cache -- Caching Infrastructure
Three-tier caching system for performance:
- phf perfect hash (
COMMON_JSON_KEYS) -- Zero-cost lookup for common keys - Thread-local FxHashMap (
KeyDeduplicator) -- Per-thread key deduplication - Global DashMap (
REGEX_CACHE) -- Compiled regex pattern cache with LRU eviction
convert -- Type Conversion
Automatic type conversion for string values (~1,000 lines, the largest leaf module):
- Number parsing: integers, decimals, currency, percentages, basis points, scientific notation, suffixed (K/M/B)
- Date parsing: ISO-8601 variants with UTC normalization
- Boolean/null detection via phf perfect hash maps
- SIMD-optimized
clean_number_string()withextend_skipping_3/4helpers
transform -- Transformations
Core transformation logic applied after flatten/unflatten:
- Key/value replacements (literal and regex, with SIMD fast-path)
- Filtering (empty strings, nulls, empty objects/arrays)
- Key collision handling (collect into arrays)
- Lowercase key conversion
flatten -- Flattening Algorithm
Recursive JSON flattening with performance optimizations:
SeparatorCachefor pre-computed separator propertiesFastStringBuilderwith thread-local cachingflatten_value_with_threshold()for Crossbeam parallel flattening of large objects/arraysquick_leaf_estimate()for O(1) HashMap pre-sizing
unflatten -- Unflattening Algorithm
Reconstructs nested JSON from flat key-value pairs:
- SIMD-accelerated separator detection (
find_separator*()functions) - Path type analysis for array vs. object reconstruction
- Recursive
set_nested_value()andset_nested_array_value()
builder -- Public API
The JSONTools struct and all 35+ builder methods. Routes execute() calls to the appropriate processing function based on operation mode (flatten, unflatten, normal).
python -- Python Bindings
PyO3-based Python bindings with:
- Perfect type preservation (input type = output type)
- Native DataFrame/Series support (Pandas, Polars, PyArrow, PySpark)
- GIL release during compute-intensive operations
Processing Pipeline
Input → Parse → Flatten/Unflatten → Transform → Filter → Convert → Serialize → Output
│ │ │ │ │ │
json_parser flatten/ transform transform convert json_parser
unflatten
Public API Surface
All public types are re-exported from lib.rs, preserving a flat import path:
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonInput, JsonOutput, JsonToolsError}; use json_tools_rs::{ProcessingConfig, FilteringConfig, CollisionConfig, ReplacementConfig}; }
Internal modules use pub(crate) visibility for cross-module access without exposing internals.
Error Codes
All errors include a machine-readable code accessible via .error_code() (Rust) or in the error message (Python).
| Code | Name | Description |
|---|---|---|
E001 | JsonParseError | Invalid JSON input. The input string could not be parsed as valid JSON. |
E002 | RegexError | Invalid regex pattern in a key or value replacement. |
E003 | InvalidReplacementPattern | Malformed replacement pattern string. |
E004 | InvalidJsonStructure | JSON structure is valid but not suitable for the operation (e.g., unflattening non-object JSON). |
E005 | ConfigurationError | Operation mode not set. Call .flatten(), .unflatten(), or .normal() before .execute(). |
E006 | BatchProcessingError | An error occurred while processing one or more items in a batch. |
E007 | InputValidationError | Input validation failed (e.g., unsupported input type). |
E008 | SerializationError | Failed to serialize the output back to JSON. |
Rust Error Handling
#![allow(unused)] fn main() { use json_tools_rs::{JSONTools, JsonToolsError}; match JSONTools::new().flatten().execute(input) { Ok(result) => { /* success */ } Err(e) => { eprintln!("[{}] {}", e.error_code(), e); // [E001] JSON parse error: expected value at line 1 column 1 } } }
Python Error Handling
import json_tools_rs as jt
try:
result = jt.JSONTools().flatten().execute("not valid json")
except jt.JsonToolsError as e:
print(f"Error: {e}")
# Error: [E001] JSON parse error: ...
Common Errors
E005: No mode set
# Wrong: no mode set
tools = jt.JSONTools().execute(data) # Raises E005
# Correct: set a mode first
tools = jt.JSONTools().flatten().execute(data)
E001: Invalid JSON
# Wrong: not valid JSON
tools = jt.JSONTools().flatten().execute("hello world") # Raises E001
# Correct: valid JSON string
tools = jt.JSONTools().flatten().execute('{"key": "value"}')
Performance & Benchmarks
JSON Tools RS achieves ~2,000+ ops/ms through multiple optimization layers.
Optimization Techniques
| Technique | Impact |
|---|---|
| SIMD JSON Parsing | sonic-rs (64-bit) / simd-json (32-bit) for hardware-accelerated parsing |
| SIMD Byte Search | memchr/memmem for fast string operations |
| FxHashMap | Fast non-cryptographic hashing via rustc-hash |
| Tiered Caching | phf perfect hash -> thread-local FxHashMap -> global DashMap |
| SmallVec | Stack allocation for depth stacks and number buffers |
| Arc<str> Dedup | Shared key storage to minimize allocations |
| First-Byte Discriminators | Rapid rejection of non-convertible strings |
| Crossbeam Parallelism | Scoped thread pools for batch and nested parallelism |
| Zero-Copy (Cow) | Avoid allocations when strings don't need modification |
| itoa | Fast integer-to-string formatting |
| mimalloc | Optional high-performance allocator (features = ["mimalloc"], ~5-10% speedup) |
Benchmark Results
Measured on Apple Silicon. Results from the stress benchmark suite targeting edge cases and large inputs.
Stress Benchmarks
| Benchmark | Result | Description |
|---|---|---|
| Deep nesting (100 levels) | 8.3 us | Deeply nested objects, 100 levels deep |
| Wide objects (1,000 keys) | ~337 us | Single object with 1,000 top-level keys |
| Large arrays (5,000 items) | ~2.11 ms | Array containing 5,000 elements |
| Parallel batch (10,000 items) | ~2.61 ms | Batch processing with Crossbeam parallelism |
Throughput Targets (v0.9.0)
| Operation | Target |
|---|---|
| Basic flatten | >2,000 ops/ms |
| With transformations | >1,300 ops/ms |
| Regex replacements | >1,800 ops/ms |
| Batch (10 items) | >2,500 ops/ms |
| Batch (100 items) | >3,000 ops/ms |
| Roundtrip | >1,000 cycles/ms |
Performance Tuning
Three threshold parameters control when parallelism activates. Tuning them for your workload can significantly affect throughput.
parallel_threshold (default: 100)
Controls when batch processing (multiple JSON documents) switches from sequential to parallel execution.
When to lower (e.g., 20-50):
- Each document is large or complex (deep nesting, many keys)
- CPU cores are available and not contended
- You are processing 50-100 items and want parallel speedup
When to raise (e.g., 200-500):
- Each document is small (a few keys, shallow nesting)
- Thread-spawning overhead dominates processing time
- Running inside a container with limited CPU
# For large documents, parallel even at small batch sizes
tools = jt.JSONTools().flatten().parallel_threshold(20)
# For tiny documents, avoid parallelism overhead
tools = jt.JSONTools().flatten().parallel_threshold(500)
#![allow(unused)] fn main() { let tools = JSONTools::new() .flatten() .parallel_threshold(50); }
nested_parallel_threshold (default: 100)
Controls when a single JSON document's top-level keys/array items are processed in parallel (intra-document parallelism). This is independent of batch parallelism.
When to lower (e.g., 50):
- Individual documents have very wide objects (500+ keys) with deep sub-trees
- Processing includes expensive transformations (regex replacements, type conversion)
When to raise (e.g., 500-1000) or effectively disable:
- Documents are moderately sized (under 100 keys)
- Sub-trees are shallow (1-2 levels), so per-key work is minimal
- You want deterministic (sequential) output ordering
# Large documents with heavy per-key work
tools = jt.JSONTools().flatten().nested_parallel_threshold(50)
# Disable nested parallelism entirely
tools = jt.JSONTools().flatten().nested_parallel_threshold(999_999)
num_threads (default: CPU count)
Controls the number of worker threads for parallel processing.
When to set explicitly:
- Running alongside other CPU-intensive workloads -- limit threads to avoid contention
- In a container or VM with a CPU quota -- match thread count to available cores
- Benchmarking -- fix thread count for reproducible results
tools = jt.JSONTools().flatten().num_threads(4)
#![allow(unused)] fn main() { let tools = JSONTools::new() .flatten() .num_threads(Some(4)); }
Environment Variable Overrides
All threshold defaults can be overridden without code changes via environment variables. These are read once at process startup (via LazyLock).
| Variable | Default | Description |
|---|---|---|
JSON_TOOLS_PARALLEL_THRESHOLD | 100 | Minimum batch size for parallel processing |
JSON_TOOLS_NESTED_PARALLEL_THRESHOLD | 100 | Minimum keys/items for nested parallelism |
JSON_TOOLS_NUM_THREADS | (CPU count) | Thread count for parallel processing |
JSON_TOOLS_MAX_ARRAY_INDEX | 100000 | Maximum array index during unflattening |
# Example: tune for a workload of many small documents
export JSON_TOOLS_PARALLEL_THRESHOLD=200
export JSON_TOOLS_NUM_THREADS=8
python my_pipeline.py
Environment variable values are parsed as usize. Invalid values (non-numeric, negative) silently fall back to the default.
Running Benchmarks
# All benchmarks
cargo bench
# Specific suite
cargo bench --bench isolation_benchmarks
cargo bench --bench comprehensive_benchmark
cargo bench --bench stress_benchmarks
cargo bench --bench realworld_benchmarks
cargo bench --bench combination_benchmarks
Benchmark Suites
| Suite | Focus |
|---|---|
isolation_benchmarks | Individual features in isolation (10 groups) |
combination_benchmarks | 2-way and 3-way feature interactions |
realworld_benchmarks | AWS CloudTrail, GitHub API, K8s, Elasticsearch, Stripe, Twitter/X |
stress_benchmarks | Edge cases: deep nesting, wide objects, large arrays |
comprehensive_benchmark | Full feature coverage (15 groups) |
Profiling
On macOS, use samply for profiling:
# Build with profiling symbols
cargo bench --profile profiling --bench stress_benchmarks --no-run
# Profile with samply
samply record --save-only -o /tmp/profile.json -- \
./target/profiling/deps/stress_benchmarks-* --bench
# View results
samply load /tmp/profile.json
Architecture
The codebase is organized into focused, single-responsibility modules:
src/
├── lib.rs Facade: mod declarations + pub use re-exports
├── json_parser.rs Conditional SIMD parser (sonic-rs / simd-json)
├── types.rs Core types: JsonInput, JsonOutput, FlatMap
├── error.rs Error types with codes E001-E008
├── config.rs Configuration structs and operation modes
├── cache.rs Tiered caching: regex, key deduplication, phf
├── convert.rs Type conversion: numbers, dates, booleans, nulls
├── transform.rs Filtering, key/value replacements, collision handling
├── flatten.rs Flattening algorithm with Crossbeam parallelism
├── unflatten.rs Unflattening with SIMD separator detection
├── builder.rs Public JSONTools builder API and execute()
├── python.rs Python bindings via PyO3
└── tests.rs 99 unit tests
The processing pipeline:
- Parse -- SIMD-accelerated JSON parsing (
json_parser) - Flatten/Unflatten -- Recursive traversal with Arc<str> key dedup (
flatten/unflatten) - Transform -- Lowercase, replacements (cached regex), collision handling (
transform) - Filter -- Remove empty strings, nulls, empty objects/arrays (
transform) - Convert -- Type conversion with first-byte discriminators (
convert) - Serialize -- Output to JSON string or native Python types
Troubleshooting
This guide covers common errors, their causes, and how to resolve them.
Error Code Reference
All errors include a machine-readable code (E001-E008) at the start of the error message. Use these codes for programmatic error handling.
E001: JsonParseError
Message: [E001] JSON parsing failed: ...
Cause: The input string is not valid JSON.
Common triggers:
- Missing quotes around keys or values
- Trailing commas after the last element
- Single quotes instead of double quotes
- Unescaped special characters in strings
- Incomplete JSON (missing closing braces or brackets)
- Passing a file path instead of the file contents
Solution:
# Wrong
result = tools.execute("hello world") # Not JSON
result = tools.execute("{'key': 'value'}") # Single quotes
result = tools.execute('{"a": 1,}') # Trailing comma
# Correct
result = tools.execute('{"key": "value"}')
result = tools.execute({"key": "value"}) # Pass a dict directly
E002: RegexError
Message: [E002] Regex pattern error: ...
Cause: A key or value replacement pattern failed to compile as a regex.
Common triggers:
- Unescaped special regex characters (
.,*,+,?,(,),[,]) - Unclosed groups or character classes
- Invalid backreferences
Solution:
# Wrong -- unescaped dot matches any character
tools.key_replacement("user.name", "username")
# Correct -- escape the dot for literal matching
tools.key_replacement(r"user\.name", "username")
# Or use a simpler pattern that won't be misinterpreted
tools.key_replacement("user_name", "username")
Note: If regex compilation fails, the library automatically falls back to literal string matching. This error only surfaces when the pattern is syntactically broken (e.g., unclosed groups).
E003: InvalidReplacementPattern
Message: [E003] Invalid replacement pattern: ...
Cause: The replacement pattern configuration is malformed.
Solution: Ensure replacement patterns are provided as (find, replace) pairs:
# Correct usage
tools.key_replacement("find_pattern", "replacement")
tools.value_replacement("old_value", "new_value")
E004: InvalidJsonStructure
Message: [E004] Invalid JSON structure: ...
Cause: The JSON is valid but not compatible with the requested operation.
Common triggers:
- Unflattening a JSON array (unflatten requires a flat object)
- Unflattening a non-flat object (nested values where flat keys are expected)
Solution:
# Wrong -- unflatten expects a flat object, not an array
result = jt.JSONTools().unflatten().execute('[1, 2, 3]')
# Wrong -- unflatten expects flat keys
result = jt.JSONTools().unflatten().execute('{"a": {"b": 1}}')
# Correct -- flat object with dot-separated keys
result = jt.JSONTools().unflatten().execute('{"a.b": 1, "a.c": 2}')
E005: ConfigurationError
Message: [E005] Operation mode not configured: ...
Cause: .execute() was called without first setting an operation mode.
Solution: Always call .flatten(), .unflatten(), or .normal() before .execute():
# Wrong
result = jt.JSONTools().execute(data)
# Correct
result = jt.JSONTools().flatten().execute(data)
result = jt.JSONTools().unflatten().execute(data)
result = jt.JSONTools().normal().execute(data)
This error also occurs if num_threads is set to 0:
# Wrong
tools = jt.JSONTools().flatten().num_threads(0)
# Correct
tools = jt.JSONTools().flatten().num_threads(1) # At least 1
tools = jt.JSONTools().flatten() # Use default (CPU count)
E006: BatchProcessingError
Message: [E006] Batch processing failed at index {N}: ...
Cause: One or more items in a batch failed to process. The error includes the index of the failing item and the underlying error.
Solution: Check the item at the reported index. The inner error (usually E001 or E004) describes what went wrong:
try:
results = tools.execute(batch_of_json)
except jt.JsonToolsError as e:
msg = str(e)
if "[E006]" in msg:
# Extract the index from the message to find the bad item
print(f"Batch error: {e}")
# Fix or filter the problematic items and retry
E007: InputValidationError
Message: [E007] Input validation failed: ...
Cause: The input type is not supported.
Common triggers:
- Passing an integer, float, or boolean directly
- Passing a non-JSON-string, non-dict type in a list
- Using
execute_to_output()with a DataFrame or Series (useexecute()instead)
Solution:
# Wrong
result = tools.execute(42)
result = tools.execute([1, 2, 3])
# Correct
result = tools.execute('{"value": 42}')
result = tools.execute({"value": 42})
result = tools.execute(['{"a": 1}', '{"b": 2}'])
E008: SerializationError
Message: [E008] JSON serialization failed: ...
Cause: The processed result could not be serialized back to JSON. This is typically an internal error.
Solution: If you encounter this error, please report it as a bug. As a workaround, check that your input does not contain unusual Unicode sequences or extremely large numbers that may not round-trip through JSON.
Common Issues
Empty Separator
The separator must be a non-empty string. Using an empty separator is always a logic error -- it would make keys ambiguous.
# This raises an error
tools = jt.JSONTools().flatten().separator("")
# Use any non-empty string
tools = jt.JSONTools().flatten().separator(".")
tools = jt.JSONTools().flatten().separator("::")
tools = jt.JSONTools().flatten().separator("/")
In Rust, an empty separator causes a panic (via assert!). In Python, it raises a ValueError.
Missing Operation Mode
The most common mistake is forgetting to set a mode:
# This always raises E005
tools = jt.JSONTools()
tools.execute(data) # Error!
# Set a mode first
tools = jt.JSONTools().flatten()
tools.execute(data) # OK
Dict vs String Input
Both str and dict inputs are accepted, but the output type mirrors the input type:
# String in -> string out
result = tools.execute('{"a": {"b": 1}}')
assert isinstance(result, str)
# result == '{"a.b":1}'
# Dict in -> dict out
result = tools.execute({"a": {"b": 1}})
assert isinstance(result, dict)
# result == {"a.b": 1}
If you need the raw JSON string output from a dict input, use .execute_to_output():
output = tools.execute_to_output({"a": {"b": 1}})
json_str = output.get_single() # Returns a JSON string
Regex Patterns in Replacements
Replacement patterns use standard regex syntax. Common pitfalls:
# The dot matches ANY character -- "user.name" matches "username" too
tools.key_replacement("user.name", "id")
# Escape dots for literal matching
tools.key_replacement(r"user\.name", "id")
# Use anchors for precise matching
tools.key_replacement("^user_", "") # Only at start of key
tools.key_replacement("_suffix$", "") # Only at end of key
Performance Tuning
When Parallelism Helps
Parallel processing adds overhead for thread spawning and synchronization. It helps when:
- Batch size is large (100+ items by default) -- amortizes spawning cost
- Individual documents are complex -- deep nesting, many keys, expensive transformations
- CPU cores are available -- parallelism on a single-core machine adds only overhead
When Parallelism Hurts
Reduce or disable parallelism when:
- Documents are tiny (a few flat keys) -- thread overhead dominates
- Batch sizes are small (<50 items) -- raise
parallel_threshold - Memory is constrained -- each thread needs its own stack and working set
- Running inside a GIL-heavy Python workload -- the GIL is released during Rust processing, but other Python threads may contend
# Disable parallelism for small workloads
tools = jt.JSONTools().flatten().parallel_threshold(999_999)
# Or limit threads
tools = jt.JSONTools().flatten().num_threads(1)
Profiling Tips
Use the built-in benchmark suites to profile your specific workload pattern:
# Profile stress scenarios
cargo bench --profile profiling --bench stress_benchmarks --no-run
samply record --save-only -o /tmp/profile.json -- \
./target/profiling/deps/stress_benchmarks-* --bench
For Python profiling, measure wall-clock time since CPU profilers may not capture time spent in Rust:
import time
start = time.perf_counter()
result = tools.execute(data)
elapsed = time.perf_counter() - start
print(f"Processing took {elapsed:.3f}s")
Platform Notes
mimalloc (Rust-only)
The mimalloc global allocator is an optional feature that provides a 5-10% performance improvement. Enable it with features = ["mimalloc"] in your Cargo.toml. It is not included in Python builds because PyO3 manages memory through Python's allocator.
sonic-rs (64-bit only)
The default JSON parser is sonic-rs, which uses SIMD instructions available on 64-bit platforms (x86_64, aarch64). On 32-bit platforms, the library automatically falls back to simd-json. This is transparent -- the API is identical regardless of which parser is active.
macOS Profiling
On macOS, flamegraph requires full Xcode (not just Command Line Tools). Use samply instead:
cargo install samply
samply record --save-only -o profile.json -- ./target/profiling/deps/BENCH_BINARY --bench
samply load profile.json # Opens Firefox Profiler
Valgrind does not work on modern macOS. Use Instruments (if Xcode is installed) or samply for profiling.
Changelog
v0.9.0 (2026-03-09)
Added
- DataFrame & Series Support (Python): Native support for Pandas, Polars, PyArrow, and PySpark DataFrames and Series with perfect type preservation.
- Crossbeam Parallelism: Migrated from Rayon to Crossbeam for finer-grained parallel control with scoped threads.
- Modular Architecture: Refactored monolithic
lib.rsinto 10 focused modules (json_parser,types,error,config,cache,convert,transform,flatten,unflatten,builder) with zero public API changes.
Performance Improvements
Rust Core (6 optimizations):
- Eliminated per-entry HashMap allocation in parallel flatten -- single partial map per chunk
- Added early-exit first-byte discriminators for type conversion fast-path
- SIMD literal fallback for regex patterns (memchr before regex compilation)
- Thread-local regex cache half-eviction (LRU-style, capacity 64)
- Expanded SmallVec buffers (32 -> 64 bytes) and separator cache
- Vectorized
clean_number_string()with SIMD skip helpers
Python Bindings (3 optimizations):
mem::takefor zero-cost builder field extraction- Batch type detection via first-element sampling
- O(1) DataFrame/Series reconstruction
v0.8.0 (2026-01-01)
- Python Feature Parity:
auto_convert_types,parallel_threshold,num_threads,nested_parallel_thresholdin Python - Enhanced Type Conversion: ISO-8601 dates, currency codes, basis points, suffixed numbers
- Date Normalization: Automatic UTC normalization
v0.7.0 (2025-10-17)
- Parallel configuration methods (
parallel_threshold,num_threads,nested_parallel_threshold) - HashMap capacity and hashing optimizations
v0.6.0 (2025-10-13)
- Python GIL release for parallel operations (5-13% improvement)
- Inline hints on hot functions
v0.5.0 (2025-10-12)
- Rust inline optimizations (2-5% improvement)
- Iterator adapter chains
v0.4.0 (2025-10-11)
- FxHashMap migration (30-55% improvement)
- SIMD JSON parsing (sonic-rs / simd-json)
- SmallVec stack allocation
- Arc<str> key deduplication
v0.3.0 (2025-10-10)
- Automatic type conversion
- Python bindings via PyO3
v0.2.0 (2025-10-09)
- Key collision handling
- Comprehensive filtering (empty strings, nulls, objects, arrays)
- Regex-based replacements
v0.1.0 (2025-10-08)
- Initial release
- JSON flattening and unflattening
- Custom separators
- Batch processing
For the full changelog with migration guides, see CHANGELOG.md.