Performance & Benchmarks

JSON Tools RS achieves ~2,000+ ops/ms through multiple optimization layers.

Optimization Techniques

Technique	Impact
SIMD JSON Parsing	sonic-rs (64-bit) / simd-json (32-bit) for hardware-accelerated parsing
SIMD Byte Search	memchr/memmem for fast string operations
FxHashMap	Fast non-cryptographic hashing via rustc-hash
Tiered Caching	phf perfect hash -> thread-local FxHashMap -> global DashMap
SmallVec	Stack allocation for depth stacks and number buffers
Arc<str> Dedup	Shared key storage to minimize allocations
First-Byte Discriminators	Rapid rejection of non-convertible strings
Crossbeam Parallelism	Scoped thread pools for batch and nested parallelism
Zero-Copy (Cow)	Avoid allocations when strings don't need modification
itoa	Fast integer-to-string formatting
mimalloc	Optional high-performance allocator (`features = ["mimalloc"]`, ~5-10% speedup)

Benchmark Results

Measured on Apple Silicon. Results from the stress benchmark suite targeting edge cases and large inputs.

Stress Benchmarks

Benchmark	Result	Description
Deep nesting (100 levels)	8.3 us	Deeply nested objects, 100 levels deep
Wide objects (1,000 keys)	~337 us	Single object with 1,000 top-level keys
Large arrays (5,000 items)	~2.11 ms	Array containing 5,000 elements
Parallel batch (10,000 items)	~2.61 ms	Batch processing with Crossbeam parallelism

Throughput Targets (v0.9.0)

Operation	Target
Basic flatten	>2,000 ops/ms
With transformations	>1,300 ops/ms
Regex replacements	>1,800 ops/ms
Batch (10 items)	>2,500 ops/ms
Batch (100 items)	>3,000 ops/ms
Roundtrip	>1,000 cycles/ms

Performance Tuning

Three threshold parameters control when parallelism activates. Tuning them for your workload can significantly affect throughput.

`parallel_threshold` (default: 100)

Controls when batch processing (multiple JSON documents) switches from sequential to parallel execution.

When to lower (e.g., 20-50):

Each document is large or complex (deep nesting, many keys)
CPU cores are available and not contended
You are processing 50-100 items and want parallel speedup

When to raise (e.g., 200-500):

Each document is small (a few keys, shallow nesting)
Thread-spawning overhead dominates processing time
Running inside a container with limited CPU

# For large documents, parallel even at small batch sizes
tools = jt.JSONTools().flatten().parallel_threshold(20)

# For tiny documents, avoid parallelism overhead
tools = jt.JSONTools().flatten().parallel_threshold(500)

#![allow(unused)]
fn main() {
let tools = JSONTools::new()
    .flatten()
    .parallel_threshold(50);
}

`nested_parallel_threshold` (default: 100)

Controls when a single JSON document's top-level keys/array items are processed in parallel (intra-document parallelism). This is independent of batch parallelism.

When to lower (e.g., 50):

Individual documents have very wide objects (500+ keys) with deep sub-trees
Processing includes expensive transformations (regex replacements, type conversion)

When to raise (e.g., 500-1000) or effectively disable:

Documents are moderately sized (under 100 keys)
Sub-trees are shallow (1-2 levels), so per-key work is minimal
You want deterministic (sequential) output ordering

# Large documents with heavy per-key work
tools = jt.JSONTools().flatten().nested_parallel_threshold(50)

# Disable nested parallelism entirely
tools = jt.JSONTools().flatten().nested_parallel_threshold(999_999)

`num_threads` (default: CPU count)

Controls the number of worker threads for parallel processing.

When to set explicitly:

Running alongside other CPU-intensive workloads -- limit threads to avoid contention
In a container or VM with a CPU quota -- match thread count to available cores
Benchmarking -- fix thread count for reproducible results

tools = jt.JSONTools().flatten().num_threads(4)

#![allow(unused)]
fn main() {
let tools = JSONTools::new()
    .flatten()
    .num_threads(Some(4));
}

Environment Variable Overrides

All threshold defaults can be overridden without code changes via environment variables. These are read once at process startup (via LazyLock).

Variable	Default	Description
`JSON_TOOLS_PARALLEL_THRESHOLD`	`100`	Minimum batch size for parallel processing
`JSON_TOOLS_NESTED_PARALLEL_THRESHOLD`	`100`	Minimum keys/items for nested parallelism
`JSON_TOOLS_NUM_THREADS`	(CPU count)	Thread count for parallel processing
`JSON_TOOLS_MAX_ARRAY_INDEX`	`100000`	Maximum array index during unflattening

# Example: tune for a workload of many small documents
export JSON_TOOLS_PARALLEL_THRESHOLD=200
export JSON_TOOLS_NUM_THREADS=8

python my_pipeline.py

Environment variable values are parsed as usize. Invalid values (non-numeric, negative) silently fall back to the default.

Running Benchmarks

# All benchmarks
cargo bench

# Specific suite
cargo bench --bench isolation_benchmarks
cargo bench --bench comprehensive_benchmark
cargo bench --bench stress_benchmarks
cargo bench --bench realworld_benchmarks
cargo bench --bench combination_benchmarks

Benchmark Suites

Suite	Focus
`isolation_benchmarks`	Individual features in isolation (10 groups)
`combination_benchmarks`	2-way and 3-way feature interactions
`realworld_benchmarks`	AWS CloudTrail, GitHub API, K8s, Elasticsearch, Stripe, Twitter/X
`stress_benchmarks`	Edge cases: deep nesting, wide objects, large arrays
`comprehensive_benchmark`	Full feature coverage (15 groups)

Profiling

On macOS, use samply for profiling:

# Build with profiling symbols
cargo bench --profile profiling --bench stress_benchmarks --no-run

# Profile with samply
samply record --save-only -o /tmp/profile.json -- \
    ./target/profiling/deps/stress_benchmarks-* --bench

# View results
samply load /tmp/profile.json

Architecture

The codebase is organized into focused, single-responsibility modules:

src/
├── lib.rs            Facade: mod declarations + pub use re-exports
├── json_parser.rs    Conditional SIMD parser (sonic-rs / simd-json)
├── types.rs          Core types: JsonInput, JsonOutput, FlatMap
├── error.rs          Error types with codes E001-E008
├── config.rs         Configuration structs and operation modes
├── cache.rs          Tiered caching: regex, key deduplication, phf
├── convert.rs        Type conversion: numbers, dates, booleans, nulls
├── transform.rs      Filtering, key/value replacements, collision handling
├── flatten.rs        Flattening algorithm with Crossbeam parallelism
├── unflatten.rs      Unflattening with SIMD separator detection
├── builder.rs        Public JSONTools builder API and execute()
├── python.rs         Python bindings via PyO3
└── tests.rs          99 unit tests

The processing pipeline:

Parse -- SIMD-accelerated JSON parsing (json_parser)
Flatten/Unflatten -- Recursive traversal with Arc<str> key dedup (flatten/unflatten)
Transform -- Lowercase, replacements (cached regex), collision handling (transform)
Filter -- Remove empty strings, nulls, empty objects/arrays (transform)
Convert -- Type conversion with first-byte discriminators (convert)
Serialize -- Output to JSON string or native Python types

JSON Tools RS Guide