Parallel Processing

JSON Tools RS uses Crossbeam-based parallelism to automatically speed up batch operations and large nested structures.

Automatic Parallelism

Batch processing (100+ items by default) automatically uses parallel execution:

#![allow(unused)]
fn main() {
let batch: Vec<&str> = large_json_collection;
let result = JSONTools::new()
    .flatten()
    .execute(batch.as_slice())?;
// Automatically parallelized
}
batch = [{"data": i} for i in range(2000)]
results = jt.JSONTools().flatten().execute(batch)
# Automatically parallelized

Configuration

Batch Threshold

Control the minimum batch size before parallelism kicks in:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .parallel_threshold(50)  // Only parallelize batches of 50+ items
    .execute(batch.as_slice())?;
}

Thread Count

Limit the number of threads used:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .num_threads(Some(4))  // Use 4 threads (default: CPU count)
    .execute(batch.as_slice())?;
}

Nested Parallelism

Large individual JSON objects/arrays can also be parallelized:

#![allow(unused)]
fn main() {
let result = JSONTools::new()
    .flatten()
    .nested_parallel_threshold(200)  // Parallelize objects with 200+ entries
    .execute(large_json)?;
}

Python Configuration

tools = (jt.JSONTools()
    .flatten()
    .parallel_threshold(50)
    .num_threads(4)
    .nested_parallel_threshold(200)
)

results = tools.execute(large_batch)

How It Works

  • Batch parallelism: Input is split into chunks, each processed by a separate thread via crossbeam::thread::scope. Results are written to pre-allocated slots preserving input order.
  • Nested parallelism: Large JSON objects (many keys) or arrays (many elements) are split across threads for parallel flattening, then merged.
  • Thread safety: All parallelism uses scoped threads -- no 'static bounds required, no data races possible.

Environment Variables

All parallelism settings can be overridden via environment variables (applied at construction time):

VariableDefaultDescription
JSON_TOOLS_PARALLEL_THRESHOLD100Minimum batch size to trigger parallel processing
JSON_TOOLS_NESTED_PARALLEL_THRESHOLD100Minimum object/array size for nested parallelism
JSON_TOOLS_NUM_THREADSCPU countNumber of threads for parallel processing
JSON_TOOLS_MAX_ARRAY_INDEX100000Maximum array index during unflattening (DoS protection)
export JSON_TOOLS_PARALLEL_THRESHOLD=50
export JSON_TOOLS_NESTED_PARALLEL_THRESHOLD=200
export JSON_TOOLS_NUM_THREADS=4
export JSON_TOOLS_MAX_ARRAY_INDEX=500000

Environment variables take effect when JSONTools::new() is called. Builder method calls (e.g., .parallel_threshold(n)) override them.