Troubleshooting

This guide covers common errors, their causes, and how to resolve them.

Error Code Reference

All errors include a machine-readable code (E001-E008) at the start of the error message. Use these codes for programmatic error handling.

E001: JsonParseError

Message: [E001] JSON parsing failed: ...

Cause: The input string is not valid JSON.

Common triggers:

  • Missing quotes around keys or values
  • Trailing commas after the last element
  • Single quotes instead of double quotes
  • Unescaped special characters in strings
  • Incomplete JSON (missing closing braces or brackets)
  • Passing a file path instead of the file contents

Solution:

# Wrong
result = tools.execute("hello world")          # Not JSON
result = tools.execute("{'key': 'value'}")     # Single quotes
result = tools.execute('{"a": 1,}')            # Trailing comma

# Correct
result = tools.execute('{"key": "value"}')
result = tools.execute({"key": "value"})       # Pass a dict directly

E002: RegexError

Message: [E002] Regex pattern error: ...

Cause: A key or value replacement pattern failed to compile as a regex.

Common triggers:

  • Unescaped special regex characters (., *, +, ?, (, ), [, ])
  • Unclosed groups or character classes
  • Invalid backreferences

Solution:

# Wrong -- unescaped dot matches any character
tools.key_replacement("user.name", "username")

# Correct -- escape the dot for literal matching
tools.key_replacement(r"user\.name", "username")

# Or use a simpler pattern that won't be misinterpreted
tools.key_replacement("user_name", "username")

Note: If regex compilation fails, the library automatically falls back to literal string matching. This error only surfaces when the pattern is syntactically broken (e.g., unclosed groups).

E003: InvalidReplacementPattern

Message: [E003] Invalid replacement pattern: ...

Cause: The replacement pattern configuration is malformed.

Solution: Ensure replacement patterns are provided as (find, replace) pairs:

# Correct usage
tools.key_replacement("find_pattern", "replacement")
tools.value_replacement("old_value", "new_value")

E004: InvalidJsonStructure

Message: [E004] Invalid JSON structure: ...

Cause: The JSON is valid but not compatible with the requested operation.

Common triggers:

  • Unflattening a JSON array (unflatten requires a flat object)
  • Unflattening a non-flat object (nested values where flat keys are expected)

Solution:

# Wrong -- unflatten expects a flat object, not an array
result = jt.JSONTools().unflatten().execute('[1, 2, 3]')

# Wrong -- unflatten expects flat keys
result = jt.JSONTools().unflatten().execute('{"a": {"b": 1}}')

# Correct -- flat object with dot-separated keys
result = jt.JSONTools().unflatten().execute('{"a.b": 1, "a.c": 2}')

E005: ConfigurationError

Message: [E005] Operation mode not configured: ...

Cause: .execute() was called without first setting an operation mode.

Solution: Always call .flatten(), .unflatten(), or .normal() before .execute():

# Wrong
result = jt.JSONTools().execute(data)

# Correct
result = jt.JSONTools().flatten().execute(data)
result = jt.JSONTools().unflatten().execute(data)
result = jt.JSONTools().normal().execute(data)

This error also occurs if num_threads is set to 0:

# Wrong
tools = jt.JSONTools().flatten().num_threads(0)

# Correct
tools = jt.JSONTools().flatten().num_threads(1)    # At least 1
tools = jt.JSONTools().flatten()                    # Use default (CPU count)

E006: BatchProcessingError

Message: [E006] Batch processing failed at index {N}: ...

Cause: One or more items in a batch failed to process. The error includes the index of the failing item and the underlying error.

Solution: Check the item at the reported index. The inner error (usually E001 or E004) describes what went wrong:

try:
    results = tools.execute(batch_of_json)
except jt.JsonToolsError as e:
    msg = str(e)
    if "[E006]" in msg:
        # Extract the index from the message to find the bad item
        print(f"Batch error: {e}")
        # Fix or filter the problematic items and retry

E007: InputValidationError

Message: [E007] Input validation failed: ...

Cause: The input type is not supported.

Common triggers:

  • Passing an integer, float, or boolean directly
  • Passing a non-JSON-string, non-dict type in a list
  • Using execute_to_output() with a DataFrame or Series (use execute() instead)

Solution:

# Wrong
result = tools.execute(42)
result = tools.execute([1, 2, 3])

# Correct
result = tools.execute('{"value": 42}')
result = tools.execute({"value": 42})
result = tools.execute(['{"a": 1}', '{"b": 2}'])

E008: SerializationError

Message: [E008] JSON serialization failed: ...

Cause: The processed result could not be serialized back to JSON. This is typically an internal error.

Solution: If you encounter this error, please report it as a bug. As a workaround, check that your input does not contain unusual Unicode sequences or extremely large numbers that may not round-trip through JSON.

Common Issues

Empty Separator

The separator must be a non-empty string. Using an empty separator is always a logic error -- it would make keys ambiguous.

# This raises an error
tools = jt.JSONTools().flatten().separator("")

# Use any non-empty string
tools = jt.JSONTools().flatten().separator(".")
tools = jt.JSONTools().flatten().separator("::")
tools = jt.JSONTools().flatten().separator("/")

In Rust, an empty separator causes a panic (via assert!). In Python, it raises a ValueError.

Missing Operation Mode

The most common mistake is forgetting to set a mode:

# This always raises E005
tools = jt.JSONTools()
tools.execute(data)  # Error!

# Set a mode first
tools = jt.JSONTools().flatten()
tools.execute(data)  # OK

Dict vs String Input

Both str and dict inputs are accepted, but the output type mirrors the input type:

# String in -> string out
result = tools.execute('{"a": {"b": 1}}')
assert isinstance(result, str)
# result == '{"a.b":1}'

# Dict in -> dict out
result = tools.execute({"a": {"b": 1}})
assert isinstance(result, dict)
# result == {"a.b": 1}

If you need the raw JSON string output from a dict input, use .execute_to_output():

output = tools.execute_to_output({"a": {"b": 1}})
json_str = output.get_single()  # Returns a JSON string

Regex Patterns in Replacements

Replacement patterns use standard regex syntax. Common pitfalls:

# The dot matches ANY character -- "user.name" matches "username" too
tools.key_replacement("user.name", "id")

# Escape dots for literal matching
tools.key_replacement(r"user\.name", "id")

# Use anchors for precise matching
tools.key_replacement("^user_", "")       # Only at start of key
tools.key_replacement("_suffix$", "")      # Only at end of key

Performance Tuning

When Parallelism Helps

Parallel processing adds overhead for thread spawning and synchronization. It helps when:

  • Batch size is large (100+ items by default) -- amortizes spawning cost
  • Individual documents are complex -- deep nesting, many keys, expensive transformations
  • CPU cores are available -- parallelism on a single-core machine adds only overhead

When Parallelism Hurts

Reduce or disable parallelism when:

  • Documents are tiny (a few flat keys) -- thread overhead dominates
  • Batch sizes are small (<50 items) -- raise parallel_threshold
  • Memory is constrained -- each thread needs its own stack and working set
  • Running inside a GIL-heavy Python workload -- the GIL is released during Rust processing, but other Python threads may contend
# Disable parallelism for small workloads
tools = jt.JSONTools().flatten().parallel_threshold(999_999)

# Or limit threads
tools = jt.JSONTools().flatten().num_threads(1)

Profiling Tips

Use the built-in benchmark suites to profile your specific workload pattern:

# Profile stress scenarios
cargo bench --profile profiling --bench stress_benchmarks --no-run
samply record --save-only -o /tmp/profile.json -- \
    ./target/profiling/deps/stress_benchmarks-* --bench

For Python profiling, measure wall-clock time since CPU profilers may not capture time spent in Rust:

import time
start = time.perf_counter()
result = tools.execute(data)
elapsed = time.perf_counter() - start
print(f"Processing took {elapsed:.3f}s")

Platform Notes

mimalloc (Rust-only)

The mimalloc global allocator is an optional feature that provides a 5-10% performance improvement. Enable it with features = ["mimalloc"] in your Cargo.toml. It is not included in Python builds because PyO3 manages memory through Python's allocator.

sonic-rs (64-bit only)

The default JSON parser is sonic-rs, which uses SIMD instructions available on 64-bit platforms (x86_64, aarch64). On 32-bit platforms, the library automatically falls back to simd-json. This is transparent -- the API is identical regardless of which parser is active.

macOS Profiling

On macOS, flamegraph requires full Xcode (not just Command Line Tools). Use samply instead:

cargo install samply
samply record --save-only -o profile.json -- ./target/profiling/deps/BENCH_BINARY --bench
samply load profile.json  # Opens Firefox Profiler

Valgrind does not work on modern macOS. Use Instruments (if Xcode is installed) or samply for profiling.