Python API Reference

import json_tools_rs

JSONTools

The main builder class for all JSON operations. All configuration methods return self for chaining; only .execute() and .execute_to_output() trigger processing.

Construction

tools = json_tools_rs.JSONTools()

Creates a new JSONTools instance with all default settings. The instance is reusable -- you can call .execute() multiple times with different inputs.

Operation Modes

Exactly one mode must be set before calling .execute(). Calling a mode method replaces any previously set mode.

.flatten()

tools.flatten() -> JSONTools

Set the operation to flatten nested JSON into dot-separated (or custom separator) keys.

import json_tools_rs as jt

result = jt.JSONTools().flatten().execute({"a": {"b": {"c": 1}}})
# {"a.b.c": 1}

.unflatten()

tools.unflatten() -> JSONTools

Set the operation to reconstruct nested JSON from flat, separator-delimited keys.

result = jt.JSONTools().unflatten().execute({"a.b.c": 1})
# {"a": {"b": {"c": 1}}}

.normal()

tools.normal() -> JSONTools

Set the operation to apply transformations (filtering, replacements, type conversion) without changing the nesting structure.

result = jt.JSONTools().normal().lowercase_keys(True).execute({"Name": "Alice"})
# {"name": "Alice"}

Configuration Methods

All configuration methods return self for chaining.

.separator(sep)

tools.separator(sep: str) -> JSONTools

Set the key separator for flatten/unflatten operations.

ParameterTypeDefaultDescription
sepstr"."Non-empty string used to join/split nested keys

Raises: ValueError if sep is an empty string.

result = jt.JSONTools().flatten().separator("::").execute({"a": {"b": 1}})
# {"a::b": 1}

.lowercase_keys(flag)

tools.lowercase_keys(flag: bool) -> JSONTools

Convert all keys to lowercase after processing.

ParameterTypeDefaultDescription
flagboolFalseEnable or disable lowercase key conversion
result = jt.JSONTools().flatten().lowercase_keys(True).execute({"User": {"Name": "Alice"}})
# {"user.name": "Alice"}

.remove_empty_strings(flag)

tools.remove_empty_strings(flag: bool) -> JSONTools

Remove key-value pairs where the value is an empty string "".

ParameterTypeDefaultDescription
flagboolFalseEnable or disable empty string removal
result = jt.JSONTools().flatten().remove_empty_strings(True).execute({"a": "", "b": "hello"})
# {"b": "hello"}

.remove_nulls(flag)

tools.remove_nulls(flag: bool) -> JSONTools

Remove key-value pairs where the value is None / null.

ParameterTypeDefaultDescription
flagboolFalseEnable or disable null removal
result = jt.JSONTools().flatten().remove_nulls(True).execute({"a": None, "b": 1})
# {"b": 1}

.remove_empty_objects(flag)

tools.remove_empty_objects(flag: bool) -> JSONTools

Remove key-value pairs where the value is an empty object {}.

ParameterTypeDefaultDescription
flagboolFalseEnable or disable empty object removal

.remove_empty_arrays(flag)

tools.remove_empty_arrays(flag: bool) -> JSONTools

Remove key-value pairs where the value is an empty array [].

ParameterTypeDefaultDescription
flagboolFalseEnable or disable empty array removal

.key_replacement(find, replace)

tools.key_replacement(find: str, replace: str) -> JSONTools

Add a key replacement pattern. Patterns use standard regex syntax. If the regex fails to compile, it falls back to literal string replacement. Multiple replacements can be chained.

ParameterTypeDescription
findstrRegex pattern (or literal string) to match in keys
replacestrReplacement string (supports regex capture groups like $1)
result = (jt.JSONTools()
    .flatten()
    .key_replacement("^user_", "")
    .key_replacement("_name$", "_id")
    .execute({"user_name": "Alice"}))
# {"id": "Alice"}

.value_replacement(find, replace)

tools.value_replacement(find: str, replace: str) -> JSONTools

Add a value replacement pattern. Works the same as key replacements but applies to string values.

ParameterTypeDescription
findstrRegex pattern (or literal string) to match in values
replacestrReplacement string
result = (jt.JSONTools()
    .flatten()
    .value_replacement("@example\\.com", "@company.org")
    .execute({"email": "user@example.com"}))
# {"email": "user@company.org"}

.handle_key_collision(flag)

tools.handle_key_collision(flag: bool) -> JSONTools

When enabled, keys that would collide after transformations (e.g., after lowercasing) are collected into arrays instead of overwriting each other.

ParameterTypeDefaultDescription
flagboolFalseEnable collision handling
result = (jt.JSONTools()
    .flatten()
    .lowercase_keys(True)
    .handle_key_collision(True)
    .execute({"Name": "Alice", "name": "Bob"}))
# {"name": ["Alice", "Bob"]}

.auto_convert_types(flag)

tools.auto_convert_types(flag: bool) -> JSONTools

Automatically convert string values to their native types:

  • Numbers: "123" -> 123, "1,234.56" -> 1234.56, "$99.99" -> 99.99, "1e5" -> 100000
  • Booleans: "true" / "TRUE" / "True" -> true, "false" / "FALSE" / "False" -> false
  • Nulls: "null" / "None" -> null

If conversion fails, the original string is kept. No errors are raised on conversion failure.

ParameterTypeDefaultDescription
flagboolFalseEnable automatic type conversion
result = (jt.JSONTools()
    .flatten()
    .auto_convert_types(True)
    .execute({"id": "123", "price": "1,234.56", "active": "true"}))
# {"id": 123, "price": 1234.56, "active": true}

.parallel_threshold(n)

tools.parallel_threshold(n: int) -> JSONTools

Set the minimum batch size to trigger parallel processing. Batches smaller than this are processed sequentially to avoid thread-spawning overhead.

ParameterTypeDefaultDescription
nint100Minimum batch size for parallelism

Default can be overridden with the JSON_TOOLS_PARALLEL_THRESHOLD environment variable.

tools = jt.JSONTools().flatten().parallel_threshold(50)

.num_threads(n)

tools.num_threads(n: int) -> JSONTools

Set the number of threads used for parallel processing.

ParameterTypeDefaultDescription
nintCPU countNumber of worker threads

Default can be overridden with the JSON_TOOLS_NUM_THREADS environment variable.

tools = jt.JSONTools().flatten().num_threads(4)

.nested_parallel_threshold(n)

tools.nested_parallel_threshold(n: int) -> JSONTools

Set the minimum number of keys/items within a single JSON document to trigger nested (intra-document) parallelism. Only objects or arrays exceeding this count are parallelized internally.

ParameterTypeDefaultDescription
nint100Minimum keys/items for nested parallelism

Default can be overridden with the JSON_TOOLS_NESTED_PARALLEL_THRESHOLD environment variable.

tools = jt.JSONTools().flatten().nested_parallel_threshold(200)

.max_array_index(n)

tools.max_array_index(n: int) -> JSONTools

Set the maximum array index allowed during unflattening. This is a DoS protection: a malicious key like "items.999999999" would otherwise allocate a massive array.

ParameterTypeDefaultDescription
nint100000Maximum array index

Default can be overridden with the JSON_TOOLS_MAX_ARRAY_INDEX environment variable.

Execution Methods

.execute(input)

tools.execute(input) -> str | dict | list[str] | list[dict] | DataFrame | Series

Execute the configured operation. The return type mirrors the input type:

Input TypeOutput Type
strstr (JSON string)
dictdict (Python dictionary)
list[str]list[str]
list[dict]list[dict]
pandas.DataFramepandas.DataFrame
pandas.Seriespandas.Series
polars.DataFramepolars.DataFrame
polars.Seriespolars.Series
pyarrow.Tablepyarrow.Table
pyarrow.ChunkedArraypyarrow.ChunkedArray
pyspark.sql.DataFramepyspark.sql.DataFrame

Raises: JsonToolsError if no mode is set, input is invalid, or processing fails.

# String input -> string output
result = jt.JSONTools().flatten().execute('{"a": {"b": 1}}')
assert isinstance(result, str)

# Dict input -> dict output
result = jt.JSONTools().flatten().execute({"a": {"b": 1}})
assert isinstance(result, dict)

# Batch string input -> batch string output
results = jt.JSONTools().flatten().execute(['{"a": 1}', '{"b": 2}'])
assert isinstance(results, list) and isinstance(results[0], str)

# Batch dict input -> batch dict output
results = jt.JSONTools().flatten().execute([{"a": {"b": 1}}, {"c": {"d": 2}}])
assert isinstance(results, list) and isinstance(results[0], dict)

.execute_to_output(input)

tools.execute_to_output(input) -> JsonOutput

Execute the operation but return a JsonOutput wrapper instead of native Python types. Useful when you need to inspect whether the result is single or multiple before extracting.

Note: DataFrame and Series inputs are not supported with execute_to_output(). Use .execute() for those types.

ParameterTypeDescription
inputstr, dict, list[str], list[dict]JSON data to process
output = jt.JSONTools().flatten().execute_to_output('{"a": {"b": 1}}')
if output.is_single:
    print(output.get_single())
elif output.is_multiple:
    for item in output.get_multiple():
        print(item)

JsonOutput

Output wrapper returned by .execute_to_output(). Provides typed access to results.

Properties

PropertyTypeDescription
.is_singleboolTrue if the result contains a single JSON string
.is_multipleboolTrue if the result contains multiple JSON strings

Methods

.get_single()

output.get_single() -> str

Extract the single JSON string result.

Raises: ValueError if the result is multiple.

.get_multiple()

output.get_multiple() -> list[str]

Extract the list of JSON string results.

Raises: ValueError if the result is single.

.to_python()

output.to_python() -> str | list[str]

Convert to native Python type: returns str for single results, list[str] for multiple results.

String Representations

str(output) returns the JSON string (single) or a list representation (multiple). repr(output) returns JsonOutput.Single('...') or JsonOutput.Multiple([...]).

DataFrame and Series Support

JSON Tools RS natively supports Pandas, Polars, PyArrow, and PySpark DataFrames and Series. Detection is performed via duck typing -- no explicit imports are required.

Pandas DataFrame

import pandas as pd
import json_tools_rs as jt

df = pd.DataFrame({"json_col": [
    '{"user": {"name": "Alice", "age": 30}}',
    '{"user": {"name": "Bob", "age": 25}}',
]})

tools = jt.JSONTools().flatten().separator(".")

# Process column containing JSON strings
result_df = tools.execute(df)
# Returns a DataFrame with flattened columns: "user.name", "user.age"

Pandas Series

series = pd.Series([
    '{"a": {"b": 1}}',
    '{"a": {"b": 2}}',
])

result_series = jt.JSONTools().flatten().execute(series)
# Returns a Series of flattened JSON strings

Polars DataFrame

import polars as pl

df = pl.DataFrame({"json_col": [
    '{"user": {"name": "Alice"}}',
    '{"user": {"name": "Bob"}}',
]})

result_df = jt.JSONTools().flatten().execute(df)

Polars Series

series = pl.Series("data", [
    '{"a": {"b": 1}}',
    '{"a": {"b": 2}}',
])

result_series = jt.JSONTools().flatten().execute(series)

PyArrow Table

import pyarrow as pa

table = pa.table({"json_col": [
    '{"user": {"name": "Alice"}}',
    '{"user": {"name": "Bob"}}',
]})

result_table = jt.JSONTools().flatten().execute(table)

PySpark DataFrame

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
    ('{"user": {"name": "Alice"}}',),
    ('{"user": {"name": "Bob"}}',),
], ["json_col"])

result_df = jt.JSONTools().flatten().execute(df)

JsonToolsError

Exception class for all errors raised by JSON Tools RS.

import json_tools_rs as jt

try:
    result = jt.JSONTools().flatten().execute("not valid json")
except jt.JsonToolsError as e:
    print(f"Error: {e}")
    # Error: [E001] JSON parsing failed: ...

Error messages include a machine-readable code (E001-E008) at the start of the message. See Error Codes for the full reference.

Error Codes Quick Reference

CodeNameCommon Cause
E001JsonParseErrorInvalid JSON input
E002RegexErrorBad regex in key/value replacement
E003InvalidReplacementPatternMalformed replacement pair
E004InvalidJsonStructureWrong JSON shape for the operation
E005ConfigurationErrorNo mode set before .execute()
E006BatchProcessingErrorError in one item during batch processing
E007InputValidationErrorUnsupported input type
E008SerializationErrorInternal serialization failure

Handling Specific Errors

import json_tools_rs as jt

try:
    result = jt.JSONTools().execute({"a": 1})  # No mode set
except jt.JsonToolsError as e:
    msg = str(e)
    if "[E005]" in msg:
        print("Forgot to call .flatten() or .unflatten()")
    elif "[E001]" in msg:
        print("Invalid JSON input")

Complete Example

import json_tools_rs as jt

# Build once, reuse many times
tools = (jt.JSONTools()
    .flatten()
    .separator("::")
    .lowercase_keys(True)
    .remove_nulls(True)
    .remove_empty_strings(True)
    .key_replacement("^user_", "")
    .auto_convert_types(True)
    .parallel_threshold(50)
    .num_threads(4)
)

# Single dict
result = tools.execute({"User_Name": "Alice", "User_Age": "30"})
# {"name": "Alice", "age": 30}

# Batch of dicts (processed in parallel if >= 50 items)
results = tools.execute([{"data": str(i)} for i in range(1000)])

# JSON string
result = tools.execute('{"User_Name": "Alice", "nested": {"User_Age": "30"}}')

# DataFrame
import pandas as pd
df = pd.DataFrame({"json": [
    '{"User_Name": "Alice", "nested": {"User_Age": "30"}}',
    '{"User_Name": "Bob", "nested": {"User_Age": "25"}}',
]})
df_result = tools.execute(df)