Skip to content

Python JSON: Parse, Read, Write, and Convert JSON Data

Updated on

You fetch data from a REST API, and the response is a JSON string. You need to extract specific fields, transform the data, and save it to a file. Or you have a Python dictionary that needs to be sent as a JSON payload to another service. You reach for json.loads() but hit a JSONDecodeError because the string has single quotes instead of double quotes. Or you try to serialize a datetime object and get TypeError: Object of type datetime is not JSON serializable.

JSON is the universal data exchange format for web APIs, configuration files, and data pipelines. Python's built-in json module handles serialization and deserialization, but its quirks trip up developers constantly -- from encoding edge cases to performance bottlenecks when processing large files. This guide covers everything you need to work with JSON in Python confidently, from basic parsing to advanced patterns used in production systems.

📚

What Is JSON?

JSON (JavaScript Object Notation) is a lightweight text-based data format. It supports six data types: strings, numbers, booleans, null, arrays, and objects. A typical JSON document looks like this:

{
  "name": "Alice",
  "age": 30,
  "is_active": true,
  "skills": ["Python", "SQL", "Machine Learning"],
  "address": {
    "city": "San Francisco",
    "state": "CA"
  }
}

Python's json module maps JSON types to Python types automatically:

JSON TypePython Type
objectdict
arraylist
stringstr
number (int)int
number (float)float
true/falseTrue/False
nullNone

The Four Core Functions

The json module has four primary functions. Two work with strings, two work with files.

FunctionInputOutputUse Case
json.loads()JSON stringPython objectParse API response body
json.dumps()Python objectJSON stringBuild request payload
json.load()File objectPython objectRead config/data file
json.dump()Python objectFile objectWrite data to JSON file

The naming convention: functions ending in s work with strings. Functions without s work with file-like objects.

Parsing JSON Strings with json.loads()

json.loads() (load from string) converts a JSON-formatted string into a Python object.

import json
 
json_string = '{"name": "Alice", "age": 30, "skills": ["Python", "SQL"]}'
data = json.loads(json_string)
 
print(data["name"])       # Alice
print(data["skills"][0])  # Python
print(type(data))         # <class 'dict'>

Parsing JSON Arrays

When the JSON root is an array, json.loads() returns a Python list:

import json
 
json_array = '[1, 2, 3, "four", null, true]'
result = json.loads(json_array)
 
print(result)       # [1, 2, 3, 'four', None, True]
print(type(result)) # <class 'list'>

Handling Parsing Errors

Invalid JSON raises json.JSONDecodeError. Always wrap parsing in a try/except block when dealing with external data:

import json
 
bad_json = "{'name': 'Alice'}"  # Single quotes are not valid JSON
 
try:
    data = json.loads(bad_json)
except json.JSONDecodeError as e:
    print(f"Invalid JSON: {e}")
    # Invalid JSON: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Common causes of JSONDecodeError:

  • Single quotes instead of double quotes
  • Trailing commas after the last element
  • Unquoted keys
  • Comments in the JSON (JSON does not support comments)
  • BOM characters at the start of the string

Converting Python to JSON with json.dumps()

json.dumps() (dump to string) serializes a Python object into a JSON-formatted string.

import json
 
data = {
    "name": "Alice",
    "age": 30,
    "is_active": True,
    "scores": [95, 87, 92],
    "address": None
}
 
json_string = json.dumps(data)
print(json_string)
# {"name": "Alice", "age": 30, "is_active": true, "scores": [95, 87, 92], "address": null}

Notice that Python True becomes JSON true, None becomes null, and the output uses double quotes.

Pretty Printing with indent

Raw JSON on a single line is hard to read. Use the indent parameter:

import json
 
data = {"user": {"name": "Alice", "roles": ["admin", "editor"]}, "active": True}
 
print(json.dumps(data, indent=2))

Output:

{
  "user": {
    "name": "Alice",
    "roles": [
      "admin",
      "editor"
    ]
  },
  "active": true
}

Sorting Keys

Use sort_keys=True for consistent, deterministic output -- useful for diffs and testing:

import json
 
data = {"banana": 2, "apple": 5, "cherry": 1}
print(json.dumps(data, sort_keys=True, indent=2))

Output:

{
  "apple": 5,
  "banana": 2,
  "cherry": 1
}

Controlling Separators

By default, json.dumps() uses ", " between items and ": " between keys and values. You can make compact output by removing spaces:

import json
 
data = {"a": 1, "b": 2, "c": 3}
 
# Compact (no spaces)
print(json.dumps(data, separators=(",", ":")))
# {"a":1,"b":2,"c":3}
 
# Default
print(json.dumps(data))
# {"a": 1, "b": 2, "c": 3}

Compact separators reduce file size, which matters when transmitting large JSON payloads over the network.

Handling Non-ASCII Characters

By default, json.dumps() escapes non-ASCII characters. Set ensure_ascii=False to preserve them:

import json
 
data = {"city": "Zurich", "greeting": "こんにちは"}
 
print(json.dumps(data))
# {"city": "Zurich", "greeting": "\u3053\u3093\u306b\u3061\u306f"}
 
print(json.dumps(data, ensure_ascii=False))
# {"city": "Zurich", "greeting": "こんにちは"}

Reading JSON Files with json.load()

json.load() reads JSON directly from a file object:

import json
 
with open("config.json", "r", encoding="utf-8") as f:
    config = json.load(f)
 
print(config["database"]["host"])
print(config["database"]["port"])

Always specify encoding="utf-8" when opening files to avoid platform-specific encoding issues.

Reading Large JSON Files

For very large JSON files, loading everything into memory at once may not be feasible. Consider these approaches:

import json
 
# Approach 1: Read and process line-delimited JSON (JSONL)
with open("events.jsonl", "r") as f:
    for line in f:
        event = json.loads(line.strip())
        process(event)
 
# Approach 2: Use ijson for streaming large JSON arrays
# pip install ijson
import ijson
 
with open("huge_file.json", "rb") as f:
    for item in ijson.items(f, "item"):
        process(item)

Writing JSON Files with json.dump()

json.dump() writes a Python object directly to a file as JSON:

import json
 
data = {
    "users": [
        {"name": "Alice", "age": 30},
        {"name": "Bob", "age": 25}
    ],
    "total": 2,
    "generated_at": "2026-02-11"
}
 
with open("output.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

This creates a properly formatted, human-readable JSON file. For data interchange where file size matters, omit indent:

import json
 
with open("output_compact.json", "w") as f:
    json.dump(data, f, separators=(",", ":"))

Handling Custom Objects

Python's json module cannot serialize custom objects, datetime, set, bytes, or Decimal by default. You have three approaches to solve this.

Approach 1: The default Parameter

Pass a function to the default parameter that converts unsupported types:

import json
from datetime import datetime, date
from decimal import Decimal
 
def json_serializer(obj):
    if isinstance(obj, (datetime, date)):
        return obj.isoformat()
    if isinstance(obj, Decimal):
        return float(obj)
    if isinstance(obj, set):
        return list(obj)
    if isinstance(obj, bytes):
        return obj.decode("utf-8")
    raise TypeError(f"Type {type(obj)} is not JSON serializable")
 
data = {
    "timestamp": datetime(2026, 2, 11, 14, 30),
    "price": Decimal("19.99"),
    "tags": {"python", "json", "tutorial"},
    "raw": b"hello"
}
 
print(json.dumps(data, default=json_serializer, indent=2))

Output:

{
  "timestamp": "2026-02-11T14:30:00",
  "price": 19.99,
  "tags": ["tutorial", "json", "python"],
  "raw": "hello"
}

Approach 2: Custom JSONEncoder

For reusable serialization logic, subclass json.JSONEncoder:

import json
from datetime import datetime, date
from decimal import Decimal
 
class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime, date)):
            return obj.isoformat()
        if isinstance(obj, Decimal):
            return str(obj)
        if isinstance(obj, set):
            return sorted(list(obj))
        return super().default(obj)
 
data = {"created": datetime.now(), "price": Decimal("29.99")}
 
# Use as cls argument
print(json.dumps(data, cls=CustomEncoder, indent=2))
 
# Or instantiate directly
encoder = CustomEncoder(indent=2)
print(encoder.encode(data))

Approach 3: Custom Decoder with object_hook

To convert JSON back into custom Python objects, use object_hook:

import json
from datetime import datetime
 
def decode_dates(obj):
    for key, value in obj.items():
        if isinstance(value, str):
            try:
                obj[key] = datetime.fromisoformat(value)
            except ValueError:
                pass
    return obj
 
json_string = '{"name": "Event", "start": "2026-02-11T14:30:00"}'
data = json.loads(json_string, object_hook=decode_dates)
 
print(type(data["start"]))  # <class 'datetime.datetime'>
print(data["start"].year)    # 2026

Parsing JSON from APIs

One of the most common uses of JSON in Python is handling API responses. The requests library makes this straightforward:

import requests
 
response = requests.get("https://api.github.com/repos/python/cpython")
 
# Method 1: Use response.json() (handles decoding automatically)
data = response.json()
print(data["full_name"])       # python/cpython
print(data["stargazers_count"])
 
# Method 2: Parse manually
data = json.loads(response.text)

Building JSON Request Bodies

import requests
import json
 
payload = {
    "query": "SELECT * FROM users",
    "limit": 100,
    "filters": {"status": "active"}
}
 
# requests automatically serializes dict to JSON with json= parameter
response = requests.post(
    "https://api.example.com/data",
    json=payload,  # Automatically sets Content-Type: application/json
    headers={"Authorization": "Bearer token123"}
)
 
result = response.json()

Handling Paginated API Responses

import requests
 
def fetch_all_items(base_url):
    items = []
    page = 1
 
    while True:
        response = requests.get(f"{base_url}?page={page}&per_page=100")
        data = response.json()
 
        if not data["results"]:
            break
 
        items.extend(data["results"])
        page += 1
 
    return items

If you work with data from APIs frequently and want to explore the results interactively, PyGWalker (opens in a new tab) can turn your parsed JSON data (loaded into a Pandas DataFrame) into a Tableau-like visual interface for drag-and-drop exploration -- no extra code required.

Working with Nested JSON

Real-world JSON is often deeply nested. Here are patterns for working with complex structures:

Safe Access with .get()

import json
 
data = json.loads('''{
    "user": {
        "profile": {
            "name": "Alice",
            "settings": {"theme": "dark"}
        }
    }
}''')
 
# Risky: KeyError if any level is missing
# name = data["user"]["profile"]["name"]
 
# Safe: Use .get() with defaults
name = data.get("user", {}).get("profile", {}).get("name", "Unknown")
print(name)  # Alice
 
# Missing path returns default
email = data.get("user", {}).get("profile", {}).get("email", "N/A")
print(email)  # N/A

Flattening Nested JSON

import json
 
def flatten_json(obj, prefix=""):
    """Flatten a nested JSON object into a single-level dict."""
    flat = {}
    for key, value in obj.items():
        new_key = f"{prefix}.{key}" if prefix else key
        if isinstance(value, dict):
            flat.update(flatten_json(value, new_key))
        elif isinstance(value, list):
            for i, item in enumerate(value):
                if isinstance(item, dict):
                    flat.update(flatten_json(item, f"{new_key}[{i}]"))
                else:
                    flat[f"{new_key}[{i}]"] = item
        else:
            flat[new_key] = value
    return flat
 
nested = {
    "user": {"name": "Alice", "address": {"city": "SF", "zip": "94102"}},
    "tags": ["admin", "editor"]
}
 
flat = flatten_json(nested)
for key, value in flat.items():
    print(f"{key}: {value}")

Output:

user.name: Alice
user.address.city: SF
user.address.zip: 94102
tags[0]: admin
tags[1]: editor

Converting Nested JSON to a DataFrame

import json
import pandas as pd
 
json_data = '''[
    {"name": "Alice", "scores": {"math": 95, "science": 88}},
    {"name": "Bob", "scores": {"math": 78, "science": 92}},
    {"name": "Charlie", "scores": {"math": 88, "science": 85}}
]'''
 
data = json.loads(json_data)
 
# pd.json_normalize handles nested structures
df = pd.json_normalize(data)
print(df)

Output:

      name  scores.math  scores.science
0    Alice           95              88
1      Bob           78              92
2  Charlie           88              85

JSON vs Python Dict: Key Differences

JSON strings and Python dictionaries look similar but have important differences:

FeatureJSONPython Dict
QuotesDouble quotes only ("key")Single or double ('key' or "key")
Booleanstrue / falseTrue / False
Null valuenullNone
Key typesStrings onlyAny hashable type
Trailing commasNot allowedAllowed
CommentsNot supported# comments
Data typeText (string)Python object
TupleNo tuple type (becomes array)Supported
SetNo set typeSupported

A common mistake is using str() or repr() on a dict and expecting valid JSON:

import json
 
data = {"active": True, "count": None}
 
# WRONG: Python repr, not valid JSON
print(str(data))   # {'active': True, 'count': None}
 
# RIGHT: Valid JSON
print(json.dumps(data))  # {"active": true, "count": null}

Command-Line JSON with json.tool

Python includes a built-in command-line tool for pretty-printing and validating JSON:

# Pretty print a JSON file
python -m json.tool data.json
 
# Pretty print from a pipe (e.g., curl output)
curl -s https://api.github.com/zen | python -m json.tool
 
# Validate JSON (exits with error code on invalid JSON)
echo '{"valid": true}' | python -m json.tool
 
# Sort keys
python -m json.tool --sort-keys data.json
 
# Compact output
python -m json.tool --compact data.json

This is invaluable for quick debugging and data inspection without writing a script.

Performance: json vs orjson vs ujson

The built-in json module is reliable but not the fastest option. For performance-critical applications, consider these alternatives:

LibraryParse SpeedSerialize SpeedNotes
json (built-in)1x (baseline)1x (baseline)Always available, no install needed
ujson~2-3x faster~2-3x fasterDrop-in replacement, C extension
orjson~5-10x faster~5-10x fasterReturns bytes, not str; most strict
rapidjson~3-5x faster~3-5x fasterMany configuration options

Using orjson

# pip install orjson
import orjson
 
# Parse JSON (same interface)
data = orjson.loads('{"name": "Alice", "age": 30}')
 
# Serialize (returns bytes, not str)
json_bytes = orjson.dumps(data)
print(json_bytes)  # b'{"name":"Alice","age":30}'
 
# Pretty print with orjson
json_bytes = orjson.dumps(data, option=orjson.OPT_INDENT_2)
 
# orjson handles datetime natively
from datetime import datetime
data = {"timestamp": datetime.now()}
print(orjson.dumps(data))  # b'{"timestamp":"2026-02-11T14:30:00"}'

Using ujson

# pip install ujson
import ujson
 
# Drop-in replacement for json
data = ujson.loads('{"name": "Alice"}')
json_string = ujson.dumps(data, indent=2)
print(json_string)

Benchmark Comparison

import json
import time
 
# Generate test data
data = [{"id": i, "name": f"user_{i}", "scores": [i * 10, i * 20]} for i in range(10000)]
json_string = json.dumps(data)
 
# Benchmark json
start = time.perf_counter()
for _ in range(100):
    json.loads(json_string)
json_time = time.perf_counter() - start
 
print(f"json:   {json_time:.3f}s")
 
# Benchmark orjson (if installed)
try:
    import orjson
    start = time.perf_counter()
    for _ in range(100):
        orjson.loads(json_string)
    orjson_time = time.perf_counter() - start
    print(f"orjson: {orjson_time:.3f}s ({json_time/orjson_time:.1f}x faster)")
except ImportError:
    print("orjson not installed")

For data-heavy workflows -- parsing API responses, transforming JSON into DataFrames for analysis -- consider using RunCell (opens in a new tab) as your Jupyter environment. Its AI agent can help you write and debug JSON parsing code interactively.

Common Errors and Solutions

JSONDecodeError

import json
 
# Problem: Invalid JSON format
try:
    json.loads("{'key': 'value'}")  # Single quotes
except json.JSONDecodeError as e:
    print(f"Error at position {e.pos}: {e.msg}")
 
# Solution: Ensure proper JSON format
data = json.loads('{"key": "value"}')

TypeError: Object is not JSON serializable

import json
from datetime import datetime
 
# Problem: datetime is not serializable
try:
    json.dumps({"now": datetime.now()})
except TypeError as e:
    print(e)  # Object of type datetime is not JSON serializable
 
# Solution 1: Convert before serializing
data = {"now": datetime.now().isoformat()}
print(json.dumps(data))
 
# Solution 2: Use default parameter
print(json.dumps({"now": datetime.now()}, default=str))

Handling NaN and Infinity

JSON does not support NaN or Infinity, but Python's json module allows them by default (non-standard behavior):

import json
import math
 
data = {"value": float("nan"), "big": float("inf")}
 
# Default: allows NaN/Infinity (non-standard JSON)
print(json.dumps(data))  # {"value": NaN, "big": Infinity}
 
# Strict: raise error on NaN/Infinity
try:
    json.dumps(data, allow_nan=False)
except ValueError as e:
    print(e)  # Out of range float values are not JSON compliant
 
# Solution: Replace NaN with None
clean = {k: (None if isinstance(v, float) and (math.isnan(v) or math.isinf(v)) else v)
         for k, v in data.items()}
print(json.dumps(clean))  # {"value": null, "big": null}

Encoding Issues

import json
 
# Problem: BOM character in JSON string
json_with_bom = '\ufeff{"key": "value"}'
try:
    json.loads(json_with_bom)
except json.JSONDecodeError:
    # Solution: Strip BOM
    clean = json_with_bom.lstrip('\ufeff')
    data = json.loads(clean)
    print(data)

Real-World Patterns

Configuration Files

import json
from pathlib import Path
 
def load_config(config_path="config.json"):
    """Load config with defaults and validation."""
    defaults = {
        "debug": False,
        "log_level": "INFO",
        "max_retries": 3,
        "timeout": 30
    }
 
    path = Path(config_path)
    if path.exists():
        with open(path, "r") as f:
            user_config = json.load(f)
        # Merge: user config overrides defaults
        return {**defaults, **user_config}
    return defaults
 
def save_config(config, config_path="config.json"):
    """Save config with pretty formatting."""
    with open(config_path, "w") as f:
        json.dump(config, f, indent=2, sort_keys=True)
 
# Usage
config = load_config()
config["debug"] = True
save_config(config)

JSON Lines (JSONL) for Log Files

JSONL stores one JSON object per line, making it efficient for append-only writes and line-by-line streaming reads:

import json
from datetime import datetime
 
def log_event(event_type, data, log_file="events.jsonl"):
    """Append a JSON event to a JSONL log file."""
    event = {
        "timestamp": datetime.now().isoformat(),
        "type": event_type,
        "data": data
    }
    with open(log_file, "a") as f:
        f.write(json.dumps(event) + "\n")
 
def read_events(log_file="events.jsonl", event_type=None):
    """Read and optionally filter events from a JSONL file."""
    events = []
    with open(log_file, "r") as f:
        for line in f:
            event = json.loads(line.strip())
            if event_type is None or event["type"] == event_type:
                events.append(event)
    return events
 
# Usage
log_event("user_login", {"user_id": 42, "ip": "192.168.1.1"})
log_event("page_view", {"user_id": 42, "page": "/dashboard"})
log_event("user_login", {"user_id": 99, "ip": "10.0.0.1"})
 
logins = read_events(event_type="user_login")
print(f"Total logins: {len(logins)}")

API Response Caching

import json
import time
from pathlib import Path
 
class JSONCache:
    """Simple file-based JSON cache with TTL."""
 
    def __init__(self, cache_dir=".cache"):
        self.cache_dir = Path(cache_dir)
        self.cache_dir.mkdir(exist_ok=True)
 
    def get(self, key, ttl=3600):
        """Get cached value if it exists and hasn't expired."""
        path = self.cache_dir / f"{key}.json"
        if not path.exists():
            return None
 
        with open(path, "r") as f:
            cached = json.load(f)
 
        if time.time() - cached["timestamp"] > ttl:
            path.unlink()  # Delete expired cache
            return None
 
        return cached["data"]
 
    def set(self, key, data):
        """Store data in cache."""
        path = self.cache_dir / f"{key}.json"
        with open(path, "w") as f:
            json.dump({"timestamp": time.time(), "data": data}, f)
 
# Usage
cache = JSONCache()
cache.set("user_42", {"name": "Alice", "role": "admin"})
result = cache.get("user_42", ttl=300)  # 5-minute TTL

JSON Schema Validation

# pip install jsonschema
from jsonschema import validate, ValidationError
import json
 
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string", "minLength": 1},
        "age": {"type": "integer", "minimum": 0, "maximum": 150},
        "email": {"type": "string", "format": "email"}
    },
    "required": ["name", "age"]
}
 
# Valid data
valid = {"name": "Alice", "age": 30, "email": "alice@example.com"}
validate(instance=valid, schema=schema)  # No error
 
# Invalid data
invalid = {"name": "", "age": -5}
try:
    validate(instance=invalid, schema=schema)
except ValidationError as e:
    print(f"Validation failed: {e.message}")

Quick Reference: json Module Parameters

ParameterFunction(s)Description
indentdumps, dumpPretty print with N spaces of indentation
sort_keysdumps, dumpSort dictionary keys alphabetically
defaultdumps, dumpFunction to serialize non-standard types
clsdumps, dump, loads, loadCustom encoder/decoder class
ensure_asciidumps, dumpEscape non-ASCII characters (default: True)
separatorsdumps, dumpTuple of (item_sep, key_sep)
allow_nandumps, dumpAllow NaN/Infinity (default: True)
object_hookloads, loadFunction to transform decoded dicts
object_pairs_hookloads, loadFunction receiving ordered pairs
parse_floatloads, loadFunction to parse float strings (e.g., Decimal)
parse_intloads, loadFunction to parse int strings

Frequently Asked Questions

What is the difference between json.loads() and json.load()?

json.loads() parses a JSON string into a Python object. json.load() reads JSON from a file object. Use loads() when you already have the JSON as a string (e.g., from an API response), and load() when reading from a file on disk.

How do I convert a Python dictionary to a JSON string?

Use json.dumps(your_dict). For pretty formatting, add indent=2. For example: json.dumps({"name": "Alice", "age": 30}, indent=2) produces a nicely formatted JSON string with 2-space indentation.

How do I handle datetime objects in JSON?

Python's json module cannot serialize datetime objects by default. Use the default parameter: json.dumps(data, default=str) converts any non-serializable object to its string representation. For more control, write a custom function or use orjson, which handles datetime natively.

Why do I get JSONDecodeError when parsing a Python dictionary string?

Python dictionaries use single quotes, True/False, and None. JSON requires double quotes, true/false, and null. Use json.dumps() to convert a Python dict to valid JSON, not str() or repr().

Which JSON library is fastest in Python?

orjson is the fastest, typically 5-10x faster than the built-in json module for both parsing and serialization. ujson is 2-3x faster and acts as a drop-in replacement. The built-in json module is sufficient for most use cases and has no external dependencies.

How do I read a large JSON file without running out of memory?

Use ijson for streaming large JSON files, or switch to JSONL (JSON Lines) format where each line is a separate JSON object. JSONL allows line-by-line processing with minimal memory usage. For large arrays, ijson.items() yields one element at a time.

Conclusion

Python's json module handles the vast majority of JSON tasks you will encounter: parsing API responses with json.loads(), building request payloads with json.dumps(), and reading/writing files with json.load() and json.dump(). For custom objects, use the default parameter or a custom JSONEncoder. When performance matters, switch to orjson for a significant speed boost.

The key patterns to remember: always use try/except around parsing external JSON, use indent=2 for human-readable output, use ensure_ascii=False for international text, and never use str() on a dict when you need valid JSON. For large-scale data processing, JSONL format with line-by-line streaming keeps memory usage low while maintaining the simplicity of JSON.

📚