Python JSON: Parse, Read, Write, and Convert JSON Data
Updated on
You fetch data from a REST API, and the response is a JSON string. You need to extract specific fields, transform the data, and save it to a file. Or you have a Python dictionary that needs to be sent as a JSON payload to another service. You reach for json.loads() but hit a JSONDecodeError because the string has single quotes instead of double quotes. Or you try to serialize a datetime object and get TypeError: Object of type datetime is not JSON serializable.
JSON is the universal data exchange format for web APIs, configuration files, and data pipelines. Python's built-in json module handles serialization and deserialization, but its quirks trip up developers constantly -- from encoding edge cases to performance bottlenecks when processing large files. This guide covers everything you need to work with JSON in Python confidently, from basic parsing to advanced patterns used in production systems.
What Is JSON?
JSON (JavaScript Object Notation) is a lightweight text-based data format. It supports six data types: strings, numbers, booleans, null, arrays, and objects. A typical JSON document looks like this:
{
"name": "Alice",
"age": 30,
"is_active": true,
"skills": ["Python", "SQL", "Machine Learning"],
"address": {
"city": "San Francisco",
"state": "CA"
}
}Python's json module maps JSON types to Python types automatically:
| JSON Type | Python Type |
|---|---|
| object | dict |
| array | list |
| string | str |
| number (int) | int |
| number (float) | float |
| true/false | True/False |
| null | None |
The Four Core Functions
The json module has four primary functions. Two work with strings, two work with files.
| Function | Input | Output | Use Case |
|---|---|---|---|
json.loads() | JSON string | Python object | Parse API response body |
json.dumps() | Python object | JSON string | Build request payload |
json.load() | File object | Python object | Read config/data file |
json.dump() | Python object | File object | Write data to JSON file |
The naming convention: functions ending in s work with strings. Functions without s work with file-like objects.
Parsing JSON Strings with json.loads()
json.loads() (load from string) converts a JSON-formatted string into a Python object.
import json
json_string = '{"name": "Alice", "age": 30, "skills": ["Python", "SQL"]}'
data = json.loads(json_string)
print(data["name"]) # Alice
print(data["skills"][0]) # Python
print(type(data)) # <class 'dict'>Parsing JSON Arrays
When the JSON root is an array, json.loads() returns a Python list:
import json
json_array = '[1, 2, 3, "four", null, true]'
result = json.loads(json_array)
print(result) # [1, 2, 3, 'four', None, True]
print(type(result)) # <class 'list'>Handling Parsing Errors
Invalid JSON raises json.JSONDecodeError. Always wrap parsing in a try/except block when dealing with external data:
import json
bad_json = "{'name': 'Alice'}" # Single quotes are not valid JSON
try:
data = json.loads(bad_json)
except json.JSONDecodeError as e:
print(f"Invalid JSON: {e}")
# Invalid JSON: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)Common causes of JSONDecodeError:
- Single quotes instead of double quotes
- Trailing commas after the last element
- Unquoted keys
- Comments in the JSON (JSON does not support comments)
- BOM characters at the start of the string
Converting Python to JSON with json.dumps()
json.dumps() (dump to string) serializes a Python object into a JSON-formatted string.
import json
data = {
"name": "Alice",
"age": 30,
"is_active": True,
"scores": [95, 87, 92],
"address": None
}
json_string = json.dumps(data)
print(json_string)
# {"name": "Alice", "age": 30, "is_active": true, "scores": [95, 87, 92], "address": null}Notice that Python True becomes JSON true, None becomes null, and the output uses double quotes.
Pretty Printing with indent
Raw JSON on a single line is hard to read. Use the indent parameter:
import json
data = {"user": {"name": "Alice", "roles": ["admin", "editor"]}, "active": True}
print(json.dumps(data, indent=2))Output:
{
"user": {
"name": "Alice",
"roles": [
"admin",
"editor"
]
},
"active": true
}Sorting Keys
Use sort_keys=True for consistent, deterministic output -- useful for diffs and testing:
import json
data = {"banana": 2, "apple": 5, "cherry": 1}
print(json.dumps(data, sort_keys=True, indent=2))Output:
{
"apple": 5,
"banana": 2,
"cherry": 1
}Controlling Separators
By default, json.dumps() uses ", " between items and ": " between keys and values. You can make compact output by removing spaces:
import json
data = {"a": 1, "b": 2, "c": 3}
# Compact (no spaces)
print(json.dumps(data, separators=(",", ":")))
# {"a":1,"b":2,"c":3}
# Default
print(json.dumps(data))
# {"a": 1, "b": 2, "c": 3}Compact separators reduce file size, which matters when transmitting large JSON payloads over the network.
Handling Non-ASCII Characters
By default, json.dumps() escapes non-ASCII characters. Set ensure_ascii=False to preserve them:
import json
data = {"city": "Zurich", "greeting": "こんにちは"}
print(json.dumps(data))
# {"city": "Zurich", "greeting": "\u3053\u3093\u306b\u3061\u306f"}
print(json.dumps(data, ensure_ascii=False))
# {"city": "Zurich", "greeting": "こんにちは"}Reading JSON Files with json.load()
json.load() reads JSON directly from a file object:
import json
with open("config.json", "r", encoding="utf-8") as f:
config = json.load(f)
print(config["database"]["host"])
print(config["database"]["port"])Always specify encoding="utf-8" when opening files to avoid platform-specific encoding issues.
Reading Large JSON Files
For very large JSON files, loading everything into memory at once may not be feasible. Consider these approaches:
import json
# Approach 1: Read and process line-delimited JSON (JSONL)
with open("events.jsonl", "r") as f:
for line in f:
event = json.loads(line.strip())
process(event)
# Approach 2: Use ijson for streaming large JSON arrays
# pip install ijson
import ijson
with open("huge_file.json", "rb") as f:
for item in ijson.items(f, "item"):
process(item)Writing JSON Files with json.dump()
json.dump() writes a Python object directly to a file as JSON:
import json
data = {
"users": [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
],
"total": 2,
"generated_at": "2026-02-11"
}
with open("output.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, ensure_ascii=False)This creates a properly formatted, human-readable JSON file. For data interchange where file size matters, omit indent:
import json
with open("output_compact.json", "w") as f:
json.dump(data, f, separators=(",", ":"))Handling Custom Objects
Python's json module cannot serialize custom objects, datetime, set, bytes, or Decimal by default. You have three approaches to solve this.
Approach 1: The default Parameter
Pass a function to the default parameter that converts unsupported types:
import json
from datetime import datetime, date
from decimal import Decimal
def json_serializer(obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
if isinstance(obj, Decimal):
return float(obj)
if isinstance(obj, set):
return list(obj)
if isinstance(obj, bytes):
return obj.decode("utf-8")
raise TypeError(f"Type {type(obj)} is not JSON serializable")
data = {
"timestamp": datetime(2026, 2, 11, 14, 30),
"price": Decimal("19.99"),
"tags": {"python", "json", "tutorial"},
"raw": b"hello"
}
print(json.dumps(data, default=json_serializer, indent=2))Output:
{
"timestamp": "2026-02-11T14:30:00",
"price": 19.99,
"tags": ["tutorial", "json", "python"],
"raw": "hello"
}Approach 2: Custom JSONEncoder
For reusable serialization logic, subclass json.JSONEncoder:
import json
from datetime import datetime, date
from decimal import Decimal
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
if isinstance(obj, Decimal):
return str(obj)
if isinstance(obj, set):
return sorted(list(obj))
return super().default(obj)
data = {"created": datetime.now(), "price": Decimal("29.99")}
# Use as cls argument
print(json.dumps(data, cls=CustomEncoder, indent=2))
# Or instantiate directly
encoder = CustomEncoder(indent=2)
print(encoder.encode(data))Approach 3: Custom Decoder with object_hook
To convert JSON back into custom Python objects, use object_hook:
import json
from datetime import datetime
def decode_dates(obj):
for key, value in obj.items():
if isinstance(value, str):
try:
obj[key] = datetime.fromisoformat(value)
except ValueError:
pass
return obj
json_string = '{"name": "Event", "start": "2026-02-11T14:30:00"}'
data = json.loads(json_string, object_hook=decode_dates)
print(type(data["start"])) # <class 'datetime.datetime'>
print(data["start"].year) # 2026Parsing JSON from APIs
One of the most common uses of JSON in Python is handling API responses. The requests library makes this straightforward:
import requests
response = requests.get("https://api.github.com/repos/python/cpython")
# Method 1: Use response.json() (handles decoding automatically)
data = response.json()
print(data["full_name"]) # python/cpython
print(data["stargazers_count"])
# Method 2: Parse manually
data = json.loads(response.text)Building JSON Request Bodies
import requests
import json
payload = {
"query": "SELECT * FROM users",
"limit": 100,
"filters": {"status": "active"}
}
# requests automatically serializes dict to JSON with json= parameter
response = requests.post(
"https://api.example.com/data",
json=payload, # Automatically sets Content-Type: application/json
headers={"Authorization": "Bearer token123"}
)
result = response.json()Handling Paginated API Responses
import requests
def fetch_all_items(base_url):
items = []
page = 1
while True:
response = requests.get(f"{base_url}?page={page}&per_page=100")
data = response.json()
if not data["results"]:
break
items.extend(data["results"])
page += 1
return itemsIf you work with data from APIs frequently and want to explore the results interactively, PyGWalker (opens in a new tab) can turn your parsed JSON data (loaded into a Pandas DataFrame) into a Tableau-like visual interface for drag-and-drop exploration -- no extra code required.
Working with Nested JSON
Real-world JSON is often deeply nested. Here are patterns for working with complex structures:
Safe Access with .get()
import json
data = json.loads('''{
"user": {
"profile": {
"name": "Alice",
"settings": {"theme": "dark"}
}
}
}''')
# Risky: KeyError if any level is missing
# name = data["user"]["profile"]["name"]
# Safe: Use .get() with defaults
name = data.get("user", {}).get("profile", {}).get("name", "Unknown")
print(name) # Alice
# Missing path returns default
email = data.get("user", {}).get("profile", {}).get("email", "N/A")
print(email) # N/AFlattening Nested JSON
import json
def flatten_json(obj, prefix=""):
"""Flatten a nested JSON object into a single-level dict."""
flat = {}
for key, value in obj.items():
new_key = f"{prefix}.{key}" if prefix else key
if isinstance(value, dict):
flat.update(flatten_json(value, new_key))
elif isinstance(value, list):
for i, item in enumerate(value):
if isinstance(item, dict):
flat.update(flatten_json(item, f"{new_key}[{i}]"))
else:
flat[f"{new_key}[{i}]"] = item
else:
flat[new_key] = value
return flat
nested = {
"user": {"name": "Alice", "address": {"city": "SF", "zip": "94102"}},
"tags": ["admin", "editor"]
}
flat = flatten_json(nested)
for key, value in flat.items():
print(f"{key}: {value}")Output:
user.name: Alice
user.address.city: SF
user.address.zip: 94102
tags[0]: admin
tags[1]: editorConverting Nested JSON to a DataFrame
import json
import pandas as pd
json_data = '''[
{"name": "Alice", "scores": {"math": 95, "science": 88}},
{"name": "Bob", "scores": {"math": 78, "science": 92}},
{"name": "Charlie", "scores": {"math": 88, "science": 85}}
]'''
data = json.loads(json_data)
# pd.json_normalize handles nested structures
df = pd.json_normalize(data)
print(df)Output:
name scores.math scores.science
0 Alice 95 88
1 Bob 78 92
2 Charlie 88 85JSON vs Python Dict: Key Differences
JSON strings and Python dictionaries look similar but have important differences:
| Feature | JSON | Python Dict |
|---|---|---|
| Quotes | Double quotes only ("key") | Single or double ('key' or "key") |
| Booleans | true / false | True / False |
| Null value | null | None |
| Key types | Strings only | Any hashable type |
| Trailing commas | Not allowed | Allowed |
| Comments | Not supported | # comments |
| Data type | Text (string) | Python object |
| Tuple | No tuple type (becomes array) | Supported |
| Set | No set type | Supported |
A common mistake is using str() or repr() on a dict and expecting valid JSON:
import json
data = {"active": True, "count": None}
# WRONG: Python repr, not valid JSON
print(str(data)) # {'active': True, 'count': None}
# RIGHT: Valid JSON
print(json.dumps(data)) # {"active": true, "count": null}Command-Line JSON with json.tool
Python includes a built-in command-line tool for pretty-printing and validating JSON:
# Pretty print a JSON file
python -m json.tool data.json
# Pretty print from a pipe (e.g., curl output)
curl -s https://api.github.com/zen | python -m json.tool
# Validate JSON (exits with error code on invalid JSON)
echo '{"valid": true}' | python -m json.tool
# Sort keys
python -m json.tool --sort-keys data.json
# Compact output
python -m json.tool --compact data.jsonThis is invaluable for quick debugging and data inspection without writing a script.
Performance: json vs orjson vs ujson
The built-in json module is reliable but not the fastest option. For performance-critical applications, consider these alternatives:
| Library | Parse Speed | Serialize Speed | Notes |
|---|---|---|---|
json (built-in) | 1x (baseline) | 1x (baseline) | Always available, no install needed |
ujson | ~2-3x faster | ~2-3x faster | Drop-in replacement, C extension |
orjson | ~5-10x faster | ~5-10x faster | Returns bytes, not str; most strict |
rapidjson | ~3-5x faster | ~3-5x faster | Many configuration options |
Using orjson
# pip install orjson
import orjson
# Parse JSON (same interface)
data = orjson.loads('{"name": "Alice", "age": 30}')
# Serialize (returns bytes, not str)
json_bytes = orjson.dumps(data)
print(json_bytes) # b'{"name":"Alice","age":30}'
# Pretty print with orjson
json_bytes = orjson.dumps(data, option=orjson.OPT_INDENT_2)
# orjson handles datetime natively
from datetime import datetime
data = {"timestamp": datetime.now()}
print(orjson.dumps(data)) # b'{"timestamp":"2026-02-11T14:30:00"}'Using ujson
# pip install ujson
import ujson
# Drop-in replacement for json
data = ujson.loads('{"name": "Alice"}')
json_string = ujson.dumps(data, indent=2)
print(json_string)Benchmark Comparison
import json
import time
# Generate test data
data = [{"id": i, "name": f"user_{i}", "scores": [i * 10, i * 20]} for i in range(10000)]
json_string = json.dumps(data)
# Benchmark json
start = time.perf_counter()
for _ in range(100):
json.loads(json_string)
json_time = time.perf_counter() - start
print(f"json: {json_time:.3f}s")
# Benchmark orjson (if installed)
try:
import orjson
start = time.perf_counter()
for _ in range(100):
orjson.loads(json_string)
orjson_time = time.perf_counter() - start
print(f"orjson: {orjson_time:.3f}s ({json_time/orjson_time:.1f}x faster)")
except ImportError:
print("orjson not installed")For data-heavy workflows -- parsing API responses, transforming JSON into DataFrames for analysis -- consider using RunCell (opens in a new tab) as your Jupyter environment. Its AI agent can help you write and debug JSON parsing code interactively.
Common Errors and Solutions
JSONDecodeError
import json
# Problem: Invalid JSON format
try:
json.loads("{'key': 'value'}") # Single quotes
except json.JSONDecodeError as e:
print(f"Error at position {e.pos}: {e.msg}")
# Solution: Ensure proper JSON format
data = json.loads('{"key": "value"}')TypeError: Object is not JSON serializable
import json
from datetime import datetime
# Problem: datetime is not serializable
try:
json.dumps({"now": datetime.now()})
except TypeError as e:
print(e) # Object of type datetime is not JSON serializable
# Solution 1: Convert before serializing
data = {"now": datetime.now().isoformat()}
print(json.dumps(data))
# Solution 2: Use default parameter
print(json.dumps({"now": datetime.now()}, default=str))Handling NaN and Infinity
JSON does not support NaN or Infinity, but Python's json module allows them by default (non-standard behavior):
import json
import math
data = {"value": float("nan"), "big": float("inf")}
# Default: allows NaN/Infinity (non-standard JSON)
print(json.dumps(data)) # {"value": NaN, "big": Infinity}
# Strict: raise error on NaN/Infinity
try:
json.dumps(data, allow_nan=False)
except ValueError as e:
print(e) # Out of range float values are not JSON compliant
# Solution: Replace NaN with None
clean = {k: (None if isinstance(v, float) and (math.isnan(v) or math.isinf(v)) else v)
for k, v in data.items()}
print(json.dumps(clean)) # {"value": null, "big": null}Encoding Issues
import json
# Problem: BOM character in JSON string
json_with_bom = '\ufeff{"key": "value"}'
try:
json.loads(json_with_bom)
except json.JSONDecodeError:
# Solution: Strip BOM
clean = json_with_bom.lstrip('\ufeff')
data = json.loads(clean)
print(data)Real-World Patterns
Configuration Files
import json
from pathlib import Path
def load_config(config_path="config.json"):
"""Load config with defaults and validation."""
defaults = {
"debug": False,
"log_level": "INFO",
"max_retries": 3,
"timeout": 30
}
path = Path(config_path)
if path.exists():
with open(path, "r") as f:
user_config = json.load(f)
# Merge: user config overrides defaults
return {**defaults, **user_config}
return defaults
def save_config(config, config_path="config.json"):
"""Save config with pretty formatting."""
with open(config_path, "w") as f:
json.dump(config, f, indent=2, sort_keys=True)
# Usage
config = load_config()
config["debug"] = True
save_config(config)JSON Lines (JSONL) for Log Files
JSONL stores one JSON object per line, making it efficient for append-only writes and line-by-line streaming reads:
import json
from datetime import datetime
def log_event(event_type, data, log_file="events.jsonl"):
"""Append a JSON event to a JSONL log file."""
event = {
"timestamp": datetime.now().isoformat(),
"type": event_type,
"data": data
}
with open(log_file, "a") as f:
f.write(json.dumps(event) + "\n")
def read_events(log_file="events.jsonl", event_type=None):
"""Read and optionally filter events from a JSONL file."""
events = []
with open(log_file, "r") as f:
for line in f:
event = json.loads(line.strip())
if event_type is None or event["type"] == event_type:
events.append(event)
return events
# Usage
log_event("user_login", {"user_id": 42, "ip": "192.168.1.1"})
log_event("page_view", {"user_id": 42, "page": "/dashboard"})
log_event("user_login", {"user_id": 99, "ip": "10.0.0.1"})
logins = read_events(event_type="user_login")
print(f"Total logins: {len(logins)}")API Response Caching
import json
import time
from pathlib import Path
class JSONCache:
"""Simple file-based JSON cache with TTL."""
def __init__(self, cache_dir=".cache"):
self.cache_dir = Path(cache_dir)
self.cache_dir.mkdir(exist_ok=True)
def get(self, key, ttl=3600):
"""Get cached value if it exists and hasn't expired."""
path = self.cache_dir / f"{key}.json"
if not path.exists():
return None
with open(path, "r") as f:
cached = json.load(f)
if time.time() - cached["timestamp"] > ttl:
path.unlink() # Delete expired cache
return None
return cached["data"]
def set(self, key, data):
"""Store data in cache."""
path = self.cache_dir / f"{key}.json"
with open(path, "w") as f:
json.dump({"timestamp": time.time(), "data": data}, f)
# Usage
cache = JSONCache()
cache.set("user_42", {"name": "Alice", "role": "admin"})
result = cache.get("user_42", ttl=300) # 5-minute TTLJSON Schema Validation
# pip install jsonschema
from jsonschema import validate, ValidationError
import json
schema = {
"type": "object",
"properties": {
"name": {"type": "string", "minLength": 1},
"age": {"type": "integer", "minimum": 0, "maximum": 150},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age"]
}
# Valid data
valid = {"name": "Alice", "age": 30, "email": "alice@example.com"}
validate(instance=valid, schema=schema) # No error
# Invalid data
invalid = {"name": "", "age": -5}
try:
validate(instance=invalid, schema=schema)
except ValidationError as e:
print(f"Validation failed: {e.message}")Quick Reference: json Module Parameters
| Parameter | Function(s) | Description |
|---|---|---|
indent | dumps, dump | Pretty print with N spaces of indentation |
sort_keys | dumps, dump | Sort dictionary keys alphabetically |
default | dumps, dump | Function to serialize non-standard types |
cls | dumps, dump, loads, load | Custom encoder/decoder class |
ensure_ascii | dumps, dump | Escape non-ASCII characters (default: True) |
separators | dumps, dump | Tuple of (item_sep, key_sep) |
allow_nan | dumps, dump | Allow NaN/Infinity (default: True) |
object_hook | loads, load | Function to transform decoded dicts |
object_pairs_hook | loads, load | Function receiving ordered pairs |
parse_float | loads, load | Function to parse float strings (e.g., Decimal) |
parse_int | loads, load | Function to parse int strings |
Frequently Asked Questions
What is the difference between json.loads() and json.load()?
json.loads() parses a JSON string into a Python object. json.load() reads JSON from a file object. Use loads() when you already have the JSON as a string (e.g., from an API response), and load() when reading from a file on disk.
How do I convert a Python dictionary to a JSON string?
Use json.dumps(your_dict). For pretty formatting, add indent=2. For example: json.dumps({"name": "Alice", "age": 30}, indent=2) produces a nicely formatted JSON string with 2-space indentation.
How do I handle datetime objects in JSON?
Python's json module cannot serialize datetime objects by default. Use the default parameter: json.dumps(data, default=str) converts any non-serializable object to its string representation. For more control, write a custom function or use orjson, which handles datetime natively.
Why do I get JSONDecodeError when parsing a Python dictionary string?
Python dictionaries use single quotes, True/False, and None. JSON requires double quotes, true/false, and null. Use json.dumps() to convert a Python dict to valid JSON, not str() or repr().
Which JSON library is fastest in Python?
orjson is the fastest, typically 5-10x faster than the built-in json module for both parsing and serialization. ujson is 2-3x faster and acts as a drop-in replacement. The built-in json module is sufficient for most use cases and has no external dependencies.
How do I read a large JSON file without running out of memory?
Use ijson for streaming large JSON files, or switch to JSONL (JSON Lines) format where each line is a separate JSON object. JSONL allows line-by-line processing with minimal memory usage. For large arrays, ijson.items() yields one element at a time.
Conclusion
Python's json module handles the vast majority of JSON tasks you will encounter: parsing API responses with json.loads(), building request payloads with json.dumps(), and reading/writing files with json.load() and json.dump(). For custom objects, use the default parameter or a custom JSONEncoder. When performance matters, switch to orjson for a significant speed boost.
The key patterns to remember: always use try/except around parsing external JSON, use indent=2 for human-readable output, use ensure_ascii=False for international text, and never use str() on a dict when you need valid JSON. For large-scale data processing, JSONL format with line-by-line streaming keeps memory usage low while maintaining the simplicity of JSON.