Python Requests 库:Python HTTP 请求完整指南
Updated on
在 Python 中使用内置的 urllib 模块发起 HTTP 请求以“复杂、冗长”而闻名。你需要手动编码参数、通过多次方法调用来处理响应对象,并且仅仅为了发送一个简单的 API 请求就要写几十行样板代码。这种复杂性会拖慢开发速度,也会让代码更难维护。
Python requests 库通过提供优雅、面向人的 HTTP 通信 API 消除了这种挫败感。无论你是在消费 REST API、爬取网站、上传文件,还是编写自动化脚本,requests 都能让 HTTP 操作变得直观、直接,只需几行代码即可完成。
在这篇全面指南中,你将学习从基础的 GET 与 POST 请求,到认证、session、错误处理以及真实世界 API 集成模式等高级特性。
安装 Python Requests 库
requests 库不是 Python 标准库的一部分,因此你需要用 pip 单独安装:
pip install requests对于 conda 用户:
conda install requests安装后,你可以在 Python 脚本中导入它:
import requests验证安装并检查版本:
import requests
print(requests.__version__)使用 Python Requests 发起 GET 请求
GET 请求是最常见的 HTTP 方法,用于从服务器获取数据。requests 库让 GET 请求变得非常简单。
基础 GET 请求
下面是发起基础 GET 请求的方法:
import requests
response = requests.get('https://api.github.com')
print(response.status_code) # 200
print(response.text) # Response body as string带查询参数的 GET 请求
与其手动拼接带 query string 的 URL,不如使用 params 参数:
import requests
# Method 1: Using params dictionary
params = {
'q': 'python requests',
'sort': 'stars',
'order': 'desc'
}
response = requests.get('https://api.github.com/search/repositories', params=params)
# The URL is automatically constructed:
# https://api.github.com/search/repositories?q=python+requests&sort=stars&order=desc
print(response.url) # View the constructed URL
data = response.json() # Parse JSON response带自定义请求头的 GET 请求
很多 API 需要自定义 headers 来进行认证或内容协商(content negotiation):
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json',
'Accept-Language': 'en-US,en;q=0.9'
}
response = requests.get('https://api.example.com/data', headers=headers)
print(response.json())在 Python 中发起 POST 请求
POST 请求会向服务器发送数据,常用于表单提交与 API 操作。
使用表单数据的 POST 请求
发送表单编码的数据(类似 HTML 表单提交):
import requests
# Send form data
data = {
'username': 'john_doe',
'password': 'secret123',
'remember_me': True
}
response = requests.post('https://example.com/login', data=data)
print(response.status_code)使用 JSON Payload 的 POST 请求
现代 REST API 通常期望 JSON payload。requests 会自动处理 JSON 序列化:
import requests
# Method 1: Using json parameter (recommended)
payload = {
'name': 'New Project',
'description': 'A test project',
'tags': ['python', 'api']
}
response = requests.post('https://api.example.com/projects', json=payload)
# Method 2: Manual JSON encoding
import json
headers = {'Content-Type': 'application/json'}
response = requests.post(
'https://api.example.com/projects',
data=json.dumps(payload),
headers=headers
)带文件的 POST 请求
使用 files 参数上传文件:
import requests
# Upload a single file
files = {'file': open('report.pdf', 'rb')}
response = requests.post('https://example.com/upload', files=files)
# Upload multiple files
files = {
'file1': open('document.pdf', 'rb'),
'file2': open('image.jpg', 'rb')
}
response = requests.post('https://example.com/upload', files=files)
# Upload file with additional form data
files = {'file': open('data.csv', 'rb')}
data = {'description': 'Monthly report', 'category': 'finance'}
response = requests.post('https://example.com/upload', files=files, data=data)其他 HTTP 方法:PUT、PATCH、DELETE
requests 库支持所有标准 HTTP 方法:
import requests
# PUT - Replace entire resource
data = {'name': 'Updated Name', 'status': 'active'}
response = requests.put('https://api.example.com/users/123', json=data)
# PATCH - Partially update resource
data = {'status': 'inactive'}
response = requests.patch('https://api.example.com/users/123', json=data)
# DELETE - Remove resource
response = requests.delete('https://api.example.com/users/123')
print(response.status_code) # 204 No Content
# HEAD - Get headers only (no response body)
response = requests.head('https://example.com')
print(response.headers)
# OPTIONS - Get supported methods
response = requests.options('https://api.example.com/users')
print(response.headers.get('Allow'))理解 Response 对象
Response 对象包含服务器回复的全部信息:
import requests
response = requests.get('https://api.github.com/users/github')
# Status code
print(response.status_code) # 200, 404, 500, etc.
# Response body as string
print(response.text)
# Response body as JSON (for JSON APIs)
data = response.json()
print(data['login'])
# Raw binary content (for images, files)
image_data = response.content
with open('profile.jpg', 'wb') as f:
f.write(image_data)
# Response headers
print(response.headers)
print(response.headers['Content-Type'])
# Encoding
print(response.encoding) # 'utf-8'
# Request information
print(response.request.headers)
print(response.request.url)
# Check if request was successful
if response.ok: # True if status_code < 400
print("Success!")使用请求头(Request Headers)
headers 会携带请求的关键元数据:
设置自定义 Headers
import requests
headers = {
'User-Agent': 'MyApp/1.0',
'Accept': 'application/json',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Custom-Header': 'custom-value'
}
response = requests.get('https://api.example.com', headers=headers)访问响应头(Response Headers)
import requests
response = requests.get('https://api.github.com')
# Dictionary-like access
print(response.headers['Content-Type'])
print(response.headers.get('X-RateLimit-Remaining'))
# Case-insensitive access
print(response.headers['content-type']) # Works!
# Iterate all headers
for key, value in response.headers.items():
print(f"{key}: {value}")使用 Python Requests 进行认证(Authentication)
requests 库支持多种认证机制:
Basic Authentication
import requests
from requests.auth import HTTPBasicAuth
# Method 1: Using auth parameter (recommended)
response = requests.get(
'https://api.example.com/protected',
auth=('username', 'password')
)
# Method 2: Explicit HTTPBasicAuth
response = requests.get(
'https://api.example.com/protected',
auth=HTTPBasicAuth('username', 'password')
)
# Method 3: Manual header (not recommended)
import base64
credentials = base64.b64encode(b'username:password').decode('utf-8')
headers = {'Authorization': f'Basic {credentials}'}
response = requests.get('https://api.example.com/protected', headers=headers)Bearer Token Authentication
常见于 JWT token 与 OAuth 2.0:
import requests
token = 'your_access_token_here'
headers = {'Authorization': f'Bearer {token}'}
response = requests.get('https://api.example.com/user', headers=headers)API Key Authentication
import requests
# Method 1: Query parameter
params = {'api_key': 'your_api_key_here'}
response = requests.get('https://api.example.com/data', params=params)
# Method 2: Custom header
headers = {'X-API-Key': 'your_api_key_here'}
response = requests.get('https://api.example.com/data', headers=headers)OAuth 2.0 Authentication
对于 OAuth 2.0,使用 requests-oauthlib 库:
from requests_oauthlib import OAuth2Session
client_id = 'your_client_id'
client_secret = 'your_client_secret'
token_url = 'https://oauth.example.com/token'
oauth = OAuth2Session(client_id)
token = oauth.fetch_token(token_url, client_secret=client_secret)
# Make authenticated requests
response = oauth.get('https://api.example.com/protected')使用 Session 保持持久连接
Session 会在多次请求之间维护 cookies、连接池(connection pooling)以及配置:
import requests
# Create a session
session = requests.Session()
# Set headers for all requests in this session
session.headers.update({
'User-Agent': 'MyApp/1.0',
'Accept': 'application/json'
})
# Login and session maintains cookies
login_data = {'username': 'john', 'password': 'secret'}
session.post('https://example.com/login', data=login_data)
# Subsequent requests use the session cookies
response1 = session.get('https://example.com/dashboard')
response2 = session.get('https://example.com/profile')
# Close the session
session.close()带认证的 Session
import requests
session = requests.Session()
session.auth = ('username', 'password')
# All requests in this session use the authentication
response1 = session.get('https://api.example.com/users')
response2 = session.get('https://api.example.com/posts')Session 上下文管理器(Context Manager)
import requests
with requests.Session() as session:
session.headers.update({'Authorization': 'Bearer token123'})
response1 = session.get('https://api.example.com/data')
response2 = session.post('https://api.example.com/data', json={'key': 'value'})
# Session automatically closed after with blockTimeout 与重试策略(Retry)
务必设置 timeout,避免请求无限期挂起:
设置 Timeout
import requests
# Single timeout value (applies to both connect and read)
response = requests.get('https://api.example.com', timeout=5)
# Separate connect and read timeouts
response = requests.get('https://api.example.com', timeout=(3, 10))
# 3 seconds to establish connection, 10 seconds to read response
# No timeout (dangerous - may hang forever)
response = requests.get('https://api.example.com', timeout=None)实现重试逻辑(Retry Logic)
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
# Configure retry strategy
retry_strategy = Retry(
total=3, # Total number of retries
backoff_factor=1, # Wait 1, 2, 4 seconds between retries
status_forcelist=[429, 500, 502, 503, 504], # Retry on these status codes
allowed_methods=["HEAD", "GET", "OPTIONS", "POST"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)
# Requests will automatically retry on failure
response = session.get('https://api.example.com/data')Python Requests 的错误处理
健壮的错误处理对生产环境应用至关重要:
处理常见异常
import requests
from requests.exceptions import (
ConnectionError,
Timeout,
HTTPError,
RequestException
)
try:
response = requests.get('https://api.example.com/data', timeout=5)
response.raise_for_status() # Raises HTTPError for 4xx/5xx status codes
data = response.json()
except ConnectionError:
print("Failed to connect to the server")
except Timeout:
print("Request timed out")
except HTTPError as e:
print(f"HTTP error occurred: {e}")
print(f"Status code: {e.response.status_code}")
except RequestException as e:
# Catches all requests exceptions
print(f"An error occurred: {e}")
except ValueError:
# JSON decoding error
print("Invalid JSON response")检查状态码(Status Codes)
import requests
response = requests.get('https://api.example.com/data')
# Method 1: Manual check
if response.status_code == 200:
data = response.json()
elif response.status_code == 404:
print("Resource not found")
elif response.status_code >= 500:
print("Server error")
# Method 2: Using raise_for_status()
try:
response.raise_for_status()
data = response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 404:
print("Resource not found")
elif response.status_code == 401:
print("Authentication required")
else:
print(f"HTTP error: {e}")
# Method 3: Using response.ok
if response.ok: # True if status_code < 400
data = response.json()
else:
print(f"Request failed with status {response.status_code}")SSL 校验与证书
默认情况下,requests 会校验 SSL 证书:
import requests
# Default behavior - verify SSL certificate
response = requests.get('https://api.example.com')
# Disable SSL verification (not recommended for production)
response = requests.get('https://example.com', verify=False)
# Suppress the InsecureRequestWarning
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
response = requests.get('https://example.com', verify=False)
# Use custom CA bundle
response = requests.get('https://example.com', verify='/path/to/ca_bundle.crt')
# Client-side certificates
response = requests.get(
'https://example.com',
cert=('/path/to/client.crt', '/path/to/client.key')
)在 Python Requests 中使用代理(Proxies)
为请求配置代理服务器:
import requests
# HTTP and HTTPS proxies
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
response = requests.get('https://api.example.com', proxies=proxies)
# SOCKS proxy (requires requests[socks])
proxies = {
'http': 'socks5://user:pass@host:port',
'https': 'socks5://user:pass@host:port'
}
# Use environment variables
# Set HTTP_PROXY and HTTPS_PROXY environment variables
response = requests.get('https://api.example.com') # Automatically uses env proxies
# Disable proxies
response = requests.get('https://api.example.com', proxies={'http': None, 'https': None})Python HTTP 库对比
下面是 requests 与替代方案的对比:
| Feature | requests | urllib | httpx | aiohttp |
|---|---|---|---|---|
| Ease of Use | Excellent (Pythonic API) | Poor (verbose) | Excellent | Good |
| Async Support | No | No | Yes | Yes |
| HTTP/2 Support | No | No | Yes | No |
| Session Management | Built-in | Manual | Built-in | Built-in |
| JSON Handling | Automatic | Manual | Automatic | Automatic |
| Connection Pooling | Yes | No | Yes | Yes |
| Standard Library | No (pip install) | Yes | No (pip install) | No (pip install) |
| Documentation | Excellent | Good | Excellent | Good |
| Performance | Good | Fair | Excellent | Excellent (async) |
| SSL/TLS | Full support | Full support | Full support | Full support |
| Best For | Synchronous HTTP, general use | Simple scripts, no dependencies | Modern sync/async HTTP | High-performance async |
何时使用哪一个:
- requests:大多数同步 HTTP 操作的默认选择。最适合 web scraping、API 消费与通用 HTTP 任务。
- urllib:仅当你无法安装外部包(必须使用标准库)时使用。
- httpx:当你需要 HTTP/2 支持,或想要一个与 requests 兼容、同时具备 async 能力的现代 API 时使用。
- aiohttp:适用于需要处理大量并发请求的高性能异步应用。
速率限制与礼貌爬取(Respectful Scraping)
在爬取网站或调用 API 时,建议实现速率限制:
import requests
import time
from datetime import datetime
class RateLimitedSession:
def __init__(self, requests_per_second=1):
self.session = requests.Session()
self.min_interval = 1.0 / requests_per_second
self.last_request_time = 0
def get(self, url, **kwargs):
# Wait if necessary
elapsed = time.time() - self.last_request_time
if elapsed < self.min_interval:
time.sleep(self.min_interval - elapsed)
# Make request
response = self.session.get(url, **kwargs)
self.last_request_time = time.time()
return response
# Use rate-limited session
session = RateLimitedSession(requests_per_second=2) # 2 requests per second
urls = ['https://api.example.com/item/1', 'https://api.example.com/item/2']
for url in urls:
response = session.get(url)
print(f"{datetime.now()}: {response.status_code}")遵守 robots.txt
import requests
from urllib.robotparser import RobotFileParser
def can_fetch(url, user_agent='MyBot'):
"""Check if URL can be scraped according to robots.txt"""
rp = RobotFileParser()
robots_url = f"{url.split('/')[0]}//{url.split('/')[2]}/robots.txt"
rp.set_url(robots_url)
rp.read()
return rp.can_fetch(user_agent, url)
url = 'https://example.com/page'
if can_fetch(url):
response = requests.get(url)
else:
print("Scraping not allowed by robots.txt")真实世界示例与使用场景
示例 1:消费 REST API
import requests
class GitHubAPI:
def __init__(self, token=None):
self.base_url = 'https://api.github.com'
self.session = requests.Session()
if token:
self.session.headers.update({'Authorization': f'token {token}'})
def get_user(self, username):
"""Get user information"""
response = self.session.get(f'{self.base_url}/users/{username}')
response.raise_for_status()
return response.json()
def search_repositories(self, query, sort='stars', limit=10):
"""Search repositories"""
params = {'q': query, 'sort': sort, 'per_page': limit}
response = self.session.get(f'{self.base_url}/search/repositories', params=params)
response.raise_for_status()
return response.json()['items']
def create_issue(self, owner, repo, title, body):
"""Create an issue in a repository"""
url = f'{self.base_url}/repos/{owner}/{repo}/issues'
data = {'title': title, 'body': body}
response = self.session.post(url, json=data)
response.raise_for_status()
return response.json()
# Usage
api = GitHubAPI(token='your_github_token')
user = api.get_user('torvalds')
print(f"Name: {user['name']}, Followers: {user['followers']}")
repos = api.search_repositories('python requests', limit=5)
for repo in repos:
print(f"{repo['full_name']}: {repo['stargazers_count']} stars")示例 2:带进度条下载文件
import requests
from tqdm import tqdm
def download_file(url, filename):
"""Download file with progress bar"""
response = requests.get(url, stream=True)
response.raise_for_status()
total_size = int(response.headers.get('content-length', 0))
with open(filename, 'wb') as f, tqdm(
desc=filename,
total=total_size,
unit='B',
unit_scale=True,
unit_divisor=1024,
) as progress_bar:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
progress_bar.update(len(chunk))
# Download a file
download_file('https://example.com/large-file.zip', 'downloaded.zip')示例 3:带错误处理的 Web Scraping
import requests
from bs4 import BeautifulSoup
import time
def scrape_articles(base_url, max_pages=5):
"""Scrape article titles from a news website"""
session = requests.Session()
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})
articles = []
for page in range(1, max_pages + 1):
try:
url = f"{base_url}?page={page}"
response = session.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
titles = soup.find_all('h2', class_='article-title')
for title in titles:
articles.append({
'title': title.text.strip(),
'url': title.find('a')['href'] if title.find('a') else None
})
print(f"Scraped page {page}: {len(titles)} articles")
time.sleep(1) # Rate limiting
except requests.exceptions.RequestException as e:
print(f"Error scraping page {page}: {e}")
continue
return articles
# Usage
articles = scrape_articles('https://news.example.com/articles', max_pages=3)
print(f"Total articles collected: {len(articles)}")示例 4:带分页的 API 集成
import requests
def fetch_all_items(api_url, headers=None):
"""Fetch all items from a paginated API"""
items = []
page = 1
while True:
try:
params = {'page': page, 'per_page': 100}
response = requests.get(api_url, params=params, headers=headers, timeout=10)
response.raise_for_status()
data = response.json()
if not data: # No more items
break
items.extend(data)
print(f"Fetched page {page}: {len(data)} items")
page += 1
# Check for pagination in headers
if 'Link' in response.headers:
links = response.headers['Link']
if 'rel="next"' not in links:
break
except requests.exceptions.RequestException as e:
print(f"Error fetching page {page}: {e}")
break
return items
# Usage
all_items = fetch_all_items('https://api.example.com/items')
print(f"Total items: {len(all_items)}")在 Jupyter 中用 RunCell 测试 API
在开发与测试 API 集成时,RunCell (opens in a new tab) 提供了一个直接运行在 Jupyter notebooks 中的 AI agent 环境。相比手动调试 HTTP 请求与响应,RunCell 的智能 agent 可以帮助你:
- 自动构造并测试带正确认证的 API 请求
- 实时调试响应解析与错误处理
- 为常见 HTTP 模式生成代码片段
- 将 API 响应与预期 schema 进行校验
- 快速迭代 API 响应到数据转换逻辑
当你处理需要多步认证、分页处理或复杂数据解析逻辑的 API 时,这一点尤其有价值。RunCell 通过减少手动测试 HTTP 请求的反复操作,加速整体开发工作流。
FAQ
Python requests 库用于什么?
Python requests 库用于向 Web 服务器与 API 发起 HTTP 请求。它简化了获取网页、消费 REST API、发送表单数据、上传文件以及处理认证等任务。由于 API 直观且功能全面,它是 Python 中最流行的 HTTP 库。
如何安装 Python requests 库?
使用 pip 安装:pip install requests。对于 conda 环境,使用 conda install requests。安装后在代码中通过 import requests 导入。requests 不属于 Python 标准库,因此需要额外安装。
requests.get() 和 requests.post() 有什么区别?
requests.get() 用于从服务器获取数据而不修改资源,通常用于获取网页或 API 数据。requests.post() 将数据发送到服务器以创建或更新资源,常用于表单提交、文件上传或会改变服务器状态的 API 操作。GET 通常把参数放在 URL 中,而 POST 会把数据放在请求体中。
如何用 Python requests 库处理错误?
使用 try-except 捕获请求异常:ConnectionError 处理网络连接问题、Timeout 处理超时、HTTPError 处理 4xx/5xx 状态码、RequestException 作为兜底。每次请求后调用 response.raise_for_status(),可在请求失败时自动抛出 HTTPError。同时应始终设置 timeout,避免请求无限期挂起。
如何用 Python requests 发送 JSON 数据?
使用 json 参数:requests.post(url, json=data)。requests 会自动把 Python dict 序列化为 JSON,并设置 Content-Type: application/json。解析 JSON 响应可使用 response.json(),它会把响应体反序列化为 Python dict。
在 Python 中应该用 requests 还是 urllib?
多数 HTTP 场景使用 requests。相比 urllib,它提供更清晰的 API、自动 JSON 处理、内置 session 管理以及更好的错误处理。只有在你无法安装外部包、必须依赖标准库时才使用 urllib。若需要 HTTP/2 或 async 支持,可考虑 httpx 作为替代方案。
如何在 Python requests 中添加认证?
Basic 认证:requests.get(url, auth=('username', 'password'))。Bearer token(JWT、OAuth):添加 Authorization header:headers = {'Authorization': f'Bearer {token}'}。API key:可以作为 query parameters 通过 params 传入,或作为自定义 header(如 'X-API-Key')。你也可以用 session 在多次请求之间持久化认证信息。
Python requests 的 session 是什么,什么时候该用?
Session 会在对同一服务器的多次请求之间保持配置(headers、cookies、authentication)。当你需要对同一 API 发起多次请求、需要用 cookies 维持登录态,或希望复用 TCP 连接以提升性能时,应该使用 session。用 session = requests.Session() 创建,并用 session.get() 代替 requests.get()。
结论
Python requests 库是 Python 中进行 HTTP 通信不可或缺的工具。它优雅的 API 将复杂的 HTTP 操作转化为简单、可读的代码。从基础的 GET 请求到认证、session、文件上传、错误处理等高级能力,requests 提供了构建健壮 HTTP 交互所需的一切。
通过掌握本文中的模式与最佳实践——设置 timeout、实现 retry、优雅处理错误、遵守 rate limit——你将能够构建与 Web 服务和 API 高效可靠通信的应用。无论你是在消费 REST API、进行 web scraping,还是构建自动化工具,requests 都能让 HTTP 操作变得更直接、更易维护。