Python subprocess:从 Python 运行外部命令(完整指南)
Updated on
Python 脚本经常需要调用外部程序。你可能需要运行一条 shell 命令来压缩文件、调用 git 做版本控制、使用 ffmpeg 这类系统工具处理视频,或在数据流水线中执行某个已编译的二进制程序。但如果你直接使用 os.system() 或反引号风格的“黑魔法”,代码往往会变得脆弱、不安全,而且一旦出问题几乎无法调试。
这种痛点会很快放大。输出会消失在虚空中,因为你没有办法捕获它。错误会悄无声息地被忽略,因为返回码没人检查。只要有一个用户提供的文件名里包含空格或分号,你看似无害的脚本就可能变成一个 shell 注入漏洞。而当子进程卡住时,你整个 Python 程序也会跟着卡住——没有超时、没有恢复、没有解释。
Python 的 subprocess 模块是标准答案。它用一套统一一致的 API 替代了 os.system()、os.popen() 以及已废弃的 commands 模块,支持创建进程、捕获输出、处理错误、设置超时、构建管道等能力。本指南覆盖你高效且安全使用它所需要的一切。
使用 subprocess.run() 快速开始
subprocess.run() 函数在 Python 3.5 引入,是运行外部命令的推荐方式。它会执行命令、等待其结束,并返回一个 CompletedProcess 对象。
import subprocess
# Run a simple command
result = subprocess.run(["ls", "-la"], capture_output=True, text=True)
print(result.stdout) # standard output as a string
print(result.stderr) # standard error as a string
print(result.returncode) # 0 means success关键参数:
capture_output=True捕获 stdout 和 stderr(等价于stdout=subprocess.PIPE, stderr=subprocess.PIPE)text=True将输出解码为字符串而非 bytes- 命令以字符串列表的形式传入,每个参数都是列表中的独立元素
import subprocess
# Run a command with arguments
result = subprocess.run(
["python", "--version"],
capture_output=True,
text=True
)
print(result.stdout.strip()) # e.g., "Python 3.12.1"理解命令参数:列表 vs 字符串
subprocess.run() 的第一个参数可以是列表或字符串。这个差异对正确性与安全性都很关键。
列表形式(推荐)
列表中的每个元素都是一个独立参数。Python 会将它们直接传给操作系统,不经过 shell 解释。
import subprocess
# Each argument is a separate list element
result = subprocess.run(
["grep", "-r", "TODO", "/home/user/project"],
capture_output=True,
text=True
)
print(result.stdout)即使文件名里有空格、引号或特殊字符,也能正确工作,因为每个参数都会原样传递:
import subprocess
# Filename with spaces -- works correctly as a list element
result = subprocess.run(
["cat", "my file with spaces.txt"],
capture_output=True,
text=True
)字符串形式(需要 shell=True)
传入单个字符串通常需要 shell=True,这会调用系统 shell(Unix 上是 /bin/sh,Windows 上是 cmd.exe)来解释命令。
import subprocess
# String form requires shell=True
result = subprocess.run(
"ls -la | grep '.py'",
shell=True,
capture_output=True,
text=True
)
print(result.stdout)这能启用 shell 特性,比如管道(|)、重定向(>)、通配符(*.py)以及环境变量展开($HOME)。但同时也会带来严重的安全风险,我们会在后面的安全章节详细说明。
捕获输出
分别捕获 stdout 和 stderr
import subprocess
result = subprocess.run(
["python", "-c", "import sys; print('out'); print('err', file=sys.stderr)"],
capture_output=True,
text=True
)
print(f"stdout: {result.stdout}") # "out\n"
print(f"stderr: {result.stderr}") # "err\n"将 stderr 合并到 stdout
有时你希望把所有输出合并到一个流里。使用 stderr=subprocess.STDOUT:
import subprocess
result = subprocess.run(
["python", "-c", "import sys; print('out'); print('err', file=sys.stderr)"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True
)
print(result.stdout) # Contains both "out\n" and "err\n"丢弃输出
将输出指向 subprocess.DEVNULL 来抑制输出:
import subprocess
# Run silently -- discard all output
result = subprocess.run(
["apt-get", "update"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL
)二进制输出
省略 text=True 以获取原始 bytes。对图片、压缩文件等二进制数据很有用:
import subprocess
# Capture binary output (e.g., from curl)
result = subprocess.run(
["curl", "-s", "https://example.com/image.png"],
capture_output=True
)
image_bytes = result.stdout # bytes object
print(f"Downloaded {len(image_bytes)} bytes")错误处理
手动检查返回码
默认情况下,subprocess.run() 在命令失败时不会抛异常。你需要自己检查 returncode:
import subprocess
result = subprocess.run(
["ls", "/nonexistent/path"],
capture_output=True,
text=True
)
if result.returncode != 0:
print(f"Command failed with code {result.returncode}")
print(f"Error: {result.stderr}")使用 check=True 在失败时自动抛异常
check=True 会在返回码非 0 时抛出 subprocess.CalledProcessError:
import subprocess
try:
result = subprocess.run(
["ls", "/nonexistent/path"],
capture_output=True,
text=True,
check=True
)
except subprocess.CalledProcessError as e:
print(f"Command failed with return code {e.returncode}")
print(f"stderr: {e.stderr}")
print(f"stdout: {e.stdout}")对于“应该永远成功”的命令,这是推荐模式。它会强制你显式处理失败,而不是悄悄忽略。
处理“命令不存在”
如果可执行文件不存在,Python 会抛出 FileNotFoundError:
import subprocess
try:
result = subprocess.run(
["nonexistent_command"],
capture_output=True,
text=True
)
except FileNotFoundError:
print("Command not found -- is it installed and in PATH?")超时
耗时很久或卡死的进程会让脚本永久阻塞。timeout 参数(秒)会在超时后终止进程,并抛出 subprocess.TimeoutExpired:
import subprocess
try:
result = subprocess.run(
["sleep", "30"],
timeout=5,
capture_output=True,
text=True
)
except subprocess.TimeoutExpired:
print("Process timed out after 5 seconds")这对网络命令、外部 API 调用或任何可能挂起的进程都至关重要:
import subprocess
def run_with_timeout(cmd, timeout_seconds=30):
"""Run a command with timeout and error handling."""
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout_seconds,
check=True
)
return result.stdout
except subprocess.TimeoutExpired:
print(f"Command timed out after {timeout_seconds}s: {' '.join(cmd)}")
return None
except subprocess.CalledProcessError as e:
print(f"Command failed (code {e.returncode}): {e.stderr}")
return None
except FileNotFoundError:
print(f"Command not found: {cmd[0]}")
return None
# Usage
output = run_with_timeout(["ping", "-c", "4", "example.com"], timeout_seconds=10)
if output:
print(output)向进程传递输入
使用 input 参数向进程的 stdin 发送数据:
import subprocess
# Send text to stdin
result = subprocess.run(
["grep", "error"],
input="line 1\nerror on line 2\nline 3\nerror on line 4\n",
capture_output=True,
text=True
)
print(result.stdout)
# "error on line 2\nerror on line 4\n"这可以替代常见的“用 shell 管道把数据传来传去”的写法:
import subprocess
import json
# Send JSON to a processing command
data = {"name": "Alice", "score": 95}
json_string = json.dumps(data)
result = subprocess.run(
["python", "-c", "import sys, json; d = json.load(sys.stdin); print(d['name'])"],
input=json_string,
capture_output=True,
text=True
)
print(result.stdout.strip()) # "Alice"环境变量
默认情况下,子进程会继承当前环境。你可以修改它:
import subprocess
import os
# Add or override environment variables
custom_env = os.environ.copy()
custom_env["API_KEY"] = "secret123"
custom_env["DEBUG"] = "true"
result = subprocess.run(
["python", "-c", "import os; print(os.environ.get('API_KEY'))"],
env=custom_env,
capture_output=True,
text=True
)
print(result.stdout.strip()) # "secret123"始终用 os.environ.copy() 作为基础。直接传入一个不包含现有环境的 dict 会导致继承环境被清空,从而破坏依赖 PATH、HOME 等变量的命令。
工作目录
cwd 参数为子进程设置工作目录:
import subprocess
# Run git status in a specific repository
result = subprocess.run(
["git", "status", "--short"],
cwd="/home/user/my-project",
capture_output=True,
text=True
)
print(result.stdout)subprocess.run() vs Popen:何时用哪个
subprocess.run() 是对 subprocess.Popen 的便捷封装。大多数场景用 run() 就够了。只有当你需要以下能力时再用 Popen:
- 实时流式输出(按行读取、边产出边处理)
- 与运行中的进程交互(循环发送输入、读取输出)
- 构建多步骤管道,把多个进程串联起来
- 非阻塞执行,并手动管理进程生命周期
对比表
| 特性 | subprocess.run() | subprocess.Popen | os.system() |
|---|---|---|---|
| 推荐 | Yes (Python 3.5+) | Yes (advanced) | No (deprecated pattern) |
| 捕获输出 | Yes (capture_output=True) | Yes (via PIPE) | No |
| 返回值 | CompletedProcess object | Popen process object | Exit code (int) |
| 超时支持 | Yes (timeout param) | Manual (via wait/communicate) | No |
| 错误检查 | check=True raises exception | Manual | Must parse exit code |
| stdin 输入 | input parameter | communicate() or stdin.write() | No |
| 实时输出 | No (waits for completion) | Yes (stream line by line) | Output goes to terminal |
| 管道 | Limited (single command) | Yes (chain multiple Popen) | Yes (via shell string) |
| 安全性 | Safe with list args | Safe with list args | Shell injection risk |
| Shell 特性 | Only with shell=True | Only with shell=True | Always uses shell |
进阶:subprocess.Popen
Popen 让你对进程生命周期拥有完全控制。构造函数会立刻启动进程并返回一个 Popen 对象供你交互。
基本 Popen 用法
import subprocess
proc = subprocess.Popen(
["ls", "-la"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True
)
stdout, stderr = proc.communicate() # Wait for completion and get output
print(f"Return code: {proc.returncode}")
print(stdout)实时流式输出
不同于 run(),Popen 允许你按行读取输出,并在输出产生时立刻处理:
import subprocess
proc = subprocess.Popen(
["ping", "-c", "5", "example.com"],
stdout=subprocess.PIPE,
text=True
)
# Read output line by line as it arrives
for line in proc.stdout:
print(f"[LIVE] {line.strip()}")
proc.wait() # Wait for process to finish
print(f"Exit code: {proc.returncode}")对需要显示进度或实时写日志的长命令,这是必不可少的:
import subprocess
import sys
def run_with_live_output(cmd):
"""Run a command and stream its output in real time."""
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
bufsize=1 # Line-buffered
)
output_lines = []
for line in proc.stdout:
line = line.rstrip()
print(line)
output_lines.append(line)
proc.wait()
return proc.returncode, "\n".join(output_lines)
# Usage
code, output = run_with_live_output(["pip", "install", "requests"])
print(f"\nFinished with exit code: {code}")构建管道(Pipelines)
通过把一个进程的 stdout 接到另一个进程的 stdin 来连接多个命令:
import subprocess
# Equivalent to: cat /var/log/syslog | grep "error" | wc -l
p1 = subprocess.Popen(
["cat", "/var/log/syslog"],
stdout=subprocess.PIPE
)
p2 = subprocess.Popen(
["grep", "error"],
stdin=p1.stdout,
stdout=subprocess.PIPE
)
# Allow p1 to receive SIGPIPE if p2 exits early
p1.stdout.close()
p3 = subprocess.Popen(
["wc", "-l"],
stdin=p2.stdout,
stdout=subprocess.PIPE,
text=True
)
p2.stdout.close()
output, _ = p3.communicate()
print(f"Error count: {output.strip()}")p1.stdout.close() 在连接到 p2 之后非常重要:如果 p2 提前退出,它能让 p1 收到 SIGPIPE,从而避免死锁。
交互式进程通信
import subprocess
# Start a Python REPL as a subprocess
proc = subprocess.Popen(
["python", "-i"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True
)
# Send commands and get results
stdout, stderr = proc.communicate(input="print(2 + 2)\nprint('hello')\n")
print(f"stdout: {stdout}")
print(f"stderr: {stderr}")shell=True:强大但危险
设置 shell=True 会让命令通过系统 shell 执行,启用 shell 特性,但也引入安全风险。
什么时候 shell=True 有用
import subprocess
# Shell features: pipes, redirects, globbing, env vars
result = subprocess.run(
"ls *.py | wc -l",
shell=True,
capture_output=True,
text=True
)
print(f"Python files: {result.stdout.strip()}")
# Environment variable expansion
result = subprocess.run(
"echo $HOME",
shell=True,
capture_output=True,
text=True
)
print(result.stdout.strip())shell 注入问题
绝不要把未经清洗的用户输入传给 shell=True:
import subprocess
# DANGEROUS -- shell injection vulnerability
user_input = "file.txt; rm -rf /" # malicious input
subprocess.run(f"cat {user_input}", shell=True) # Executes "rm -rf /"!
# SAFE -- use list form without shell=True
subprocess.run(["cat", user_input]) # Treats entire string as filename如果你确实必须在 shell=True 中拼接动态值,请使用 shlex.quote():
import subprocess
import shlex
user_input = "file with spaces.txt; rm -rf /"
safe_input = shlex.quote(user_input)
# shlex.quote wraps in single quotes, neutralizing shell metacharacters
result = subprocess.run(
f"cat {safe_input}",
shell=True,
capture_output=True,
text=True
)但最安全的做法是完全避免 shell=True,用 Python 复刻 shell 特性:
import subprocess
import glob
# Instead of: subprocess.run("ls *.py | wc -l", shell=True)
py_files = glob.glob("*.py")
print(f"Python files: {len(py_files)}")
# Instead of: subprocess.run("cat file1.txt file2.txt > combined.txt", shell=True)
with open("combined.txt", "w") as outfile:
result = subprocess.run(
["cat", "file1.txt", "file2.txt"],
stdout=outfile
)安全最佳实践
| Practice | Do | Don't |
|---|---|---|
| 命令格式 | ["cmd", "arg1", "arg2"] | f"cmd {user_input}" with shell=True |
| 用户输入 | shell 必须时使用 shlex.quote() | 把字符串直接拼接到命令里 |
| Shell 模式 | shell=False(默认) | shell=True 且输入不可信 |
| 可执行文件路径 | 使用完整路径如 /usr/bin/git | 在安全敏感代码里依赖 PATH |
| 输入校验 | 传入前先校验与清洗 | 直接把原始用户输入交给命令 |
import subprocess
import shlex
from pathlib import Path
def safe_file_operation(filename):
"""Safely run a command with user-supplied filename."""
# Validate input
path = Path(filename)
if not path.exists():
raise FileNotFoundError(f"File not found: {filename}")
# Check for path traversal
resolved = path.resolve()
allowed_dir = Path("/home/user/uploads").resolve()
if not str(resolved).startswith(str(allowed_dir)):
raise PermissionError("Access denied: file outside allowed directory")
# Use list form -- no shell injection possible
result = subprocess.run(
["wc", "-l", str(resolved)],
capture_output=True,
text=True,
check=True
)
return result.stdout.strip()真实世界示例
运行 git 命令
import subprocess
def git_status(repo_path):
"""Get git status for a repository."""
result = subprocess.run(
["git", "status", "--porcelain"],
cwd=repo_path,
capture_output=True,
text=True,
check=True
)
return result.stdout.strip()
def git_log(repo_path, n=5):
"""Get last n commit messages."""
result = subprocess.run(
["git", "log", f"--oneline", f"-{n}"],
cwd=repo_path,
capture_output=True,
text=True,
check=True
)
return result.stdout.strip()
status = git_status("/home/user/my-project")
if status:
print("Uncommitted changes:")
print(status)
else:
print("Working directory clean")压缩与解压文件
import subprocess
def compress_directory(source_dir, output_file):
"""Create a tar.gz archive of a directory."""
subprocess.run(
["tar", "-czf", output_file, "-C", source_dir, "."],
check=True
)
print(f"Created archive: {output_file}")
def extract_archive(archive_file, dest_dir):
"""Extract a tar.gz archive."""
subprocess.run(
["tar", "-xzf", archive_file, "-C", dest_dir],
check=True
)
print(f"Extracted to: {dest_dir}")
compress_directory("/home/user/data", "/tmp/data_backup.tar.gz")查看系统信息
import subprocess
def get_disk_usage():
"""Get disk usage summary."""
result = subprocess.run(
["df", "-h", "/"],
capture_output=True,
text=True,
check=True
)
return result.stdout
def get_memory_info():
"""Get memory usage on Linux."""
result = subprocess.run(
["free", "-h"],
capture_output=True,
text=True,
check=True
)
return result.stdout
def get_process_list(filter_name=None):
"""List running processes, optionally filtered."""
cmd = ["ps", "aux"]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
if filter_name:
lines = result.stdout.strip().split("\n")
header = lines[0]
matching = [line for line in lines[1:] if filter_name in line]
return header + "\n" + "\n".join(matching)
return result.stdout
print(get_disk_usage())用外部工具处理数据文件
import subprocess
import csv
import io
def sort_csv_by_column(input_file, column_index=1):
"""Sort a CSV file using the system sort command (fast for large files)."""
result = subprocess.run(
["sort", "-t,", f"-k{column_index}", input_file],
capture_output=True,
text=True,
check=True
)
return result.stdout
def count_lines(filepath):
"""Count lines in a file using wc (faster than Python for huge files)."""
result = subprocess.run(
["wc", "-l", filepath],
capture_output=True,
text=True,
check=True
)
return int(result.stdout.strip().split()[0])
def search_in_files(directory, pattern, file_type="*.py"):
"""Search for a pattern in files using grep."""
result = subprocess.run(
["grep", "-rn", "--include", file_type, pattern, directory],
capture_output=True,
text=True
)
# grep returns exit code 1 if no matches found (not an error)
if result.returncode == 0:
return result.stdout
elif result.returncode == 1:
return "" # No matches
else:
raise subprocess.CalledProcessError(result.returncode, result.args)
matches = search_in_files("/home/user/project", "TODO")
if matches:
print(matches)
else:
print("No TODOs found")自动化部署脚本
import subprocess
import sys
def deploy(repo_path, branch="main"):
"""Simple deployment script using subprocess."""
steps = [
(["git", "fetch", "origin"], "Fetching latest changes"),
(["git", "checkout", branch], f"Switching to {branch}"),
(["git", "pull", "origin", branch], "Pulling latest code"),
(["pip", "install", "-r", "requirements.txt"], "Installing dependencies"),
(["python", "manage.py", "migrate"], "Running migrations"),
(["python", "manage.py", "collectstatic", "--noinput"], "Collecting static files"),
]
for cmd, description in steps:
print(f"\n--- {description} ---")
try:
result = subprocess.run(
cmd,
cwd=repo_path,
capture_output=True,
text=True,
check=True,
timeout=120
)
if result.stdout:
print(result.stdout)
except subprocess.CalledProcessError as e:
print(f"FAILED: {e.stderr}")
sys.exit(1)
except subprocess.TimeoutExpired:
print(f"TIMEOUT: {description} took too long")
sys.exit(1)
print("\nDeployment complete")跨平台注意事项
Windows 与 Unix 下命令行为不同。要写出可移植代码:
import subprocess
import platform
def run_command(cmd_unix, cmd_windows=None):
"""Run a command with platform awareness."""
if platform.system() == "Windows":
cmd = cmd_windows or cmd_unix
# Windows often needs shell=True for built-in commands
return subprocess.run(cmd, shell=True, capture_output=True, text=True)
else:
return subprocess.run(cmd, capture_output=True, text=True)
# List directory contents
result = run_command(
cmd_unix=["ls", "-la"],
cmd_windows="dir"
)
print(result.stdout)主要平台差异:
| Feature | Unix/macOS | Windows |
|---|---|---|
| Shell | /bin/sh | cmd.exe |
| 路径分隔符 | / | \\ |
| 内置命令(dir, copy) | 不可用 | 需要 shell=True |
| 可执行文件扩展名 | 不需要 | 有时需要 .exe |
| 信号处理 | 完整 POSIX signals | 有限 |
shlex.quote() | 可用 | 使用 subprocess.list2cmdline() |
在 Jupyter Notebooks 中运行 subprocess
在 Jupyter notebook 中运行 shell 命令是数据科学家的常见工作流。虽然 Jupyter 支持 !command 语法用于快速调用 shell,但 subprocess 能让你在 Python 代码里进行更规范的错误处理与输出捕获。
在 notebook 里调试 subprocess 调用时——尤其是命令静默失败或输出异常——RunCell (opens in a new tab) 会很有帮助。RunCell 是一个面向 Jupyter 的 AI agent,能够理解你的 notebook 上下文。它可以诊断某条 subprocess 命令为何失败、建议正确参数,并处理平台特定的坑点。你无需在终端与 notebook 间来回切换调试 shell 命令,RunCell 可以直接在 cell 里追踪问题。
import subprocess
# In a Jupyter notebook: capture and display command output
result = subprocess.run(
["pip", "list", "--format=columns"],
capture_output=True,
text=True
)
# Display as formatted output in the notebook
print(result.stdout)常见错误与修复方法
错误 1:忘记捕获输出
import subprocess
# Output goes to terminal, not captured
result = subprocess.run(["ls", "-la"])
print(result.stdout) # None!
# Fix: add capture_output=True
result = subprocess.run(["ls", "-la"], capture_output=True, text=True)
print(result.stdout) # Actual output错误 2:字符串形式但没加 shell=True
import subprocess
# Fails: string passed without shell=True
# subprocess.run("ls -la") # FileNotFoundError: "ls -la" is not a program
# Fix option 1: use a list
subprocess.run(["ls", "-la"])
# Fix option 2: use shell=True (less safe)
subprocess.run("ls -la", shell=True)错误 3:忽略错误
import subprocess
# Bad: silently continues on failure
result = subprocess.run(["rm", "/important/file"], capture_output=True, text=True)
# ... continues even if rm failed
# Good: check=True raises exception on failure
try:
result = subprocess.run(
["rm", "/important/file"],
capture_output=True,
text=True,
check=True
)
except subprocess.CalledProcessError as e:
print(f"Failed to delete: {e.stderr}")错误 4:Popen 死锁
import subprocess
# DEADLOCK: stdout buffer fills up, process blocks, .wait() waits forever
proc = subprocess.Popen(["command_with_lots_of_output"], stdout=subprocess.PIPE)
proc.wait() # Deadlock!
# Fix: use communicate() which handles buffering
proc = subprocess.Popen(["command_with_lots_of_output"], stdout=subprocess.PIPE, text=True)
stdout, stderr = proc.communicate() # Safe错误 5:未处理编码
import subprocess
# Bytes output can cause issues
result = subprocess.run(["cat", "data.txt"], capture_output=True)
# result.stdout is bytes, not str
# Fix: use text=True or encoding parameter
result = subprocess.run(["cat", "data.txt"], capture_output=True, text=True)
# For specific encodings:
result = subprocess.run(
["cat", "data.txt"],
capture_output=True,
encoding="utf-8",
errors="replace" # Handle invalid bytes
)subprocess.run() 参数完整参考
import subprocess
result = subprocess.run(
args, # Command as list or string
stdin=None, # Input source (PIPE, DEVNULL, file object, or None)
stdout=None, # Output destination
stderr=None, # Error destination
capture_output=False, # Shorthand for stdout=PIPE, stderr=PIPE
text=False, # Decode output as strings (alias: universal_newlines)
shell=False, # Run through system shell
cwd=None, # Working directory
timeout=None, # Seconds before TimeoutExpired
check=False, # Raise CalledProcessError on non-zero exit
env=None, # Environment variables dict
encoding=None, # Output encoding (alternative to text=True)
errors=None, # Encoding error handling ('strict', 'replace', 'ignore')
input=None, # String/bytes to send to stdin
)FAQ
Python 里的 subprocess 模块是什么?
subprocess 模块是 Python 标准库中用于在 Python 脚本内运行外部命令与程序的工具。它替代了诸如 os.system()、os.popen() 以及 commands 模块等旧方案。它提供创建新进程、连接其 stdin/stdout/stderr 管道、获取返回码、处理超时等能力。主要接口是用于简单执行命令的 subprocess.run(),以及用于需要实时 I/O 或进程管道等高级场景的 subprocess.Popen。
subprocess.run() 和 subprocess.Popen 有什么区别?
subprocess.run() 是一个更高层的便捷函数:它运行命令、等待结束,并返回包含输出的 CompletedProcess 对象,适合绝大多数任务。subprocess.Popen 是更底层的类,允许你直接控制进程:可以按行流式读取输出、交互式发送输入、构建多进程管道,并手动管理进程生命周期。当你需要实时输出流或连接多个进程时,应使用 Popen。
subprocess 里 shell=True 危险吗?
是的,将不可信输入与 shell=True 一起使用会造成 shell 注入漏洞。当设置 shell=True 时,命令字符串会交由系统 shell 解释,因此像 ;、|、&&、$() 等 shell 元字符会被执行,攻击者可能借此注入任意命令。安全默认是 shell=False 并使用列表形式传参。如果必须使用 shell=True,请用 shlex.quote() 清洗输入,并且绝不要传入原始用户输入。
如何捕获 subprocess 命令的输出?
在 subprocess.run() 中使用 capture_output=True 与 text=True。输出会存放在 result.stdout(字符串),错误在 result.stderr。例如:result = subprocess.run(["ls", "-la"], capture_output=True, text=True),然后读取 result.stdout。如果不加 text=True,输出会以 bytes 返回。
如何在 Python 中处理 subprocess 超时?
向 subprocess.run() 传入 timeout 参数(秒)。如果进程执行超过超时,Python 会终止它并抛出 subprocess.TimeoutExpired。例如:subprocess.run(["slow_command"], timeout=30)。对于 Popen,可使用 proc.communicate(timeout=30) 或 proc.wait(timeout=30)。务必用 try/except 包裹超时敏感代码。
既然推荐 subprocess,为什么 os.system() 还能用?
os.system() 并未被正式废弃,但被视为遗留接口。它通过 shell 执行命令(类似 shell=True),无法捕获输出,没有超时机制,并且只返回退出状态码。subprocess.run() 不仅能做到 os.system() 的事,还额外提供输出捕获、错误处理、超时控制以及更安全的参数传递。所有新代码都应使用 subprocess。
总结
subprocess 模块是 Python 运行外部命令的权威工具。对于直接的命令执行,请使用 subprocess.run()——它在一次调用中就能处理输出捕获、错误检查、超时与输入传递。只有当你需要实时输出流、交互式进程通信或多步骤管道时,才应该使用 subprocess.Popen。
最重要的习惯是:避免将 shell=True 与用户输入一起使用。用列表传参可以彻底消除 shell 注入风险。用 check=True 尽早捕获失败;用 timeout 防止进程挂死;用 text=True 让输出以字符串而不是 bytes 的形式处理。
无论是 git 自动化还是数据流水线编排,subprocess 都能提供 os.system() 无法比拟的控制力与安全性。掌握这些模式后,你就能自信地把任何外部工具集成进 Python 工作流中。