PyGWalker 数据集输入

PyGWalker 处理表格数据。大多数公共 API 接受 pandas DataFrame、polars DataFrame、pyarrow Table、数据库连接器和连接器式 SQL/data-source 字符串。部分适配器也接受可复用的 pygwalker.Walker。

支持的输入矩阵

数据集输入	典型 API	说明
pandas DataFrame	所有主要 API	最常见的本地输入。
polars DataFrame	所有主要 API	通过 DataFrame parser 层解析。
pyarrow Table	所有主要 API	公共 API 签名和 parser 测试均支持。
database `Connector`	`walk`、`render`、`table`、Streamlit、Gradio、webserver、cloud helper	Connector 数据集使用 kernel 侧查询。
SQL/data-source string	顶层、notebook、anywidget、marimo、webserver、component、HTML chart helper	用于适配器支持的连接器式路径。
`pygwalker.Walker`	`walk`、anywidget、marimo、webserver、Streamlit、`to_html`	复用已经构造好的 PyGWalker 对象。

Pandas

当数据已经在内存中时使用 pandas。

import pandas as pd
import pygwalker as pyg
 
df = pd.read_csv("data.csv")
walker = pyg.walk(df, spec_path="./gw_config.json")

Polars

Polars DataFrame 可以直接传入。

import polars as pl
import pygwalker as pyg
 
df = pl.read_csv("data.csv")
walker = pyg.walk(df, computation="browser")

PyArrow Table

公共 DataFrame 类型和 parser 测试均支持 PyArrow Table。

import pyarrow as pa
import pygwalker as pyg
 
table = pa.table({
    "city": ["London", "Paris", "Tokyo"],
    "sales": [120, 95, 140],
})
 
walker = pyg.walk(table, computation="browser")

Database Connector

当数据应停留在 SQL 查询之后，而不是先加载为本地 DataFrame 时，请使用 Connector。

from pygwalker.data_parsers.database_parser import Connector
import pygwalker as pyg
 
conn = Connector(
    "postgresql+psycopg2://username:password@host:5432/database",
    "SELECT * FROM table_name",
)
 
walker = pyg.walk(conn, spec_path="./gw_config.json", computation="kernel")

Connector 数据集默认按 kernel-computation 输入处理，因为查询需要 live 后端。

可复用 `Walker`

当你希望同一份数据集和配置流转到多个适配器时，请创建 Walker。

import pygwalker as pyg
 
walker = pyg.Walker(
    df,
    spec_path="./gw_config.json",
    computation="browser",
)
 
walker.show()
html = pyg.to_html(walker, width="100%", height="720px")

适配器会拒绝与已有 Walker 冲突的构造选项。请将 spec_path、field_specs、appearance 和 computation 放在 Walker 构造函数上。

FieldSpec

FieldSpec 可用于覆盖自动推断的字段元数据。

from pygwalker import FieldSpec
import pygwalker as pyg
 
field_specs = [
    FieldSpec(
        fname="order_date",
        semantic_type="temporal",
        analytic_type="dimension",
        display_as="Order Date",
    ),
    FieldSpec(
        fname="revenue",
        semantic_type="quantitative",
        analytic_type="measure",
        display_as="Revenue",
    ),
]
 
pyg.walk(df, field_specs=field_specs)

定义：

FieldSpec(
    fname: str,
    semantic_type: "?" | "nominal" | "ordinal" | "temporal" | "quantitative" = "?",
    analytic_type: "?" | "dimension" | "measure" = "?",
    display_as: str = None,
)

使用 "?" 让 PyGWalker 自动推断该值。

常见陷阱

陷阱	修复方式
在新代码中通过 `spec` 传入本地 spec 文件	使用 `spec_path="./gw_config.json"`，让本地文件更明确。
适配器接收 `Walker` 后又传入 `spec_path`	改为在 `pyg.Walker(...)` 上设置 `spec_path`。
使用 `computation="kernel"` 或 `"cloud"` 导出静态 HTML	静态导出请使用 `computation="browser"`。
新示例仍使用旧的 `kernel_computation=True`	使用 `computation="kernel"`。