Seaborn Pairplot: Visualize Relationships Across All Variables
Updated on
You have a dataset with ten numeric columns. You want to understand how every variable relates to every other variable -- which pairs are correlated, which clusters exist, and where the outliers hide. Manually writing scatter plots for every combination means 45 separate charts for just ten columns. That is tedious, error-prone, and slow. This is the exact problem that seaborn pairplot solves.
sns.pairplot() generates a grid of scatter plots for every pair of numeric variables in your DataFrame, with distribution plots along the diagonal. One function call gives you a complete visual overview of your data's structure. It is one of the most powerful tools for exploratory data analysis (EDA) in Python, and this guide covers everything you need to use it effectively -- from basic syntax to advanced customization with PairGrid.
Every code example below is copy-paste ready.
What sns.pairplot() Does
sns.pairplot() creates a matrix of Axes. Each cell in the matrix shows the relationship between two variables:
- Off-diagonal cells show scatter plots (by default) of every variable pair.
- Diagonal cells show the distribution of each variable individually (histogram or KDE).
The result is a single figure that lets you scan for linear relationships, non-linear patterns, clusters, and outliers across your entire dataset at once.
Basic Syntax
import seaborn as sns
import matplotlib.pyplot as plt
# Load a built-in dataset
df = sns.load_dataset("iris")
# One line to visualize all pairwise relationships
sns.pairplot(df)
plt.show()This produces a 4x4 grid (the iris dataset has four numeric columns: sepal_length, sepal_width, petal_length, petal_width). The diagonal shows histograms, and every off-diagonal cell shows a scatter plot. You can immediately see that petal_length and petal_width are strongly correlated, while sepal_width has a weaker relationship with the other features.
Adding Color with the hue Parameter
Raw scatter plots become much more informative when you color points by a categorical variable. The hue parameter does this automatically.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")
plt.show()Now each species (setosa, versicolor, virginica) gets its own color. This reveals something critical that the uncolored version hides: setosa forms a distinct cluster that separates cleanly from the other two species in almost every variable pair. Versicolor and virginica overlap more, especially on sepal measurements.
The hue parameter also changes the diagonal from histograms to overlapping KDE (kernel density estimation) curves, so you can compare the distribution of each variable across groups.
Customizing hue Colors
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(
df,
hue="species",
palette={"setosa": "#e74c3c", "versicolor": "#3498db", "virginica": "#2ecc71"}
)
plt.show()You can pass any seaborn palette name ("Set2", "husl", "dark") or a dictionary mapping each category to a specific color. For an in-depth look at choosing color schemes, see the matplotlib colormap guide.
Controlling Plot Types with kind and diag_kind
The kind parameter controls what appears in the off-diagonal cells. The diag_kind parameter controls the diagonal.
kind Options
| kind Value | Off-Diagonal Plot Type | Best For |
|---|---|---|
"scatter" | Scatter plot (default) | General-purpose; seeing individual points |
"kde" | 2D kernel density contour | Large datasets where scatter plots become overplotted |
"hist" | 2D histogram (heatmap bins) | Large datasets; showing density without overlapping points |
"reg" | Scatter plot with regression line | Identifying linear trends between variables |
diag_kind Options
| diag_kind Value | Diagonal Plot Type | Best For |
|---|---|---|
"auto" | Histogram (no hue) or KDE (with hue) | Default behavior |
"hist" | Histogram | Always show histograms |
"kde" | KDE curve | Smooth density estimation |
None | No diagonal plots | When you only care about relationships |
Scatter with Regression Lines
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(df, kind="reg", diag_kind="kde")
plt.show()Each off-diagonal cell now includes a linear regression line with a confidence band. This makes it easy to judge the strength and direction of linear relationships at a glance. Steep, tight bands indicate strong correlations; wide, flat bands indicate weak ones.
KDE Contour Plots
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(df, kind="kde", diag_kind="kde", hue="species")
plt.show()KDE contour plots replace individual scatter points with smooth density contours. This works particularly well when you have thousands of data points and scatter plots turn into solid blobs. The contours show where data is concentrated versus sparse.
2D Histograms
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(df, kind="hist", diag_kind="hist")
plt.show()The 2D histogram bins data into a grid and uses color intensity to represent count. This is the fastest-rendering option for very large datasets.
Selecting Variables with vars, x_vars, and y_vars
By default, sns.pairplot() includes every numeric column. With many columns this gets unwieldy -- a 15-column dataset would produce a 15x15 grid with 225 cells. Three parameters let you control which variables appear.
vars: Select a Subset of Columns
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(df, vars=["sepal_length", "petal_length", "petal_width"], hue="species")
plt.show()This produces a 3x3 grid instead of 4x4, focusing only on the columns you care about.
x_vars and y_vars: Asymmetric Grids
When you want to examine how a specific set of features relates to a different set, use x_vars and y_vars together.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(
df,
x_vars=["sepal_length", "sepal_width"],
y_vars=["petal_length", "petal_width"],
hue="species",
height=4
)
plt.show()This creates a 2x2 grid where the rows are petal measurements and the columns are sepal measurements. There is no diagonal in this case because the row and column variables are different. This is particularly useful when you have a clear division between feature groups (e.g., input features vs. target variables).
Corner Plots with corner=True
A full pairplot grid is symmetric -- the scatter plot of X vs. Y is the mirror of Y vs. X. Showing both halves is redundant. The corner parameter removes the upper triangle.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(df, corner=True, hue="species")
plt.show()Corner plots display only the lower triangle and the diagonal. This is cleaner, takes up less space, and is the standard format in many scientific publications (especially in astronomy and physics, where they are called "corner plots" or "triangle plots").
Customizing Appearance with plot_kws and diag_kws
The plot_kws and diag_kws parameters pass keyword arguments directly to the underlying matplotlib plotting functions.
Adjusting Scatter Point Size, Transparency, and Shape
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(
df,
hue="species",
plot_kws={"s": 40, "alpha": 0.6, "edgecolor": "white", "linewidth": 0.5},
diag_kws={"alpha": 0.5, "linewidth": 2}
)
plt.show()| Keyword | What It Controls | Example Values |
|---|---|---|
s | Marker size in scatter plots | 20, 50, 100 |
alpha | Transparency (0 = invisible, 1 = opaque) | 0.3, 0.6, 1.0 |
edgecolor | Color of marker border | "white", "black", "none" |
linewidth | Width of marker border or KDE line | 0.5, 1, 2 |
marker | Marker shape | "o", "s", "D", "^" |
Using Different Markers per Group
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(
df,
hue="species",
markers=["o", "s", "D"],
plot_kws={"alpha": 0.7}
)
plt.show()The markers parameter takes a list of marker shapes, one per hue level. This is critical for accessibility -- readers who cannot distinguish colors can still identify groups by marker shape.
Controlling Figure Size with height and aspect
sns.pairplot() does not use figsize directly. Instead, it uses height and aspect to control the size of each individual subplot.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species", height=3, aspect=1.2)
plt.show()height: Height (in inches) of each facet. Default is2.5.aspect: Ratio of width to height for each facet. Default is1.
The total figure size equals roughly height * n_vars by height * aspect * n_vars. For a 4-variable dataset with height=3 and aspect=1, the figure is approximately 12x12 inches.
PairGrid: Full Control Over Each Cell
sns.pairplot() is a convenience wrapper around sns.PairGrid. When you need different plot types in the upper triangle, lower triangle, and diagonal, use PairGrid directly.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("iris")
g = sns.PairGrid(df, hue="species", diag_sharey=False)
g.map_upper(sns.scatterplot, alpha=0.5)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot, kde=True)
g.add_legend()
plt.show()This creates a grid where:
- Upper triangle: Scatter plots
- Lower triangle: 2D KDE contour plots
- Diagonal: Histograms with KDE overlay
PairGrid Methods
| Method | Where It Draws | Use For |
|---|---|---|
g.map(func) | All cells | Same plot everywhere |
g.map_diag(func) | Diagonal only | Distribution plots |
g.map_upper(func) | Upper triangle | One plot type above diagonal |
g.map_lower(func) | Lower triangle | Different plot type below diagonal |
g.map_offdiag(func) | All off-diagonal cells | Same plot for upper and lower |
g.add_legend() | Figure level | Adds the hue legend |
Custom Functions in PairGrid
You can pass any function that accepts x, y, and optional keyword arguments:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
df = sns.load_dataset("iris")
def corrfunc(x, y, **kwargs):
r, p = stats.pearsonr(x, y)
ax = plt.gca()
ax.annotate(f"r = {r:.2f}", xy=(0.05, 0.95), xycoords="axes fraction",
fontsize=12, ha="left", va="top",
bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.5))
g = sns.PairGrid(df.drop(columns="species"))
g.map_upper(corrfunc)
g.map_upper(sns.scatterplot, alpha=0.3, s=15)
g.map_lower(sns.kdeplot, cmap="Blues_d")
g.map_diag(sns.histplot, kde=True, color="steelblue")
plt.show()This adds the Pearson correlation coefficient as a text annotation in every upper-triangle cell -- a common pattern in academic publications.
Handling Large Datasets
Pairplots get slow and cluttered with large datasets. Here are practical strategies.
Sampling
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Suppose you have 100,000 rows
df = sns.load_dataset("diamonds")
# Sample 1,000 rows for the pairplot
sample = df.sample(n=1000, random_state=42)
sns.pairplot(
sample,
vars=["carat", "depth", "table", "price"],
hue="cut",
plot_kws={"alpha": 0.4, "s": 15}
)
plt.show()Sampling 1,000-5,000 rows is usually enough to reveal the major patterns. Combined with alpha transparency, this keeps the plot readable and the rendering time under a few seconds.
Switch to KDE for Dense Data
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("diamonds").sample(n=2000, random_state=42)
sns.pairplot(
df,
vars=["carat", "depth", "table", "price"],
kind="kde",
diag_kind="kde",
hue="cut",
corner=True
)
plt.show()KDE plots handle density much better than scatter plots when thousands of points overlap. The contour lines clearly show where data concentrates.
Limit Variables
With 15+ numeric columns, even a sampled pairplot becomes unreadable. Focus on the variables that matter most:
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("diamonds").sample(n=2000, random_state=42)
# Only show the most important variables
sns.pairplot(df, vars=["carat", "price", "depth"], hue="cut", corner=True)
plt.show()Use domain knowledge or a quick correlation heatmap to identify which variables deserve a spot in the pairplot.
Real-World EDA Workflow with Pairplot
Here is a practical workflow for using sns.pairplot() during exploratory data analysis.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Step 1: Load and inspect
df = sns.load_dataset("penguins").dropna()
print(df.shape)
print(df.dtypes)
print(df.describe())
# Step 2: Quick overview with pairplot
sns.pairplot(df, hue="species", corner=True, plot_kws={"alpha": 0.6})
plt.suptitle("Penguin Measurements by Species", y=1.02, fontsize=16)
plt.show()
# Step 3: Focus on interesting pairs identified from the overview
sns.pairplot(
df,
x_vars=["bill_length_mm", "flipper_length_mm"],
y_vars=["bill_depth_mm", "body_mass_g"],
hue="species",
kind="reg",
height=4
)
plt.show()Step 1 gives you the raw numbers. Step 2 gives you a visual map of the entire dataset. Step 3 zooms into the pairs that stood out. This three-step pattern is the standard opening move for any new tabular dataset.
Pairplot vs PairGrid vs Manual Subplots
| Feature | sns.pairplot() | sns.PairGrid() | Manual plt.subplots() |
|---|---|---|---|
| Effort | One function call | 3-5 lines | 10-50+ lines |
| Flexibility | Medium (parameters only) | High (different plots per triangle) | Full control |
| Speed to prototype | Fastest | Fast | Slow |
| Different plot types per region | No | Yes (map_upper, map_lower) | Yes (manual) |
| Custom annotations | Limited | Yes (custom functions) | Yes |
| Automatic hue handling | Yes | Yes | Manual |
| Corner plot support | corner=True | corner=True | Manual masking |
| When to use | Quick EDA, presentations | Publication figures, mixed plot types | Highly custom layouts |
Decision rule: Start with sns.pairplot(). If you need different plot types in the upper and lower triangles, upgrade to PairGrid. Resort to manual subplots only when you need completely custom layouts that PairGrid cannot handle.
sns.pairplot() Parameter Reference
| Parameter | Description | Default |
|---|---|---|
data | Input DataFrame | Required |
hue | Column name for color grouping | None |
hue_order | Order for hue levels | None |
palette | Colors for hue levels | None |
vars | List of variable names to include | All numeric columns |
x_vars | Variables for the columns of the grid | None |
y_vars | Variables for the rows of the grid | None |
kind | Plot type for off-diagonal: "scatter", "kde", "hist", "reg" | "scatter" |
diag_kind | Plot type for diagonal: "auto", "hist", "kde", None | "auto" |
markers | Marker style(s) for scatter points | None |
height | Height (inches) of each facet | 2.5 |
aspect | Width-to-height ratio of each facet | 1 |
corner | Show only the lower triangle | False |
dropna | Drop missing values before plotting | True |
plot_kws | Dict of kwargs passed to the off-diagonal plot function | {} |
diag_kws | Dict of kwargs passed to the diagonal plot function | {} |
grid_kws | Dict of kwargs passed to PairGrid constructor | {} |
Saving Pairplots to File
Since sns.pairplot() returns a PairGrid object, you can save the figure using its savefig method:
import seaborn as sns
df = sns.load_dataset("iris")
g = sns.pairplot(df, hue="species")
g.savefig("iris_pairplot.png", dpi=300, bbox_inches="tight")Supported formats include PNG, SVG, PDF, and EPS. For publications, use dpi=300 or higher and SVG/PDF for vector graphics. If labels get clipped during export, see matplotlib savefig troubleshooting.
Interactive Alternative: PyGWalker
Static pairplots are excellent for reports and papers, but during the exploration phase you often want to filter data, change variable selections, and try different chart types without rewriting code every time.
PyGWalker (opens in a new tab) (Python binding of Graphic Walker) turns any pandas DataFrame into a Tableau-like interactive visualization UI directly inside Jupyter Notebook. Instead of writing a new sns.pairplot() call each time you want to explore a different variable combination, you can drag and drop fields to build scatter plots, histograms, heatmaps, and more.
pip install pygwalkerimport pandas as pd
import pygwalker as pyg
df = pd.read_csv("your_data.csv")
walker = pyg.walk(df)Once the interactive interface loads, you can:
- Drag numeric fields to X and Y axes to create scatter plots instantly.
- Add a categorical field to the Color channel to replicate the
huebehavior. - Switch between scatter, line, bar, histogram, and other chart types with a click.
- Filter rows interactively without writing pandas code.
- Compare multiple variable pairs side by side.
This workflow pairs well with seaborn pairplot: use PyGWalker for the fast, interactive exploration phase, then create your final static pairplot with sns.pairplot() for sharing and publication.
Frequently Asked Questions
How do I change the size of a seaborn pairplot?
Use the height and aspect parameters. height controls the size of each individual subplot in inches (default 2.5), and aspect controls the width-to-height ratio (default 1). For example, sns.pairplot(df, height=3.5, aspect=1.2) makes each subplot 3.5 inches tall and 4.2 inches wide. The total figure size is approximately height * n_vars by height * aspect * n_vars.
What is the difference between sns.pairplot() and sns.PairGrid()?
sns.pairplot() is a high-level convenience function that creates a complete pair grid with one call. sns.PairGrid() is the underlying class that gives you more control -- you can assign different plot types to the upper triangle, lower triangle, and diagonal using map_upper(), map_lower(), and map_diag(). Use pairplot for quick exploration and PairGrid when you need mixed plot types.
How do I show only specific columns in a pairplot?
Pass a list of column names to the vars parameter: sns.pairplot(df, vars=["col1", "col2", "col3"]). For an asymmetric grid where rows and columns show different variables, use x_vars and y_vars together: sns.pairplot(df, x_vars=["a", "b"], y_vars=["c", "d"]).
How do I create a corner plot (lower triangle only)?
Set corner=True: sns.pairplot(df, corner=True). This removes the redundant upper triangle and shows only the lower triangle plus the diagonal. Corner plots are common in scientific publications where the symmetric information is unnecessary.
How do I handle a large dataset with pairplot?
Three strategies: (1) Sample your data with df.sample(n=2000) before passing it to pairplot. (2) Use kind="kde" instead of scatter plots to show density contours that handle overlap better. (3) Limit variables with the vars parameter to only the most important columns. Combining all three keeps the plot readable and fast to render.
Can I add a regression line to a pairplot?
Yes. Set kind="reg" to add a linear regression line with confidence interval to every off-diagonal scatter plot: sns.pairplot(df, kind="reg"). For more control over the regression (e.g., polynomial fits), use PairGrid with sns.regplot as the mapping function.
Conclusion
The seaborn pairplot is one of the first tools you should reach for when working with a new tabular dataset. One function call gives you a complete map of pairwise relationships, distributions, and cluster structure across all your numeric variables. Start with a basic sns.pairplot(df, hue="category") to get the big picture, then narrow your focus with vars, corner=True, or kind="reg" as your analysis deepens.
When you need more control -- different plot types in the upper and lower triangles, custom annotations, or correlation coefficients overlaid on each cell -- step up to sns.PairGrid and its map_upper, map_lower, and map_diag methods. For the interactive exploration phase before you commit to a final static visualization, PyGWalker (opens in a new tab) gives you a drag-and-drop interface for building charts from DataFrames without writing plotting code.
All the examples in this guide are ready to copy and run. Swap in your own DataFrame, adjust the variables and colors, and you will have a publication-quality pairplot in under a minute.
Related Guides
- Seaborn Heatmap -- use a correlation heatmap as a complement to pairplot for identifying the strongest variable relationships.
- Seaborn Histogram -- deep dive into the distribution plots shown on the pairplot diagonal.
- Matplotlib Colormap -- choose the right color palette for your pairplot KDE contours.
- Matplotlib Subplots -- build custom multi-panel figures when PairGrid does not offer enough layout control.
- Pandas GroupBy -- aggregate data before visualization when working with large datasets.