Seaborn Boxplot: How to Create and Customize Box Plots in Python
Updated on
Understanding how your data is distributed is one of the most fundamental tasks in data analysis. You need to spot outliers, compare groups, and identify skewness before running any model or drawing conclusions. Yet staring at raw numbers in a DataFrame rarely gives you the full picture. A seaborn boxplot solves this problem by condensing an entire distribution into a compact, readable graphic that shows the median, spread, and outliers at a glance.
In this guide, you will learn how to create, customize, and interpret box plots using Python's seaborn library. Every code example uses real datasets built into seaborn so you can run them immediately in a Jupyter notebook.
What Is a Box Plot?
A box plot (also called a box-and-whisker plot) is a standardized way to display a distribution based on five summary statistics:
| Component | What It Represents |
|---|---|
| Median line | The middle value of the dataset (50th percentile) |
| Box (IQR) | The interquartile range, spanning from Q1 (25th percentile) to Q3 (75th percentile) |
| Lower whisker | The smallest data point within Q1 - 1.5 * IQR |
| Upper whisker | The largest data point within Q3 + 1.5 * IQR |
| Outlier points | Individual data points that fall outside the whisker range |
The box captures the middle 50% of your data. A tall box means high variability; a short box means your values cluster tightly. When the median line sits off-center inside the box, the distribution is skewed. Dots beyond the whiskers flag potential outliers that deserve investigation.
Box plots are particularly effective when you need to compare distributions across several categories side by side, making them a staple of exploratory data analysis.
Basic Seaborn Boxplot Syntax
Creating a boxplot in seaborn requires just one function: sns.boxplot(). At minimum, you pass the data and specify which variable to plot.
import seaborn as sns
import matplotlib.pyplot as plt
# Load a built-in dataset
tips = sns.load_dataset("tips")
# Basic vertical boxplot
sns.boxplot(data=tips, y="total_bill")
plt.title("Distribution of Total Bill")
plt.show()This produces a single box showing the distribution of total_bill values. The function automatically calculates the quartiles, whiskers, and outliers for you.
To split the data by a categorical variable, add the x parameter:
sns.boxplot(data=tips, x="day", y="total_bill")
plt.title("Total Bill by Day of the Week")
plt.show()Now you see four boxes side by side, one for each day, making it easy to compare spending patterns across the week.
Creating Boxplots from Different Data Formats
Seaborn handles multiple data formats gracefully. Here are the most common scenarios.
Long-Form DataFrame (Tidy Data)
Most seaborn functions work best with long-form (tidy) data where each row is a single observation and columns represent variables. The tips dataset is already in this format:
# Long-form: each row is one restaurant visit
sns.boxplot(data=tips, x="day", y="total_bill")
plt.show()Wide-Form DataFrame
If your data has one column per group (wide format), seaborn can still produce boxplots directly:
import pandas as pd
import numpy as np
# Create wide-form data
np.random.seed(42)
wide_df = pd.DataFrame({
"Group A": np.random.normal(50, 10, 100),
"Group B": np.random.normal(60, 15, 100),
"Group C": np.random.normal(45, 8, 100),
})
sns.boxplot(data=wide_df)
plt.title("Comparing Three Groups (Wide-Form Data)")
plt.ylabel("Value")
plt.show()Seaborn automatically treats each column as a separate category and plots them side by side.
Selecting Specific DataFrame Columns
When you only want to visualize certain numeric columns from a larger DataFrame, filter them first:
iris = sns.load_dataset("iris")
# Select only measurement columns
measurement_cols = iris[["sepal_length", "sepal_width", "petal_length", "petal_width"]]
sns.boxplot(data=measurement_cols)
plt.title("Iris Feature Distributions")
plt.xticks(rotation=15)
plt.show()Customizing Your Seaborn Boxplot
Seaborn provides extensive options to tailor the appearance and behavior of your box plots.
Colors and Palettes
Change the color scheme using the palette parameter or set a single color for all boxes:
# Use a named palette
sns.boxplot(data=tips, x="day", y="total_bill", palette="Set2")
plt.title("Custom Palette")
plt.show()# Single color for all boxes
sns.boxplot(data=tips, x="day", y="total_bill", color="skyblue")
plt.title("Uniform Color")
plt.show()Popular palette options include "Set2", "pastel", "muted", "deep", "husl", and "coolwarm".
Horizontal vs Vertical Orientation
Swap the axes to create a horizontal boxplot. This is useful when category labels are long:
sns.boxplot(data=tips, x="total_bill", y="day", orient="h")
plt.title("Horizontal Boxplot")
plt.show()Grouped Boxplots with the hue Parameter
The hue parameter splits each category into subgroups, adding a second dimension to your comparison:
sns.boxplot(data=tips, x="day", y="total_bill", hue="sex")
plt.title("Total Bill by Day and Gender")
plt.legend(title="Gender")
plt.show()Each day now shows two boxes (one per gender), making it straightforward to compare male vs. female spending patterns on every day of the week.
Controlling Figure Size
Seaborn plots inherit the figure size from matplotlib. Set it before calling sns.boxplot():
plt.figure(figsize=(12, 6))
sns.boxplot(data=tips, x="day", y="total_bill", hue="smoker", palette="muted")
plt.title("Total Bill by Day and Smoking Status")
plt.show()Adding Swarm or Strip Plot Overlays
A boxplot summarizes the distribution, but it hides individual data points. Overlay a swarm plot or strip plot to show every observation:
plt.figure(figsize=(10, 6))
sns.boxplot(data=tips, x="day", y="total_bill", palette="pastel")
sns.stripplot(data=tips, x="day", y="total_bill", color="0.3", size=3, jitter=True, alpha=0.5)
plt.title("Boxplot with Strip Plot Overlay")
plt.show()plt.figure(figsize=(10, 6))
sns.boxplot(data=tips, x="day", y="total_bill", palette="pastel")
sns.swarmplot(data=tips, x="day", y="total_bill", color="0.25", size=3, alpha=0.6)
plt.title("Boxplot with Swarm Plot Overlay")
plt.show()The swarm plot arranges points so they do not overlap, giving a better sense of the data density. Use strip plots when you have many data points and swarm plots when you have fewer (swarm plots can get slow with thousands of points).
Seaborn Boxplot Parameter Reference
Here is a quick reference for the most commonly used parameters in sns.boxplot():
| Parameter | Type | Description |
|---|---|---|
data | DataFrame, array, or list | Input data structure |
x, y | str or array | Variables for the axes |
hue | str | Grouping variable for color-coded subgroups |
order | list of str | Order to plot the categorical levels |
hue_order | list of str | Order for the hue levels |
orient | "v" or "h" | Orientation of the plot |
color | str | Single color for all elements |
palette | str, list, or dict | Colors for different levels |
saturation | float | Proportion of original saturation (0 to 1) |
fill | bool | Whether to fill the box with color (seaborn >= 0.13) |
width | float | Width of the boxes (default 0.8) |
dodge | bool | Whether to shift hue groups along the categorical axis |
fliersize | float | Size of outlier markers |
linewidth | float | Width of the lines framing the box |
whis | float or tuple | Whisker length as a multiple of IQR (default 1.5) |
ax | matplotlib Axes | Axes object to draw the plot onto |
Comparing Distributions
Multiple Columns Side by Side
When comparing distributions of several numeric features, melt your DataFrame into long form:
iris = sns.load_dataset("iris")
# Melt from wide to long format
iris_long = iris.melt(id_vars="species", var_name="measurement", value_name="cm")
plt.figure(figsize=(12, 6))
sns.boxplot(data=iris_long, x="measurement", y="cm", palette="Set3")
plt.title("Iris Measurements Compared")
plt.show()Grouped Comparisons with hue
Combine both a category and a grouping variable to compare distributions across two dimensions:
plt.figure(figsize=(14, 6))
sns.boxplot(data=iris_long, x="measurement", y="cm", hue="species", palette="husl")
plt.title("Iris Measurements by Species")
plt.legend(title="Species", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.show()This produces a grouped boxplot where each measurement type shows three boxes (one per species), making cross-species comparison immediate.
Seaborn Boxplot vs Matplotlib Boxplot
Both seaborn and matplotlib can produce box plots, but they differ significantly in ease of use and visual quality.
| Feature | Seaborn sns.boxplot() | Matplotlib ax.boxplot() |
|---|---|---|
| Default aesthetics | Polished, publication-ready | Basic, minimal styling |
| DataFrame integration | Native support for pandas DataFrames | Requires extracting arrays manually |
| Hue grouping | Built-in hue parameter | Manual positioning and coloring |
| Palettes | One-line palette assignment | Manual color list management |
| Statistical annotations | Easy overlay with swarm/strip plots | Requires manual scatter overlays |
| Customization depth | Moderate (delegates to matplotlib) | Full low-level control |
| Learning curve | Low | Medium to high |
| Code verbosity | 1-2 lines for a grouped plot | 10-20 lines for equivalent |
Verdict: Use seaborn for rapid exploratory analysis and clean visuals. Fall back to matplotlib when you need pixel-level control over every element.
# Matplotlib boxplot (more verbose)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 5))
data_by_day = [tips[tips["day"] == d]["total_bill"].values for d in ["Thur", "Fri", "Sat", "Sun"]]
bp = ax.boxplot(data_by_day, labels=["Thur", "Fri", "Sat", "Sun"], patch_artist=True)
for patch, color in zip(bp["boxes"], ["#8dd3c7", "#ffffb3", "#bebada", "#fb8072"]):
patch.set_facecolor(color)
ax.set_title("Matplotlib Boxplot (Manual Styling)")
ax.set_ylabel("Total Bill")
plt.show()# Seaborn equivalent (concise)
sns.boxplot(data=tips, x="day", y="total_bill", palette="Set3",
order=["Thur", "Fri", "Sat", "Sun"])
plt.title("Seaborn Boxplot (One Line)")
plt.show()Box Plot vs Violin Plot: When to Use Which
Seaborn also offers sns.violinplot(), which shows the full density shape of the distribution rather than just the summary statistics. Here is when to choose each:
| Criterion | Box Plot | Violin Plot |
|---|---|---|
| Best for | Quick summary statistics and outlier detection | Understanding the full shape of the distribution |
| Shows outliers | Yes, as individual dots | No (absorbed into the density curve) |
| Shows bimodality | No | Yes (visible as two bumps) |
| Readability | High, even for non-technical audiences | Requires more explanation |
| Space efficiency | Compact | Wider, takes more horizontal space |
| Performance | Fast to render | Slower with large datasets (KDE computation) |
Rule of thumb: Start with a boxplot for quick checks. Switch to a violin plot if you suspect the distribution has multiple peaks or an unusual shape.
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sns.boxplot(data=tips, x="day", y="total_bill", palette="Set2", ax=axes[0])
axes[0].set_title("Box Plot")
sns.violinplot(data=tips, x="day", y="total_bill", palette="Set2", ax=axes[1])
axes[1].set_title("Violin Plot")
plt.tight_layout()
plt.show()Interactive Alternative: Explore Distributions with PyGWalker
Static box plots work well for reports and notebooks, but when you are in the early exploration phase, you often want to drag different variables in and out of a chart without rewriting code every time.
PyGWalker (opens in a new tab) is an open-source Python library that turns any pandas DataFrame into an interactive, Tableau-like visual exploration interface directly inside Jupyter Notebook. You can create box plots, violin plots, histograms, scatter plots, and more by simply dragging fields onto the axes. No code changes needed.
pip install pygwalkerimport pandas as pd
import pygwalker as pyg
tips = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")
walker = pyg.walk(tips)Once the UI loads, drag day to the X axis and total_bill to the Y axis, then select the box plot mark type. You can instantly switch to a violin plot, add color dimensions, or filter by any column, all without writing a single extra line of code.
This is particularly valuable for teams where not everyone writes Python. Share the notebook and let stakeholders explore the data themselves.
| Run PyGWalker in Kaggle (opens in a new tab) | Run PyGWalker in Google Colab (opens in a new tab) | PyGWalker on GitHub (opens in a new tab) |
|---|
FAQ
How do I remove outliers from a seaborn boxplot?
Set the whis parameter to a larger value (e.g., whis=3.0) to extend the whiskers and reduce the number of points shown as outliers. Alternatively, set flierprops={"marker": ""} to hide the outlier markers entirely while keeping the whisker calculation unchanged.
Can I plot a seaborn boxplot without using a DataFrame?
Yes. You can pass raw arrays or lists directly: sns.boxplot(x=["A"]*50 + ["B"]*50, y=np.random.randn(100)). However, using a DataFrame with named columns is recommended because it automatically generates axis labels.
How do I change the outlier marker style in sns.boxplot?
Pass a dictionary to the flierprops parameter: sns.boxplot(data=tips, x="day", y="total_bill", flierprops={"marker": "D", "markerfacecolor": "red", "markersize": 5}). This changes outlier markers to red diamonds.
What is the difference between hue and x in seaborn boxplot?
The x parameter defines the primary categorical grouping on the horizontal axis. The hue parameter adds a secondary grouping within each category, displayed as side-by-side colored boxes. Use x alone for simple comparisons and add hue when you want to break down each category by a second variable like gender or status.
How do I save a seaborn boxplot to a file?
After creating the plot, call plt.savefig("boxplot.png", dpi=300, bbox_inches="tight"). Seaborn plots are matplotlib figures under the hood, so all matplotlib saving methods work. Supported formats include PNG, PDF, SVG, and EPS.
Conclusion
The seaborn boxplot is one of the fastest ways to understand how your data is distributed, spot outliers, and compare groups. With sns.boxplot(), you go from a raw DataFrame to a publication-quality visualization in a single line of code. The hue parameter adds grouped comparisons without extra effort, and overlaying strip or swarm plots fills in the individual-data-point detail that box plots abstract away.
For static analysis in notebooks and reports, seaborn boxplots are hard to beat. When you need an interactive exploration layer on top of your pandas data, tools like PyGWalker (opens in a new tab) let you build and iterate on visualizations without code changes.
Start with the basic examples above, experiment with palettes and overlays, and you will have a reliable visual toolkit for any data distribution question that comes your way.