Seaborn Histogram: Create Distribution Plots in Python
Updated on
Understanding the distribution of your data is critical for statistical analysis and decision-making. Yet many data professionals struggle with creating clear, informative distribution plots that reveal patterns and outliers. Generic plotting tools often require extensive customization and produce visually unappealing results.
Seaborn's histogram functions solve this problem by providing a high-level interface for creating beautiful distribution plots with minimal code. The library automatically selects appropriate defaults for bin sizes, colors, and styling while giving you fine-grained control when needed.
This guide covers everything you need to master histograms in seaborn, from basic plots to advanced customization techniques. You'll learn how to use sns.histplot() and sns.displot(), control binning strategies, overlay KDE curves, compare multiple distributions, and avoid common pitfalls.
What is a Histogram?
A histogram displays the distribution of a continuous variable by dividing the data range into bins and counting the number of observations in each bin. The height of each bar represents the frequency or density of data points within that bin range.
Histograms help you identify:
- The central tendency (where most values cluster)
- The spread and variability of data
- Skewness and outliers
- Multi-modal distributions (multiple peaks)
Seaborn provides two main functions for creating histograms: histplot() for axes-level plots and displot() for figure-level plots with automatic faceting support.
Basic Histogram with sns.histplot()
The histplot() function is the primary tool for creating histograms in seaborn. It offers a simple interface with powerful customization options.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
np.random.seed(42)
data = np.random.normal(100, 15, 1000)
# Create basic histogram
sns.histplot(data=data)
plt.title('Basic Seaborn Histogram')
plt.xlabel('Value')
plt.ylabel('Count')
plt.show()This creates a histogram with automatically determined bin sizes based on the Freedman-Diaconis rule, which balances detail and noise.
For data in a DataFrame:
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'values': np.random.normal(100, 15, 1000),
'category': np.random.choice(['A', 'B', 'C'], 1000)
})
# Plot histogram from DataFrame
sns.histplot(data=df, x='values')
plt.show()Controlling Bins: Size, Number, and Range
Bin selection dramatically affects how your histogram reveals patterns. Too few bins oversimplify the distribution, while too many bins create noise.
Specify Number of Bins
# Create histogram with 30 bins
sns.histplot(data=data, bins=30)
plt.title('Histogram with 30 Bins')
plt.show()Set Bin Width
# Set specific bin width
sns.histplot(data=data, binwidth=5)
plt.title('Histogram with Bin Width = 5')
plt.show()Define Bin Edges
# Custom bin edges
bin_edges = [70, 80, 90, 100, 110, 120, 130]
sns.histplot(data=data, bins=bin_edges)
plt.title('Histogram with Custom Bin Edges')
plt.show()Control Bin Range
# Limit histogram range
sns.histplot(data=data, binrange=(80, 120))
plt.title('Histogram with Limited Range (80-120)')
plt.show()Adding KDE Overlay
Kernel Density Estimation (KDE) provides a smooth estimate of the probability density function, helping you see the overall shape of the distribution.
# Histogram with KDE overlay
sns.histplot(data=data, kde=True)
plt.title('Histogram with KDE Overlay')
plt.show()You can also show only the KDE curve:
# KDE only (no histogram bars)
sns.kdeplot(data=data)
plt.title('KDE Curve Only')
plt.show()Or combine multiple visualizations:
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Histogram only
sns.histplot(data=data, ax=axes[0])
axes[0].set_title('Histogram Only')
# KDE only
sns.kdeplot(data=data, ax=axes[1])
axes[1].set_title('KDE Only')
# Both combined
sns.histplot(data=data, kde=True, ax=axes[2])
axes[2].set_title('Histogram + KDE')
plt.tight_layout()
plt.show()Understanding the stat Parameter
The stat parameter controls what statistic is computed for each bin, affecting the y-axis interpretation.
| stat Value | Description | Use Case |
|---|---|---|
count | Number of observations (default) | Show absolute frequencies |
frequency | Same as count | Alternative name for count |
density | Normalize so area equals 1 | Compare distributions with different sample sizes |
probability | Normalize so heights sum to 1 | Show probability of bins |
percent | Probability as percentage | More intuitive than probability |
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
stats = ['count', 'frequency', 'density', 'probability', 'percent']
for idx, stat_type in enumerate(stats):
ax = axes[idx // 3, idx % 3]
sns.histplot(data=data, stat=stat_type, kde=True, ax=ax)
ax.set_title(f'stat="{stat_type}"')
# Remove extra subplot
fig.delaxes(axes[1, 2])
plt.tight_layout()
plt.show()Use density when comparing distributions with different sample sizes, as it normalizes the histogram so the total area equals 1.
Comparing Multiple Distributions with hue
The hue parameter allows you to compare distributions across different categories in a single plot.
# Create multi-category data
df = pd.DataFrame({
'values': np.concatenate([
np.random.normal(90, 10, 500),
np.random.normal(105, 12, 500),
np.random.normal(100, 8, 500)
]),
'group': ['A'] * 500 + ['B'] * 500 + ['C'] * 500
})
# Plot with hue
sns.histplot(data=df, x='values', hue='group', kde=True)
plt.title('Multiple Distributions by Group')
plt.show()Control Overlay Behavior with multiple
The multiple parameter determines how multiple distributions are displayed:
| multiple Value | Description | When to Use |
|---|---|---|
layer | Overlay with transparency (default) | Compare shapes and overlaps |
dodge | Place bars side-by-side | Emphasize differences per bin |
stack | Stack bars vertically | Show total and proportions |
fill | Stacked with normalized heights | Focus on proportions |
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
multiple_types = ['layer', 'dodge', 'stack', 'fill']
for idx, mult_type in enumerate(multiple_types):
ax = axes[idx // 2, idx % 2]
sns.histplot(data=df, x='values', hue='group',
multiple=mult_type, ax=ax)
ax.set_title(f'multiple="{mult_type}"')
plt.tight_layout()
plt.show()Using sns.displot() for Figure-Level Plots
While histplot() is an axes-level function, displot() is a figure-level function that provides additional capabilities like automatic faceting.
# Basic displot (creates entire figure)
sns.displot(data=df, x='values', hue='group', kde=True)
plt.show()Advantages of displot()
- Automatic faceting with
colandrowparameters - Consistent sizing across subplots
- Legend outside the plot by default
- Easy switching between histogram, KDE, and ECDF
# Add faceting dimension
df['dataset'] = np.random.choice(['Train', 'Test'], len(df))
# Create faceted plot
sns.displot(data=df, x='values', hue='group',
col='dataset', kde=True, height=4, aspect=1.2)
plt.show()Switch Plot Types with kind
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Histogram
sns.displot(data=df, x='values', kind='hist', kde=True)
plt.title('kind="hist"')
# KDE
sns.displot(data=df, x='values', kind='kde')
plt.title('kind="kde"')
# ECDF (Empirical Cumulative Distribution Function)
sns.displot(data=df, x='values', kind='ecdf')
plt.title('kind="ecdf"')
plt.tight_layout()
plt.show()Function Comparison Table
| Feature | histplot() | displot() | distplot() (deprecated) | matplotlib hist() |
|---|---|---|---|---|
| Level | Axes-level | Figure-level | Axes-level | Axes-level |
| Faceting | No | Yes (col/row) | No | No |
| KDE Support | Yes | Yes | Yes | No (manual) |
| Hue Support | Yes | Yes | Limited | No |
| Multiple Distributions | layer/dodge/stack/fill | layer/dodge/stack/fill | No | No |
| Stat Options | 5 options | 5 options | Limited | Limited |
| Default Style | Modern | Modern | Modern | Basic |
| Status | Current | Current | Deprecated | Standard |
| Best For | Subplots, integration | Standalone plots, faceting | Legacy code | Basic plots |
Recommendation: Use histplot() when creating multiple subplots with plt.subplots(). Use displot() for standalone visualizations or when you need faceting.
histplot() Parameter Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
data | DataFrame | None | Input data structure |
x, y | vector/string | None | Variables for x and y axes |
hue | vector/string | None | Grouping variable for colors |
weights | vector/string | None | Weights for observations |
stat | string | "count" | Statistic to compute (count/frequency/density/probability/percent) |
bins | int/vector | "auto" | Number of bins or bin edges |
binwidth | float | None | Width of bins |
binrange | tuple | None | Range of bins (min, max) |
discrete | bool | None | Treat variable as discrete |
cumulative | bool | False | Compute cumulative distribution |
common_bins | bool | True | Use same bins for all hue levels |
common_norm | bool | True | Use same normalization for all hue levels |
multiple | string | "layer" | How to plot multiple distributions (layer/dodge/stack/fill) |
element | string | "bars" | Visual representation (bars/step/poly) |
fill | bool | True | Fill bars/polygons |
shrink | float | 1 | Scale bar width |
kde | bool | False | Add KDE curve |
kde_kws | dict | None | Additional parameters for KDE |
line_kws | dict | None | Parameters for KDE line |
thresh | float | 0 | Threshold for removing bins |
pthresh | float | None | Threshold as proportion |
pmax | float | None | Maximum proportion to display |
cbar | bool | False | Add colorbar (for bivariate) |
cbar_ax | Axes | None | Axes for colorbar |
cbar_kws | dict | None | Colorbar parameters |
palette | string/list | None | Color palette |
hue_order | list | None | Order for hue levels |
hue_norm | tuple | None | Normalization for hue |
color | color | None | Single color for all elements |
log_scale | bool/tuple | False | Use log scale for axis |
legend | bool | True | Show legend |
ax | Axes | None | Matplotlib axes to plot on |
Advanced Customization
Custom Colors and Styling
# Custom colors for each category
custom_palette = {'A': '#FF6B6B', 'B': '#4ECDC4', 'C': '#45B7D1'}
sns.histplot(data=df, x='values', hue='group',
palette=custom_palette, alpha=0.6,
edgecolor='black', linewidth=1.5)
plt.title('Custom Colored Histogram')
plt.show()Edge Colors and Transparency
# Emphasize bin edges
sns.histplot(data=data, bins=20, edgecolor='black',
linewidth=2, alpha=0.7, color='skyblue')
plt.title('Histogram with Prominent Edges')
plt.show()Logarithmic Scale
For data spanning multiple orders of magnitude, use logarithmic scales:
# Generate log-normal data
log_data = np.random.lognormal(3, 1, 1000)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Linear scale
sns.histplot(data=log_data, ax=axes[0])
axes[0].set_title('Linear Scale')
# Log scale
sns.histplot(data=log_data, log_scale=True, ax=axes[1])
axes[1].set_title('Log Scale')
plt.tight_layout()
plt.show()Cumulative Distributions
Cumulative histograms show the running total of observations up to each bin:
# Cumulative histogram
sns.histplot(data=data, cumulative=True, stat='density',
element='step', fill=False)
plt.title('Cumulative Distribution')
plt.ylabel('Cumulative Density')
plt.show()This is useful for determining percentiles and comparing distributions.
Change Visual Element
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
elements = ['bars', 'step', 'poly']
for ax, elem in zip(axes, elements):
sns.histplot(data=data, element=elem, kde=True, ax=ax)
ax.set_title(f'element="{elem}"')
plt.tight_layout()
plt.show()Bivariate Histograms (2D Histograms)
Seaborn can create 2D histograms to visualize the joint distribution of two variables:
# Generate correlated 2D data
np.random.seed(42)
x = np.random.normal(100, 15, 1000)
y = x + np.random.normal(0, 10, 1000)
# 2D histogram
sns.histplot(x=x, y=y, bins=30, cbar=True)
plt.title('2D Histogram (Bivariate Distribution)')
plt.show()Combine with KDE for Bivariate Analysis
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# 2D histogram
sns.histplot(x=x, y=y, bins=30, cbar=True, ax=axes[0])
axes[0].set_title('2D Histogram')
# KDE contour plot
sns.kdeplot(x=x, y=y, fill=True, cmap='viridis', ax=axes[1])
axes[1].set_title('2D KDE Plot')
plt.tight_layout()
plt.show()Bivariate with Hue
# Add category
categories = np.random.choice(['Group 1', 'Group 2'], 1000)
sns.histplot(x=x, y=y, hue=categories, bins=20)
plt.title('2D Histogram with Hue')
plt.show()Common Mistakes and How to Avoid Them
Mistake 1: Using Too Many or Too Few Bins
Problem: Oversmoothing hides patterns, while too many bins create noise.
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Too few bins
sns.histplot(data=data, bins=5, ax=axes[0])
axes[0].set_title('Too Few Bins (5) - Oversmoothed')
# Good number
sns.histplot(data=data, bins=30, ax=axes[1])
axes[1].set_title('Appropriate Bins (30)')
# Too many bins
sns.histplot(data=data, bins=100, ax=axes[2])
axes[2].set_title('Too Many Bins (100) - Noisy')
plt.tight_layout()
plt.show()Solution: Start with automatic bin selection, then adjust based on your data characteristics. Use Sturges' rule (bins = log2(n) + 1) for normal distributions or Freedman-Diaconis rule for skewed data.
Mistake 2: Comparing Distributions with Different Sample Sizes
Problem: Raw counts make it difficult to compare distributions with different total observations.
# Different sample sizes
small_sample = np.random.normal(100, 15, 200)
large_sample = np.random.normal(100, 15, 2000)
df_samples = pd.DataFrame({
'value': np.concatenate([small_sample, large_sample]),
'sample': ['Small (n=200)'] * 200 + ['Large (n=2000)'] * 2000
})
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Wrong: using count
sns.histplot(data=df_samples, x='value', hue='sample',
stat='count', ax=axes[0])
axes[0].set_title('Wrong: Count (Hard to Compare)')
# Correct: using density
sns.histplot(data=df_samples, x='value', hue='sample',
stat='density', common_norm=False, ax=axes[1])
axes[1].set_title('Correct: Density (Easy to Compare)')
plt.tight_layout()
plt.show()Solution: Use stat='density' or stat='probability' and set common_norm=False.
Mistake 3: Ignoring Overlapping Distributions
Problem: Default layer mode with high opacity makes overlaps invisible.
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Bad: opaque bars hide overlap
sns.histplot(data=df, x='values', hue='group',
alpha=1.0, ax=axes[0])
axes[0].set_title('Bad: Opaque Bars')
# Good: transparency shows overlap
sns.histplot(data=df, x='values', hue='group',
alpha=0.5, ax=axes[1])
axes[1].set_title('Good: Transparent Bars')
plt.tight_layout()
plt.show()Solution: Use alpha=0.5 for transparency or multiple='dodge' to place bars side-by-side.
Mistake 4: Not Labeling Axes Properly
Problem: Unclear what the y-axis represents, especially when using different stat values.
# Good practice: clear labels
sns.histplot(data=data, stat='density', kde=True)
plt.title('Distribution of Values')
plt.xlabel('Value')
plt.ylabel('Density')
plt.show()Solution: Always label axes clearly, especially the y-axis when using stat='density' or stat='probability'.
Mistake 5: Using Deprecated distplot()
Problem: Old code uses distplot(), which is deprecated and less flexible.
# Old way (deprecated)
# sns.distplot(data) # Don't use this
# New way
sns.histplot(data=data, kde=True)
plt.show()Solution: Migrate to histplot() for histograms or kdeplot() for KDE curves.
Visualize Data Interactively with PyGWalker
While seaborn provides excellent static histogram visualizations, PyGWalker offers an interactive alternative that lets you explore distributions dynamically. PyGWalker transforms your pandas DataFrame into an interactive Tableau-like interface where you can create histograms, adjust binning, and switch between visualization types without writing code.
import pygwalker as pyg
import pandas as pd
import numpy as np
# Create sample data
df = pd.DataFrame({
'values': np.random.normal(100, 15, 1000),
'category': np.random.choice(['A', 'B', 'C'], 1000),
'score': np.random.uniform(0, 100, 1000)
})
# Launch interactive explorer
pyg.walk(df)PyGWalker advantages for histogram analysis:
- Drag-and-drop interface: Create histograms by dragging variables to shelves
- Dynamic binning: Adjust bin counts interactively with sliders
- Multi-variable exploration: Quickly switch between variables to compare distributions
- Export capabilities: Save insights as images or share interactive reports
- No coding required: Non-technical team members can explore data independently
Visit github.com/Kanaries/pygwalker (opens in a new tab) to get started with interactive data visualization.
Real-World Example: Analyzing Exam Scores
Let's apply histogram techniques to analyze exam score distributions across different classes:
# Generate realistic exam data
np.random.seed(42)
n_students = 500
exam_data = pd.DataFrame({
'score': np.concatenate([
np.random.normal(75, 10, 200), # Class A
np.random.normal(68, 15, 150), # Class B
np.random.normal(82, 8, 150) # Class C
]),
'class': ['Class A'] * 200 + ['Class B'] * 150 + ['Class C'] * 150,
'study_hours': np.random.uniform(0, 40, n_students)
})
# Clip scores to valid range
exam_data['score'] = exam_data['score'].clip(0, 100)
# Create comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Overall distribution with KDE
sns.histplot(data=exam_data, x='score', kde=True,
bins=30, ax=axes[0, 0])
axes[0, 0].set_title('Overall Score Distribution')
axes[0, 0].axvline(exam_data['score'].mean(), color='red',
linestyle='--', label=f'Mean: {exam_data["score"].mean():.1f}')
axes[0, 0].legend()
# Compare classes
sns.histplot(data=exam_data, x='score', hue='class',
stat='density', common_norm=False,
alpha=0.5, bins=25, ax=axes[0, 1])
axes[0, 1].set_title('Score Distribution by Class')
# Stacked view for proportions
sns.histplot(data=exam_data, x='score', hue='class',
multiple='stack', bins=25, ax=axes[1, 0])
axes[1, 0].set_title('Stacked Distribution (Total Counts)')
# Cumulative distributions
sns.histplot(data=exam_data, x='score', hue='class',
stat='density', element='step', fill=False,
cumulative=True, common_norm=False, ax=axes[1, 1])
axes[1, 1].set_title('Cumulative Distribution by Class')
plt.tight_layout()
plt.show()
# Print summary statistics
print(exam_data.groupby('class')['score'].describe())This example demonstrates:
- Overall distribution analysis with mean reference line
- Density comparison normalized for different class sizes
- Stacked visualization showing contribution of each class
- Cumulative distributions for percentile analysis
FAQ
How do I choose the right number of bins for my histogram?
The optimal number of bins depends on your data size and distribution. Seaborn's automatic bin selection (based on the Freedman-Diaconis rule) works well for most cases. For manual selection, use Sturges' rule: bins = log2(n) + 1 for normal distributions (typically 10-20 bins for 1000 samples), or Square Root rule: bins = sqrt(n) for general data. Experiment with different values and choose the one that reveals patterns without creating excessive noise. Use binwidth instead of bins when you need consistent bin sizes across multiple plots for comparison.
What is the difference between histplot and displot in seaborn?
histplot() is an axes-level function that plots on a specific matplotlib axes object, making it suitable for creating complex multi-plot figures with plt.subplots(). displot() is a figure-level function that creates an entire figure and supports automatic faceting with col and row parameters. Use histplot() when integrating histograms into existing matplotlib figures with multiple subplots. Use displot() for standalone visualizations or when you need to create faceted plots across multiple categorical variables. Both functions support the same core parameters for controlling bins, KDE, hue, and statistics.
Should I use stat='density' or stat='probability' for my histogram?
Use stat='density' when you need the histogram area to equal 1, making it comparable with probability density functions and ideal for overlaying theoretical distributions. Use stat='probability' (or stat='percent') when you want to interpret the y-axis as the proportion of data in each bin, with all bin heights summing to 1 (or 100%). Choose stat='density' for comparing distributions with different sample sizes or when performing statistical analysis. Choose stat='probability' for more intuitive interpretation in presentations or when explaining results to non-technical audiences.
How do I create a histogram with multiple overlapping distributions?
Use the hue parameter to specify a categorical variable that splits your data into groups. Control the overlay behavior with the multiple parameter: use 'layer' (default) for transparent overlapping bars, 'dodge' for side-by-side bars, 'stack' for vertically stacked bars showing totals, or 'fill' for normalized stacked bars showing proportions. Set alpha=0.5 to make overlapping regions visible. When comparing distributions with different sample sizes, always use stat='density' or stat='probability' with common_norm=False to ensure fair comparison. Add KDE curves with kde=True to highlight the overall shape of each distribution.
Why is seaborn's distplot deprecated and what should I use instead?
Seaborn deprecated distplot() in version 0.11.0 because it combined multiple functionalities in a single function with inconsistent parameter naming. The replacement functions provide clearer interfaces and more flexibility: use histplot() for histograms, kdeplot() for kernel density estimation, ecdfplot() for empirical cumulative distribution functions, and rugplot() for rug plots. These new functions offer better parameter names, more customization options, native support for hue-based grouping, and consistent behavior with other seaborn functions. To migrate old code, replace sns.distplot(data) with sns.histplot(data, kde=True) for the most common use case.
Conclusion
Seaborn histograms provide a powerful and flexible way to visualize data distributions with minimal code. The histplot() function offers fine-grained control over binning, statistics, and grouping, while displot() simplifies faceted visualizations. By mastering parameters like bins, stat, hue, and multiple, you can create publication-quality distribution plots that reveal patterns, outliers, and differences across groups.
Key takeaways:
- Use automatic bin selection for initial exploration, then adjust for clarity
- Apply
stat='density'when comparing distributions with different sample sizes - Leverage
hueandmultipleparameters to compare multiple distributions effectively - Add KDE overlays to show smooth distribution shapes
- Choose
histplot()for integration with matplotlib subplots anddisplot()for standalone faceted plots
Whether you're analyzing exam scores, scientific measurements, or business metrics, seaborn histograms help you understand your data distribution and communicate insights effectively. Combine these techniques with interactive tools like PyGWalker for comprehensive exploratory data analysis.