Matplotlib Histogram: The Complete Guide to plt.hist() in Python
Updated on
You have a dataset with thousands of numeric values -- ages, test scores, response times, sensor readings -- and you need to understand how those values are distributed. Are they clustered around a central point? Skewed toward one end? Do they follow a normal distribution? A scatter plot will not help. A bar chart is designed for categories, not continuous data. What you need is a histogram, and in Python, matplotlib.pyplot.hist() is the standard way to build one.
The problem is that plt.hist() has over a dozen parameters, and the default output often looks plain or misleading. Choosing the wrong number of bins can hide important patterns in your data. Comparing multiple distributions on one chart requires knowing the right combination of options. This guide covers every parameter that matters, with working code examples you can copy directly into your notebook or script.
What Is a Histogram and When Should You Use One?
A histogram divides a range of numeric values into equal-width intervals called bins and counts how many data points fall into each bin. The x-axis shows the value range, and the y-axis shows the frequency (count) or density for each bin. Unlike a bar chart, which displays categorical data, a histogram represents the distribution of continuous numerical data.
Use a histogram when you need to:
- See the shape of a distribution (normal, skewed, bimodal, uniform)
- Identify outliers or gaps in data
- Compare the spread of values across groups
- Decide on data transformations before modeling
Basic plt.hist() Syntax
The simplest histogram requires only one argument: the data array.
import matplotlib.pyplot as plt
import numpy as np
# Generate 1000 normally distributed values
np.random.seed(42)
data = np.random.normal(loc=50, scale=15, size=1000)
plt.hist(data)
plt.title('Basic Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()By default, matplotlib divides the data into 10 bins. The function returns three objects: the bin counts, the bin edges, and the patch objects (the drawn rectangles). We will cover those return values in detail later.
Full Signature
plt.hist(x, bins=None, range=None, density=False, weights=None,
cumulative=False, bottom=None, histtype='bar', align='mid',
orientation='vertical', rwidth=None, log=False, color=None,
label=None, stacked=False, edgecolor=None, alpha=None)Controlling Bins
The bins parameter is the single most important setting in a histogram. Too few bins hide patterns. Too many bins create noise.
Setting a Fixed Number of Bins
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
axes[0].hist(data, bins=5, edgecolor='black')
axes[0].set_title('5 Bins')
axes[1].hist(data, bins=30, edgecolor='black')
axes[1].set_title('30 Bins')
axes[2].hist(data, bins=100, edgecolor='black')
axes[2].set_title('100 Bins')
plt.tight_layout()
plt.show()With 5 bins, you see only a rough shape. With 100 bins, small sample sizes per bin introduce visual noise. For this dataset of 1,000 points, 30 bins produces a clear picture of the normal distribution.
Custom Bin Edges
Pass a sequence to bins to define exact boundaries:
custom_edges = [0, 20, 35, 50, 65, 80, 100]
plt.hist(data, bins=custom_edges, edgecolor='black', color='steelblue')
plt.title('Histogram with Custom Bin Edges')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()This is useful when your data has meaningful thresholds -- letter grades, age brackets, or performance tiers.
Automatic Bin Algorithms
Matplotlib supports several algorithms that calculate the optimal number of bins based on data characteristics:
| Algorithm | bins= Value | Method | Best For |
|---|---|---|---|
| Sturges | 'sturges' | 1 + log2(n) | Small, roughly normal datasets |
| Scott | 'scott' | Based on standard deviation and n | Normal or near-normal data |
| Freedman-Diaconis | 'fd' | Based on IQR and n | Robust to outliers |
| Square Root | 'sqrt' | sqrt(n) | Quick rough estimate |
| Auto | 'auto' | Max of Sturges and FD | General-purpose default |
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, method in zip(axes, ['sturges', 'scott', 'fd']):
ax.hist(data, bins=method, edgecolor='black', color='#4C72B0')
ax.set_title(f'bins="{method}"')
plt.tight_layout()
plt.show()For most cases, bins='auto' is a solid starting point. Switch to 'fd' when your data contains outliers, since it uses the interquartile range instead of standard deviation.
Normalized and Density Histograms
By default, the y-axis shows raw counts. Set density=True to normalize the histogram so that the total area under the bars equals 1. This converts the y-axis from frequency to probability density.
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].hist(data, bins=30, edgecolor='black', color='#55A868')
axes[0].set_title('Frequency (default)')
axes[0].set_ylabel('Count')
axes[1].hist(data, bins=30, edgecolor='black', color='#C44E52', density=True)
axes[1].set_title('Density (density=True)')
axes[1].set_ylabel('Probability Density')
plt.tight_layout()
plt.show()Density normalization is essential when you want to overlay a theoretical distribution curve or compare datasets of different sizes:
from scipy import stats
plt.hist(data, bins=30, density=True, edgecolor='black', color='#55A868', alpha=0.7)
# Overlay the theoretical normal curve
x_range = np.linspace(data.min(), data.max(), 200)
plt.plot(x_range, stats.norm.pdf(x_range, loc=50, scale=15), 'r-', linewidth=2, label='Normal PDF')
plt.legend()
plt.title('Density Histogram with Normal Curve Overlay')
plt.show()Customizing Appearance
Color, Edge Color, and Transparency
plt.hist(data, bins=30, color='#4C72B0', edgecolor='white', alpha=0.85)
plt.title('Styled Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()Histogram Types
The histtype parameter changes the visual style:
histtype Value | Description |
|---|---|
'bar' | Traditional filled bars (default) |
'barstacked' | Stacked bars for multiple datasets |
'step' | Unfilled line outline |
'stepfilled' | Filled area with step outline |
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
types = ['bar', 'barstacked', 'step', 'stepfilled']
for ax, ht in zip(axes.flat, types):
ax.hist(data, bins=30, histtype=ht, edgecolor='black', color='#4C72B0')
ax.set_title(f'histtype="{ht}"')
plt.tight_layout()
plt.show()The 'step' type is particularly useful when overlaying multiple distributions, since unfilled outlines do not obscure each other.
Multiple Histograms on One Plot
Overlapping Histograms
Use alpha (transparency) to layer two or more distributions:
np.random.seed(42)
group_a = np.random.normal(loc=50, scale=10, size=800)
group_b = np.random.normal(loc=65, scale=12, size=800)
plt.hist(group_a, bins=30, alpha=0.6, color='#4C72B0', edgecolor='black', label='Group A')
plt.hist(group_b, bins=30, alpha=0.6, color='#C44E52', edgecolor='black', label='Group B')
plt.legend()
plt.title('Overlapping Histograms')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()Side-by-Side Histograms
Pass a list of arrays to plot them with grouped bars:
plt.hist([group_a, group_b], bins=20, color=['#4C72B0', '#C44E52'],
edgecolor='black', label=['Group A', 'Group B'])
plt.legend()
plt.title('Side-by-Side Histograms')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()When you pass a list of arrays, matplotlib places the bars for each dataset next to each other within each bin.
Stacked Histograms
Set stacked=True to stack one dataset on top of another. This shows both the individual distributions and their combined total.
np.random.seed(42)
freshmen = np.random.normal(loc=68, scale=8, size=500)
sophomores = np.random.normal(loc=72, scale=7, size=400)
juniors = np.random.normal(loc=75, scale=6, size=300)
plt.hist([freshmen, sophomores, juniors], bins=25, stacked=True,
color=['#4C72B0', '#55A868', '#C44E52'], edgecolor='black',
label=['Freshmen', 'Sophomores', 'Juniors'])
plt.legend()
plt.title('Stacked Histogram: Exam Scores by Class Year')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()Stacked histograms work well when you want to show how sub-groups contribute to an overall distribution. However, they become hard to read with more than three or four groups.
Cumulative Histograms
Set cumulative=True to show how values accumulate from left to right. The last bar reaches the total count (or 1.0 if density=True).
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].hist(data, bins=30, cumulative=True, edgecolor='black', color='#DD8452')
axes[0].set_title('Cumulative Histogram (Count)')
axes[0].set_ylabel('Cumulative Count')
axes[1].hist(data, bins=30, cumulative=True, density=True, edgecolor='black', color='#8172B3')
axes[1].set_title('Cumulative Histogram (Density)')
axes[1].set_ylabel('Cumulative Probability')
plt.tight_layout()
plt.show()Cumulative histograms are useful for answering questions like "What percentage of values fall below 60?" by reading directly from the y-axis.
Horizontal Histograms
Set orientation='horizontal' to flip the axes. This is helpful when value labels are long or when you want to place the histogram alongside another vertical chart.
plt.hist(data, bins=30, orientation='horizontal', color='#64B5CD', edgecolor='black')
plt.title('Horizontal Histogram')
plt.xlabel('Frequency')
plt.ylabel('Value')
plt.show()plt.hist() Return Values
plt.hist() returns three values that give you programmatic access to the histogram data:
n, bin_edges, patches = plt.hist(data, bins=20, edgecolor='black', color='#4C72B0')
plt.show()
print(f"Bin counts (n): shape = {n.shape}, first 5 = {n[:5]}")
print(f"Bin edges: shape = {bin_edges.shape}, first 5 = {bin_edges[:5]}")
print(f"Patches: {len(patches)} Rectangle objects")| Return Value | Type | Description |
|---|---|---|
n | ndarray | Count (or density) for each bin |
bin_edges | ndarray | Edge values for each bin (length = len(n) + 1) |
patches | list of Rectangles | The matplotlib patch objects for each bar |
You can use patches to color individual bars based on their height or position:
n, bin_edges, patches = plt.hist(data, bins=30, edgecolor='black')
# Color bars based on height
for count, patch in zip(n, patches):
if count > 50:
patch.set_facecolor('#C44E52')
else:
patch.set_facecolor('#4C72B0')
plt.title('Conditional Bar Coloring')
plt.show()plt.hist() Common Parameters Reference
| Parameter | Type | Description | Default |
|---|---|---|---|
x | array-like | Input data | Required |
bins | int, sequence, or str | Number of bins, bin edges, or algorithm name | 10 |
range | tuple | Lower and upper range of the bins | (x.min(), x.max()) |
density | bool | Normalize so area equals 1 | False |
weights | array-like | Weight for each data point | None |
cumulative | bool | Compute cumulative histogram | False |
histtype | str | 'bar', 'barstacked', 'step', 'stepfilled' | 'bar' |
orientation | str | 'vertical' or 'horizontal' | 'vertical' |
color | color or list | Bar color(s) | None |
edgecolor | color | Bar edge color | None |
alpha | float | Transparency (0 to 1) | None |
label | str | Label for the legend | None |
stacked | bool | Stack multiple datasets | False |
log | bool | Logarithmic y-axis | False |
rwidth | float | Relative width of bars (0 to 1) | None |
bottom | array-like or scalar | Baseline for each bar | 0 |
plt.hist() vs sns.histplot(): When to Use Which
If you use seaborn alongside matplotlib, you may wonder which histogram function to use. Here is a direct comparison:
| Feature | plt.hist() | sns.histplot() |
|---|---|---|
| Library | matplotlib | seaborn |
| Input types | Array, list, Series | Array, Series, DataFrame column |
| KDE overlay | Manual (scipy needed) | Built-in (kde=True) |
| Default styling | Minimal | Publication-ready |
| Multiple groups | Pass list of arrays | hue parameter |
| Stat options | Count, density | Count, density, frequency, probability, percent |
| Bin algorithms | sturges, scott, fd, sqrt, auto | auto, fd, doane, scott, stone, rice, sturges, sqrt |
| Log scale | log=True | log_scale=True |
| Categorical axis | Not supported | Supported via hue |
| Performance (large data) | Faster | Slightly slower |
| Customization depth | Full matplotlib API | Seaborn + matplotlib API |
Use plt.hist() when you need full control over every visual element, when working with subplots, or when seaborn is not available. Use sns.histplot() when you want KDE overlays, cleaner default styling, or need to split data by a categorical variable with minimal code.
Create Interactive Histograms with PyGWalker
Static histograms are great for reports and scripts, but during exploratory data analysis you often need to change bins, filter subsets, and switch between chart types rapidly. PyGWalker (opens in a new tab) is an open-source Python library that turns any pandas or polars DataFrame into an interactive, drag-and-drop visualization interface directly inside Jupyter Notebook -- no frontend code required.
pip install pygwalkerimport pandas as pd
import pygwalker as pyg
# Load your dataset into a DataFrame
df = pd.DataFrame({
'score': np.random.normal(70, 12, 2000),
'group': np.random.choice(['A', 'B', 'C'], 2000)
})
# Launch the interactive UI
walker = pyg.walk(df)Once the interface opens, drag score to the x-axis and PyGWalker automatically generates a histogram. You can adjust bin size, split by group using color encoding, switch to density mode, and export the resulting chart -- all without writing additional code. This is especially useful when you need to explore several variables quickly before writing the final matplotlib code for a report.
Frequently Asked Questions
How do I choose the right number of bins for a matplotlib histogram?
Start with bins='auto', which uses the maximum of the Sturges and Freedman-Diaconis methods. For data with outliers, use bins='fd'. For small datasets (under 200 points), bins='sturges' works well. You can also pass an integer and adjust by eye: increase the number if the distribution looks overly smooth, decrease it if the bars look noisy.
What is the difference between density=True and cumulative=True in plt.hist()?
density=True normalizes the histogram so the total area under all bars equals 1, converting the y-axis to probability density. cumulative=True makes each bar represent the sum of all previous bars plus itself. You can combine both: density=True, cumulative=True produces a cumulative distribution function where the last bar reaches 1.0.
How do I overlay two histograms in matplotlib?
Call plt.hist() twice with the same bins value and set alpha to a value less than 1 (e.g., 0.5 or 0.6) so both distributions remain visible. Add label to each call and finish with plt.legend(). Using histtype='step' as an alternative avoids the need for transparency entirely since it draws only outlines.
Can plt.hist() handle pandas Series and DataFrame columns directly?
Yes. plt.hist() accepts any array-like input, including pandas Series. You can pass df['column_name'] directly. For plotting from a DataFrame using pandas' built-in method, use df['column_name'].plot.hist(bins=30), which wraps matplotlib under the hood.
How do I save a matplotlib histogram as an image file?
After calling plt.hist(), use plt.savefig('histogram.png', dpi=150, bbox_inches='tight') before plt.show(). The bbox_inches='tight' parameter prevents labels from being cut off. Supported formats include PNG, PDF, SVG, and EPS.