Matplotlib Histogram: The Complete Guide to plt.hist() in Python

Q: How do I choose the right number of bins for a matplotlib histogram?

Start with bins='auto', which uses the maximum of the Sturges and Freedman-Diaconis methods. For data with outliers, use bins='fd'. For small datasets (under 200 points), bins='sturges' works well. You can also pass an integer and adjust by eye.

Q: What is the difference between density=True and cumulative=True in plt.hist()?

density=True normalizes the histogram so the total area under all bars equals 1, converting the y-axis to probability density. cumulative=True makes each bar represent the sum of all previous bars plus itself. You can combine both to produce a cumulative distribution function.

Q: How do I overlay two histograms in matplotlib?

Call plt.hist() twice with the same bins value and set alpha to a value less than 1 so both distributions remain visible. Add label to each call and finish with plt.legend(). Using histtype='step' avoids the need for transparency since it draws only outlines.

Q: Can plt.hist() handle pandas Series and DataFrame columns directly?

Yes. plt.hist() accepts any array-like input, including pandas Series. You can pass df['column_name'] directly. For plotting from a DataFrame using pandas' built-in method, use df['column_name'].plot.hist(bins=30).

Q: How do I save a matplotlib histogram as an image file?

After calling plt.hist(), use plt.savefig('histogram.png', dpi=150, bbox_inches='tight') before plt.show(). The bbox_inches='tight' parameter prevents labels from being cut off. Supported formats include PNG, PDF, SVG, and EPS.

Name: Soren Atelier

Updated on 2/9/2026

You have a dataset with thousands of numeric values -- ages, test scores, response times, sensor readings -- and you need to understand how those values are distributed. Are they clustered around a central point? Skewed toward one end? Do they follow a normal distribution? A scatter plot will not help. A bar chart is designed for categories, not continuous data. What you need is a histogram, and in Python, matplotlib.pyplot.hist() is the standard way to build one.

The problem is that plt.hist() has over a dozen parameters, and the default output often looks plain or misleading. Choosing the wrong number of bins can hide important patterns in your data. Comparing multiple distributions on one chart requires knowing the right combination of options. This guide covers every parameter that matters, with working code examples you can copy directly into your notebook or script.

📚

What Is a Histogram and When Should You Use One?

A histogram divides a range of numeric values into equal-width intervals called bins and counts how many data points fall into each bin. The x-axis shows the value range, and the y-axis shows the frequency (count) or density for each bin. Unlike a bar chart, which displays categorical data, a histogram represents the distribution of continuous numerical data.

Use a histogram when you need to:

See the shape of a distribution (normal, skewed, bimodal, uniform)
Identify outliers or gaps in data
Compare the spread of values across groups
Decide on data transformations before modeling

For a complementary view, seaborn boxplot shows the same five-number summary (median, quartiles, whiskers, outliers) in a more compact form.

Basic plt.hist() Syntax

The simplest histogram requires only one argument: the data array.

import matplotlib.pyplot as plt
import numpy as np
 
# Generate 1000 normally distributed values
np.random.seed(42)
data = np.random.normal(loc=50, scale=15, size=1000)
 
plt.hist(data)
plt.title('Basic Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

By default, matplotlib divides the data into 10 bins. The function returns three objects: the bin counts, the bin edges, and the patch objects (the drawn rectangles). We will cover those return values in detail later.

Full Signature

plt.hist(x, bins=None, range=None, density=False, weights=None,
         cumulative=False, bottom=None, histtype='bar', align='mid',
         orientation='vertical', rwidth=None, log=False, color=None,
         label=None, stacked=False, edgecolor=None, alpha=None)

Controlling Bins

The bins parameter is the single most important setting in a histogram. Too few bins hide patterns. Too many bins create noise.

Setting a Fixed Number of Bins

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
 
axes[0].hist(data, bins=5, edgecolor='black')
axes[0].set_title('5 Bins')
 
axes[1].hist(data, bins=30, edgecolor='black')
axes[1].set_title('30 Bins')
 
axes[2].hist(data, bins=100, edgecolor='black')
axes[2].set_title('100 Bins')
 
plt.tight_layout()
plt.show()

With 5 bins, you see only a rough shape. With 100 bins, small sample sizes per bin introduce visual noise. For this dataset of 1,000 points, 30 bins produces a clear picture of the normal distribution.

Custom Bin Edges

Pass a sequence to bins to define exact boundaries:

custom_edges = [0, 20, 35, 50, 65, 80, 100]
plt.hist(data, bins=custom_edges, edgecolor='black', color='steelblue')
plt.title('Histogram with Custom Bin Edges')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

This is useful when your data has meaningful thresholds -- letter grades, age brackets, or performance tiers.

Automatic Bin Algorithms

Matplotlib supports several algorithms that calculate the optimal number of bins based on data characteristics:

Algorithm	`bins=` Value	Method	Best For
Sturges	`'sturges'`	`1 + log2(n)`	Small, roughly normal datasets
Scott	`'scott'`	Based on standard deviation and n	Normal or near-normal data
Freedman-Diaconis	`'fd'`	Based on IQR and n	Robust to outliers
Square Root	`'sqrt'`	`sqrt(n)`	Quick rough estimate
Auto	`'auto'`	Max of Sturges and FD	General-purpose default

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
 
for ax, method in zip(axes, ['sturges', 'scott', 'fd']):
    ax.hist(data, bins=method, edgecolor='black', color='#4C72B0')
    ax.set_title(f'bins="{method}"')
 
plt.tight_layout()
plt.show()

For most cases, bins='auto' is a solid starting point. Switch to 'fd' when your data contains outliers, since it uses the interquartile range instead of standard deviation.

Normalized and Density Histograms

By default, the y-axis shows raw counts. Set density=True to normalize the histogram so that the total area under the bars equals 1. This converts the y-axis from frequency to probability density.

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
axes[0].hist(data, bins=30, edgecolor='black', color='#55A868')
axes[0].set_title('Frequency (default)')
axes[0].set_ylabel('Count')
 
axes[1].hist(data, bins=30, edgecolor='black', color='#C44E52', density=True)
axes[1].set_title('Density (density=True)')
axes[1].set_ylabel('Probability Density')
 
plt.tight_layout()
plt.show()

Density normalization is essential when you want to overlay a theoretical distribution curve or compare datasets of different sizes:

from scipy import stats
 
plt.hist(data, bins=30, density=True, edgecolor='black', color='#55A868', alpha=0.7)
 
# Overlay the theoretical normal curve
x_range = np.linspace(data.min(), data.max(), 200)
plt.plot(x_range, stats.norm.pdf(x_range, loc=50, scale=15), 'r-', linewidth=2, label='Normal PDF')
plt.legend()
plt.title('Density Histogram with Normal Curve Overlay')
plt.show()

Customizing Appearance

Color, Edge Color, and Transparency

plt.hist(data, bins=30, color='#4C72B0', edgecolor='white', alpha=0.85)
plt.title('Styled Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Histogram Types

The histtype parameter changes the visual style:

`histtype` Value	Description
`'bar'`	Traditional filled bars (default)
`'barstacked'`	Stacked bars for multiple datasets
`'step'`	Unfilled line outline
`'stepfilled'`	Filled area with step outline

fig, axes = plt.subplots(2, 2, figsize=(10, 8))
types = ['bar', 'barstacked', 'step', 'stepfilled']
 
for ax, ht in zip(axes.flat, types):
    ax.hist(data, bins=30, histtype=ht, edgecolor='black', color='#4C72B0')
    ax.set_title(f'histtype="{ht}"')
 
plt.tight_layout()
plt.show()

The 'step' type is particularly useful when overlaying multiple distributions, since unfilled outlines do not obscure each other.

Multiple Histograms on One Plot

Overlapping Histograms

Use alpha (transparency) to layer two or more distributions:

np.random.seed(42)
group_a = np.random.normal(loc=50, scale=10, size=800)
group_b = np.random.normal(loc=65, scale=12, size=800)
 
plt.hist(group_a, bins=30, alpha=0.6, color='#4C72B0', edgecolor='black', label='Group A')
plt.hist(group_b, bins=30, alpha=0.6, color='#C44E52', edgecolor='black', label='Group B')
plt.legend()
plt.title('Overlapping Histograms')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

Side-by-Side Histograms

Pass a list of arrays to plot them with grouped bars:

plt.hist([group_a, group_b], bins=20, color=['#4C72B0', '#C44E52'],
         edgecolor='black', label=['Group A', 'Group B'])
plt.legend()
plt.title('Side-by-Side Histograms')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

When you pass a list of arrays, matplotlib places the bars for each dataset next to each other within each bin.

Stacked Histograms

Set stacked=True to stack one dataset on top of another. This shows both the individual distributions and their combined total.

np.random.seed(42)
freshmen = np.random.normal(loc=68, scale=8, size=500)
sophomores = np.random.normal(loc=72, scale=7, size=400)
juniors = np.random.normal(loc=75, scale=6, size=300)
 
plt.hist([freshmen, sophomores, juniors], bins=25, stacked=True,
         color=['#4C72B0', '#55A868', '#C44E52'], edgecolor='black',
         label=['Freshmen', 'Sophomores', 'Juniors'])
plt.legend()
plt.title('Stacked Histogram: Exam Scores by Class Year')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

Stacked histograms work well when you want to show how sub-groups contribute to an overall distribution. However, they become hard to read with more than three or four groups.

Cumulative Histograms

Set cumulative=True to show how values accumulate from left to right. The last bar reaches the total count (or 1.0 if density=True).

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
axes[0].hist(data, bins=30, cumulative=True, edgecolor='black', color='#DD8452')
axes[0].set_title('Cumulative Histogram (Count)')
axes[0].set_ylabel('Cumulative Count')
 
axes[1].hist(data, bins=30, cumulative=True, density=True, edgecolor='black', color='#8172B3')
axes[1].set_title('Cumulative Histogram (Density)')
axes[1].set_ylabel('Cumulative Probability')
 
plt.tight_layout()
plt.show()

Cumulative histograms are useful for answering questions like "What percentage of values fall below 60?" by reading directly from the y-axis.

Horizontal Histograms

Set orientation='horizontal' to flip the axes. This is helpful when value labels are long or when you want to place the histogram alongside another vertical chart.

plt.hist(data, bins=30, orientation='horizontal', color='#64B5CD', edgecolor='black')
plt.title('Horizontal Histogram')
plt.xlabel('Frequency')
plt.ylabel('Value')
plt.show()

plt.hist() Return Values

plt.hist() returns three values that give you programmatic access to the histogram data:

n, bin_edges, patches = plt.hist(data, bins=20, edgecolor='black', color='#4C72B0')
plt.show()
 
print(f"Bin counts (n): shape = {n.shape}, first 5 = {n[:5]}")
print(f"Bin edges: shape = {bin_edges.shape}, first 5 = {bin_edges[:5]}")
print(f"Patches: {len(patches)} Rectangle objects")

Return Value	Type	Description
`n`	ndarray	Count (or density) for each bin
`bin_edges`	ndarray	Edge values for each bin (length = len(n) + 1)
`patches`	list of Rectangles	The matplotlib patch objects for each bar

You can use patches to color individual bars based on their height or position. For a thorough guide to choosing the right colors, see the matplotlib colormap reference:

n, bin_edges, patches = plt.hist(data, bins=30, edgecolor='black')
 
# Color bars based on height
for count, patch in zip(n, patches):
    if count > 50:
        patch.set_facecolor('#C44E52')
    else:
        patch.set_facecolor('#4C72B0')
 
plt.title('Conditional Bar Coloring')
plt.show()

plt.hist() Common Parameters Reference

Parameter	Type	Description	Default
`x`	array-like	Input data	Required
`bins`	int, sequence, or str	Number of bins, bin edges, or algorithm name	`10`
`range`	tuple	Lower and upper range of the bins	`(x.min(), x.max())`
`density`	bool	Normalize so area equals 1	`False`
`weights`	array-like	Weight for each data point	`None`
`cumulative`	bool	Compute cumulative histogram	`False`
`histtype`	str	`'bar'`, `'barstacked'`, `'step'`, `'stepfilled'`	`'bar'`
`orientation`	str	`'vertical'` or `'horizontal'`	`'vertical'`
`color`	color or list	Bar color(s)	`None`
`edgecolor`	color	Bar edge color	`None`
`alpha`	float	Transparency (0 to 1)	`None`
`label`	str	Label for the legend	`None`
`stacked`	bool	Stack multiple datasets	`False`
`log`	bool	Logarithmic y-axis	`False`
`rwidth`	float	Relative width of bars (0 to 1)	`None`
`bottom`	array-like or scalar	Baseline for each bar	`0`

plt.hist() vs sns.histplot(): When to Use Which

If you use seaborn alongside matplotlib, you may wonder which histogram function to use. Here is a direct comparison:

Feature	`plt.hist()`	`sns.histplot()`
Library	matplotlib	seaborn
Input types	Array, list, Series	Array, Series, DataFrame column
KDE overlay	Manual (scipy needed)	Built-in (`kde=True`)
Default styling	Minimal	Publication-ready
Multiple groups	Pass list of arrays	`hue` parameter
Stat options	Count, density	Count, density, frequency, probability, percent
Bin algorithms	sturges, scott, fd, sqrt, auto	auto, fd, doane, scott, stone, rice, sturges, sqrt
Log scale	`log=True`	`log_scale=True`
Categorical axis	Not supported	Supported via `hue`
Performance (large data)	Faster	Slightly slower
Customization depth	Full matplotlib API	Seaborn + matplotlib API

Use plt.hist() when you need full control over every visual element, when working with subplots, or when seaborn is not available. Use sns.histplot() when you want KDE overlays, cleaner default styling, or need to split data by a categorical variable with minimal code.

Create Interactive Histograms with PyGWalker

Static histograms are great for reports and scripts, but during exploratory data analysis you often need to change bins, filter subsets, and switch between chart types rapidly. PyGWalker (opens in a new tab) is an open-source Python library that turns any pandas or polars DataFrame into an interactive, drag-and-drop visualization interface directly inside Jupyter Notebook -- no frontend code required.

pip install pygwalker

import pandas as pd
import pygwalker as pyg
 
# Load your dataset into a DataFrame
df = pd.DataFrame({
    'score': np.random.normal(70, 12, 2000),
    'group': np.random.choice(['A', 'B', 'C'], 2000)
})
 
# Launch the interactive UI
walker = pyg.walk(df)

Once the interface opens, drag score to the x-axis and PyGWalker automatically generates a histogram. You can adjust bin size, split by group using color encoding, switch to density mode, and export the resulting chart -- all without writing additional code. This is especially useful when you need to explore several variables quickly before writing the final matplotlib code for a report.

Frequently Asked Questions

How do I choose the right number of bins for a matplotlib histogram?

Start with bins='auto', which uses the maximum of the Sturges and Freedman-Diaconis methods. For data with outliers, use bins='fd'. For small datasets (under 200 points), bins='sturges' works well. You can also pass an integer and adjust by eye: increase the number if the distribution looks overly smooth, decrease it if the bars look noisy.

What is the difference between density=True and cumulative=True in plt.hist()?

density=True normalizes the histogram so the total area under all bars equals 1, converting the y-axis to probability density. cumulative=True makes each bar represent the sum of all previous bars plus itself. You can combine both: density=True, cumulative=True produces a cumulative distribution function where the last bar reaches 1.0.

How do I overlay two histograms in matplotlib?

Call plt.hist() twice with the same bins value and set alpha to a value less than 1 (e.g., 0.5 or 0.6) so both distributions remain visible. Add label to each call and finish with plt.legend(). Using histtype='step' as an alternative avoids the need for transparency entirely since it draws only outlines.

Can plt.hist() handle pandas Series and DataFrame columns directly?

Yes. plt.hist() accepts any array-like input, including pandas Series. You can pass df['column_name'] directly. For plotting from a DataFrame using pandas' built-in method, use df['column_name'].plot.hist(bins=30), which wraps matplotlib under the hood.

How do I save a matplotlib histogram as an image file?

After calling plt.hist(), use plt.savefig('histogram.png', dpi=150, bbox_inches='tight') before plt.show(). The bbox_inches='tight' parameter prevents labels from being cut off. Supported formats include PNG, PDF, SVG, and EPS. For a full guide on export issues, see matplotlib savefig.

Related Guides

Seaborn Histogram -- higher-level histogram API with built-in KDE overlay, hue grouping, and cleaner defaults.
Seaborn Boxplot -- visualize distribution summary statistics as a compact alternative to histograms.
Matplotlib Subplots -- arrange multiple histograms in grid layouts for side-by-side comparison.
Matplotlib Colormap -- apply meaningful color schemes when coloring bars by value.
Pandas Value Counts -- quick frequency analysis before deciding on bin edges.

📚