Skip to content
Topics
Matplotlib
Matplotlib Histogram: The Complete Guide to plt.hist() in Python

Matplotlib Histogram: The Complete Guide to plt.hist() in Python

Updated on

You have a dataset with thousands of numeric values -- ages, test scores, response times, sensor readings -- and you need to understand how those values are distributed. Are they clustered around a central point? Skewed toward one end? Do they follow a normal distribution? A scatter plot will not help. A bar chart is designed for categories, not continuous data. What you need is a histogram, and in Python, matplotlib.pyplot.hist() is the standard way to build one.

The problem is that plt.hist() has over a dozen parameters, and the default output often looks plain or misleading. Choosing the wrong number of bins can hide important patterns in your data. Comparing multiple distributions on one chart requires knowing the right combination of options. This guide covers every parameter that matters, with working code examples you can copy directly into your notebook or script.

📚

What Is a Histogram and When Should You Use One?

A histogram divides a range of numeric values into equal-width intervals called bins and counts how many data points fall into each bin. The x-axis shows the value range, and the y-axis shows the frequency (count) or density for each bin. Unlike a bar chart, which displays categorical data, a histogram represents the distribution of continuous numerical data.

Use a histogram when you need to:

  • See the shape of a distribution (normal, skewed, bimodal, uniform)
  • Identify outliers or gaps in data
  • Compare the spread of values across groups
  • Decide on data transformations before modeling

Basic plt.hist() Syntax

The simplest histogram requires only one argument: the data array.

import matplotlib.pyplot as plt
import numpy as np
 
# Generate 1000 normally distributed values
np.random.seed(42)
data = np.random.normal(loc=50, scale=15, size=1000)
 
plt.hist(data)
plt.title('Basic Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

By default, matplotlib divides the data into 10 bins. The function returns three objects: the bin counts, the bin edges, and the patch objects (the drawn rectangles). We will cover those return values in detail later.

Full Signature

plt.hist(x, bins=None, range=None, density=False, weights=None,
         cumulative=False, bottom=None, histtype='bar', align='mid',
         orientation='vertical', rwidth=None, log=False, color=None,
         label=None, stacked=False, edgecolor=None, alpha=None)

Controlling Bins

The bins parameter is the single most important setting in a histogram. Too few bins hide patterns. Too many bins create noise.

Setting a Fixed Number of Bins

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
 
axes[0].hist(data, bins=5, edgecolor='black')
axes[0].set_title('5 Bins')
 
axes[1].hist(data, bins=30, edgecolor='black')
axes[1].set_title('30 Bins')
 
axes[2].hist(data, bins=100, edgecolor='black')
axes[2].set_title('100 Bins')
 
plt.tight_layout()
plt.show()

With 5 bins, you see only a rough shape. With 100 bins, small sample sizes per bin introduce visual noise. For this dataset of 1,000 points, 30 bins produces a clear picture of the normal distribution.

Custom Bin Edges

Pass a sequence to bins to define exact boundaries:

custom_edges = [0, 20, 35, 50, 65, 80, 100]
plt.hist(data, bins=custom_edges, edgecolor='black', color='steelblue')
plt.title('Histogram with Custom Bin Edges')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

This is useful when your data has meaningful thresholds -- letter grades, age brackets, or performance tiers.

Automatic Bin Algorithms

Matplotlib supports several algorithms that calculate the optimal number of bins based on data characteristics:

Algorithmbins= ValueMethodBest For
Sturges'sturges'1 + log2(n)Small, roughly normal datasets
Scott'scott'Based on standard deviation and nNormal or near-normal data
Freedman-Diaconis'fd'Based on IQR and nRobust to outliers
Square Root'sqrt'sqrt(n)Quick rough estimate
Auto'auto'Max of Sturges and FDGeneral-purpose default
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
 
for ax, method in zip(axes, ['sturges', 'scott', 'fd']):
    ax.hist(data, bins=method, edgecolor='black', color='#4C72B0')
    ax.set_title(f'bins="{method}"')
 
plt.tight_layout()
plt.show()

For most cases, bins='auto' is a solid starting point. Switch to 'fd' when your data contains outliers, since it uses the interquartile range instead of standard deviation.

Normalized and Density Histograms

By default, the y-axis shows raw counts. Set density=True to normalize the histogram so that the total area under the bars equals 1. This converts the y-axis from frequency to probability density.

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
axes[0].hist(data, bins=30, edgecolor='black', color='#55A868')
axes[0].set_title('Frequency (default)')
axes[0].set_ylabel('Count')
 
axes[1].hist(data, bins=30, edgecolor='black', color='#C44E52', density=True)
axes[1].set_title('Density (density=True)')
axes[1].set_ylabel('Probability Density')
 
plt.tight_layout()
plt.show()

Density normalization is essential when you want to overlay a theoretical distribution curve or compare datasets of different sizes:

from scipy import stats
 
plt.hist(data, bins=30, density=True, edgecolor='black', color='#55A868', alpha=0.7)
 
# Overlay the theoretical normal curve
x_range = np.linspace(data.min(), data.max(), 200)
plt.plot(x_range, stats.norm.pdf(x_range, loc=50, scale=15), 'r-', linewidth=2, label='Normal PDF')
plt.legend()
plt.title('Density Histogram with Normal Curve Overlay')
plt.show()

Customizing Appearance

Color, Edge Color, and Transparency

plt.hist(data, bins=30, color='#4C72B0', edgecolor='white', alpha=0.85)
plt.title('Styled Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Histogram Types

The histtype parameter changes the visual style:

histtype ValueDescription
'bar'Traditional filled bars (default)
'barstacked'Stacked bars for multiple datasets
'step'Unfilled line outline
'stepfilled'Filled area with step outline
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
types = ['bar', 'barstacked', 'step', 'stepfilled']
 
for ax, ht in zip(axes.flat, types):
    ax.hist(data, bins=30, histtype=ht, edgecolor='black', color='#4C72B0')
    ax.set_title(f'histtype="{ht}"')
 
plt.tight_layout()
plt.show()

The 'step' type is particularly useful when overlaying multiple distributions, since unfilled outlines do not obscure each other.

Multiple Histograms on One Plot

Overlapping Histograms

Use alpha (transparency) to layer two or more distributions:

np.random.seed(42)
group_a = np.random.normal(loc=50, scale=10, size=800)
group_b = np.random.normal(loc=65, scale=12, size=800)
 
plt.hist(group_a, bins=30, alpha=0.6, color='#4C72B0', edgecolor='black', label='Group A')
plt.hist(group_b, bins=30, alpha=0.6, color='#C44E52', edgecolor='black', label='Group B')
plt.legend()
plt.title('Overlapping Histograms')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

Side-by-Side Histograms

Pass a list of arrays to plot them with grouped bars:

plt.hist([group_a, group_b], bins=20, color=['#4C72B0', '#C44E52'],
         edgecolor='black', label=['Group A', 'Group B'])
plt.legend()
plt.title('Side-by-Side Histograms')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

When you pass a list of arrays, matplotlib places the bars for each dataset next to each other within each bin.

Stacked Histograms

Set stacked=True to stack one dataset on top of another. This shows both the individual distributions and their combined total.

np.random.seed(42)
freshmen = np.random.normal(loc=68, scale=8, size=500)
sophomores = np.random.normal(loc=72, scale=7, size=400)
juniors = np.random.normal(loc=75, scale=6, size=300)
 
plt.hist([freshmen, sophomores, juniors], bins=25, stacked=True,
         color=['#4C72B0', '#55A868', '#C44E52'], edgecolor='black',
         label=['Freshmen', 'Sophomores', 'Juniors'])
plt.legend()
plt.title('Stacked Histogram: Exam Scores by Class Year')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

Stacked histograms work well when you want to show how sub-groups contribute to an overall distribution. However, they become hard to read with more than three or four groups.

Cumulative Histograms

Set cumulative=True to show how values accumulate from left to right. The last bar reaches the total count (or 1.0 if density=True).

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
axes[0].hist(data, bins=30, cumulative=True, edgecolor='black', color='#DD8452')
axes[0].set_title('Cumulative Histogram (Count)')
axes[0].set_ylabel('Cumulative Count')
 
axes[1].hist(data, bins=30, cumulative=True, density=True, edgecolor='black', color='#8172B3')
axes[1].set_title('Cumulative Histogram (Density)')
axes[1].set_ylabel('Cumulative Probability')
 
plt.tight_layout()
plt.show()

Cumulative histograms are useful for answering questions like "What percentage of values fall below 60?" by reading directly from the y-axis.

Horizontal Histograms

Set orientation='horizontal' to flip the axes. This is helpful when value labels are long or when you want to place the histogram alongside another vertical chart.

plt.hist(data, bins=30, orientation='horizontal', color='#64B5CD', edgecolor='black')
plt.title('Horizontal Histogram')
plt.xlabel('Frequency')
plt.ylabel('Value')
plt.show()

plt.hist() Return Values

plt.hist() returns three values that give you programmatic access to the histogram data:

n, bin_edges, patches = plt.hist(data, bins=20, edgecolor='black', color='#4C72B0')
plt.show()
 
print(f"Bin counts (n): shape = {n.shape}, first 5 = {n[:5]}")
print(f"Bin edges: shape = {bin_edges.shape}, first 5 = {bin_edges[:5]}")
print(f"Patches: {len(patches)} Rectangle objects")
Return ValueTypeDescription
nndarrayCount (or density) for each bin
bin_edgesndarrayEdge values for each bin (length = len(n) + 1)
patcheslist of RectanglesThe matplotlib patch objects for each bar

You can use patches to color individual bars based on their height or position:

n, bin_edges, patches = plt.hist(data, bins=30, edgecolor='black')
 
# Color bars based on height
for count, patch in zip(n, patches):
    if count > 50:
        patch.set_facecolor('#C44E52')
    else:
        patch.set_facecolor('#4C72B0')
 
plt.title('Conditional Bar Coloring')
plt.show()

plt.hist() Common Parameters Reference

ParameterTypeDescriptionDefault
xarray-likeInput dataRequired
binsint, sequence, or strNumber of bins, bin edges, or algorithm name10
rangetupleLower and upper range of the bins(x.min(), x.max())
densityboolNormalize so area equals 1False
weightsarray-likeWeight for each data pointNone
cumulativeboolCompute cumulative histogramFalse
histtypestr'bar', 'barstacked', 'step', 'stepfilled''bar'
orientationstr'vertical' or 'horizontal''vertical'
colorcolor or listBar color(s)None
edgecolorcolorBar edge colorNone
alphafloatTransparency (0 to 1)None
labelstrLabel for the legendNone
stackedboolStack multiple datasetsFalse
logboolLogarithmic y-axisFalse
rwidthfloatRelative width of bars (0 to 1)None
bottomarray-like or scalarBaseline for each bar0

plt.hist() vs sns.histplot(): When to Use Which

If you use seaborn alongside matplotlib, you may wonder which histogram function to use. Here is a direct comparison:

Featureplt.hist()sns.histplot()
Librarymatplotlibseaborn
Input typesArray, list, SeriesArray, Series, DataFrame column
KDE overlayManual (scipy needed)Built-in (kde=True)
Default stylingMinimalPublication-ready
Multiple groupsPass list of arrayshue parameter
Stat optionsCount, densityCount, density, frequency, probability, percent
Bin algorithmssturges, scott, fd, sqrt, autoauto, fd, doane, scott, stone, rice, sturges, sqrt
Log scalelog=Truelog_scale=True
Categorical axisNot supportedSupported via hue
Performance (large data)FasterSlightly slower
Customization depthFull matplotlib APISeaborn + matplotlib API

Use plt.hist() when you need full control over every visual element, when working with subplots, or when seaborn is not available. Use sns.histplot() when you want KDE overlays, cleaner default styling, or need to split data by a categorical variable with minimal code.

Create Interactive Histograms with PyGWalker

Static histograms are great for reports and scripts, but during exploratory data analysis you often need to change bins, filter subsets, and switch between chart types rapidly. PyGWalker (opens in a new tab) is an open-source Python library that turns any pandas or polars DataFrame into an interactive, drag-and-drop visualization interface directly inside Jupyter Notebook -- no frontend code required.

pip install pygwalker
import pandas as pd
import pygwalker as pyg
 
# Load your dataset into a DataFrame
df = pd.DataFrame({
    'score': np.random.normal(70, 12, 2000),
    'group': np.random.choice(['A', 'B', 'C'], 2000)
})
 
# Launch the interactive UI
walker = pyg.walk(df)

Once the interface opens, drag score to the x-axis and PyGWalker automatically generates a histogram. You can adjust bin size, split by group using color encoding, switch to density mode, and export the resulting chart -- all without writing additional code. This is especially useful when you need to explore several variables quickly before writing the final matplotlib code for a report.

Frequently Asked Questions

How do I choose the right number of bins for a matplotlib histogram?

Start with bins='auto', which uses the maximum of the Sturges and Freedman-Diaconis methods. For data with outliers, use bins='fd'. For small datasets (under 200 points), bins='sturges' works well. You can also pass an integer and adjust by eye: increase the number if the distribution looks overly smooth, decrease it if the bars look noisy.

What is the difference between density=True and cumulative=True in plt.hist()?

density=True normalizes the histogram so the total area under all bars equals 1, converting the y-axis to probability density. cumulative=True makes each bar represent the sum of all previous bars plus itself. You can combine both: density=True, cumulative=True produces a cumulative distribution function where the last bar reaches 1.0.

How do I overlay two histograms in matplotlib?

Call plt.hist() twice with the same bins value and set alpha to a value less than 1 (e.g., 0.5 or 0.6) so both distributions remain visible. Add label to each call and finish with plt.legend(). Using histtype='step' as an alternative avoids the need for transparency entirely since it draws only outlines.

Can plt.hist() handle pandas Series and DataFrame columns directly?

Yes. plt.hist() accepts any array-like input, including pandas Series. You can pass df['column_name'] directly. For plotting from a DataFrame using pandas' built-in method, use df['column_name'].plot.hist(bins=30), which wraps matplotlib under the hood.

How do I save a matplotlib histogram as an image file?

After calling plt.hist(), use plt.savefig('histogram.png', dpi=150, bbox_inches='tight') before plt.show(). The bbox_inches='tight' parameter prevents labels from being cut off. Supported formats include PNG, PDF, SVG, and EPS.

📚