Skip to content

Pandas value_counts(): Count Unique Values Like a Pro

Updated on

You have a column with thousands of entries -- product categories, status codes, survey responses -- and you need to know what values appear, how often, and in what proportion. Writing loops or manual counting logic wastes time and invites bugs. Pandas value_counts() solves this in a single call, but most tutorials only scratch the surface of what it can do.

The method handles far more than simple frequency tables. It bins continuous data into ranges, computes percentage distributions, deals with missing values, and works across entire DataFrames. Yet many analysts stick to the bare default call and miss the parameters that make value_counts() a serious exploratory data analysis (EDA) tool.

This guide covers every parameter, walks through real-world scenarios, and shows how to combine value_counts() with groupby, crosstab, plotting, and other Pandas methods to get clean, actionable frequency analyses.

📚

What value_counts() Returns

value_counts() returns a Series containing counts of unique values, sorted in descending order by default. The index holds the unique values, and the data holds their counts.

import pandas as pd
 
colors = pd.Series(["red", "blue", "red", "green", "blue", "red", "blue", "blue"])
print(colors.value_counts())

Output:

blue     4
red      3
green    1
Name: count, dtype: int64

The result tells you blue appeared 4 times, red 3 times, and green once. Simple, but the real power comes from its parameters.


Basic Usage on a Series

The most common pattern is calling value_counts() on a single DataFrame column.

import pandas as pd
 
df = pd.DataFrame({
    "product": ["Laptop", "Phone", "Tablet", "Phone", "Laptop", "Phone", "Laptop", "Tablet", "Phone", "Laptop"],
    "region": ["East", "West", "East", "East", "West", "East", "East", "West", "West", "East"],
    "revenue": [1200, 800, 450, 850, 1150, 900, 1300, 500, 780, 1100],
})
 
# Count products
print(df["product"].value_counts())

Output:

Laptop    4
Phone     4
Tablet    2
Name: count, dtype: int64

You can also call it directly on the DataFrame index or on the result of any operation that returns a Series.


The Complete Parameter Reference

Here is every parameter value_counts() accepts, with a description and default value.

ParameterTypeDefaultDescription
normalizeboolFalseReturn proportions (0-1) instead of raw counts.
sortboolTrueSort by frequency. False keeps the order values first appeared.
ascendingboolFalseSort in ascending order (least frequent first).
binsintNoneGroup continuous data into equal-width bins (works on numeric Series only).
dropnaboolTrueExclude NaN values from the count. Set False to include them.

Each parameter is covered with examples in the sections below.


Percentage Distribution with normalize=True

Raw counts are useful, but stakeholders often want percentages. The normalize parameter converts counts into proportions that sum to 1.0.

print(df["product"].value_counts(normalize=True))

Output:

Laptop    0.4
Phone     0.4
Tablet    0.2
Name: proportion, dtype: float64

To display as readable percentages:

pcts = df["product"].value_counts(normalize=True) * 100
print(pcts.round(1))

Output:

Laptop    40.0
Phone     40.0
Tablet    20.0
Name: proportion, dtype: float64

This is one of the fastest ways to build a distribution summary during EDA.


Controlling Sort Order

By default, value_counts() sorts results from most frequent to least frequent. You can change this behavior with sort and ascending. For more advanced sorting options on DataFrames, see the pandas sort_values guide.

# Ascending order (least frequent first)
print(df["product"].value_counts(ascending=True))

Output:

Tablet    2
Laptop    4
Phone     4
Name: count, dtype: int64
# Preserve original order of appearance
print(df["product"].value_counts(sort=False))

This is helpful when your categories have a natural order (e.g., rating scales from 1-5) and you want to preserve it.


Binning Continuous Data with bins

One of the most underused features. When you have numeric data -- ages, prices, scores -- the bins parameter groups values into equal-width intervals and counts how many fall into each bin.

ages = pd.Series([22, 35, 28, 45, 19, 52, 38, 41, 29, 33, 27, 60, 48, 31, 24])
 
print(ages.value_counts(bins=4, sort=False))

Output:

(18.959, 29.25]    6
(29.25, 39.5]      4
(39.5, 49.75]      3
(49.75, 60.0]      2
Name: count, dtype: int64

This gives you an instant histogram without any plotting code. Each interval shows how many values fall within that range.

# Combine bins with normalize for percentage distribution
print(ages.value_counts(bins=5, normalize=True).round(3))

For finer control over bin edges, use pd.cut() before calling value_counts():

bins_custom = [0, 18, 30, 45, 65, 100]
labels = ["Under 18", "18-30", "31-45", "46-65", "65+"]
 
age_groups = pd.cut(ages, bins=bins_custom, labels=labels)
print(age_groups.value_counts())

Output:

18-30      6
31-45      5
46-65      3
Under 18   1
65+        0
Name: count, dtype: int64

Handling Missing Values with dropna

By default, value_counts() skips NaN values. Set dropna=False to include them in the count.

responses = pd.Series(["Yes", "No", "Yes", None, "Yes", "No", None, "Maybe", None])
 
# Default behavior: NaN excluded
print(responses.value_counts())

Output:

Yes      3
No       2
Maybe    1
Name: count, dtype: int64
# Include NaN in the count
print(responses.value_counts(dropna=False))

Output:

Yes      3
NaN      3
No       2
Maybe    1
Name: count, dtype: int64

This is critical for data quality checks. If NaN appears as the top value, you know your data has a significant missingness problem that needs handling before analysis. You can then filter rows to isolate or remove these missing entries.


value_counts() on a DataFrame (Pandas 1.1+)

Starting with Pandas 1.1, you can call value_counts() directly on a DataFrame. It counts unique combinations of values across all columns (or a subset).

print(df[["product", "region"]].value_counts())

Output:

product  region
Laptop   East      3
Phone    East      2
         West      2
Tablet   East      1
         West      1
Laptop   West      1
Name: count, dtype: int64

This is essentially a multi-dimensional frequency table. It shows, for example, that 3 Laptops were sold in the East region while only 1 was sold in the West.

# Convert to a tidy DataFrame for further analysis
combo_counts = (
    df[["product", "region"]]
    .value_counts()
    .reset_index(name="count")
)
print(combo_counts)

This gives you a clean DataFrame with product, region, and count columns -- ready for merging, filtering, or plotting.


Counting with groupby + value_counts

When you need frequency counts within groups, combine groupby() with value_counts().

survey = pd.DataFrame({
    "department": ["Sales", "Sales", "Sales", "Engineering", "Engineering", "Engineering", "HR", "HR"],
    "satisfaction": ["High", "Medium", "High", "Low", "High", "Medium", "High", "High"],
})
 
print(survey.groupby("department")["satisfaction"].value_counts())

Output:

department   satisfaction
Engineering  High            1
             Low             1
             Medium          1
HR           High            2
Sales        High            2
             Medium          1
Name: count, dtype: int64

To get proportions within each group (so each department sums to 1.0):

print(survey.groupby("department")["satisfaction"].value_counts(normalize=True))

This pattern is invaluable for comparing distributions across segments -- satisfaction by department, error types by service, purchase categories by customer tier. If you find duplicate entries skewing your counts, consider using drop_duplicates() before running value_counts().


Cross-Tabulation with pd.crosstab

For two-dimensional frequency tables, pd.crosstab() is often cleaner than value_counts() on a DataFrame.

print(pd.crosstab(df["product"], df["region"]))

Output:

region   East  West
product
Laptop      3     1
Phone       2     2
Tablet      1     1

Add margins for row and column totals:

print(pd.crosstab(df["product"], df["region"], margins=True))

Add normalization:

print(pd.crosstab(df["product"], df["region"], normalize="index").round(2))

Use value_counts() when you need a flat frequency list. Use pd.crosstab() when you want a matrix view of two variables side by side. See the Pandas Crosstab guide for deeper coverage.


Combining value_counts with Other Methods

Convert to DataFrame with reset_index

counts_df = df["product"].value_counts().reset_index()
counts_df.columns = ["product", "count"]
print(counts_df)

Output:

  product  count
0  Laptop      4
1   Phone      4
2  Tablet      2

Now you have a standard DataFrame you can merge, filter, or export.

Quick Bar Chart with plot

df["product"].value_counts().plot(kind="bar", title="Product Frequency")

Or a horizontal bar chart for long category names:

df["product"].value_counts().plot(kind="barh")

Filter by Count Threshold

# Keep only values that appear at least 3 times
counts = df["product"].value_counts()
frequent = counts[counts >= 3]
print(frequent)

Map Counts Back to the Original DataFrame

df["product_freq"] = df["product"].map(df["product"].value_counts())
print(df[["product", "product_freq"]])

This adds a frequency column to every row, useful for feature engineering in machine learning.


Real-World Examples

Example 1: Survey Data Analysis

import numpy as np
 
np.random.seed(42)
survey_data = pd.DataFrame({
    "age": np.random.randint(18, 65, 500),
    "rating": np.random.choice(["Excellent", "Good", "Average", "Poor", "Terrible"], 500,
                                p=[0.1, 0.3, 0.35, 0.15, 0.1]),
    "channel": np.random.choice(["Web", "Mobile", "In-Store", None], 500,
                                 p=[0.35, 0.30, 0.25, 0.10]),
})
 
# Rating distribution
print("=== Rating Distribution ===")
print(survey_data["rating"].value_counts())
print()
 
# Channel distribution including missing
print("=== Channel (including unknown) ===")
print(survey_data["channel"].value_counts(dropna=False))
print()
 
# Age distribution in 5 bins
print("=== Age Groups ===")
print(survey_data["age"].value_counts(bins=5, sort=False))

Example 2: Log Analysis

logs = pd.DataFrame({
    "status_code": [200, 200, 404, 200, 500, 301, 200, 404, 200, 500, 200, 200, 301, 200, 403],
    "method": ["GET", "POST", "GET", "GET", "POST", "GET", "GET", "GET", "POST", "GET", "GET", "GET", "POST", "GET", "GET"],
    "endpoint": ["/api/users", "/api/login", "/api/old", "/api/users", "/api/upload",
                 "/old-page", "/api/users", "/api/missing", "/api/login", "/api/crash",
                 "/", "/api/users", "/old-page", "/", "/admin"],
})
 
# Error rate overview
print("=== Status Code Distribution ===")
print(logs["status_code"].value_counts(normalize=True).round(3))
print()
 
# Errors by method
print("=== Error Codes by HTTP Method ===")
error_logs = logs[logs["status_code"] >= 400]
print(error_logs.groupby("method")["status_code"].value_counts())

Example 3: EDA on E-commerce Data

np.random.seed(0)
orders = pd.DataFrame({
    "category": np.random.choice(["Electronics", "Clothing", "Books", "Home", "Sports"], 1000),
    "price": np.random.exponential(50, 1000).round(2),
    "payment": np.random.choice(["Credit Card", "PayPal", "Apple Pay", "Cash"], 1000,
                                 p=[0.45, 0.25, 0.20, 0.10]),
})
 
# Top categories
print(orders["category"].value_counts().head(3))
print()
 
# Price ranges
print(orders["price"].value_counts(bins=6, sort=False))
print()
 
# Category x Payment breakdown
print(pd.crosstab(orders["category"], orders["payment"], normalize="index").round(2))

Performance Tips for Large Datasets

When working with millions of rows, value_counts() is already fast because it leverages optimized C code under the hood. But there are ways to make it even faster.

TipExplanation
Use Categorical dtypeConverting string columns to pd.Categorical before calling value_counts() can speed things up significantly and reduce memory.
Avoid chaining unnecessary operationsCall value_counts() directly instead of .unique() + manual counting.
Use .head(n) on resultsIf you only need the top N values, slicing the result avoids unnecessary display overhead.
Use sort=False when order doesn't matterSkipping the sort step saves time on very large result sets.
# Convert to categorical for faster counting
df["product"] = df["product"].astype("category")
print(df["product"].value_counts())
# Only top 5 from a column with thousands of unique values
print(df["product"].value_counts().head(5))

For datasets that exceed memory, consider chunked reading with pd.read_csv(chunksize=...) and aggregating value_counts() results across chunks:

from collections import Counter
 
total_counts = Counter()
for chunk in pd.read_csv("large_file.csv", chunksize=100_000):
    total_counts.update(chunk["category"].value_counts().to_dict())
 
result = pd.Series(total_counts).sort_values(ascending=False)
print(result.head(10))

Visualize Frequency Distributions with PyGWalker

After running value_counts(), you typically want to explore the results visually -- bar charts, pie charts, or interactive filtering across multiple dimensions. Instead of writing separate matplotlib code for each view, you can use PyGWalker (opens in a new tab), an open-source Python library that turns any Pandas DataFrame into an interactive, Tableau-like visual exploration interface right inside Jupyter Notebook.

import pygwalker as pyg
 
# Get your frequency table as a DataFrame
freq_df = df["product"].value_counts().reset_index()
freq_df.columns = ["product", "count"]
 
# Launch interactive visualization
walker = pyg.walk(freq_df)

With PyGWalker, you can drag product to the x-axis and count to the y-axis to create a bar chart instantly. Switch to a pie chart, add filters, or layer in additional dimensions -- all without writing plotting code. This is especially useful when exploring value_counts() results across multiple categorical columns during EDA.

Try PyGWalker in Google Colab (opens in a new tab) or install with pip install pygwalker.


Related Guides


FAQ

What does value_counts() do in pandas?

value_counts() returns a Series containing the count of each unique value in a column (or Series), sorted from most to least frequent by default. It is the standard way to build frequency tables for categorical or discrete data in Pandas.

How do I get percentages from value_counts()?

Pass normalize=True to value_counts(). This returns proportions (values between 0 and 1) instead of raw counts. Multiply by 100 to convert to percentage format: df["col"].value_counts(normalize=True) * 100.

Can I use value_counts() on a DataFrame?

Yes, since Pandas 1.1. Calling df[["col_a", "col_b"]].value_counts() counts unique combinations of values across the selected columns and returns a multi-index Series sorted by frequency.

How do I count NaN values with value_counts()?

Set dropna=False: df["col"].value_counts(dropna=False). By default, value_counts() excludes NaN values. With dropna=False, missing values appear in the result alongside regular values.

How do I bin numeric data with value_counts()?

Pass an integer to the bins parameter: df["price"].value_counts(bins=5). This groups numeric values into 5 equal-width intervals and counts how many values fall into each bin. For custom bin edges, use pd.cut() first.

What is the difference between value_counts() and groupby().size()?

value_counts() on a single column returns the same result as df.groupby("col").size().sort_values(ascending=False). However, value_counts() is more concise for simple frequency counts. Use groupby() when you need to count within groups or combine counting with other aggregations.

How do I convert value_counts() output to a DataFrame?

Call .reset_index() on the result: df["col"].value_counts().reset_index(). This gives you a DataFrame with two columns: the original values and their counts. You can then rename the columns as needed.


Key Takeaways

  • value_counts() is the fastest way to build frequency tables in Pandas -- one method call replaces loops and manual counting.
  • Use normalize=True for proportions, bins for numeric data, and dropna=False to surface missing values.
  • Since Pandas 1.1, call value_counts() on a DataFrame to count unique row combinations across multiple columns.
  • Combine with groupby() for within-group frequency analysis and pd.crosstab() for two-dimensional frequency matrices.
  • Chain with .reset_index() to get a clean DataFrame and .plot(kind="bar") for quick visualizations.
  • For interactive exploration of frequency results, PyGWalker (opens in a new tab) turns any DataFrame into a drag-and-drop visual interface inside Jupyter.
📚