Pandas value_counts(): Count Unique Values Like a Pro
Updated on
You have a column with thousands of entries -- product categories, status codes, survey responses -- and you need to know what values appear, how often, and in what proportion. Writing loops or manual counting logic wastes time and invites bugs. Pandas value_counts() solves this in a single call, but most tutorials only scratch the surface of what it can do.
The method handles far more than simple frequency tables. It bins continuous data into ranges, computes percentage distributions, deals with missing values, and works across entire DataFrames. Yet many analysts stick to the bare default call and miss the parameters that make value_counts() a serious exploratory data analysis (EDA) tool.
This guide covers every parameter, walks through real-world scenarios, and shows how to combine value_counts() with groupby, crosstab, plotting, and other Pandas methods to get clean, actionable frequency analyses.
What value_counts() Returns
value_counts() returns a Series containing counts of unique values, sorted in descending order by default. The index holds the unique values, and the data holds their counts.
import pandas as pd
colors = pd.Series(["red", "blue", "red", "green", "blue", "red", "blue", "blue"])
print(colors.value_counts())Output:
blue 4
red 3
green 1
Name: count, dtype: int64The result tells you blue appeared 4 times, red 3 times, and green once. Simple, but the real power comes from its parameters.
Basic Usage on a Series
The most common pattern is calling value_counts() on a single DataFrame column.
import pandas as pd
df = pd.DataFrame({
"product": ["Laptop", "Phone", "Tablet", "Phone", "Laptop", "Phone", "Laptop", "Tablet", "Phone", "Laptop"],
"region": ["East", "West", "East", "East", "West", "East", "East", "West", "West", "East"],
"revenue": [1200, 800, 450, 850, 1150, 900, 1300, 500, 780, 1100],
})
# Count products
print(df["product"].value_counts())Output:
Laptop 4
Phone 4
Tablet 2
Name: count, dtype: int64You can also call it directly on the DataFrame index or on the result of any operation that returns a Series.
The Complete Parameter Reference
Here is every parameter value_counts() accepts, with a description and default value.
| Parameter | Type | Default | Description |
|---|---|---|---|
normalize | bool | False | Return proportions (0-1) instead of raw counts. |
sort | bool | True | Sort by frequency. False keeps the order values first appeared. |
ascending | bool | False | Sort in ascending order (least frequent first). |
bins | int | None | Group continuous data into equal-width bins (works on numeric Series only). |
dropna | bool | True | Exclude NaN values from the count. Set False to include them. |
Each parameter is covered with examples in the sections below.
Percentage Distribution with normalize=True
Raw counts are useful, but stakeholders often want percentages. The normalize parameter converts counts into proportions that sum to 1.0.
print(df["product"].value_counts(normalize=True))Output:
Laptop 0.4
Phone 0.4
Tablet 0.2
Name: proportion, dtype: float64To display as readable percentages:
pcts = df["product"].value_counts(normalize=True) * 100
print(pcts.round(1))Output:
Laptop 40.0
Phone 40.0
Tablet 20.0
Name: proportion, dtype: float64This is one of the fastest ways to build a distribution summary during EDA.
Controlling Sort Order
By default, value_counts() sorts results from most frequent to least frequent. You can change this behavior with sort and ascending. For more advanced sorting options on DataFrames, see the pandas sort_values guide.
# Ascending order (least frequent first)
print(df["product"].value_counts(ascending=True))Output:
Tablet 2
Laptop 4
Phone 4
Name: count, dtype: int64# Preserve original order of appearance
print(df["product"].value_counts(sort=False))This is helpful when your categories have a natural order (e.g., rating scales from 1-5) and you want to preserve it.
Binning Continuous Data with bins
One of the most underused features. When you have numeric data -- ages, prices, scores -- the bins parameter groups values into equal-width intervals and counts how many fall into each bin.
ages = pd.Series([22, 35, 28, 45, 19, 52, 38, 41, 29, 33, 27, 60, 48, 31, 24])
print(ages.value_counts(bins=4, sort=False))Output:
(18.959, 29.25] 6
(29.25, 39.5] 4
(39.5, 49.75] 3
(49.75, 60.0] 2
Name: count, dtype: int64This gives you an instant histogram without any plotting code. Each interval shows how many values fall within that range.
# Combine bins with normalize for percentage distribution
print(ages.value_counts(bins=5, normalize=True).round(3))For finer control over bin edges, use pd.cut() before calling value_counts():
bins_custom = [0, 18, 30, 45, 65, 100]
labels = ["Under 18", "18-30", "31-45", "46-65", "65+"]
age_groups = pd.cut(ages, bins=bins_custom, labels=labels)
print(age_groups.value_counts())Output:
18-30 6
31-45 5
46-65 3
Under 18 1
65+ 0
Name: count, dtype: int64Handling Missing Values with dropna
By default, value_counts() skips NaN values. Set dropna=False to include them in the count.
responses = pd.Series(["Yes", "No", "Yes", None, "Yes", "No", None, "Maybe", None])
# Default behavior: NaN excluded
print(responses.value_counts())Output:
Yes 3
No 2
Maybe 1
Name: count, dtype: int64# Include NaN in the count
print(responses.value_counts(dropna=False))Output:
Yes 3
NaN 3
No 2
Maybe 1
Name: count, dtype: int64This is critical for data quality checks. If NaN appears as the top value, you know your data has a significant missingness problem that needs handling before analysis. You can then filter rows to isolate or remove these missing entries.
value_counts() on a DataFrame (Pandas 1.1+)
Starting with Pandas 1.1, you can call value_counts() directly on a DataFrame. It counts unique combinations of values across all columns (or a subset).
print(df[["product", "region"]].value_counts())Output:
product region
Laptop East 3
Phone East 2
West 2
Tablet East 1
West 1
Laptop West 1
Name: count, dtype: int64This is essentially a multi-dimensional frequency table. It shows, for example, that 3 Laptops were sold in the East region while only 1 was sold in the West.
# Convert to a tidy DataFrame for further analysis
combo_counts = (
df[["product", "region"]]
.value_counts()
.reset_index(name="count")
)
print(combo_counts)This gives you a clean DataFrame with product, region, and count columns -- ready for merging, filtering, or plotting.
Counting with groupby + value_counts
When you need frequency counts within groups, combine groupby() with value_counts().
survey = pd.DataFrame({
"department": ["Sales", "Sales", "Sales", "Engineering", "Engineering", "Engineering", "HR", "HR"],
"satisfaction": ["High", "Medium", "High", "Low", "High", "Medium", "High", "High"],
})
print(survey.groupby("department")["satisfaction"].value_counts())Output:
department satisfaction
Engineering High 1
Low 1
Medium 1
HR High 2
Sales High 2
Medium 1
Name: count, dtype: int64To get proportions within each group (so each department sums to 1.0):
print(survey.groupby("department")["satisfaction"].value_counts(normalize=True))This pattern is invaluable for comparing distributions across segments -- satisfaction by department, error types by service, purchase categories by customer tier. If you find duplicate entries skewing your counts, consider using drop_duplicates() before running value_counts().
Cross-Tabulation with pd.crosstab
For two-dimensional frequency tables, pd.crosstab() is often cleaner than value_counts() on a DataFrame.
print(pd.crosstab(df["product"], df["region"]))Output:
region East West
product
Laptop 3 1
Phone 2 2
Tablet 1 1Add margins for row and column totals:
print(pd.crosstab(df["product"], df["region"], margins=True))Add normalization:
print(pd.crosstab(df["product"], df["region"], normalize="index").round(2))Use value_counts() when you need a flat frequency list. Use pd.crosstab() when you want a matrix view of two variables side by side. See the Pandas Crosstab guide for deeper coverage.
Combining value_counts with Other Methods
Convert to DataFrame with reset_index
counts_df = df["product"].value_counts().reset_index()
counts_df.columns = ["product", "count"]
print(counts_df)Output:
product count
0 Laptop 4
1 Phone 4
2 Tablet 2Now you have a standard DataFrame you can merge, filter, or export.
Quick Bar Chart with plot
df["product"].value_counts().plot(kind="bar", title="Product Frequency")Or a horizontal bar chart for long category names:
df["product"].value_counts().plot(kind="barh")Filter by Count Threshold
# Keep only values that appear at least 3 times
counts = df["product"].value_counts()
frequent = counts[counts >= 3]
print(frequent)Map Counts Back to the Original DataFrame
df["product_freq"] = df["product"].map(df["product"].value_counts())
print(df[["product", "product_freq"]])This adds a frequency column to every row, useful for feature engineering in machine learning.
Real-World Examples
Example 1: Survey Data Analysis
import numpy as np
np.random.seed(42)
survey_data = pd.DataFrame({
"age": np.random.randint(18, 65, 500),
"rating": np.random.choice(["Excellent", "Good", "Average", "Poor", "Terrible"], 500,
p=[0.1, 0.3, 0.35, 0.15, 0.1]),
"channel": np.random.choice(["Web", "Mobile", "In-Store", None], 500,
p=[0.35, 0.30, 0.25, 0.10]),
})
# Rating distribution
print("=== Rating Distribution ===")
print(survey_data["rating"].value_counts())
print()
# Channel distribution including missing
print("=== Channel (including unknown) ===")
print(survey_data["channel"].value_counts(dropna=False))
print()
# Age distribution in 5 bins
print("=== Age Groups ===")
print(survey_data["age"].value_counts(bins=5, sort=False))Example 2: Log Analysis
logs = pd.DataFrame({
"status_code": [200, 200, 404, 200, 500, 301, 200, 404, 200, 500, 200, 200, 301, 200, 403],
"method": ["GET", "POST", "GET", "GET", "POST", "GET", "GET", "GET", "POST", "GET", "GET", "GET", "POST", "GET", "GET"],
"endpoint": ["/api/users", "/api/login", "/api/old", "/api/users", "/api/upload",
"/old-page", "/api/users", "/api/missing", "/api/login", "/api/crash",
"/", "/api/users", "/old-page", "/", "/admin"],
})
# Error rate overview
print("=== Status Code Distribution ===")
print(logs["status_code"].value_counts(normalize=True).round(3))
print()
# Errors by method
print("=== Error Codes by HTTP Method ===")
error_logs = logs[logs["status_code"] >= 400]
print(error_logs.groupby("method")["status_code"].value_counts())Example 3: EDA on E-commerce Data
np.random.seed(0)
orders = pd.DataFrame({
"category": np.random.choice(["Electronics", "Clothing", "Books", "Home", "Sports"], 1000),
"price": np.random.exponential(50, 1000).round(2),
"payment": np.random.choice(["Credit Card", "PayPal", "Apple Pay", "Cash"], 1000,
p=[0.45, 0.25, 0.20, 0.10]),
})
# Top categories
print(orders["category"].value_counts().head(3))
print()
# Price ranges
print(orders["price"].value_counts(bins=6, sort=False))
print()
# Category x Payment breakdown
print(pd.crosstab(orders["category"], orders["payment"], normalize="index").round(2))Performance Tips for Large Datasets
When working with millions of rows, value_counts() is already fast because it leverages optimized C code under the hood. But there are ways to make it even faster.
| Tip | Explanation |
|---|---|
Use Categorical dtype | Converting string columns to pd.Categorical before calling value_counts() can speed things up significantly and reduce memory. |
| Avoid chaining unnecessary operations | Call value_counts() directly instead of .unique() + manual counting. |
Use .head(n) on results | If you only need the top N values, slicing the result avoids unnecessary display overhead. |
Use sort=False when order doesn't matter | Skipping the sort step saves time on very large result sets. |
# Convert to categorical for faster counting
df["product"] = df["product"].astype("category")
print(df["product"].value_counts())# Only top 5 from a column with thousands of unique values
print(df["product"].value_counts().head(5))For datasets that exceed memory, consider chunked reading with pd.read_csv(chunksize=...) and aggregating value_counts() results across chunks:
from collections import Counter
total_counts = Counter()
for chunk in pd.read_csv("large_file.csv", chunksize=100_000):
total_counts.update(chunk["category"].value_counts().to_dict())
result = pd.Series(total_counts).sort_values(ascending=False)
print(result.head(10))Visualize Frequency Distributions with PyGWalker
After running value_counts(), you typically want to explore the results visually -- bar charts, pie charts, or interactive filtering across multiple dimensions. Instead of writing separate matplotlib code for each view, you can use PyGWalker (opens in a new tab), an open-source Python library that turns any Pandas DataFrame into an interactive, Tableau-like visual exploration interface right inside Jupyter Notebook.
import pygwalker as pyg
# Get your frequency table as a DataFrame
freq_df = df["product"].value_counts().reset_index()
freq_df.columns = ["product", "count"]
# Launch interactive visualization
walker = pyg.walk(freq_df)With PyGWalker, you can drag product to the x-axis and count to the y-axis to create a bar chart instantly. Switch to a pie chart, add filters, or layer in additional dimensions -- all without writing plotting code. This is especially useful when exploring value_counts() results across multiple categorical columns during EDA.
Try PyGWalker in Google Colab (opens in a new tab) or install with
pip install pygwalker.
Related Guides
- Pandas GroupBy: Aggregation, Transform, Apply
- Pandas Crosstab
- Pandas Drop Duplicates
- Pandas Filter Rows: Select Data by Condition
- Pandas sort_values(): Sort DataFrames
- Pandas Data Cleaning
FAQ
What does value_counts() do in pandas?
value_counts() returns a Series containing the count of each unique value in a column (or Series), sorted from most to least frequent by default. It is the standard way to build frequency tables for categorical or discrete data in Pandas.
How do I get percentages from value_counts()?
Pass normalize=True to value_counts(). This returns proportions (values between 0 and 1) instead of raw counts. Multiply by 100 to convert to percentage format: df["col"].value_counts(normalize=True) * 100.
Can I use value_counts() on a DataFrame?
Yes, since Pandas 1.1. Calling df[["col_a", "col_b"]].value_counts() counts unique combinations of values across the selected columns and returns a multi-index Series sorted by frequency.
How do I count NaN values with value_counts()?
Set dropna=False: df["col"].value_counts(dropna=False). By default, value_counts() excludes NaN values. With dropna=False, missing values appear in the result alongside regular values.
How do I bin numeric data with value_counts()?
Pass an integer to the bins parameter: df["price"].value_counts(bins=5). This groups numeric values into 5 equal-width intervals and counts how many values fall into each bin. For custom bin edges, use pd.cut() first.
What is the difference between value_counts() and groupby().size()?
value_counts() on a single column returns the same result as df.groupby("col").size().sort_values(ascending=False). However, value_counts() is more concise for simple frequency counts. Use groupby() when you need to count within groups or combine counting with other aggregations.
How do I convert value_counts() output to a DataFrame?
Call .reset_index() on the result: df["col"].value_counts().reset_index(). This gives you a DataFrame with two columns: the original values and their counts. You can then rename the columns as needed.
Key Takeaways
value_counts()is the fastest way to build frequency tables in Pandas -- one method call replaces loops and manual counting.- Use
normalize=Truefor proportions,binsfor numeric data, anddropna=Falseto surface missing values. - Since Pandas 1.1, call
value_counts()on a DataFrame to count unique row combinations across multiple columns. - Combine with
groupby()for within-group frequency analysis andpd.crosstab()for two-dimensional frequency matrices. - Chain with
.reset_index()to get a clean DataFrame and.plot(kind="bar")for quick visualizations. - For interactive exploration of frequency results, PyGWalker (opens in a new tab) turns any DataFrame into a drag-and-drop visual interface inside Jupyter.