Skip to content

Sorting Pandas DataFrame by Index

Updated on

One of the most powerful features of Pandas is its ability to handle and manipulate large amounts of data with ease. In this tutorial, we will be discussing one of the fundamental methods in Pandas - the sort_index() method. With this method, we can sort a Pandas DataFrame by its index, whether it is numerical or string-based. By the end of this tutorial, you will have a solid understanding of how to use the sort_index() method to sort your data and improve your data manipulation skills.

But before we dive into the sort_index() method, let's talk briefly about what a Pandas DataFrame is.

Want to quickly create Data Visualizations in Python?

PyGWalker is an Open Source Python Project that can help speed up the data analysis and visualization workflow directly within a Jupyter Notebook-based environments.

PyGWalker (opens in a new tab) turns your Pandas Dataframe (or Polars Dataframe) into a visual UI where you can drag and drop variables to create graphs with ease. Simply use the following code:

pip install pygwalker
import pygwalker as pyg
gwalker = pyg.walk(df)

You can run PyGWalker right now with these online notebooks:

And, don't forget to give us a ⭐️ on GitHub!

Run PyGWalker in Kaggle Notebook (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)Give PyGWalker a ⭐️ on GitHub (opens in a new tab)
Run PyGWalker in Kaggle Notebook (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)

What is a Pandas DataFrame?

A Pandas DataFrame is a two-dimensional table that has labeled rows and columns. It is similar to a spreadsheet or a SQL table. In a DataFrame, the rows represent observations or records, while the columns represent variables or features.

Pandas is built on top of NumPy, which means that it is incredibly fast at handling and manipulating large datasets. It also provides built-in methods for data cleaning, data manipulation, and data visualization.

Now that we have a basic understanding of a Pandas DataFrame let's move on to the sort_index() method.

Sorting Pandas DataFrame by Index

The sort_index() method is used to sort a Pandas DataFrame by its index. The index of a DataFrame is like the row number in a spreadsheet. It identifies each row in the DataFrame.

Let's take a look at an example.

import pandas as pd 
 
# create a dictionary 
data = {'name': ['John', 'Mark', 'Sara', 'Anna', 'Paul'],
       'age': [24, 34, 21, 19, 26],
       'city': ['New York', 'Paris', 'London', 'Berlin', 'San Francisco']}
 
# create a DataFrame 
df = pd.DataFrame(data, index=['b', 'a', 'd', 'c', 'e'])
 
# sort the DataFrame by index 
df = df.sort_index()
print(df)

Output:

    name  age           city
a   Mark   34          Paris
b   John   24       New York
c   Anna   19         Berlin
d   Sara   21         London
e   Paul   26  San Francisco

In the above example, we have created a dictionary data with three keys name, age, and city. We have then used this dictionary to create a DataFrame df with the specified index.

After creating the DataFrame, we have used the sort_index() method to sort the DataFrame by its index. As you can see, the sort_index() method sorts the DataFrame by the index in ascending order.

If we want to sort the index in descending order, we can use the sort_index(ascending=False) method.

# sort the DataFrame by index in descending order
df = df.sort_index(ascending=False)
print(df)

Output:

    name  age           city
e   Paul   26  San Francisco
d   Sara   21         London
c   Anna   19         Berlin
b   John   24       New York
a   Mark   34          Paris

As you can see, the sort_index(ascending=False) method sorts the DataFrame in descending order.

Sorting Pandas Series by Index

A Pandas Series is a one-dimensional labeled array. It is similar to a column in a spreadsheet. Like a DataFrame, a Series also has an index.

To sort a Pandas Series by its index, we can use the sort_index() method as well.

import pandas as pd 
 
# create a dictionary 
data = {'name': ['John', 'Mark', 'Sara', 'Anna', 'Paul'],
       'age': [24, 34, 21, 19, 26],
       'city': ['New York', 'Paris', 'London', 'Berlin', 'San Francisco']}
 
# create a DataFrame 
df = pd.DataFrame(data, index=['b', 'a', 'd', 'c', 'e'])
 
# select a Series from the DataFrame
s = df['name']
 
# sort the Series by its index 
s = s.sort_index()
print(s)

Output:

a     Mark
b     John
c     Anna
d     Sara
e     Paul
Name: name, dtype: object

In the above code, we have first created a DataFrame df with a specified index. The s variable then selects the name column from the DataFrame as a Series. We can then sort the Series by its index using the sort_index() method.

Conclusion

In this tutorial, we have learned how to use the sort_index() method to sort a Pandas DataFrame or Series by its index. This is a powerful method that can help us clean and manipulate large datasets with ease. We hope you found this tutorial helpful and informative.