Skip to content

Using DataFrame.loc to Access and Manipulate Data in Pandas

Updated on

and manipulation are integral to any data science project, and the Pandas library is one of the most popular tools used for these tasks. Within Pandas, the loc[] method is often used to access and filter data in a DataFrame by label or boolean array. In this article, we will dive into the syntax and examples of using Pandas DataFrame loc[] and explore its advantages over other methods.

Want to quickly create Data Visualizations in Python?

PyGWalker is an Open Source Python Project that can help speed up the data analysis and visualization workflow directly within a Jupyter Notebook-based environments.

PyGWalker (opens in a new tab) turns your Pandas Dataframe (or Polars Dataframe) into a visual UI where you can drag and drop variables to create graphs with ease. Simply use the following code:

pip install pygwalker
import pygwalker as pyg
gwalker = pyg.walk(df)

You can run PyGWalker right now with these online notebooks:

And, don't forget to give us a ⭐️ on GitHub!

Run PyGWalker in Kaggle Notebook (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)Give PyGWalker a ⭐️ on GitHub (opens in a new tab)
Run PyGWalker in Kaggle Notebook (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)

What is Pandas DataFrame loc[]?

Pandas DataFrame loc[] is a method for selecting and filtering data in a Pandas DataFrame. The loc[] method takes one or two arguments, which can be either a row label or a boolean array. When a row label is provided, loc[] returns a DataFrame containing the row(s) that match the label. When a boolean array is provided, loc[] returns a DataFrame containing the row(s) where the boolean array is True.

How do you use Pandas DataFrame loc[]?

Accessing rows and columns by label(s) using Pandas DataFrame loc[] is quite straightforward. Here's an example:

import pandas as pd
 
# Creating a sample DataFrame
df = pd.DataFrame({'Age': [23, 24, 25, 26], 
                   'Name': ['John', 'Mike', 'Sarah', 'Rachel'], 
                   'Marks': [85, 90, 80, 95], 
                   'ID': ['A101', 'A102', 'A103', 'A104']})
 
# Accessing a row using the row label
row = df.loc[1]
 
# Accessing multiple rows using a list of row labels
rows = df.loc[[0, 2]]
 
# Accessing a column using the column label
ages = df.loc[:, 'Age']
 
# Accessing multiple columns using a list of column labels
subset = df.loc[:, ['Name', 'Marks']]

In the above example, we have created a sample DataFrame with four rows and four columns. We use loc[] to access rows and columns by label(s). When we provide a single label to loc[], it returns a pandas Series, and when multiple labels are provided, it returns a DataFrame.

We can also filter rows based on a given condition using loc[]. Here is an example:

# Filter rows based on a condition
filtered_df = df.loc[df['Age'] > 24]

In the above example, we are using loc[] to filter rows where the Age column is greater than 24.

What are the advantages of using Pandas DataFrame loc[]?

One of the major advantages of using Pandas DataFrame loc[] is its ability to handle label-based indexing. Since label-based indexing is more intuitive and readable than integer-based indexing, using loc[] can make your code more expressive and less error-prone.

Another advantage of loc[] is that you can also use it to assign new values to a subset of the DataFrame.

# Change values for specific rows
df.loc[0:1, 'Age'] = 24

In the above example, we are using loc[] to change the Age value for the first two rows of the DataFrame.

How is Pandas DataFrame loc[] different from Pandas iloc[]?

Pandas iloc[] is similar to loc[], but instead of label-based indexing, iloc[] uses integer-based indexing. Here's an example:

# Access the first row using iloc[]
df.iloc[0]
 
# Access rows and columns using integer position
df.iloc[0:2, 1:3]

In the first example, we are accessing the first row of the DataFrame using iloc[]. In the second example, we use iloc[] to access a subset of the DataFrame using integer positions.

While iloc[] is faster than loc[], it is less expressive and can lead to errors if the DataFrame is modified. Moreover, since iloc[] uses integer positions, it can become confusing if the DataFrame is sorted or modified, whereas loc[] is more robust to such changes.

Can you select/filter rows and columns by names/labels using Pandas DataFrame loc[]?

Yes, you can select/filter rows and columns by labels using Pandas DataFrame loc[]. Here's an example:

# Filter rows using column label and condition
filtered_df = df.loc[df['Name'] == 'Mike']
 
# Access a subset of rows and columns using labels
subset_df = df.loc[0:1, ['Name', 'Age']]

In the first example, we are using loc[] to filter rows based on a condition column using a column label. In the second example, we are accessing a subset of rows and columns using loc[] and column labels.

Conclusion

In this article, we have explored the Pandas DataFrame loc[] method, its syntax and examples, and its advantages over other methods. We have shown how loc[] can be used to access and filter data based on label or boolean array, and how it is different from integer-based indexing using iloc[]. By using Pandas DataFrame loc[], you can write more expressive and robust code for data analysis and manipulation in Python.