Skip to content

Dictionary to DataFrame Conversion in Python Pandas

Updated on

As a Data Scientist, working with data is one of the core aspects of the job. One of the most common data structures used in Python for this purpose is the dictionary. A dictionary is a collection of key-value pairs, where each key is unique. Pandas is a popular Python library for data analysis and provides powerful capabilities for data manipulation. One of the most common tasks in data analysis is the conversion of a dictionary into a Pandas DataFrame. In this blog post, we will discuss the process of converting a dictionary to a DataFrame in Pandas.

Want to quickly create Data Visualizations in Python?

PyGWalker is an Open Source Python Project that can help speed up the data analysis and visualization workflow directly within a Jupyter Notebook-based environments.

PyGWalker (opens in a new tab) turns your Pandas Dataframe (or Polars Dataframe) into a visual UI where you can drag and drop variables to create graphs with ease. Simply use the following code:

pip install pygwalker
import pygwalker as pyg
gwalker = pyg.walk(df)

You can run PyGWalker right now with these online notebooks:

And, don't forget to give us a ⭐️ on GitHub!

Run PyGWalker in Kaggle Notebook (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)Give PyGWalker a ⭐️ on GitHub (opens in a new tab)
Run PyGWalker in Kaggle Notebook (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)Run PyGWalker in Google Colab (opens in a new tab)

What is a Dictionary?

In Python, a dictionary is a collection of key-value pairs. Each key is unique and corresponds to a value. Dictionaries are used to store and manipulate data that can be accessed using keys. Dictionaries in Python are defined using curly braces {} and can be nested.

What is a DataFrame?

A DataFrame is a two-dimensional table-like data structure in Pandas. It consists of rows and columns, where each column can contain data of a different type. DataFrames are an excellent way to analyze and manipulate data, and Pandas provides a wide array of functions to manipulate data in a DataFrame.

Converting a Dictionary to a DataFrame

Pandas provides a simple method for converting a dictionary to a DataFrame using the pd.DataFrame.from_dict() function. The from_dict() function takes a dictionary as its input and returns a DataFrame. The default behavior of this function assumes that the keys in the dictionary correspond to column names and the values correspond to row data.

Let's consider an example where we have a dictionary containing information about students, their grades, and their subjects:

student_data = {'name': ['Alice', 'Bob', 'Charlie'], 'grade': [95, 87, 92], 'subject': ['Math', 'English', 'Science']}

To convert this dictionary to a DataFrame, we simply use the from_dict() function:

import pandas as pd
 
df = pd.DataFrame.from_dict(student_data)
print(df)

The output of this code snippet will look like this:

       name  grade  subject
0     Alice     95     Math
1       Bob     87  English
2  Charlie     92  Science

As we can see, the dictionary keys (name, grade, and subject) were used as the column names of the resulting DataFrame, and the corresponding values were used as the row data.

Using the orient parameter

In cases where the dictionary is structured differently, we can use the orient parameter to specify how the DataFrame should be created. The orient parameter accepts several values, such as index, columns, split, and values. The default value is columns. Let's consider an example where we have a dictionary containing lists of different lengths:

data = {'name': ['Alice', 'Bob', 'Charlie'], 'grade': [95, 87], 'subject': ['Math', 'English', 'Science']}

If we try to convert this dictionary to a DataFrame using the default behavior, we will get a ValueError:

df = pd.DataFrame.from_dict(data)
ValueError: arrays must all be same length

To avoid this error, we can use the orient parameter with the value of index to create a DataFrame where the dictionary keys become the row indices and the corresponding values become the row data:

df = pd.DataFrame.from_dict(data, orient='index')
print(df)

The output of this code snippet will look like this:

            0     1        2
name    Alice   Bob  Charlie
grade      95    87     None
subject  Math  English  Science

Using a List of Dictionaries

Another way to create a DataFrame from a dictionary is by using a list of dictionaries. In this scenario, each dictionary in the list will correspond to a row in the resulting DataFrame, and the keys in the dictionary will correspond to the column names. Let's consider an example where we have a list of dictionaries representing students and their grades:

student_data = [{'name': 'Alice', 'grade': 95, 'subject': 'Math'},
                {'name': 'Bob', 'grade': 87, 'subject': 'English'},
                {'name': 'Charlie', 'grade': 92, 'subject': 'Science'}]

To convert this list of dictionaries to a DataFrame, we simply use the pd.DataFrame() function:

df = pd.DataFrame(student_data)
print(df)

The output of this code snippet will look like this:

       name  grade  subject
0     Alice     95     Math
1       Bob     87  English
2  Charlie     92  Science

As we can see, the resulting DataFrame is the same as the one created from the dictionary in the previous example.

Using Keys as Columns

By default, the from_dict() function uses the dictionary keys as the column names in the resulting DataFrame. In cases where we want to use a different set of keys, we can use the columns parameter. For example, if we have a dictionary with keys a, b, and c, but we want to use x, y, and z as the column names, we can do the following:

data = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}
df = pd.DataFrame.from_dict(data, columns=['x', 'y', 'z'])
print(df)

The output of this code snippet will look like this:

   x  y  z
0  1  4  7
1  2  5  8
2  3  6  9

Using a Tight Orientation

The from_dict() function can also be used to create a DataFrame from a dictionary with a tight orientation. A tight orientation means that each dictionary key contains a dictionary with the same set of keys. Consider the following example:

data = {'a': {'x': 1, 'y': 2, 'z': 3}, 'b': {'x': 4, 'y': 5, 'z': 6}, 'c': {'x': 7, 'y': 8, 'z': 9}}

To create a DataFrame from this dictionary with a tight orientation, we can use the orient parameter and set its value to index:

df = pd.DataFrame.from_dict(data, orient='index')
print(df)

The output of this code snippet will look like this:

   x  y  z
a  1  2  3
b  4  5  6
c  7  8  9

Index and Column Names

When converting a dictionary to a DataFrame, we can also specify the index and column names. Let's consider the following example:

data = {'name': ['Alice', 'Bob', 'Charlie'], 'grade': [95, 87, 92], 'subject': ['Math', 'English', 'Science']}
 
df = pd.DataFrame.from_dict(data, orient='columns', columns=['name', 'subject', 'grade'], index=['student1', 'student2', 'student3'])
print(df)

The output of this code snippet will look like this:

             name  subject  grade
student1    Alice     Math     95
student2      Bob  English     87
student3  Charlie  Science     92

As we can see from this example, we can specify the column names using the columns parameter and the index names using the index parameter.

Conclusion

In this blog post, we learned how to easily convert a dictionary to a DataFrame using the pd.DataFrame.from_dict() function in Pandas. We also learned how to specify the orientation of the dictionary and customize the column and index names. The ability to easily convert dictionaries to data frames makes manipulating data in Python easier, thus allowing data scientists to perform several data analysis tasks such as data manipulation and machine learning which can be useful in their profession. The skills learned in manipulating dictionaries to data frames can also be transferred to R language another popular tool in data science and the general field of Python data analysis and data manipulation.