Skip to content

How to Create Custom Distribution Plots with Seaborn Displot

Updated on

Data visualization is a crucial aspect of data analysis and machine learning. It allows us to understand complex data sets and draw insights from them. One of the most popular libraries for data visualization in Python is Seaborn, and one of its most powerful tools is the displot function. This tutorial will guide you through the process of creating and customizing distribution plots using the Seaborn displot function in Python.

Seaborn's displot is a versatile function that can create a variety of distribution plots, including histograms, KDE plots, and ECDF plots. It's a flexible and powerful tool that can handle both univariate and bivariate data, making it an essential part of any data analyst's toolkit. Whether you're a seasoned data scientist or a beginner just starting out, understanding how to use displot effectively can significantly enhance your data visualization skills.

What is Displot in Seaborn?

Seaborn's displot is a function designed to visualize the distribution of data. It's a flexible function that can create a variety of distribution plots, including histograms, KDE plots, and ECDF plots. The displot function is part of Seaborn's relational module, which is designed to visualize statistical relationships between variables.

The basic syntax for displot is as follows:

seaborn.displot(data, x=None, y=None, hue=None, row=None, col=None, weights=None, kind='hist', rug=False, rug_kws=None, log_scale=None, legend=True, palette=None, hue_order=None, hue_norm=None, color=None, col_wrap=None, row_order=None, col_order=None, height=5, aspect=1, facet_kws=None, **kwargs)

The displot function takes a number of arguments that allow you to customize the appearance and behavior of your plots. For example, you can specify the kind of plot (histogram, KDE, or ECDF), the variables to plot (x and y), and the variable to use for color grouping (hue).

Difference Between Distplot and Displot

While both distplot and displot are Seaborn functions used to visualize data distributions, there are some key differences between them. The distplot function was the primary function used for creating histograms and KDE plots in earlier versions of Seaborn. However, distplot has been deprecated in recent versions of Seaborn, and displot is now the recommended function for creating distribution plots.

The displot function is more flexible and powerful than distplot. It can handle both univariate and bivariate data, and it can create a wider variety of plots, including histograms, KDE plots, ECDF plots, and more. Additionally, displot supports the use of FacetGrid, which allows you to create multiple subplots in a single figure.

Is Seaborn Deprecated?

No, Seaborn is not deprecated. However, some functions within Seaborn, such as distplot, have been deprecated in recent versions. The displot function is now the recommended function for creating distribution plots in Seaborn. It's more flexible and powerful than distplot, and it's designed to work

well with the rest of Seaborn's relational module.

Seaborn Displot Examples

To better understand how to use displot, let's look at some examples. We'll start by importing the necessary libraries and loading a dataset:

import seaborn as sns
import matplotlib.pyplot as plt
 
## Load the penguins dataset
penguins = sns.load_dataset("penguins")

Example 1: Basic Histogram

The simplest use of displot is to create a histogram of a single variable. Here's how you can create a histogram of the flipper_length_mm variable from the penguins dataset:

sns.displot(data=penguins, x="flipper_length_mm")
plt.show()

This will create a basic histogram with automatic bin size determination. You can customize the number of bins using the bins parameter:

sns.displot(data=penguins, x="flipper_length_mm", bins=20)
plt.show()

Example 2: Histogram with KDE

You can also add a Kernel Density Estimate (KDE) plot to your histogram using the kde parameter:

sns.displot(data=penguins, x="flipper_length_mm", kde=True)
plt.show()

The KDE plot is a smoothed version of the histogram, and it can give you a better idea of the shape of the data distribution.

Example 3: FacetGrid Histogram

One of the most powerful features of displot is its ability to create multiple subplots in a single figure using FacetGrid. You can create a separate subplot for each species of penguin like this:

sns.displot(data=penguins, x="flipper_length_mm", col="species")
plt.show()

This will create a separate histogram for each species of penguin, allowing you to compare the flipper length distributions between species.

Seaborn Displot Customization

Seaborn's displot function provides a variety of options for customizing the appearance of your plots. You can control the color of the plot, the size and style of the bins, the appearance of the KDE plot, and more.

Example 4: Customizing Color and Bins

To change the color of the plot, you can use the color parameter. For example, to create a red histogram, you can do:

sns.displot(data=penguins, x="flipper_length_mm", color="red")
plt.show()

You can also customize the size and style of the bins using the binwidth and binrange parameters. For example, to create a histogram with bins of width 5 and range from 150 to 250, you can do:

sns.displot(data=penguins, x="flipper_length_mm", binwidth=5, binrange=(150, 250))
plt.show()

Example 5: Customizing KDE Plot

If you're using a KDE plot, you can customize its appearance using the kde_kws parameter. For example, to create a KDE plot with a thicker line and a different color, you can do:

sns.displot(data=penguins, x="flipper_length_mm", kde=True, kde_kws={"color": "green", "lw": 3})
plt.show()

Seaborn Displot with Multiple Columns

One of the most powerful features of Seaborn's displot function is its ability to handle multiple columns of data. This allows you to create complex visualizations that can reveal interesting patterns and relationships in your data.

Example 6: Displot with Two Variables

To create a displot with two variables, you can specify both the x and y parameters. For example, to create a bivariate histogram of the flipper_length_mm and body_mass_g variables, you can do:

sns.displot(data=penguins, x="flipper_length_mm", y="body_mass_g")
plt.show()

This will create a 2D histogram where the color intensity represents the number of data points in each bin.

Example 7: Displot with Hue

You can also use the hue parameter to group your data by another variable. For example, to create a histogram of flipper_length_mm grouped by species, you can do:

sns.displot(data=penguins, x="flipper_length_mm", hue="species")
plt.show()

This will create a separate histogram for each species, with different colors for each species.

Frequently Asked Questions

  1. What is the displot function in Seaborn?

The displot function in Seaborn is a flexible function designed to visualize the distribution of data. It can create a variety of distribution plots, including histograms, KDE plots, and ECDF plots.

  1. How can I customize the appearance of my displot?

You can customize the appearance of your displot using various parameters, such as color for the color of the plot, binwidth and binrange for the size and range of the bins, and kde_kws for the appearance of the KDE plot.

  1. Can I use displot with multiple columns of data?

Yes, displot can handle multiple columns of data. You can specify both the x and y parameters to create a bivariate histogram, or use the hue parameter to group your data by another variable.