Seaborn Visulaization

September 06, 2020

import seaborn as sns

Seaborn is a Python data visualization library that provides simple code to create elegant visualizations for statistical exploration and insight. Seaborn is based on Matplotlib, but improves on Matplotlib in several ways:

Seaborn provides a more visually appealing plotting style and concise syntax.
Seaborn natively understands Pandas DataFrames, making it easier to plot data directly from CSVs.
Seaborn can easily summarize Pandas DataFrames with many rows of data into aggregated charts.

If you’re unfamiliar with Pandas, just know that Pandas is a data analysis library for Python that provides easy-to-use data structures and allows you to organize and manipulate datasets so they can be visualized. To fully leverage the power of Seaborn, it is best to prepare your data using Pandas.

Over the next few exercises, we will explain how Seaborn relates to Pandas and how we can transform massive datasets into easily understandable graphics.

Basic Syntax

For Barplot

sns.barplot(
   data= df,
   x= 'Gender' ,
   y= 'Mean Satisfaction'
)
plt.show()




By default, Seaborn uses something called a bootstrapped confidence interval. Roughly speaking, this interval means that “based on this data, 95% of similar situations would have an outcome within this range”.
In our gradebook example, the confidence interval for the assignments means “if we gave this assignment to many, many students, we’re confident that the mean score on the assignment would be within the range represented by the error bar”.
The confidence interval is a nice error bar measurement because it is defined for different types of aggregate functions, such as medians and mode, in addition to means.
If you’re calculating a mean and would prefer to use standard deviation for your error bars, you can pass in the keyword argument ci="sd" to sns.barplot() which will represent one standard deviation. It would look like this:
sns.barplot(data=gradebook, x="name", y="grade", ci="sd")
For example, to calculate the median, you can pass in np.median to the estimator keyword:
sns.barplot(data=df,
  x="x-values",
  y="y-values",
  estimator=np.median)

Sometimes we’ll want to aggregate our data by multiple columns to visualize nested categorical variables.
For example, consider our hospital survey data. The mean satisfaction seems to depend on Gender, but it might also depend on another column: Age Range.,
We can compare both the Gender and Age Range factors at once by using the keyword hue.
sns.barplot(data=df,
            x="Gender",
            y="Response",
            hue="Age Range")

To review the seaborn workflow:

1. Ingest data from a CSV file to Pandas DataFrame.

df = pd.read_csv('file_name.csv')

2. Set `sns.barplot()` with desired values for `x`, `y`, and set `data` equal to your DataFrame.

sns.barplot(data=df, x='X-Values', y='Y-Values')

3. Set desired values for `estimator` and `hue` parameters.

sns.barplot(data=df, x='X-Values', y='Y-Values', estimator=len, hue='Value')

4. Render the plot using `plt.show()`.

plt.show()
KDE plot - Kernel density estimator; shows a smoothed version of dataset. Use sns.kdeplot().
Box plot - A classic statistical model that shows the median, interquartile range, and outliers. Use sns.boxplot().
Violin plot - A combination of a KDE and a box plot. Good for showing multiple distributions at a time. Use sns.violinplot().

Search This Blog

Python Data Science Coding Reference