Python Syntax and Functions Part2 (Summary Statistics)

August 28, 2020

The .agg() method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once, making your aggregations super efficient.

In the custom function for this exercise, "IQR" is short for inter-quartile range, which is the 75th percentile minus the 25th percentile. It's an alternative to standard deviation that is helpful if your data contains

import numpy as np

def iqr(column):

return column.quantile(0.75) - column.quantile(0.25)

# Update to print IQR and median of temperature_c, fuel_price_usd_per_l, & unemployment

print(sales[["temperature_c", "fuel_price_usd_per_l", "unemployment"]].agg([iqr, np.median]))

Calulating Cumulative Statistics

Syntax = df.drop_duplicates

sales_1_1 = sales_1_1.sort_values("date")

# Get the cumulative sum of weekly_sales, add as cum_weekly_sales col

sales_1_1["cum_weekly_sales"] = sales_1_1["weekly_sales"].cumsum()

# Get the cumulative max of weekly_sales, add as cum_max_sales col

sales_1_1["cum_max_sales"] = sales_1_1["weekly_sales"].cummax()

# See the columns you calculated

print(sales_1_1[["date", "weekly_sales", "cum_weekly_sales", "cum_max_sales"]])

Drop Duplicate Syntax in a column

store_types = sales.drop_duplicates(subset=["store", "type"])

print(store_types.head())

# Subset the rows that are holiday weeks and drop duplicate dates

holiday_dates = sales[sales["is_holiday"]].drop_duplicates(subset="date")

# Drop duplicate store/type combinations
store_types = sales.drop_duplicates(subset=["store", "type"])

# Drop duplicate store/department combinations
store_depts = sales.drop_duplicates(subset=["store", "department"])

Counting Values and Proporting Syntax Pandas

store_counts = store_types['type'].value_counts()
print(store_counts)

# Get the proportion of stores of each type
store_props = store_types['type'].value_counts(normalize=True)
print(store_props)

# Count the number of each department number and sort
dept_counts_sorted = store_depts['department'].value_counts(ascending=False)
print(dept_counts_sorted)

# Get the proportion of departments of each number and sort
dept_props_sorted = store_depts['department'].value_counts(normalize=True,ascending=False)
#print(dept_props_sorted)

Aggregating Columns using Group by and agg()

sales_stats = sales.groupby("type")["weekly_sales"].agg([np.min, np.max, np.mean, np.median])

# Print sales_stats
print(sales_stats)

unemp_fuel_stats = sales.groupby("type")[["unemployment", "fuel_price_usd_per_l"]].agg([np.min, np.max, np.mean, np.median])

Search This Blog

Python Data Science Coding Reference

Python Syntax and Functions Part2 (Summary Statistics)

Comments

Post a Comment

Popular posts from this blog

Binomial Test in Python

Slicing and Indexing in Python Pandas