Python Syntax and Functions Part2 (Summary Statistics)
The .agg()
method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once, making your aggregations super efficient.
In the custom function for this exercise, "IQR" is short for inter-quartile range, which is the 75th percentile minus the 25th percentile. It's an alternative to standard deviation that is helpful if your data contains
import numpy as np
def iqr(column):
return column.quantile(0.75) - column.quantile(0.25)
# Update to print IQR and median of temperature_c, fuel_price_usd_per_l, & unemployment
print(sales[["temperature_c", "fuel_price_usd_per_l", "unemployment"]].agg([iqr, np.median]))
Calulating Cumulative Statistics
Syntax = df.drop_duplicates
sales_1_1 = sales_1_1.sort_values("date")
# Get the cumulative sum of weekly_sales, add as cum_weekly_sales col
sales_1_1["cum_weekly_sales"] = sales_1_1["weekly_sales"].cumsum()
# Get the cumulative max of weekly_sales, add as cum_max_sales col
sales_1_1["cum_max_sales"] = sales_1_1["weekly_sales"].cummax()
# See the columns you calculated
print(sales_1_1[["date", "weekly_sales", "cum_weekly_sales", "cum_max_sales"]])
Drop Duplicate Syntax in a column
store_types = sales.drop_duplicates(subset=["store", "type"])
print(store_types.head())
# Subset the rows that are holiday weeks and drop duplicate dates
holiday_dates = sales[sales["is_holiday"]].drop_duplicates(subset="date")
# Drop duplicate store/type combinations
store_types = sales.drop_duplicates(subset=["store", "type"])
# Drop duplicate store/department combinations
store_depts = sales.drop_duplicates(subset=["store", "department"])
Counting Values and Proporting Syntax Pandas
store_counts = store_types['type'].value_counts()
print(store_counts)
# Get the proportion of stores of each type
store_props = store_types['type'].value_counts(normalize=True)
print(store_props)
# Count the number of each department number and sort
dept_counts_sorted = store_depts['department'].value_counts(ascending=False)
print(dept_counts_sorted)
# Get the proportion of departments of each number and sort
dept_props_sorted = store_depts['department'].value_counts(normalize=True,ascending=False)
#print(dept_props_sorted)
Aggregating Columns using Group by and agg()
sales_stats = sales.groupby("type")["weekly_sales"].agg([np.min, np.max, np.mean, np.median])
# Print sales_stats
print(sales_stats)
unemp_fuel_stats = sales.groupby("type")[["unemployment", "fuel_price_usd_per_l"]].agg([np.min, np.max, np.mean, np.median])
Comments
Post a Comment