Python Syntax and Functions Part2 (Summary Statistics)

The .agg() method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once, making your aggregations super efficient.

In the custom function for this exercise, "IQR" is short for inter-quartile range, which is the 75th percentile minus the 25th percentile. It's an alternative to standard deviation that is helpful if your data contains 

import numpy as np

def iqr(column):

    return column.quantile(0.75) - column.quantile(0.25)


# Update to print IQR and median of temperature_c, fuel_price_usd_per_l, & unemployment

print(sales[["temperature_c", "fuel_price_usd_per_l", "unemployment"]].agg([iqr, np.median]))


Calulating Cumulative Statistics

Syntax = df.drop_duplicates

sales_1_1 = sales_1_1.sort_values("date")


# Get the cumulative sum of weekly_sales, add as cum_weekly_sales col

sales_1_1["cum_weekly_sales"] = sales_1_1["weekly_sales"].cumsum()


# Get the cumulative max of weekly_sales, add as cum_max_sales col

sales_1_1["cum_max_sales"] = sales_1_1["weekly_sales"].cummax()


# See the columns you calculated

print(sales_1_1[["date", "weekly_sales", "cum_weekly_sales", "cum_max_sales"]])


Drop Duplicate Syntax in a column

store_types = sales.drop_duplicates(subset=["store", "type"])

print(store_types.head())


# Subset the rows that are holiday weeks and drop duplicate dates

holiday_dates = sales[sales["is_holiday"]].drop_duplicates(subset="date")

# Drop duplicate store/type combinations
store_types = sales.drop_duplicates(subset=["store", "type"])

# Drop duplicate store/department combinations
store_depts = sales.drop_duplicates(subset=["store", "department"])
Counting Values and Proporting Syntax Pandas
store_counts = store_types['type'].value_counts()
print(store_counts)

# Get the proportion of stores of each type
store_props = store_types['type'].value_counts(normalize=True)
print(store_props)

# Count the number of each department number and sort
dept_counts_sorted = store_depts['department'].value_counts(ascending=False)
print(dept_counts_sorted)

# Get the proportion of departments of each number and sort
dept_props_sorted = store_depts['department'].value_counts(normalize=True,ascending=False)
#print(dept_props_sorted)
Aggregating Columns using Group by and agg()
sales_stats = sales.groupby("type")["weekly_sales"].agg([np.min, np.max, np.mean, np.median])

# Print sales_stats
print(sales_stats)
unemp_fuel_stats = sales.groupby("type")[["unemployment", "fuel_price_usd_per_l"]].agg([np.min, np.max, np.mean, np.median])


Comments

Popular posts from this blog

Binomial Test in Python

Slicing and Indexing in Python Pandas