A/B test using Python

Calculating KPIs

You're now going to take what you've learned and work through calculating a KPI yourself. Specifically, you'll calculate the average amount paid per purchase within a user's first 28 days using the purchase_data DataFrame from before.

This KPI can provide a sense of the popularity of different in-app purchase price points to users within their first month.


max_purchase_date = current_date - timedelta(days=28)

# Filter to only include users who registered before our max date
purchase_data_filt = purchase_data[purchase_data.reg_date < max_purchase_date]

# Filter to contain only purchases within the first 28 days of registration
purchase_data_filt = purchase_data_filt[(purchase_data_filt.date <= 
                        purchase_data_filt.reg_date + timedelta(days=28))]

# Output the mean price paid per purchase
print(purchase_data_filt.price.mean())

Building on the previous exercise, let's look at the same KPI, average purchase price, and a similar one, median purchase price, within the first 28 days. Additionally, let's look at these metrics not limited to 28 days to compare.

We can calculate these metrics across a set of cohorts and see what differences emerge. This is a useful task as it can help us understand how behaviors vary across cohorts.

Note that in our data the price variable is given in cents.


Use np.where to create an array month1 containing:

  • the price of the purchase purchase, if

    1. the user registration .reg_date occurred at most 28 days ago (i.e. before max_reg_date), and

    2. the date of purchase .date occurred within 28 days of registration date .reg_date;

  • NaN, otherwise.

  • # Set the max registration date to be one month before today
    max_reg_date = current_date - timedelta(days=28)

    # Find the month 1 values
    month1 = np.where((purchase_data.reg_date < max_reg_date) &
                     (purchase_data.date < purchase_data.reg_date + timedelta(days=28)),
                      purchase_data.price, 
                      np.NaN)
                     
    # Update the value in the DataFrame
    purchase_data['month1'] = month1
  • purchase_data_upd = purchase_data.groupby(by=['gender', 'device'], as_index=False) 

    # Aggregate the month1 and price data 
    purchase_summary = purchase_data_upd.agg(
                            {'month1': ['mean', 'median'],
                            'price': ['mean', 'median']})

    # Examine the results 
    print(purchase_summary)

Comments

Popular posts from this blog

Binomial Test in Python

Python Syntax and Functions Part2 (Summary Statistics)

Slicing and Indexing in Python Pandas