Chi Square Test
HYPOTHESIS TESTING
In the last exercise, we looked at data where customers visited a website and either made a purchase or did not make a purchase. What if we also wanted to understand if the probability of making a purchase depends on some other categorical variable, like gender? If we want to understand whether the outcomes of two categorical variables are associated, we should use a Chi Square test. It is useful in situations like:
- An A/B test where half of users were shown a green submit button and the other half were shown a purple submit button. Was one group more likely to click the submit button?
- People under and over age 40 were given a survey asking “Which of the following three products is your favorite?” Did these age groups have significantly different preferences?
In SciPy, you can use the function chi2_contingency
to perform a Chi Square test.
The input to chi2_contingency
is a contingency table where:
- The columns are each a different condition, such as Interface A vs. Interface B
- The rows represent different outcomes, like “Clicked a Link” vs. “Didn’t Click”
This table can have as many rows and columns as you need.
Let’s return to the question of whether gender is associated with the probability of a website visitor making a purchase. The null hypothesis is that there’s no association between the variables (eg. males, females, and non-binary people are all equally likely to make a purchase on the website, so gender and purchase-status are not associated). If the p-value is below our chosen threshold (often 0.05), we reject the null hypothesis and can conclude there is a statistically significant association between the two variables (eg. men, women, and non-binary people appear to have different probabilities of making a purchase, so gender is associated with purchase-status).
Comments
Post a Comment