Hypothesis Testing In Python: A Detailed Tutorial

We are probably living in the most dynamic time in human history. Things around us are changing at a rapid rate, mostly fueled by rapid advancement in the fields of machine learning and artificial intelligence. Businesses have now at their disposal significantly improved decision making tools than a decade or so ago.

But with the rise in the availability of means to make a decision, the onus is now on engineers and business to leverage technology to support rapid decision making.

Consider an e-commerce giant that wants to revamp their UI to know if the changes in UI improve their user experience and lead to more users utilizing their software testing platform. The correct way to assess if this proposed change is really correct is to do an A/B test, i.e. release the new feature to a controlled group of users and draw insights from their experience of the new UI.

A/B testing falls under Hypothesis testing, a statistical method to determine if there is enough evidence from a sample data to draw conclusions about a wider population.

In this blog we will explore in detail what is Hypothesis testing, when and where to use it and how to select the right hypothesis test and different types of hypothesis tests.

Jump to

What is Hypothesis Testing ?

Hypothesis testing is defined as a statistical tool that helps us verify or deny if an assumption about a population based on sample data of the population holds ground or not.

It involves making an assumption, called the null hypothesis and an alternative hypothesis, then using statistics to determine if the null hypothesis can be accepted or rejected. The acceptance or rejection is done based on a level of significance. In statistics 5% or 0.05 is chosen as a common significance level.

The p-value is then derived by running hypothesis tests and if the p-value is less than the level of significance we can reject our null hypothesis, if it’s greater then we can reject the null hypothesis.

Null Hypothesis, Alternate Hypothesis And P-Value

Let’s understand the terms null hypothesis, alternate hypothesis and p-value in a bit more detail.

Null Hypothesis (H0)

In statistical terms, the null hypothesis represents a statement of no effect or difference. It is always the accepted fact and referred to as status quo.

Imagine we have a coin, flipping it will result in either a head or a tail. There is a 50% chance of a head or 50% chance of a tail during each toss of the coin. In this case our null hypothesis is that the coin is fair. We assume there is no bias or trickery going on.

Alternate Hypothesis ( H1)

In statistical terms, the alternate hypothesis represents a statement that the test hopes to find evidence for.

Alternate hypothesis will always contradict the null hypothesis. It challenges the status quo and the test should find sufficient evidence to reject the null hypothesis

Sticking to the coin example, our alternate hypothesis could be that the coin is biased, perhaps it’s more likely to produce heads. The suggestion here is there’s something different from the assumed fairness

P-Value

The p-value is a number between 0 and 1. If it’s small (typically less than the chosen level of significance, like 0.05), it suggests that the observed data is inconsistent with the null hypothesis.

The p-value is a measure of evidence against the null hypothesis. It tells you how likely it is to observe the data you have if the null hypothesis is true. A small p-value suggests that what you observed is unlikely to be just a coincidence.

Continuing with the coin example, if we flip the coin many times and consistently get heads, the p-value would be low. It’s like saying, “The chances of getting this many heads by luck are very slim. Something might be up with this coin.”

When and Where is Hypothesis Testing Needed ?

Hypothesis testing serves as a powerful tool in statistical analysis, offering a structured framework to validate or refute assumptions about populations.

Hypothesis testing plays a pivotal role in various fields, some of them are summarized below

Inference and Decision Making

Hypothesis testing guides decision-making processes by providing a formalized method to assess the evidence for or against a particular claim or hypothesis.

Model Evaluation

Data scientists employ hypothesis tests to evaluate the effectiveness of statistical models, ensuring that chosen models accurately represent the underlying data distribution.

A/B Testing in Business and Marketing

Businesses leverage hypothesis testing, particularly in A/B testing scenarios, to assess the impact of changes or interventions on user behavior, conversion rates, and other key metrics.

Quality Control in Manufacturing

Industries employ hypothesis testing to ensure product quality by systematically testing samples to validate production processes and identify potential issues.

Medical Research and Clinical Trials

Hypothesis testing is instrumental in medical research, aiding in the evaluation of new treatments through rigorous testing of hypotheses about treatment efficacy and safety.

What are Type 1 and Type 2 errors in Hypothesis Testing ?

Type 1 and Type 2 errors are concepts associated with hypothesis testing.

Type 1 Error

A Type 1 error occurs when the null hypothesis (H0) is incorrectly rejected when it is actually true. In other words, it’s a false positive or a conclusion that there is an effect or difference when there isn’t.

Suppose a medical researcher is testing a new drug’s effectiveness in treating a certain condition. The null hypothesis (H0) is that the drug has no effect.

A Type 1 error would occur if, based on the sample data, the researcher incorrectly concludes that the drug is effective (rejects H0)when, in reality, it has no effect.

Type 2 Error

A Type 2 error occurs when the null hypothesis(H0) is not rejected when it is actually false. In other words, it’s a false negative or a conclusion that there is no effect or difference when there actually is.

Continuing with the drug example, a Type 2 error would occur if the researcher fails to reject the null hypothesis(H0)and concludes that the drug has no effect when, in reality, it does have a positive effect.

Trade-Off Between Type 1 and Type 2 Errors

Adjusting Significance Level (α)

Researchers can control the likelihood of Type 1 errors by choosing the significance level (α). A lower α reduces the chance of Type 1 errors but may increase the chance of Type 2 errors.

Sample Size

Increasing the sample size generally reduces the risk of both Type 1 and Type 2 errors.

How to remember Type 1 & Type 2 errors ?

Type 1 Error (False Positive): Remember it as “I see an effect that isn’t there.”

Type 2 Error (False Negative): Remember it as “I miss an effect that is there.”

What are the steps involved in Hypothesis Testing ?

Let us now understand the steps involved in Hypothesis testing

Step 1: Defining the Null & Alternative hypothesis

The first step is stating the null hypothesis (H0) and the alternative hypothesis (H1). Defining these helps us set the course for running the test properly as we get a good understanding of the status quo we are trying to support or reject.

Step 2: Choose a Significance Level

Select a significance level (α), typically 0.05, to determine the threshold for rejecting the null hypothesis. It provides validity to our hypothesis test, ensuring that we have sufficient data to back up our claims.

Step 3: Collect and analyze data

Collect a sample from the population of interest. U

Step 4: Choose an appropriate test

Select a statistical test based on the nature of the hypothesis (e.g., t-test, chi-square test, ANOVA, etc.) and the characteristics of the data.

Step 5: Calculate the Test Statistic

Compute the test statistic that quantifies the difference between the sample data and what would be expected under the assumption of the null hypothesis.

Step 6: Compare the Test Statistic to Significance Level

In this step we compare the p-value returned by our test to the significance level chosen in step 2. Based on whether the p-value exceeds the significance level or not an inference is drawn.

Step 7: Interpret the result

We conclude the test after interpreting the result and the outcome is either accepting the null hypothesis as true or rejecting it.

How to choose the right hypothesis test ?

Choosing the right hypothesis test ensures that the analysis is appropriate for the data at hand and leads to reliable and valid conclusions. A brief framework for selecting the test is explained below.

Nature Of Data

The nature of data i.e. whether the data is continuous such as heights, weights, temperatures or categorical such as eye color, gender, yes/no response.

Number Of Groups

Whether there is one sample group such as when testing the average score of class or two groups such as when comparing heights of men and women or more than two groups such as when comparing test scores across different teaching methods.

Sample Size

A large or small sample size also impacts the tests we should choose. A small sample size such as when comparing test scores between two schools warrants a different test than a large sample size such as say voters in national elections in a country.

Paired and Unpaired Data

When the data being compared is from the same set of individuals or population it is called paired data, for example blood pressure before and after treatment by a drug. Unpaired data is when the samples are taken from a distinct set of individuals such as an A/B testing campaign by an e-commerce website.

Types Of Variables

Depending on the type of sample data collected, the test needs to be adjusted accordingly. If the sample data contains means such as average salaries in two departments warrants a different test than when comparing proportions , for example the proportion of males and females owning a car. If the sample data is medians such as median income of two cities the test that is needed will be different.

What Are The Different Types Of Hypothesis Tests ?

Student’s T-Test (Unpaired)

Use Case: To compare means of two independent groups.

When to Use: Data is continuous and approximately normally distributed. Variances of the two groups are assumed to be equal.

Welch’s T-Test (Unpaired)

Use Case: Similar to the Student’s T-Test but doesn’t assume equal variances.

When to Use: Variances of the two groups are not assumed to be equal.

Data is continuous and approximately normally distributed.

Paired T-Test

Use Case: To compare means of two related groups.

When to Use: Paired observations (e.g., before and after measurements). Data is continuous and approximately normally distributed.

Analysis of Variance (ANOVA – Parametric)

Use Case: To compare means of more than two independent groups.

When to Use: Data is continuous and approximately normally distributed.

Homogeneity of variances assumption holds.

Kruskal-Wallis Test (ANOVA – Non-Parametric)

Use Case: Similar to ANOVA but for non-normally distributed data.

When to Use: Data is not normally distributed. Comparing more than two independent groups.

Chi-Squre Test (Goodness of Fit)

Use Case: To assess independence or goodness of fit for categorical data.

When to Use: Categorical data is involved. Expected frequencies are not too small.

Fisher’s Exact Test

Use Case: Similar to the Chi-Square test for small sample sizes.

When to Use: Small sample sizes in 2×2 contingency tables.

Mann-Whitney U Test

Use Case: Non-parametric alternative to the unpaired T-Test.

When to Use: Data is not normally distributed. Two independent groups are being compared.

Wilcoxon Signed-Rank Test

Use Case: Non-parametric alternative to the paired T-Test.

When to Use: Paired observations. Data is not normally distributed.

Conclusion

In this article we’ve covered in detail what Hypothesis testing is, how it can help in using statistics to make informed decisions. We’ve also studied a brief framework that will help us decide which test to choose for a particular use case and the various different types of hypothesis tests.