Project: Statistical Hypothesis Testing Using Permutation Tests

Background

In bioinformatics and biology, we often want to know whether an observed difference between two groups is real or could have occurred by chance.

Examples include:

gene expression levels in treated vs untreated samples
GC content in two sets of DNA sequences
protein lengths in different functional categories

In this project, you will implement a permutation test, a general, assumption-free statistical test that estimates significance by simulation.

Learning Objectives

After completing this project, you should be able to:

Explain what a statistical hypothesis test is
Define a null hypothesis and a test statistic
Implement a permutation-based null model
Estimate p-values from simulated data
Interpret statistical results in a biological context

Problem Description

You are given two groups of numerical measurements:

Group A: measurements from condition A
Group B: measurements from condition B

Your task is to test whether the observed difference between the two groups is statistically significant, using a permutation test.

You will not use any built-in statistical testing functions. Instead, you will implement the test logic yourself.

Key Idea

The permutation test answers the question:

If there were actually no difference between the two groups, how often would we observe a difference at least as large as the one we see?

To answer this, we:

Compute a test statistic on the real data
Randomly shuffle group labels many times
Recompute the statistic for each shuffle
Compare the observed statistic to this null distribution

Definitions

Test Statistic

You will use the difference in means as the test statistic:

$$ T = \bar{x}_A - \bar{x}_B $$

Other statistics are possible, but this one is required.

Null Hypothesis

The null hypothesis states:

The two groups come from the same underlying distribution.

Under this hypothesis, group labels are arbitrary.

Permutation

A permutation consists of:

pooling all measurements
randomly reassigning them into two groups of the original sizes

Input

Two lists of floating-point numbers:
- groupA
- groupB
An integer N, the number of permutations (e.g. 1000)

Output

The observed test statistic
The estimated p-value
(Optional) the null distribution of test statistics

Starting Tasks

Task 1: Test Statistic

Implement a function that computes the difference in means between two groups.

Task 2: Permutation Generation

Implement a function that:

pools the data
randomly permutes it
splits it back into two groups of the original sizes

Task 3: Null Distribution

Repeat the permutation step N times and compute the test statistic for each permutation.

Task 4: P-value Estimation

Compute the p-value as:

$$ p = \frac{#{ |T_{perm}| \ge |T_{obs}| } + 1}{N + 1} $$

(This correction avoids zero p-values.)

Task 5: Interpretation

Report whether the result is statistically significant for a given significance level (e.g. $\alpha = 0.05$).

Example

Input

Group A: [5.1; 4.9; 5.3; 5.0]
Group B: [4.2; 4.4; 4.1; 4.3]
N = 1000

Output

Observed difference in means: 0.775
Estimated p-value: 0.012
Result: significant at alpha = 0.05

(Note: exact values will vary due to randomness.)

Implementation Notes

Use random number generation provided by F#
Use immutable data structures unless mutation simplifies the code
Focus on correctness and clarity
Make your code reproducible by fixing the random seed

Tasks extension

Add a one-sided test
Allow different test statistics (median difference)
Visualize the null distribution (histogram)
Apply the test to a biological dataset
Implement Benjamini–Hochberg correction for multiple tests

Submission

Submit:

Source code
A documentation explaining your approach
One example dataset with expected interpretation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project: Statistical Hypothesis Testing Using Permutation Tests

Background

Learning Objectives

Problem Description

Key Idea

Definitions

Test Statistic

Null Hypothesis

Permutation

Input

Output

Starting Tasks

Task 1: Test Statistic

Task 2: Permutation Generation

Task 3: Null Distribution

Task 4: P-value Estimation

Task 5: Interpretation

Example

Implementation Notes

Tasks extension

Submission

FilesExpand file tree

Permutation_Tests.md

Latest commit

History

Permutation_Tests.md

File metadata and controls

Project: Statistical Hypothesis Testing Using Permutation Tests

Background

Learning Objectives

Problem Description

Key Idea

Definitions

Test Statistic

Null Hypothesis

Permutation

Input

Output

Starting Tasks

Task 1: Test Statistic

Task 2: Permutation Generation

Task 3: Null Distribution

Task 4: P-value Estimation

Task 5: Interpretation

Example

Implementation Notes

Tasks extension

Submission