From bc5ea0610194ef109ba6182d9b94b900908c0693 Mon Sep 17 00:00:00 2001
From: Kim Morrison <kim@tqft.net>
Date: Thu, 21 May 2026 17:06:01 +0000
Subject: [PATCH] feat: add Green-Tao theorem eval problem
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This PR adds the Green-Tao theorem (§37 of Knill's "Some Fundamental
Theorems in Mathematics") as a new eval problem: the set of primes
contains arbitrarily long arithmetic progressions. Mathlib has
Dirichlet's theorem and Roth's theorem on 3-APs but not Green-Tao,
which has not yet been formalized in any major proof assistant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 LeanEval/NumberTheory/GreenTao.lean | 34 +++++++++++++++++++++++++++++
 manifests/problems.toml             | 11 ++++++++++
 2 files changed, 45 insertions(+)
 create mode 100644 LeanEval/NumberTheory/GreenTao.lean

diff --git a/LeanEval/NumberTheory/GreenTao.lean b/LeanEval/NumberTheory/GreenTao.lean
new file mode 100644
index 0000000..a384485
--- /dev/null
+++ b/LeanEval/NumberTheory/GreenTao.lean
@@ -0,0 +1,34 @@
+import Mathlib
+import EvalTools.Markers
+
+namespace LeanEval
+namespace NumberTheory
+
+/-!
+# Green–Tao theorem
+
+§37 of Oliver Knill's *Some Fundamental Theorems in Mathematics*. The set
+of primes contains arbitrarily long arithmetic progressions: for every `k`
+there exist `a ≥ 0` and `b ≥ 1` such that `a + b · j` is prime for every
+`j < k`.
+
+mathlib has Dirichlet's theorem (`Nat.infinite_setOf_prime_and_modEq`) and
+Roth's theorem on 3-APs (`roth_3ap_theorem_nat`) but **not** the
+Green–Tao theorem. As of 2026 it has not been formalized in any major
+proof assistant (a long-standing open formalization target).
+-/
+
+/-- A set `A ⊆ ℕ` contains **arbitrarily long arithmetic progressions** if
+for every length `k` there is an AP `{a + b·j | j < k}` of common
+difference `b ≥ 1` entirely contained in `A`. -/
+def ContainsArbitraryAPs (A : Set ℕ) : Prop :=
+  ∀ k : ℕ, ∃ a b : ℕ, 1 ≤ b ∧ ∀ j : ℕ, j < k → a + b * j ∈ A
+
+/-- **Green–Tao theorem.** The set of primes contains arbitrarily long
+arithmetic progressions. -/
+@[eval_problem]
+theorem green_tao : ContainsArbitraryAPs {p : ℕ | Nat.Prime p} := by
+  sorry
+
+end NumberTheory
+end LeanEval
diff --git a/manifests/problems.toml b/manifests/problems.toml
index b9e42d1..1d3c2f8 100644
--- a/manifests/problems.toml
+++ b/manifests/problems.toml
@@ -673,3 +673,14 @@ module = "LeanEval.Geometry.Uniformization"
 holes = ["uniformization"]
 submitter = "Junyan Xu"
 source = "John Hamal Hubbard, *Teichmüller theory and applications to geometry, topology, and dynamics. Vol. 1*, Chapter 1."
+
+[[problem]]
+id = "green_tao"
+title = "Green–Tao theorem"
+test = false
+module = "LeanEval.NumberTheory.GreenTao"
+holes = ["green_tao"]
+submitter = "Kim Morrison"
+notes = "§37 of Oliver Knill's 'Some Fundamental Theorems in Mathematics'. The set of primes contains arbitrarily long arithmetic progressions: for every k there exist a ≥ 0 and b ≥ 1 such that a + b·j is prime for every j < k. Mathlib has Dirichlet's theorem (Nat.infinite_setOf_prime_and_modEq) and Roth's theorem on 3-APs (roth_3ap_theorem_nat) but not Green-Tao. As of 2026 the theorem has not been formalized in any major proof assistant (a long-standing open formalization target)."
+source = "B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. 167 (2008), 481-547 (announced 2004). Listed as §37 in O. Knill, Some Fundamental Theorems in Mathematics (https://people.math.harvard.edu/~knill/graphgeometry/papers/fundamental.pdf)."
+informal_solution = "The proof has three main steps. (i) A transference principle: it suffices to find arbitrarily long APs in any set of positive density in a suitable pseudorandom super-set. (ii) Pseudorandomness of the primes after a `W-trick': removing small prime factors via a Selberg-style sieve makes the indicator function of the primes dominated by a pseudorandom measure ν. (iii) Apply Szemerédi's theorem in the relative form (a Furstenberg-style multiple recurrence / Gowers-uniformity argument) to that pseudorandom set, transferring the AP-rich conclusion back to the primes."