Skip to content

feat: Splay Tree Formalisation#568

Open
AntoineduFresne wants to merge 4 commits into
leanprover:mainfrom
AntoineduFresne:main
Open

feat: Splay Tree Formalisation#568
AntoineduFresne wants to merge 4 commits into
leanprover:mainfrom
AntoineduFresne:main

Conversation

@AntoineduFresne
Copy link
Copy Markdown

This PR introduces Splay Trees to CSLib, the algorithmic definitions, the correctness proofs, and the (amortised) complexity analysis.

Design & Architecture: The implementation is partitioned into four modules to isolate dependencies.

Basic: Core definitions (splay, splayUp, descend, Frame). Primitive rotations are upstreamed to BinaryTree.

Correctness:

  • Splaying a binary search tree returns a binary search tree (IsBST_splay).
  • Splaying a binary tree at a key q will return a binary tree with q at the root (splay_root_of_contains).

Complexity: We formalise the Sleator-Tarjan potential method.

  • Per-operation bound: we prove the amortised cost of a single splay operation is bounded by 3log_2(n)+1 (splay_amortized_bound).
  • Sequence bound: we establish the global sequence cost for m operations on an initial tree of size n. The total cost is bounded by m(3log_2(n) + 1) + n log_2(n) (nlogn_cost).

BSTAPI: A user-facing wrapper providing a bundled BST API. Users can splay binary search trees naturally without having to manually supply invariant proofs (for example that splaying a binary search tree returns automatically binary search tree).

Why Bottom-Up? (Comparison with Top-Down):
There is a complementary top-down implementation available for reference here. This PR utilises a bottom-up approach because it reduces the length of the formalisation:

  • No "Broken" Trees: Top-down splaying partitions the tree into three disconnected pieces (Left, Right, Middle) while searching. This makes tracking the mathematical potential function more difficult, as the potential function φ expects a whole tree. Our bottom-up approach leaves the tree intact—it just records the search path on the way down, and applies local rotations on the way up. The tree is always whole.

  • Odd vs. Even Paths: Splaying works by rotating edges in pairs (e.g., zig-zig). If a path has an odd number of edges, top-down requires, asymmetrical edge-case code to handle the leftover rotation while stitching the tree back together. By modelling the path as a list of Frames, our bottom-up approach processes pairs natively via list induction.

  • Search first & Rotate after: Top-down tries to search and restructure at the exact same time. Bottom-up strictly separates the logic: descend purely finds the node, and splayUp purely rotates it. This allows us to prove things about path lengths and node existence completely independently of the rotation proofs.

  • Symmetry Exploitation: The proofs utilise formalised mirror symmetry (mirror, flip). This allows left/right symmetric double rotations (like zig-zig vs. zag-zag) to be proven using generic transformations rather making things redundant by duplicating code with a "mirror" logic.

Co-authored-by: Anton Kovsharov antonkov@google.com
Co-authored-by: Antoine du Fresne von Hohenesche antoine@du-fresne.ch
Co-authored-by: Sorrachai Yingchareonthawornchai sorrachai.cp@gmail.com

@sorrachai
Copy link
Copy Markdown
Collaborator


variable {α : Type}

inductive BinaryTree (α : Type) where
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is the same as Mathlib.Data.Tree.Basic?

Copy link
Copy Markdown
Collaborator

@sorrachai sorrachai May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a stylistic difference.

| node (left : BinaryTree α) (key : α) (right : BinaryTree α)

The mathlib version looks like:

| node (key : α) (left : BinaryTree α) (right : BinaryTree α)

So it is about pre-order vs. in-order representation. When I formalized BST stuff, I found it more convenient to think about it in an in-order version.

Copy link
Copy Markdown
Collaborator

@sorrachai sorrachai May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, writing rotation

def rotateLeft : BinaryTree α → BinaryTree α
| .node a x (.node b y c) => .node (.node a x b) y c
| t => t

It is immediately clear from the code that you perform a left rotation. Having a pre-order version in mathlib does not look good when you write about rotations or double rotations.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your argument, but I don't think having a different order of constructors warrants this duplication.

/-! ### BST Structure -/
section BSTStructure

structure BST (α : Type) [LinearOrder α] where
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the bundling here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? What is bad? What should be a better choice?

Copy link
Copy Markdown
Contributor

@Shreyas4991 Shreyas4991 May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris is right. This is an unnecessary bundling. You can operate solely on BinaryTree and insert IsBST as a hypothesis in those theorems that need it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the proofs in the PR were based on a separate tree and IsBST property.

We create an API specifically for people who can use BST. In some use cases, it is more convenient to refer to a BST as a type rather than as a single tree with its properties. Sometimes you don't want to carry these properties around all the time.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do ask on Zulip if you'd like to hear other opinions, but unbundling these sort of propositions is a well-established best practice. I think this should be a Prop on Mathlib's (nearly identical) definition of trees.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the proofs in the PR were based on a separate tree and IsBST property.

We create an API specifically for people who can use BST. In some use cases, it is more convenient to refer to a BST as a type rather than as a single tree with its properties. Sometimes you don't want to carry these properties around all the time.

Consider the fact that a library PR must be built with future re-use in mind, beyond just this PR. If you create a tree definition it can be reused in several places where it need not be a bst. Secondly, this means you can actually work with trees from mathlib.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files need to be modules.

Copy link
Copy Markdown
Contributor

@Shreyas4991 Shreyas4991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary : When writing functional data structures, it is good practice to provide standard functional data structure API such as maps and folds, and then API lemmas over them. Also you are restating an induction principle of BinaryTree. This is redundant.

/-! ### Tree Invariants and BST Properties -/
section Invariants

inductive ForallTree (p : α → Prop) : BinaryTree α → Prop
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the induction principle of BinaryTree stated in a convoluted way.

Copy link
Copy Markdown
Author

@AntoineduFresne AntoineduFresne May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that ForallTree p _ is the induction principle of BinaryTree. it is the tree analogue of List.Forall p _ (for example see ForallTree_iff_toKeyList in Correctness.lean for the equivalence with the list-based characterisation.)

Keeping it makes pattern-matching easier on the tree constructors and goes well with cases/induction tactics in rotation and BST-preservation proofs.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main purpose of List.Forall is to play fun defeq tricks; so I think keeping this is defensible, but perhaps choosing \forall x \mem t, p x as the simp-normal form is better than using t.Forall p

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep this definition as Eric suggested.

ForallTree p r →
ForallTree p (.node l key r)

inductive IsBST [LinearOrder α] : BinaryTree α → Prop
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the design of Batteries RBMap for defining these kinds of functions. Ideally this should be defined through a fold function. The first step is of course to write the map and fold functions and API lemmas for them. See RBMap in Batteries and lean core for examples.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you instruct someone to do X, please justify/explain why X is better. In particular, I would appreciate #mwe to illustrate your point. Otherwise, I don't understand why the fold function is better than the ForAll version.

Copy link
Copy Markdown
Contributor

@Shreyas4991 Shreyas4991 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not common practice to offer MWEs in PR reviews. The purpose of MWEs is to present debugging samples in one's own code. Not suggest designs in a review.

Happy to elaborate on folds in the thread.

Briefly, fold functions are a general class of functions that capture the following common recursion pattern on recursive data types: given a binary operation, traverse the data structure recursively and accumulate the results of applying the binary operation. In Mathlib is rich in examples of using foldl and foldr. They are derived by defining instances of the 'Traversable' typeclass

Copy link
Copy Markdown
Collaborator

@eric-wieser eric-wieser May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see what you're asking for @Shreyas4991; the most obvious fold function for trees is just the builtin recursion principle provided by inductive. If you're looking for a linear fold that traverses the tree in pre/post/infix order, then I would lean towards skipping it and just writing the toPreList, toPostList, and toInfixList operators and let the user fold on those.

Copy link
Copy Markdown
Contributor

@Shreyas4991 Shreyas4991 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concretely I am asking for Forall and co to be replaced by the fold version of the recursive functions in the linked thread below.

https://leanprover.zulipchat.com/#narrow/channel/513188-CSLib/topic/Splay.20tree.20PR/near/595950421

They all follow the same pattern of fold (except fold has one accumulator parameter which makes defining them with foldl/foldr simpler)

end Invariants


/-! ### Accessor Lemmas for ForallTree -/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire section should fall out from API lemmas for fold and map.

/-! ### BST Structure -/
section BSTStructure

structure BST (α : Type) [LinearOrder α] where
Copy link
Copy Markdown
Contributor

@Shreyas4991 Shreyas4991 May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris is right. This is an unnecessary bundling. You can operate solely on BinaryTree and insert IsBST as a hypothesis in those theorems that need it.

@AntoineduFresne AntoineduFresne changed the title Splay Tree Formalisation splay tree formalisation May 18, 2026
@AntoineduFresne AntoineduFresne changed the title splay tree formalisation feat:Data Splay Tree Formalisation May 18, 2026
@AntoineduFresne AntoineduFresne changed the title feat:Data Splay Tree Formalisation feat: Splay Tree Formalisation May 18, 2026
Caution: If the search fails, we do not rotate (as currently
defined in splay) the empty leaf and start to rotate from
its ancestor, so the cost is path.length - 1. -/
def splay.cost [LinearOrder α] (t : BinaryTree α) (q : α) : ℝ :=
Copy link
Copy Markdown
Collaborator

@eric-wieser eric-wieser May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def splay.cost [LinearOrder α] (t : BinaryTree α) (q : α) : :=
def splay.cost [LinearOrder α] (t : BinaryTree α) (q : α) : Nat :=

no need to make this unprintable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants