🐍 OUROBOROS

Optimal Unified Residual Operations with Bounded Orthogonal Reflection and Spectral Control

Teaching Neural Networks the Art of Forgetting

Overview • Architecture • Installation • Quick Start • Theory • Insights

"The serpent that devours itself to be reborn — features consumed, transformed, and emerged anew."

🌀 Overview

The Problem

Standard residual networks can only add information. They lack the ability to erase, forget, or reflect — leading to residual accumulation where noisy features persist indefinitely.

Our Solution

A geometric residual connection that learns:

🧠 When to remember — preserve critical features
🗑️ When to forget — erase noise and outdated info
🔄 When to transform — flip representations

┌─────────────────────────┐
│   Traditional ResNet    │
│                         │
│  X_{l+1} = X_l + F(X)   │
│                         │
│    ❌ Can only ADD      │
└─────────────────────────┘

┌─────────────────────────┐
│      OUROBOROS          │
│                         │
│  X_{l+1} = A·X + β·k·vᵀ │
│                         │
│ ✅ ADD, ERASE, REFLECT  │
└─────────────────────────┘

OUROBOROS enables neural networks to:

Capability	Description
✨ Selective Forgetting	Surgically erase outdated or noisy information
🔄 Feature Reflection	Model oscillatory and oppositional dynamics
🎯 Spectral Control	Shape layer-wise transitions with precision
⚡ Gradient Stability	Maintain gradient flow with gated identity

🏗️ Architecture

The Delta Operator

At the heart of OUROBOROS lies the Delta Operator — a generalized Householder transformation:

`A(X) = I − β(X) · k(X) · k(X)ᵀ`

Symbol	Name	Description	Range
k(X)	Reflection Direction	Unit vector defining transformation axis	‖k‖ = 1
β(X)	Scalar Gate	Controls transformation intensity	[0, 2]
v(X)	Value Vector	New information to inject	ℝᵈᵛ

⚡ The Magic of β — One Scalar, Three Transformations

A single learnable scalar dynamically interpolates between three geometric operations:

β Value	Transformation	Eigenvalue	Effect
β → 0	Identity	λ = 1	Pass through unchanged
β → 1	Projection	λ = 0	Erase component along k
β → 2	Reflection	λ = -1	Flip direction along k

🔄 Geometric Visualization

Vector v transformed by Delta Operator. P(v) = projection (β=1), R(v) = reflection (β=2). Vector k = hyperplane normal.

🌊 Data Flow

The input X splits into three learnable branches that compute k, β, and v, which combine through the Delta operation with a skip connection.

🏛️ Full Model Architecture

Each OuroborosBlock contains:

RMSNorm → Attention → Ouroboros Residual
RMSNorm → MLP → Ouroboros Residual

🧬 The Ouroboros Residual Block

Core Update Rule — The Delta Rule

X_{l+1} = X_l + β · k · (vᵀ − kᵀ · X_l)
                        ↑         ↑
                     TARGET    CURRENT
               (what to write) (what exists)

This unifies three operations with a single gate:

Operation	Formula	Effect
Erasure	`−β · k · (kᵀ · X)`	Removes component along k
Writing	`+β · k · vᵀ`	Injects new information
Sync	Same `β`	Both scale together

📐 Mathematical Foundations

Spectral Decomposition Theorem

Theorem: For A = I − β·k·kᵀ where ‖k‖ = 1:

σ(A) = { 1, 1, ..., 1, (1−β) }
         └────┬────┘
         (d−1) times

Property	Formula	Notes
Eigenvalue along k	`λ_k = 1 − β`	Controlled by gate
Eigenvalues in k⊥	`λ = 1`	Multiplicity: d−1
Determinant	`det(A) = 1 − β`	Zero at β=1
Orthogonality	`AᵀA = I`	When β ∈ {0, 2}
Involution	`A² = I`	When β = 2

Why Standard ResNets Are Limited

Property	ResNet	OUROBOROS
Eigenvalues	≈ 1 + ε	∈ [-1, 1]
Negative λ	❌ No	✅ Yes
Singular	❌ No	✅ Yes (β=1)
Data-dependent	❌ Fixed	✅ Learnable

📊 Feature Coupling

Key Insight: The geometric coherence term k_i · k_j enables learned feature interactions without explicit cross-attention.

🚀 Installation

# Clone the repository
git clone https://github.com/DivyamTalwar/OUROBOROS.git
cd ouroboros

# Install dependencies
pip install torch>=2.0 transformers einops

# Install in development mode
pip install -e .

Requirements

Package	Version	Purpose
Python	≥ 3.8	Runtime
PyTorch	≥ 2.0	Deep learning
Transformers	≥ 4.30	HuggingFace
einops	≥ 0.6	Tensor ops

⚡ Quick Start

Basic Usage

from model.ouroboros import OuroborosModel, OuroborosConfig
import torch

# Configure the model
config = OuroborosConfig(
    vocab_size=50304,
    hidden_size=768,
    num_hidden_layers=12,
    num_attention_heads=6,
    head_dim=128,
)

# Initialize model
model = OuroborosModel(config)
print(f"Parameters: {model.get_num_params():,}")

# Forward pass
input_ids = torch.randint(0, 50304, (2, 512))
labels = input_ids.clone()

logits, loss = model(input_ids, targets=labels)
print(f"Loss: {loss.item():.4f}")

Training Loop

from torch.optim import AdamW

model = OuroborosModel(config).cuda()
optimizer = AdamW(model.parameters(), lr=3e-4, weight_decay=0.1)

for batch in dataloader:
    input_ids, labels = batch['input_ids'].cuda(), batch['labels'].cuda()
    
    logits, loss = model(input_ids, targets=labels)
    
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    
    print(f"Loss: {loss.item():.4f}")

🔧 Configuration Reference

Model Parameters

Parameter	Type	Default	Description
`vocab_size`	int	50304	Vocabulary size
`hidden_size`	int	768	Model dimension
`num_hidden_layers`	int	12	Number of blocks
`num_attention_heads`	int	6	Attention heads
`head_dim`	int	128	Dimension per head
`block_size`	int	1024	Max sequence length

Ouroboros Parameters

Parameter	Type	Default	Description
`ouroboros_k_eps`	float	1e-6	k normalization ε
`ouroboros_beta_init`	float	1.0	Initial β [0, 2]
`ouroboros_v_sigmoid`	bool	True	Sigmoid on v
`ouroboros_v_sigmoid_scale`	float	4.0	v scale factor

📁 Project Structure

OUROBOROS/
├── 📄 README.md              # Documentation
├──  assets/                # Images
│   ├── banner.png
│   ├── architecture.png
│   ├── beta_spectrum.png
│   ├── geometric_transform.png
│   ├── dataflow.png
│   ├── model_architecture.png
│   └── feature_coupling.png
└── 📁 model/
    └── ouroboros.py          # Core implementation

💡 Key Insights

Why OUROBOROS Works

Challenge	ResNet	OUROBOROS
Noisy features accumulate	❌ Can only add	✅ Can erase
Oscillatory patterns	❌ No negative λ	✅ λ ∈ [-1, 1]
Feature interference	❌ No filter	✅ Projection
Gradient stability	✅ Identity	✅ Gated identity

Depth-Wise Delta Rule

OUROBOROS is the depth-wise dual of time-wise recurrence:

Time (DeltaNet):     S_t     = A · S_{t-1}  + β · k · vᵀ
Depth (OUROBOROS):   X_{l+1} = A · X_l      + β · k · vᵀ

🔬 Advanced Topics

Invertibility

When β ≠ 1, the Delta Operator is invertible:

A⁻¹ = I + (β / (1−β)) · k · kᵀ

At β = 2: A = A⁻¹ (orthogonal involution).

📝 Citation

@software{ouroboros2025,
  title   = {OUROBOROS: Optimal Unified Residual Operations with 
             Bounded Orthogonal Reflection and Spectral Control},
  year    = {2025},
  url     = {https://github.com/DivyamTalwar/OUROBOROS}
}

🤝 Contributing

Fork the repository
Create feature branch: git checkout -b feature/amazing
Commit changes: git commit -m 'Add feature'
Push: git push origin feature/amazing
Open a Pull Request

📜 License

Creative Commons Attribution 4.0 International (CC-BY-4.0)

🐍 OUROBOROS

The ancient serpent eating its own tail — a symbol of cyclical transformation.

Features are consumed, transformed, and reborn through each layer.

Built with 💜 for the ML community

⬆️ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
model		model
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🐍 OUROBOROS

Optimal Unified Residual Operations with Bounded Orthogonal Reflection and Spectral Control

🌀 Overview

The Problem

Our Solution

🏗️ Architecture

The Delta Operator

A(X) = I − β(X) · k(X) · k(X)ᵀ

⚡ The Magic of β — One Scalar, Three Transformations

🔄 Geometric Visualization

🌊 Data Flow

🏛️ Full Model Architecture

🧬 The Ouroboros Residual Block

Core Update Rule — The Delta Rule

📐 Mathematical Foundations

Spectral Decomposition Theorem

Why Standard ResNets Are Limited

📊 Feature Coupling

🚀 Installation

Requirements

⚡ Quick Start

Basic Usage

Training Loop

🔧 Configuration Reference

Model Parameters

Ouroboros Parameters

📁 Project Structure

💡 Key Insights

Why OUROBOROS Works

Depth-Wise Delta Rule

🔬 Advanced Topics

Invertibility

📝 Citation

🤝 Contributing

📜 License

🐍 OUROBOROS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`A(X) = I − β(X) · k(X) · k(X)ᵀ`

Packages