Skip to content

DivyamTalwar/OUROBOROS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OUROBOROS Banner

🐍 OUROBOROS

Optimal Unified Residual Operations with Bounded Orthogonal Reflection and Spectral Control

Teaching Neural Networks the Art of Forgetting

Python 3.8+ PyTorch License: CC BY 4.0 HuggingFace

OverviewArchitectureInstallationQuick StartTheoryInsights


"The serpent that devours itself to be reborn — features consumed, transformed, and emerged anew."


🌀 Overview

The Problem

Standard residual networks can only add information. They lack the ability to erase, forget, or reflect — leading to residual accumulation where noisy features persist indefinitely.

Our Solution

A geometric residual connection that learns:

  • 🧠 When to remember — preserve critical features
  • 🗑️ When to forget — erase noise and outdated info
  • 🔄 When to transform — flip representations
┌─────────────────────────┐
│   Traditional ResNet    │
│                         │
│  X_{l+1} = X_l + F(X)   │
│                         │
│    ❌ Can only ADD      │
└─────────────────────────┘

┌─────────────────────────┐
│      OUROBOROS          │
│                         │
│  X_{l+1} = A·X + β·k·vᵀ │
│                         │
│ ✅ ADD, ERASE, REFLECT  │
└─────────────────────────┘

OUROBOROS enables neural networks to:

Capability Description
Selective Forgetting Surgically erase outdated or noisy information
🔄 Feature Reflection Model oscillatory and oppositional dynamics
🎯 Spectral Control Shape layer-wise transitions with precision
Gradient Stability Maintain gradient flow with gated identity

🏗️ Architecture

The Delta Operator

At the heart of OUROBOROS lies the Delta Operator — a generalized Householder transformation:

A(X) = I − β(X) · k(X) · k(X)ᵀ

Symbol Name Description Range
k(X) Reflection Direction Unit vector defining transformation axis ‖k‖ = 1
β(X) Scalar Gate Controls transformation intensity [0, 2]
v(X) Value Vector New information to inject ℝᵈᵛ

⚡ The Magic of β — One Scalar, Three Transformations

Beta Spectrum

A single learnable scalar dynamically interpolates between three geometric operations:

β Value Transformation Eigenvalue Effect
β → 0 Identity λ = 1 Pass through unchanged
β → 1 Projection λ = 0 Erase component along k
β → 2 Reflection λ = -1 Flip direction along k

🔄 Geometric Visualization

Geometric Transformations

Vector v transformed by Delta Operator. P(v) = projection (β=1), R(v) = reflection (β=2). Vector k = hyperplane normal.


🌊 Data Flow

Data Flow

The input X splits into three learnable branches that compute k, β, and v, which combine through the Delta operation with a skip connection.


🏛️ Full Model Architecture

Model Architecture

Each OuroborosBlock contains:

  • RMSNormAttentionOuroboros Residual
  • RMSNormMLPOuroboros Residual

🧬 The Ouroboros Residual Block

Ouroboros Block Architecture

Core Update Rule — The Delta Rule

X_{l+1} = X_l + β · k · (vᵀ − kᵀ · X_l)
                        ↑         ↑
                     TARGET    CURRENT
               (what to write) (what exists)

This unifies three operations with a single gate:

Operation Formula Effect
Erasure −β · k · (kᵀ · X) Removes component along k
Writing +β · k · vᵀ Injects new information
Sync Same β Both scale together

📐 Mathematical Foundations

Spectral Decomposition Theorem

Theorem: For A = I − β·k·kᵀ where ‖k‖ = 1:

σ(A) = { 1, 1, ..., 1, (1−β) }
         └────┬────┘
         (d−1) times
Property Formula Notes
Eigenvalue along k λ_k = 1 − β Controlled by gate
Eigenvalues in k⊥ λ = 1 Multiplicity: d−1
Determinant det(A) = 1 − β Zero at β=1
Orthogonality AᵀA = I When β ∈ {0, 2}
Involution A² = I When β = 2

Why Standard ResNets Are Limited

Property ResNet OUROBOROS
Eigenvalues ≈ 1 + ε ∈ [-1, 1]
Negative λ ❌ No ✅ Yes
Singular ❌ No ✅ Yes (β=1)
Data-dependent ❌ Fixed ✅ Learnable

📊 Feature Coupling

Feature Coupling

Key Insight: The geometric coherence term k_i · k_j enables learned feature interactions without explicit cross-attention.


🚀 Installation

# Clone the repository
git clone https://github.com/DivyamTalwar/OUROBOROS.git
cd ouroboros

# Install dependencies
pip install torch>=2.0 transformers einops

# Install in development mode
pip install -e .

Requirements

Package Version Purpose
Python ≥ 3.8 Runtime
PyTorch ≥ 2.0 Deep learning
Transformers ≥ 4.30 HuggingFace
einops ≥ 0.6 Tensor ops

⚡ Quick Start

Basic Usage

from model.ouroboros import OuroborosModel, OuroborosConfig
import torch

# Configure the model
config = OuroborosConfig(
    vocab_size=50304,
    hidden_size=768,
    num_hidden_layers=12,
    num_attention_heads=6,
    head_dim=128,
)

# Initialize model
model = OuroborosModel(config)
print(f"Parameters: {model.get_num_params():,}")

# Forward pass
input_ids = torch.randint(0, 50304, (2, 512))
labels = input_ids.clone()

logits, loss = model(input_ids, targets=labels)
print(f"Loss: {loss.item():.4f}")

Training Loop

from torch.optim import AdamW

model = OuroborosModel(config).cuda()
optimizer = AdamW(model.parameters(), lr=3e-4, weight_decay=0.1)

for batch in dataloader:
    input_ids, labels = batch['input_ids'].cuda(), batch['labels'].cuda()
    
    logits, loss = model(input_ids, targets=labels)
    
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    
    print(f"Loss: {loss.item():.4f}")

🔧 Configuration Reference

Model Parameters

Parameter Type Default Description
vocab_size int 50304 Vocabulary size
hidden_size int 768 Model dimension
num_hidden_layers int 12 Number of blocks
num_attention_heads int 6 Attention heads
head_dim int 128 Dimension per head
block_size int 1024 Max sequence length

Ouroboros Parameters

Parameter Type Default Description
ouroboros_k_eps float 1e-6 k normalization ε
ouroboros_beta_init float 1.0 Initial β [0, 2]
ouroboros_v_sigmoid bool True Sigmoid on v
ouroboros_v_sigmoid_scale float 4.0 v scale factor

📁 Project Structure

OUROBOROS/
├── 📄 README.md              # Documentation
├──  assets/                # Images
│   ├── banner.png
│   ├── architecture.png
│   ├── beta_spectrum.png
│   ├── geometric_transform.png
│   ├── dataflow.png
│   ├── model_architecture.png
│   └── feature_coupling.png
└── 📁 model/
    └── ouroboros.py          # Core implementation

💡 Key Insights

Why OUROBOROS Works

Challenge ResNet OUROBOROS
Noisy features accumulate ❌ Can only add ✅ Can erase
Oscillatory patterns ❌ No negative λ ✅ λ ∈ [-1, 1]
Feature interference ❌ No filter ✅ Projection
Gradient stability ✅ Identity ✅ Gated identity

Depth-Wise Delta Rule

OUROBOROS is the depth-wise dual of time-wise recurrence:

Time (DeltaNet):     S_t     = A · S_{t-1}  + β · k · vᵀ
Depth (OUROBOROS):   X_{l+1} = A · X_l      + β · k · vᵀ

🔬 Advanced Topics

Invertibility

When β ≠ 1, the Delta Operator is invertible:

A⁻¹ = I + (β / (1−β)) · k · kᵀ

At β = 2: A = A⁻¹ (orthogonal involution).


📝 Citation

@software{ouroboros2025,
  title   = {OUROBOROS: Optimal Unified Residual Operations with 
             Bounded Orthogonal Reflection and Spectral Control},
  year    = {2025},
  url     = {https://github.com/DivyamTalwar/OUROBOROS}
}

🤝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing
  3. Commit changes: git commit -m 'Add feature'
  4. Push: git push origin feature/amazing
  5. Open a Pull Request

📜 License

Creative Commons Attribution 4.0 International (CC-BY-4.0)


🐍 OUROBOROS

The ancient serpent eating its own tail — a symbol of cyclical transformation.

Features are consumed, transformed, and reborn through each layer.


Built with 💜 for the ML community

⬆️ Back to Top

About

ResNet + learnable geometric transformation that lets the network forget/reflect features, not just add to them.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages