🧠 Multi-Level Long-Term Memory Architecture

Solving Context Collapse in LLMs for 1,000+ Turn Conversations

A submission for the NeuroHack Hackathon (February 2026).

🛑 The Problem: "Context Collapse"

Current AI agents hit a glass ceiling in long-term interactions. They suffer from "Context Collapse"—either forgetting early details (amnesia) or crashing as their context windows overflow.

Standard, naive RAG (Retrieval-Augmented Generation) attempts to fix this by dumping raw text chunks into a database. However, retrieving the "top 5 chunks" from 1,000 turns of history leads to overlapping facts, vector crowding, contradictions, and hallucinations.

💡 The Solution: Decoupled 4-Chamber Memory

We didn't just build an API wrapper; we solved a fundamental computer science scaling problem. This project introduces a Decoupled 4-Chamber Memory Routing system. By atomizing data into granular JSON facts before storage and routing them to specialized vector collections, we separate "thinking" from "remembering."

🏛️ Core Architecture

semantic_facts: Immutable truths and core user identity (e.g., Job Title, Skills).
episodic_events: Temporal memory and past project events (e.g., "Troubleshot WSL disk space yesterday").
preferences: Nuanced stylistic likes/dislikes.
recent/working: Short-term conversational cache.

The Context Firewall: The system enforces a strict 3,500-character retrieval limit. This mathematically guarantees the LLM will never crash from token overflow, regardless of how many thousands of turns have passed.

🛠️ Tech Stack

Compute Engine: Python, FastAPI (Async), Groq LPU™ Inference (Llama-3.1-8b-instant)
Vector Engine: Qdrant ($O(\log N)$ HNSW indexing)
Embeddings: sentence-transformers (MiniLM-L6-v2)

🚀 Quick Start & Evaluation Guide (For Judges)

This repository contains the production-ready code. Follow these steps to spin up the FastAPI server and evaluate the agent's long-term recall and latency.

1. Environment Setup

Clone the repository and install the required dependencies.

git clone [https://github.com/isha822/neurohack_test.git](https://github.com/isha822/neurohack_test.git)
cd neurohack_test
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. API setup

echo "GROQ_API_KEY=your_groq_api_key_here" > .env

3. Starting the server

python3 -m app.main

4. Injecting memories

curl -X POST [http://127.0.0.1:8000/chat](http://127.0.0.1:8000/chat) -H "Content-Type: application/json" -d '{"message": "Hi, I am an ML Engineer and my current project has a strict 500ms latency goal.", "session_id": "JUDGE_TEST"}'

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
data		data
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
memorychat-complete.html		memorychat-complete.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Multi-Level Long-Term Memory Architecture

🛑 The Problem: "Context Collapse"

💡 The Solution: Decoupled 4-Chamber Memory

🏛️ Core Architecture

🛠️ Tech Stack

🚀 Quick Start & Evaluation Guide (For Judges)

1. Environment Setup

2. API setup

3. Starting the server

4. Injecting memories

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Multi-Level Long-Term Memory Architecture

🛑 The Problem: "Context Collapse"

💡 The Solution: Decoupled 4-Chamber Memory

🏛️ Core Architecture

🛠️ Tech Stack

🚀 Quick Start & Evaluation Guide (For Judges)

1. Environment Setup

2. API setup

3. Starting the server

4. Injecting memories

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages