Solving Context Collapse in LLMs for 1,000+ Turn Conversations
A submission for the NeuroHack Hackathon (February 2026).
Current AI agents hit a glass ceiling in long-term interactions. They suffer from "Context Collapse"—either forgetting early details (amnesia) or crashing as their context windows overflow.
Standard, naive RAG (Retrieval-Augmented Generation) attempts to fix this by dumping raw text chunks into a database. However, retrieving the "top 5 chunks" from 1,000 turns of history leads to overlapping facts, vector crowding, contradictions, and hallucinations.
We didn't just build an API wrapper; we solved a fundamental computer science scaling problem. This project introduces a Decoupled 4-Chamber Memory Routing system. By atomizing data into granular JSON facts before storage and routing them to specialized vector collections, we separate "thinking" from "remembering."
- semantic_facts: Immutable truths and core user identity (e.g., Job Title, Skills).
- episodic_events: Temporal memory and past project events (e.g., "Troubleshot WSL disk space yesterday").
- preferences: Nuanced stylistic likes/dislikes.
- recent/working: Short-term conversational cache.
The Context Firewall: The system enforces a strict 3,500-character retrieval limit. This mathematically guarantees the LLM will never crash from token overflow, regardless of how many thousands of turns have passed.
- Compute Engine: Python, FastAPI (Async), Groq LPU™ Inference (Llama-3.1-8b-instant)
-
Vector Engine: Qdrant (
$O(\log N)$ HNSW indexing) - Embeddings: sentence-transformers (MiniLM-L6-v2)
This repository contains the production-ready code. Follow these steps to spin up the FastAPI server and evaluate the agent's long-term recall and latency.
Clone the repository and install the required dependencies.
git clone [https://github.com/isha822/neurohack_test.git](https://github.com/isha822/neurohack_test.git)
cd neurohack_test
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtecho "GROQ_API_KEY=your_groq_api_key_here" > .envpython3 -m app.maincurl -X POST [http://127.0.0.1:8000/chat](http://127.0.0.1:8000/chat) -H "Content-Type: application/json" -d '{"message": "Hi, I am an ML Engineer and my current project has a strict 500ms latency goal.", "session_id": "JUDGE_TEST"}'