Skip to content

isha822/neurohack_test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Multi-Level Long-Term Memory Architecture

Solving Context Collapse in LLMs for 1,000+ Turn Conversations

A submission for the NeuroHack Hackathon (February 2026).

🛑 The Problem: "Context Collapse"

Current AI agents hit a glass ceiling in long-term interactions. They suffer from "Context Collapse"—either forgetting early details (amnesia) or crashing as their context windows overflow.

Standard, naive RAG (Retrieval-Augmented Generation) attempts to fix this by dumping raw text chunks into a database. However, retrieving the "top 5 chunks" from 1,000 turns of history leads to overlapping facts, vector crowding, contradictions, and hallucinations.

💡 The Solution: Decoupled 4-Chamber Memory

We didn't just build an API wrapper; we solved a fundamental computer science scaling problem. This project introduces a Decoupled 4-Chamber Memory Routing system. By atomizing data into granular JSON facts before storage and routing them to specialized vector collections, we separate "thinking" from "remembering."

🏛️ Core Architecture

  • semantic_facts: Immutable truths and core user identity (e.g., Job Title, Skills).
  • episodic_events: Temporal memory and past project events (e.g., "Troubleshot WSL disk space yesterday").
  • preferences: Nuanced stylistic likes/dislikes.
  • recent/working: Short-term conversational cache.

The Context Firewall: The system enforces a strict 3,500-character retrieval limit. This mathematically guarantees the LLM will never crash from token overflow, regardless of how many thousands of turns have passed.

🛠️ Tech Stack

  • Compute Engine: Python, FastAPI (Async), Groq LPU™ Inference (Llama-3.1-8b-instant)
  • Vector Engine: Qdrant ($O(\log N)$ HNSW indexing)
  • Embeddings: sentence-transformers (MiniLM-L6-v2)

🚀 Quick Start & Evaluation Guide (For Judges)

This repository contains the production-ready code. Follow these steps to spin up the FastAPI server and evaluate the agent's long-term recall and latency.

1. Environment Setup

Clone the repository and install the required dependencies.

git clone [https://github.com/isha822/neurohack_test.git](https://github.com/isha822/neurohack_test.git)
cd neurohack_test
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. API setup

echo "GROQ_API_KEY=your_groq_api_key_here" > .env

3. Starting the server

python3 -m app.main

4. Injecting memories

curl -X POST [http://127.0.0.1:8000/chat](http://127.0.0.1:8000/chat) -H "Content-Type: application/json" -d '{"message": "Hi, I am an ML Engineer and my current project has a strict 500ms latency goal.", "session_id": "JUDGE_TEST"}'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors