Movie Reasoning Agent: Advanced Agentic RAG Implementation (Option C)

This repository contains a Agentic Retrieval-Augmented Generation (RAG) system designed for deep reasoning over a hybrid movie corpus. Unlike standard linear RAG pipelines, this system utilizes a custom ReAct (Reasoning + Acting) loop to dynamically orchestrate between structured SQL databases, unstructured BM25 indices, and real-time Web Search.

🎥 1. Task E: Demo Video (ReAct Reasoning Trace)

Tip

View the full high-resolution demonstration on YouTube: Watch Demo Video

Demonstration Workflow:

Input: The user provides a natural language query.
Reasoning: The agent performs an internal ReAct cycle ([STRATEGIC BREAKDOWN] → [PLAN] → [THOUGHT]) to determine the best tool for the task.
Execution: The agent dynamically calls SQL, BM25 Search, or Web Search based on its plan.
Synthesis: Data from multiple sources is synthesized into a single, grounded response.

The demo showcases the following agentic behaviors:

Internal Commands / Tools	Query Inputted
`query_data` (SQL)	"List the top 3 highest-grossing movies in the dataset and their respective release years."
`query_data` + `search_docs`	"What was the budget for 'Avengers: Endgame', and based on the reviews, what was the most praised aspect of its plot or themes?"
System Refusal (Safety Gate)	"Can you provide a detailed financial investment plan for the tech sector in 2026?"

📂 2. Project Structure

INTERNSHIP-SELECTION-ASSIGNMENT/
├── agent/                      # Core Agent Intelligence
│   ├── agent_loop.py           # Main ReAct loop & state management
│   ├── agent_utils.py          # Deduplication & cleaning utilities
│   ├── bonus_features.py       # Self-Reflection & Telemetry logic
│   ├── prompts.py              # System prompts & reasoning protocols
│   └── tools_config.py         # Tool schemas & mapping
├── data/                       # Processed Data Assets
│   ├── database.db             # Generated SQLite database
│   └── ingest_db.py            # SQLite ingestion utility
├── dataset/                    # Raw Source Data
│   ├── movies_structured.csv   # Primary movie dataset
│   ├── rotten_tomatoes_movies.csv # Extended reviews metadata
│   ├── top1000movies.csv       # Box office rankings
│   └── unstructured_reviews/   # .txt review corpus (15 movies)
├── evaluation/                 # Performance Auditing
│   ├── logs/                   # (Folder) Granular reasoning traces
│   ├── degradation_comparison.json # Baseline vs. Degraded performance data
│   └── task_D_results.json     # Final results of 20-question suite
├── tools/                      # Atomic Tool Layer
│   ├── query_data.py           # SQL/Pandas implementation
│   ├── search_docs.py          # BM25 document search
│   └── web_search.py           # Tavily Web search
├── utils/                      # Shared Infrastructure
│   └── logger.py               # TraceLogger for terminal output
├── Degradation_Audit_Report.md # Stress-test analysis (Bonus D)
├── DESIGN.md                   # Architectural deep-dive
├── EVALUATION.md               # 20-question suite forensic report
├── tool_cost_analysis.md       # Per-tool latency & cost breakdown
├── preprocess.py               # Main data cleaning & processing script
├── setup_project.py            # Automated environment setup
├── task_D_20eval_test.py       # Automated 20-question test runner
├── degradation_runner.py       # Bonus D evaluation runner
├── task_A_test.py              # Tool-level unit tests
├── task_B_test.py              # Agent-level reasoning tests
├── task_C_test.py              # Cross-domain multi-tool tests
├── demo.gif                    # Animated ReAct reasoning trace
├── .gitignore                  # Git exclusion rules
└── requirements.txt            # System dependencies

📊 3. Hybrid Knowledge Base

The agent operates over a multi-tiered data environment to ensure high-fidelity grounding across structured, unstructured, and real-time domains.

Source	Content	Size
SQLite DB	Structured movie metadata (Year, Genre, Budget, Gross, RT Score)	663 rows × 7 columns
BM25 Index	Unstructured qualitative film reviews and thematic critiques	15 .txt files
Web Search	Real-time external data (News, Awards, recent credits)	Dynamic (Tavily API)

📚 4. Documentation Index

For a deep-dive into specific project domains, please refer to the following reports:

Architecture Design (DESIGN.md): Details on the ReAct loop, tool schemas, and safety engineering.
Evaluation Report (EVALUATION.md): Forensic trace analysis of the 20-question suite and accuracy metrics.
Cost & Telemetry Analysis (tool_cost_analysis.md): Detailed breakdown of token consumption, API spend, and fiscal projections.
Degradation Audit (Degradation_Audit_Report.md): Stress-test results showing system resilience under 50% data loss.

🏗️ 5. Technical Architecture

The Agent Loop

The core engine is a Python loop (agent/agent_loop.py) that manages state, history, and tool orchestration for the agent.

No Black-Box Wrappers: Built from scratch without initialize_agent or high-level frameworks to ensure total transparency and control.
State Optimization: Employs a Budgets & Constraints protocol, enforcing a hard 8-step cap to prevent infinite recursion.

Tool Contracts

Tool Name	Engine	Purpose	Output Fidelity
`query_data`	SQLite / Pandas	Precise numerical lookups, aggregations, and filtered searches.	Markdown Tables
`search_docs`	Rank-BM25	Qualitative analysis and thematic extraction from film reviews.	Contextual Snippets
`web_search`	Tavily API	Real-time news, awards, and director updates.	URL-Cited Snippets

Retriever Implementation Details

search_docs (Rank-BM25): Operates over a pre-indexed corpus of .txt reviews. It tokenizes queries and documents by removing stop-words/punctuation, then utilizes a probabilistic relevance score (BM25) to identify keyword-dense snippets. It features a hard entity-filter to ensure results only belong to the queried film.
query_data (SQL Engine): Built on a local sqlite3 instance managed via pandas. It translates natural language intents into read-only SQL statements for precise numerical grounding (gross, budget, ratings). To ensure context safety, it enforces a 10-row result cap and returns data in a structured Markdown table.
web_search (Tavily): A real-time external fallback utilizing the Tavily search protocol optimized for LLM RAG. It performs a live broad-web sweep for out-of-corpus data (awards, recent news, cast updates). Results are injected as citation-ready snippets, ensuring the agent remains accurate for current events.

Design Principles

Internal Structured Data: SQLite for financial metrics and metadata.
Internal Unstructured Data: BM25-based semantic retrieval for thematic critiques.
Web Fallback: Real-time retrieval via Tavily for fringe facts or recent updates.

System Architecture

flowchart TB

%% -------- ENTRY --------
A[User Query] --> B{Cache Hit?}
B -->|Yes| C[Return Cached Trace]
B -->|No| D[Init context_state]

%% -------- CONTEXT LAYER --------
subgraph CONTEXT [Context Consolidation]
    direction TB
    E1[Merge Knowledge Base]
    E2[Inject Failure Memory]
    E3[Track Used Queries]
    E4[Apply Step Budget]
end

D --> E1 --> E2 --> E3 --> E4

%% -------- REASONING --------
subgraph REASONING [Strategic Reasoning]
    direction TB
    F1[Task Decomposition]
    F2[Tool Planning]
    F3[Thought Generation]
end

E4 --> F1 --> F2 --> F3
F3 --> G{Decision}

%% -------- TOOL LAYER --------
subgraph TOOLS [Tool Execution]
    direction TB
    H[Deduplication Check]
    J[Execute Tool]
    J1[SQL]
    J2[BM25]
    J3[Web]
    K[Process Results]
end

G -->|Tool Call| H
H -->|Valid| J
H -->|Redundant| I[Block + Feedback]

J --> J1 --> K
J --> J2 --> K
J --> J3 --> K

%% -------- STATE UPDATE --------
I --> L[Update Context]
K --> L
L --> M[Telemetry + Memory]

%% -------- LOOP CONTROL --------
M --> N{Step < 8?}
N -->|Yes| E1
N -->|No| O[Force Refusal]

%% -------- REFLECTION --------
subgraph REFLECTION [Self-Critique]
    direction TB
    P[Reflection Check]
    P1[Grounding]
    P2[Completeness]
end

G -->|Final Answer| P
P --> P1 --> P2 --> Q{Pass?}

Q -->|Yes| R[Finalize Answer]
Q -->|No| S[Emergency Retrieval]

S --> E1

%% -------- OUTPUT --------
R --> T[Log Metrics]
T --> U[Cache Trace]
U --> V[Return Response]

%% -------- COLORS --------
style CONTEXT fill:#e3f2fd,stroke:#64b5f6,stroke-width:2px
style REASONING fill:#e8f5e9,stroke:#66bb6a,stroke-width:2px
style TOOLS fill:#fff3e0,stroke:#ffa726,stroke-width:2px
style REFLECTION fill:#fce4ec,stroke:#ec407a,stroke-width:2px

style O fill:#ffebee,stroke:#e53935
style C fill:#e0f7fa,stroke:#26c6da

For a deep-dive into the agent's internal mechanics, tool schemas, and safety engineering, see the Architecture Design Document (DESIGN.md).

🚀 6. Project Roadmap & Milestones

This implementation satisfies all core tasks (A-D) and includes several advanced performance bonuses.

Core Implementation (Tasks A-D)

Milestone	Implementation Files	Key Accomplishment
Task A: Tool Layer	`tools/`, `task_A_test.py`	Implemented precise SQL, BM25, and Web Search tools.
Task B: Agent Logic	`agent/agent_loop.py`, `task_B_test.py`	Developed a custom ReAct loop with strategic reasoning.
Task C: Multi-Tool	`task_C_test.py`	Verified cross-domain reasoning and tool orchestration.
Task D: Evaluation	`task_D_20eval_test.py`, EVALUATION.md	Executed a 20-question suite with 100% grounding rate.

Advanced Bonuses (A-D)

Bonus	Files & Logic	Purpose
A: Strategic Reasoning	`agent/prompts.py`	Mandatory `[STRATEGIC BREAKDOWN]` before any action.
B: Operational Telemetry	`agent/bonus_features.py`, tool_cost_analysis.md	Cost analysis in USD & INR.
C: Reflection & Recovery	`agent/bonus_features.py`	Self-critique turn that audits and fixes final answers.
D: Degradation Audit	`degradation_runner.py`, Degradation_Audit_Report.md	Testing model robustness and accuracy retention under 50% data loss.

Novelty Features

Proactive Safety Gating: Programmatic pre-processor for injection/jailbreak detection.
Keyword Deduplication: Intelligence layer to prevent redundant, costly tool calls.
Persistent Trace Caching: Integrated JSON cache for $0.00 cost replay of common queries.

📊 7. Performance & Evaluation Summary

The system was evaluated against a rigorous 20-question suite covering Single-Tool, Multi-Tool, Refusal, and Edge-Case categories.

Metric	Result	Insight
Overall Accuracy	75%	Exceptionally high for multi-step reasoning.
Grounding Rate	100%	Zero hallucinations; all facts are cited.
Failure Mode Resilience	Excellent	Agent gracefully falls back to Web when Local Data is missing.
Avg. Query Cost	$0.012 (₹1.01)	Highly efficient tiered data escalation.

Full 20-question traces and forensic failure analysis can be found in EVALUATION.md.

💻 8. Developer Guide (Setup & Usage)

One-Step Installation

The project includes an automated setup pipeline that handles virtual environments and data preprocessing:

python setup_project.py

Running the Agent

Interactive REPL: python agent/agent_loop.py
Single Question: python agent/agent_loop.py "Compare Avatar and Inception worldwide gross."
Run Evaluation: python task_D_20eval_test.py

Global Configuration (.env)

GITHUB_TOKEN: Generate a token to access GitHub Models.
- Visit the GPT-4o-mini Playground.
- Click "Get started" or "Get SDK token" to be redirected to the Personal Access Tokens page.
- Create a new token and paste it into your .env file.
TAVILY_API_KEY: Get a free key from Tavily AI for web search capabilities.

GITHUB_TOKEN=your_token_here
TAVILY_API_KEY=your_key_here

⚠️ 9. Honest Assessment: Failure Modes

As per the technical requirements, we have identified and documented the system's "Breaking Points":

Helpfulness Drift: In refusal cases (like recipes), the agent sometimes provides general trends before stating it cannot fulfill the request due to the base model's helpfulness bias.
Ambiguity Resolution: For titles like 'The Host', the agent requires clear versioning (2006 vs 2013) if not explicitly disambiguated in the query.
Budget Truncation: To protect the 8k context window, SQL results are capped at 10 rows.

🛡️ 10. Operational Constraints & Resilience

Hard Step Cap (8 Tool Calls)

The system enforces a strict 8-tool-call limit to ensure operational stability and prevent infinite reasoning loops. This "hard cap" is a core architectural safeguard; if the agent cannot resolve a query within this budget, it is programmed to deliver a Structured Refusal rather than providing an ungrounded guess.

Stress-Test Verification

The system's resilience can be verified using highly complex queries that exceed the reasoning budget.

Example Edge Case:

python agent/agent_loop.py "List the top 12 highest-grossing movies of all time. For each one, tell me the name of the lead actor and search the web to find out if that actor won an Oscar for that specific role."

Expected Behavior: The agent will utilize its full 8-step budget and then trigger a STRICT REFUSAL: BUDGET EXCEEDED terminal box.

📝 11. AI Development Disclosure

This project was pair-programmed with Antigravity, an experimental AI coding agent. Together, we designed the modular tool architecture, implemented the ReAct loop from first principles, and developed the forensic audit runners to ensure reproducible excellence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Reasoning Agent: Advanced Agentic RAG Implementation (Option C)

🎥 1. Task E: Demo Video (ReAct Reasoning Trace)

📂 2. Project Structure

📊 3. Hybrid Knowledge Base

📚 4. Documentation Index

🏗️ 5. Technical Architecture

The Agent Loop

Tool Contracts

Retriever Implementation Details

Design Principles

System Architecture

🚀 6. Project Roadmap & Milestones

Core Implementation (Tasks A-D)

Advanced Bonuses (A-D)

Novelty Features

📊 7. Performance & Evaluation Summary

💻 8. Developer Guide (Setup & Usage)

One-Step Installation

Running the Agent

Global Configuration (.env)

⚠️ 9. Honest Assessment: Failure Modes

🛡️ 10. Operational Constraints & Resilience

Hard Step Cap (8 Tool Calls)

Stress-Test Verification

📝 11. AI Development Disclosure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
agent		agent
data		data
dataset		dataset
evaluation		evaluation
tools		tools
utils		utils
.env.example		.env.example
.gitignore		.gitignore
DEMO Video.mp4		DEMO Video.mp4
DESIGN.md		DESIGN.md
Degradation_Audit_Report.md		Degradation_Audit_Report.md
EVALUATION.md		EVALUATION.md
LICENSE		LICENSE
README.md		README.md
degradation_runner.py		degradation_runner.py
demo.gif		demo.gif
preprocess.py		preprocess.py
requirements.txt		requirements.txt
setup_project.py		setup_project.py
task_A_test.py		task_A_test.py
task_B_test.py		task_B_test.py
task_C_test.py		task_C_test.py
task_D_20eval_test.py		task_D_20eval_test.py
tool_cost_analysis.md		tool_cost_analysis.md

Folders and files

Latest commit

History

Repository files navigation

Movie Reasoning Agent: Advanced Agentic RAG Implementation (Option C)

🎥 1. Task E: Demo Video (ReAct Reasoning Trace)

📂 2. Project Structure

📊 3. Hybrid Knowledge Base

📚 4. Documentation Index

🏗️ 5. Technical Architecture

The Agent Loop

Tool Contracts

Retriever Implementation Details

Design Principles

System Architecture

🚀 6. Project Roadmap & Milestones

Core Implementation (Tasks A-D)

Advanced Bonuses (A-D)

Novelty Features

📊 7. Performance & Evaluation Summary

💻 8. Developer Guide (Setup & Usage)

One-Step Installation

Running the Agent

Global Configuration (.env)

⚠️ 9. Honest Assessment: Failure Modes

🛡️ 10. Operational Constraints & Resilience

Hard Step Cap (8 Tool Calls)

Stress-Test Verification

📝 11. AI Development Disclosure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages