This project extracts a Reddit user's recent posts and comments to generate a detailed user persona using OpenAI's GPT model. It includes personality traits, interests, writing style, and even political or social leanings — all with citations from actual Reddit activity.
✅ Submission for the AI/LLM Engineer Internship at BeyondChats
praw: Python Reddit API Wrapperopenai: GPT-based LLMtqdm: Progress barpython-dotenv: For environment variable loading (optional)
- 🔍 Scrapes a Reddit user's posts & comments
- 🧠 Uses OpenAI's GPT to analyze behavior and generate persona
- 🧾 Citations for each insight using actual Reddit post links
- 💾 Saves each persona to a
.txtfile in theoutput/folder - ✅ Clean, readable, PEP-8 compliant code
git clone https://github.com/yourusername/reddit-persona-generator.git
cd reddit-persona-generatorpip install -r requirements.txtGo to https://www.reddit.com/prefs/apps and create a script-type app. Then, create a config.py file:
config.py
REDDIT_CLIENT_ID = 'your_client_id'
REDDIT_SECRET = 'your_client_secret'
REDDIT_USER_AGENT = 'user-persona-generator'
You can set it as an environment variable:
# On Windows
$env:OPENAI_API_KEY="your_openai_key"
# On macOS/Linux
export OPENAI_API_KEY=your_openai_keyRun the script:
python reddit_persona_generator.py
Example:
Enter Reddit profile URL: https://www.reddit.com/user/kojied/
This will:
- Fetch up to 100 posts and 100 comments from that user
- Analyze the text using GPT
- Save a structured persona with citations to
output/persona_kojied.txt
Sample output files are included:
output/persona_kojied.txtoutput/persona_Hungry-Move-6603.txt
Each contains:
- Age range
- Interests
- Personality
- Active subreddits
- Post timing
- Citations from posts/comments
This project is developed solely for the BeyondChats internship assignment. It is not intended for any commercial use. Your team is welcome to evaluate the code but please do not reuse it unless I am selected.