Skip to content

yoderj/gmail-maintainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gmail Maintenance Tool

A local Python script to scrape Gmail email metadata to aid in deleting emails from recent top senders.

Note: This does NOT reduce image and video backup. For that, see here.

This project is vibe-coded and tested on my own email.

Features

  • Scrapes email metadata: sender, title (subject), size (including attachments), read status
  • Runs locally, but requires setting up Google Cloud with gmail access and an Oauth Consent test screen with your email as a test user.
  • OAuth 2.0 authentication with token caching
  • Exports data to JSON for analysis
  • Allows deletion of all emails from the top email senders.

Setup Instructions

1. Install uv (if not already installed)

# On Windows (PowerShell)
irm https://astral.sh/uv/install.ps1 | iex

# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

2. Install Dependencies

From the gmail_maintenance directory:

uv sync

This will create a virtual environment and install all dependencies defined in pyproject.toml.

3. Create Google Cloud Project and Enable Gmail API

  1. Go to Google Cloud Console
  2. Create a new project (or select an existing one)
  3. Enable the Gmail API:
    • Navigate to "APIs & Services" > "Library"
    • Search for "Gmail API"
    • Click "Enable"

4. Create OAuth 2.0 Credentials

  1. Go to "APIs & Services" > "Credentials"

  2. Click "Create Credentials" > "OAuth 2.0 Client ID"

  3. If prompted, configure the OAuth consent screen:

    • Choose "External" (unless you have a Google Workspace)
    • Fill in required fields (App name, User support email, Developer contact)
    • Save and continue through the scopes (default is fine)
    • On the "Test users" step:
      • Click "+ ADD USERS"
      • Enter your Gmail address (the one you'll use to authenticate)
      • Click "ADD"
      • Click "SAVE AND CONTINUE"
    • Click "BACK TO DASHBOARD"

    Note: If you've already created the OAuth consent screen, you can add test users later:

    • Go to "APIs & Services" > "OAuth consent screen"
    • Scroll down to the "Test users" section
    • Click "+ ADD USERS"
    • Enter your Gmail address and click "ADD"
  4. Create OAuth 2.0 Client ID:

    • Application type: Desktop app
    • Name: "Gmail Maintenance Tool" (or any name)
    • Click "Create"
  5. Download the JSON file

  6. Rename it to credentials.json and place it in the gmail_maintenance directory

5. First Run

  1. Run the script using uv:

    uv run gmail_maintenance.py

    Or activate the virtual environment and run directly:

    # Activate the virtual environment (created by uv sync)
    .venv\Scripts\activate  # Windows
    # or
    source .venv/bin/activate  # macOS/Linux
    
    python gmail_maintenance.py
  2. A browser window will open asking you to sign in and authorize the app

  3. After authorization, a token.pickle file will be created (saved for future runs)

Usage

Basic Usage

The first time you run the script, it will download your 500 most recent emails from gmail and summarize the top 20 senders by storage space used within those emails.

On future runs, you will be asked if you want to repeat this step. If you choose n, it reloads the results of the previous run and goes immediately to the steps that follow.

If you type all after the summary, it will download header information for ALL emails from those top twenty senders to give you a better idea how much space you will save if you delete them.

If you type delete followed by the rank-order numbers from the list, it will preprae to delete all the emails from those senders whose numbers you provided. But it will again provide a summary of how many emails will be deleted and then ask you to type DELETE and then YES in all-caps before deleting the emails.

Emails are not completely deleted -- they are moved to trash. To free up the storage space, you will need to go to gmail.com, go to Trash, select everything in the trash (be sure to click the link that specifies the full count of the items in the trash) and click the "DELETE FOREVER" link.

This should help to free up gmail storage.

Gmail Search Queries

You can use Gmail search syntax in the query parameter:

  • is:unread - Unread emails only
  • is:read - Read emails only
  • from:example@gmail.com - Emails from specific sender
  • older_than:1y - Emails older than 1 year
  • has:attachment - Emails with attachments
  • larger:10M - Emails larger than 10MB
  • Combine queries: is:unread older_than:6m

Output

The script generates:

  • gmail_data.json - Complete email metadata in JSON format
  • Console summary with statistics

Data Structure

Each email entry contains:

  • id: Gmail message ID
  • sender: Email sender address
  • subject: Email subject line
  • date: Email date
  • size_bytes: Total size in bytes (including attachments)
  • size_mb: Total size in megabytes
  • is_read: Boolean indicating read status
  • thread_id: Gmail thread ID
  • snippet: Email preview snippet

Notes

  • Read-only access: This script only reads email data. No deletion functionality is included yet.
  • Rate limits: Gmail API has rate limits. The script processes emails in batches to avoid hitting limits.
  • Token storage: The token.pickle file stores your authentication token. Keep it secure and don't share it.
  • First run: The first run will open a browser for authentication. Subsequent runs use the saved token.

Troubleshooting

"credentials.json not found"

  • Make sure you've downloaded the OAuth credentials from Google Cloud Console
  • Rename the file to exactly credentials.json
  • Place it in the same directory as gmail_maintenance.py

"Access blocked" or OAuth errors

  • Make sure you've added your email as a test user in the OAuth consent screen
  • Check that Gmail API is enabled in your Google Cloud project

Rate limit errors

  • Gmail API has daily quotas. If you hit limits, wait 24 hours or request a quota increase in Google Cloud Console

Next Steps

After scraping email data:

  1. Analyze gmail_data.json to identify emails for deletion
  2. Delete functionality will be added in a future update
  3. Use the data to create filters and rules for bulk operations

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages