A local Python script to scrape Gmail email metadata to aid in deleting emails from recent top senders.
Note: This does NOT reduce image and video backup. For that, see here.
This project is vibe-coded and tested on my own email.
- Scrapes email metadata: sender, title (subject), size (including attachments), read status
- Runs locally, but requires setting up Google Cloud with gmail access and an Oauth Consent test screen with your email as a test user.
- OAuth 2.0 authentication with token caching
- Exports data to JSON for analysis
- Allows deletion of all emails from the top email senders.
# On Windows (PowerShell)
irm https://astral.sh/uv/install.ps1 | iex
# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | shFrom the gmail_maintenance directory:
uv syncThis will create a virtual environment and install all dependencies defined in pyproject.toml.
- Go to Google Cloud Console
- Create a new project (or select an existing one)
- Enable the Gmail API:
- Navigate to "APIs & Services" > "Library"
- Search for "Gmail API"
- Click "Enable"
-
Go to "APIs & Services" > "Credentials"
-
Click "Create Credentials" > "OAuth 2.0 Client ID"
-
If prompted, configure the OAuth consent screen:
- Choose "External" (unless you have a Google Workspace)
- Fill in required fields (App name, User support email, Developer contact)
- Save and continue through the scopes (default is fine)
- On the "Test users" step:
- Click "+ ADD USERS"
- Enter your Gmail address (the one you'll use to authenticate)
- Click "ADD"
- Click "SAVE AND CONTINUE"
- Click "BACK TO DASHBOARD"
Note: If you've already created the OAuth consent screen, you can add test users later:
- Go to "APIs & Services" > "OAuth consent screen"
- Scroll down to the "Test users" section
- Click "+ ADD USERS"
- Enter your Gmail address and click "ADD"
-
Create OAuth 2.0 Client ID:
- Application type: Desktop app
- Name: "Gmail Maintenance Tool" (or any name)
- Click "Create"
-
Download the JSON file
-
Rename it to
credentials.jsonand place it in thegmail_maintenancedirectory
-
Run the script using uv:
uv run gmail_maintenance.py
Or activate the virtual environment and run directly:
# Activate the virtual environment (created by uv sync) .venv\Scripts\activate # Windows # or source .venv/bin/activate # macOS/Linux python gmail_maintenance.py
-
A browser window will open asking you to sign in and authorize the app
-
After authorization, a
token.picklefile will be created (saved for future runs)
The first time you run the script, it will download your 500 most recent emails from gmail and summarize the top 20 senders by storage space used within those emails.
On future runs, you will be asked if you want to repeat this step. If you choose n, it reloads the results of the previous run and goes immediately to the steps that follow.
If you type all after the summary, it will download header information for ALL emails from those top twenty senders to give you a better idea how much space you will save if you delete them.
If you type delete followed by the rank-order numbers from the list, it will preprae to delete all the emails from those senders whose numbers you provided. But it will again provide a summary of how many emails will be deleted and then ask you to type DELETE and then YES in all-caps before deleting the emails.
Emails are not completely deleted -- they are moved to trash. To free up the storage space, you will need to go to gmail.com, go to Trash, select everything in the trash (be sure to click the link that specifies the full count of the items in the trash) and click the "DELETE FOREVER" link.
This should help to free up gmail storage.
You can use Gmail search syntax in the query parameter:
is:unread- Unread emails onlyis:read- Read emails onlyfrom:example@gmail.com- Emails from specific senderolder_than:1y- Emails older than 1 yearhas:attachment- Emails with attachmentslarger:10M- Emails larger than 10MB- Combine queries:
is:unread older_than:6m
The script generates:
gmail_data.json- Complete email metadata in JSON format- Console summary with statistics
Each email entry contains:
id: Gmail message IDsender: Email sender addresssubject: Email subject linedate: Email datesize_bytes: Total size in bytes (including attachments)size_mb: Total size in megabytesis_read: Boolean indicating read statusthread_id: Gmail thread IDsnippet: Email preview snippet
- Read-only access: This script only reads email data. No deletion functionality is included yet.
- Rate limits: Gmail API has rate limits. The script processes emails in batches to avoid hitting limits.
- Token storage: The
token.picklefile stores your authentication token. Keep it secure and don't share it. - First run: The first run will open a browser for authentication. Subsequent runs use the saved token.
- Make sure you've downloaded the OAuth credentials from Google Cloud Console
- Rename the file to exactly
credentials.json - Place it in the same directory as
gmail_maintenance.py
- Make sure you've added your email as a test user in the OAuth consent screen
- Check that Gmail API is enabled in your Google Cloud project
- Gmail API has daily quotas. If you hit limits, wait 24 hours or request a quota increase in Google Cloud Console
After scraping email data:
- Analyze
gmail_data.jsonto identify emails for deletion - Delete functionality will be added in a future update
- Use the data to create filters and rules for bulk operations