Skip to content

TeWu/GmailAttachmentsExtractor

Repository files navigation

Gmail Attachments Extractor 📤

Gmail is a great, free email service, with many useful features that extend the original vision of The Electronic Mail from 1960s. The one annoyingly missing feature is the ability to delete the attachment from an email, without deleting the email itself. This feature could be particularly useful in a situation where you run out of space in your inbox, but you have emails whose content is important, but attachments have already been downloaded to disk, and could be deleted to free some space.

Gmail Attachments Extractor is a simple tool, that allows you to extract the attachments from emails in your Gmail account.

How does it work?

Gmail Attachments Extractor processes a set of email messages. For each email that it process it does 3 things:

  1. It downloads all attachments from the email being processed.
  2. It inserts an email message to the inbox, that is a copy of the email being processed, but without attachments that it has just downloaded.
  3. It adds the Cleanup [pre] label to the email being processed, and the Cleanup [post] label to the copy that it has just created.

Gmail Attachments Extractor does not modify or delete (or move to Trash) any emails, so that result of its execution is fully (and easily) revertable. You run it, and then inspect the downloaded files and the emails with Cleanup [pre] and Cleanup [post] labels to verify that the result is as expected. If you are not happy with the result, and want to revert to the state from before running the extractor, it's as simple as deleting all the emails with Cleanup [post] label, and then deleting Cleanup [pre] and Cleanup [post] labels. If you are happy with the result, then you can free up some space in your inbox by deleting the original emails (with attachments) and only leave their copies (without attachments). To do this, delete emails with Cleanup [pre] label.

⚠️ WARNING: When deleting all emails with a particular label, make sure you're doing that with "Conversation view" turned off! When "Conversation view" is turned on, then clicking on a label will show all CONVERSATIONS which contain emails with that particular label. Those conversations may contain emails without the label you've clicked on, therefore deleting those conversations may lead to data loss. To turn off "Conversation view", click on a gear wheel ("Settings") button in a top-right corner of the gmail page, and in the "General" tab click on "Conversation view off", and then "Save Changes" button.

How to use it

Step 0

Download the newest release of Gmail Attachments Extractor: GmailAttachmentsExtractor_v1.0.3.zip file (sha1: decf3d0a2c51be2e3d1b644382d383865fc68b1c). Unpack the archive, so that you have GmailAttachmentsExtractor.jar file (sha1: 19b15db5d3a748947b2e7ebb3445d9d83182a33f).

Step 1

Now you need to generate credentials.json file, with Gmail API OAuth2 credentials. You can generate the file however you like, but if you are unsure how to do this, I've made a visual guide, that will guide you trough the process. Alternatively you can follow the official Google guides, first guide to creating a GCP project and enabling the API, and then guide to creating credentials.

When you have your credentials.json file, put it in the same directory as GmailAttachmentsExtractor.jar file, open terminal, change to the directory with GmailAttachmentsExtractor.jar file, and run:

java -jar GmailAttachmentsExtractor.jar --only-check-auth

A browser window should pop up where you need to log in to your Gmail account, and allow the app to access it. After you've done that, you should see the following message in the console:

Gmail authorization: OK

Step 2

You are now ready to run the Gmail Attachments Extractor. You run the app like:

java -jar GmailAttachmentsExtractor.jar [OPTIONS] QUERY_STRING [OUTPUT_DIRECTORY]

There are two parameters that you can pass to the program:

  1. QUERY_STRING - is the query string, that selects the email messages from which attachments will be extracted. It supports the same query format as the Gmail search box. So, for example, to extract attachments from email messages with label big-attachments use query string label:big-attachments, or to extract attachments from email messages that are larger than 30MB and received before 2014/01/25 use query string larger:30M before:2014/01/25. You can find more info about Gmail search operators here.

  2. OUTPUT_DIRECTORY - is the path to directory, where attachments will be saved. Must be a path to a non-existing directory. Defaults to Gmail Extracted Attachments when not specified.

Step 3

After you are done using Gmail Attachments Extractor, you should take few actions to make sure your Gmail account will remain secure. You should go to this page, and delete the OAuth2 credentials that you created to use with Gmail Attachments Extractor. If you don't access Gmail API from any other app, then, you should go to this page, and click "DISABLE API" button at the top of the page.

Finally if, in step 1, you've created OAuth2 credentials by following the visual guide, then you've also created a Google Cloud Platform project, that you no longer need. To delete this GCP project, go to this page, from menu in the top-left corner select project that you've created in step 1, and click on "SHUT DOWN" button at the top of the page.

Customize

You can customize some aspects of the program execution by using options. For example you can:

  • Specify --min-size 1M option to only extract attachments larger than 1MB
  • Specify --mime-type 'image|video|audio' option to only extract multimedia files
  • Specify --filename '.*\.pdf$' option to only extract attachments with extension .pdf

You can see all the available options by running the program with --help option:

$ java -jar GmailAttachmentsExtractor.jar --help
Gmail Attachments Extractor vX.X.X
Downloads attachments from Gmail emails, then creates copy of emails but without extracted attachments.
https://github.com/TeWu/GmailAttachmentsExtractor

Usage: java -jar GmailAttachmentsExtractor.jar [OPTIONS] QUERY_STRING [OUTPUT_DIRECTORY]

Parameters:
      QUERY_STRING          Only try to extract attachments from emails that match this query. Supports the same query
                              format as the Gmail search box. For example, "label:big-emails" or "from:someuser@example.
                              com has:attachment larger:5M after:2020/12/31 before:2021/01/25". More info about Gmail
                              search operators: https://support.google.com/mail/answer/7190
      [OUTPUT_DIRECTORY]    Save attachments to this directory. Must be a path to a non-existing directory.
                              Default: Gmail Extracted Attachments

Options:
  -l, --labels-prefix OUTPUT_LABEL_PREFIX
                            Create labels which name start with this prefix, and mark affected emails with them.
                              Default: Cleanup
  -C, --credentials-file CREDENTIALS_FILE
                            Path to file with Gmail API credentials (typically named credentials.json). How to generate
                              this file: https://github.com/TeWu/GmailAttachmentsExtractor#how-to-use-it
                              Default: credentials.json
      --tokens-dir TOKENS_DIR
                            Path to directory, where Gmail API authorization data get stored
                              Default: tokens
      --no-modify-gmail     Only download attachments. Don't modify Gmail (don't create labels, don't insert copies of
                              emails without extracted attachments, etc.).
      --only-check-auth     Only check if authorization information are correct, by trying to access the Gmail account,
                              and exit immediately.
  -h, --help                Show this help message and exit.
  -V, --version             Print version information and exit.

Attachment Filter Options:
      --filename FILENAME_REGEX
                            Extract only attachments with filenames matching this regular expression.
                              Default: .*
      --mime-type MIME_TYPE_PREFIX_REGEX
                            Extract only attachments with mime types matching regular expression
                              '^MIME_TYPE_PREFIX_REGEX.*'.
                              Default: ^.*
      --min-size MIN_SIZE   Don't extract attachment that are smaller than MIN_SIZE. Specify value in bytes or use
                              suffix k, M or G.
                              Default: 0
      --max-size MAX_SIZE   Don't extract attachment that are larger than MAX_SIZE. Specify value in bytes or use
                              suffix k, M or G.
                              Default: 0