Scan Gmail For PII

If an attacker were able to get into your email what PII could they gather? Your email and address are probably in many emails but how about your social security number?

Let’s find out.

Download your emails

gmail-download-data

  1. Log into Gmail
  2. Go to account settings
  3. Find Download of delete your data It should take you to https://takeout.google.com/?hl=en&pli=1
  4. Deselect all services
  5. Select only Mail
  6. Go to next step
  7. Choose Send download link via email
  8. Export once as Zip with 2GB max file size.
  9. Click create export and wait a bit. Google says this can take a few hours or even a day.

gmail-download-exports

Extract the emails and prepare to scan

Extract the Zip.

There should be a .mbox file (or several) in there which is a standard file format of email text concatenated together. PII Crawler understands this format and will scan all emails and attachments separately within.

Create an exact-match.json file to help PII Crawler find your social. It can do it without this but it helps in certain circumstances.

{
    "me": ["whitcher", "078-05-1120"],
    "me2": ["whitcher", "078051120"]
}

Note: This is no longer a real SSN but there is an interesting story behind it.

Scan for PII

./piicrawler scanfile ~/Downloads/gmail-export/All\ mail\ Including\ Spam\ and\ Trash-002.mbox > results.txt

Results in results.txt:

... (many results) ...
  {
    "path": "Loan documents from ******* MORTGAGE CO.::LoanDocs.pdf",
    "mime_type": "application/pdf",
    "csz_clusters": 12,
    "unique_csz_clusters": 2,
    "unique_common_first_names": 2,
    "unique_common_last_names": 10,
    "potential_tax_ids_or_ssns": 2,
    "unique_addresses": 1,
    "matches": {
      "address": [
        "**** W 2nd Street"
      ],
      "csz": [
        "Kalispell MT 59901",
        "*** MT *****"
      ],
      "email": null,
      "ssn": [
        "***-**-****",
        "***-**-****"
      ]
    },
    "parent_path": "/home/me/Downloads/gmail-export/All mail Including Spam and Trash-002.mbox",
    "exact_matches": 1
  },
...

The path format for this file is <subject>::<file>. From these results you can then go to Gmail, search for those subject lines and delete the emails. Be sure to also clear the trash folder.

💌 Get notified on new features and updates

Only sent when a new version is released. Nothing else.