Your first PII scan

Download PII Crawler

Start by downloading a copy of PII Crawler for your OS and CPU type.

I’m currently on a Linux machine so I’m going to first check my CPU type with lscpu which says:

Architecture:             x86_64
...

This means amd64.

I’ll download PII crawler with: wget https://www.piicrawler.com/downloads/piicrawler-linux-amd64.zip

Unzip it with: unzip piicrawler-linux-amd64.zip

Run PII Crawler for the first time

Let’s now run PII Crawler and print out help information: ./piicrawler-linux-amd64 -h. This will give us a short description of the tool and a list of commands we can run.

PII Crawler is a command line interface (CLI) that scans data looking for
Personal Identifiable Information (PII) and other sensitive data for the purpose of:

 * Securing, encrypting, or redacting data
 * Assisting in incident analysis
 * Data Leak Prevention (DLP)
 * Data Compliance (GDPR, CCPA, etc)

 PII Crawler documentation is available at https://piicrawler.com/docs

Usage:
  piicrawler [flags]
  piicrawler [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  enumerate   enumerate files at path
  help        Help about any command
  register    register registers your application and downloads a license file
  report      generate report based on scan findings
  scan        scan scans all files at path recursively
  scanfile    scanfile scans a single file and returns PII found
  serve       start local HTTP server to view reports
  textextract extract text of file
  update      Update to latest version of PII Crawler
  version     Print the version of PII Crawler

Flags:
  -h, --help   help for piicrawler

Use "piicrawler [command] --help" for more information about a command.

In order to scan for PII we will first need to register our copy of PII Crawler.

You can do that with: ./piicrawler-linux-amd64 register:

./piicrawler-linux-amd64 register
Please enter a valid email:
[email protected]
Successfully registered product. license.lic file downloaded to this directory
Keep this file in the same directory as piicrawler to avoid having to re-register in the future.

You may need to verify your email. Check your email and click the verify email link sent to you. Then come back and run ./piicrawler-linux-amd64 register again.

Scan for PII

You are now ready to scan for PII.

To scan for PII choose a starting directory and append it to: ./piicrawler-linux-amd64 scan ~. In this case I’m using my home directory or ~. This will scan all files in my home directory recursively. The home directory is a good place to start as it contains browser cache files and cookies that often contain lots of PII.

It will look like this with file paths being output as it scans:

./piicrawler-linux-amd64 scan ~
Enumerating files at path /home/mark
355517 files found, 355517 unscanned files, 0 files to skip.
Starting file scan...
scanning "/home/mark/.bash_history-00997.tmp"
scanning "/home/mark/.bash_history"
scanning "/home/mark/.bash_history-74572.tmp"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/005fd83c1618e7be_0"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/007e1ba7f1c7beab_0"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/016aa2d1fdf62837_0"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/01d02c1f1f6f2de2_0"
...

The scan may take several minutes to hours depending on how many files you have an how fast your hardware is. You can stop the scan at any time with Ctrl + C. You can resume the scan by simply running ./piicrawler-linux-amd64 scan ~ again. PII Crawler wont scan files again that have already been scanned. It resumes where it left off. If you want to rescan files, rename or delete piicrawler.db which is the results database.

Show results

Results are saved in piicrawler.db. The easiest way to view them after a scan is complete is with the built-in web server: ./piicrawler-linux-amd64 serve which open a localhost HTTP server on port 8080.

./piicrawler-linux-amd64 serve
Starting HTTP server on :8080
Ctrl + C to stop server

pii-report

💌 Get notified on new features and updates

Only sent when a new version is released. Nothing else.