Start by downloading a copy of PII Crawler for your OS and CPU type.
I’m currently on a Linux machine so I’m going to first check my CPU type with lscpu
which says:
Architecture: x86_64
...
This means amd64.
I’ll download PII crawler with: wget https://www.piicrawler.com/downloads/piicrawler-linux-amd64.zip
Unzip it with: unzip piicrawler-linux-amd64.zip
Let’s now run PII Crawler and print out help information: ./piicrawler-linux-amd64 -h
. This will give us a short description of the tool and a list of commands we can run.
PII Crawler is a command line interface (CLI) that scans data looking for
Personal Identifiable Information (PII) and other sensitive data for the purpose of:
* Securing, encrypting, or redacting data
* Assisting in incident analysis
* Data Leak Prevention (DLP)
* Data Compliance (GDPR, CCPA, etc)
PII Crawler documentation is available at https://piicrawler.com/docs
Usage:
piicrawler [flags]
piicrawler [command]
Available Commands:
completion Generate the autocompletion script for the specified shell
enumerate enumerate files at path
help Help about any command
register register registers your application and downloads a license file
report generate report based on scan findings
scan scan scans all files at path recursively
scanfile scanfile scans a single file and returns PII found
serve start local HTTP server to view reports
textextract extract text of file
update Update to latest version of PII Crawler
version Print the version of PII Crawler
Flags:
-h, --help help for piicrawler
Use "piicrawler [command] --help" for more information about a command.
In order to scan for PII we will first need to register our copy of PII Crawler.
You can do that with: ./piicrawler-linux-amd64 register
:
./piicrawler-linux-amd64 register
Please enter a valid email:
[email protected]
Successfully registered product. license.lic file downloaded to this directory
Keep this file in the same directory as piicrawler to avoid having to re-register in the future.
You may need to verify your email. Check your email and click the verify email link sent to you. Then come back and run ./piicrawler-linux-amd64 register
again.
You are now ready to scan for PII.
To scan for PII choose a starting directory and append it to: ./piicrawler-linux-amd64 scan ~
. In this case I’m using my home directory or ~
. This will scan all files in my home directory recursively. The home directory is a good place to start as it contains browser cache files and cookies that often contain lots of PII.
It will look like this with file paths being output as it scans:
./piicrawler-linux-amd64 scan ~
Enumerating files at path /home/mark
355517 files found, 355517 unscanned files, 0 files to skip.
Starting file scan...
scanning "/home/mark/.bash_history-00997.tmp"
scanning "/home/mark/.bash_history"
scanning "/home/mark/.bash_history-74572.tmp"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/005fd83c1618e7be_0"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/007e1ba7f1c7beab_0"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/016aa2d1fdf62837_0"
scanning "/home/mark/.cache/chromium/Default/Cache/Cache_Data/01d02c1f1f6f2de2_0"
...
The scan may take several minutes to hours depending on how many files you have an how fast your hardware is. You can stop the scan at any time with Ctrl + C. You can resume the scan by simply running ./piicrawler-linux-amd64 scan ~
again. PII Crawler wont scan files again that have already been scanned. It resumes where it left off. If you want to rescan files, rename or delete piicrawler.db
which is the results database.
Results are saved in piicrawler.db
. The easiest way to view them after a scan is complete is with the built-in web server: ./piicrawler-linux-amd64 serve
which open a localhost HTTP server on port 8080.
./piicrawler-linux-amd64 serve
Starting HTTP server on :8080
Ctrl + C to stop server
💌 Get notified on new features and updates