Results Storage

PII Crawler scan results are stored in an SQLite Database. SQLite is a single-file database. Most programing languages have built-in support for connecting to it. By default this file (piicrawler.db) will be created in the same directory that you run PII Crawler from.

PII Crawler intentionally exposes the application’s database to the user so the user can manually or programmatically:

View the data that is being collected
Run custom queries on the data
Create alerts from findings
Customize which files are scanned
Customize how files are scanned
Customize the behavior of PII Crawler

You can think of this database as an API to PII Crawler.

Schema

CREATE TABLE IF NOT EXISTS "files" (
		path TEXT primary key,
		scan_started_at INTEGER,
		scan_finished_at INTEGER,
		size INTEGER,
		extension TEXT,
		mime_type TEXT,
		csz_clusters INTEGER,
		unique_csz_clusters INTEGER,
		unique_common_first_names INTEGER,
		unique_common_last_names INTEGER,
		potential_tax_ids_or_ssns INTEGER,
		text_extracted BOOLEAN default 0 NOT NULL,
		unique_common_email_domain_suffixes INTEGER,
		unique_emails INTEGER,
		unique_addresses INTEGER,
		results TEXT,
		skip BOOLEAN default 0 NOT NULL
	);

Column	Description
path	absolute path to file
scan_started_at	unix timestamp of when the scan started
scan_finished_at	unix timestamp of when the scan finished
size	size of file in bytes
extension	file extension (ex: .pdf, .csv)
mime_type	detected file mimetype (ex: application/json, image/jpeg)
csz_clusters	city, state, zip combination matches
unique_csz_clusters	unique city, state, zip combination matches
unique_common_first_names	unique common first names
unique_common_last_names	unique common last names
potential_tax_ids_or_ssns	SSNs or Tax IDs
text_extracted	bool if file parsing, text extraction, or OCR was used
unique_common_email_domain_suffixes	count of common email suffixes found (supplemental to unique_emails)
unique_emails	unique full email addresses
unique_addresses	unique street addresses with match city state zip
results	not yet used
skip	bool if true file will not be scanned

💌 Get notified on new features and updates

Only sent when a new version is released. Nothing else.