PII Crawler scan results are stored in an SQLite Database. SQLite is a single-file database. Most programing languages have built-in support for connecting to it. By default this file (piicrawler.db
) will be created in the same directory that you run PII Crawler from.
PII Crawler intentionally exposes the application’s database to the user so the user can manually or programmatically:
You can think of this database as an API to PII Crawler.
CREATE TABLE IF NOT EXISTS "files" (
path TEXT primary key,
scan_started_at INTEGER,
scan_finished_at INTEGER,
size INTEGER,
extension TEXT,
mime_type TEXT,
csz_clusters INTEGER,
unique_csz_clusters INTEGER,
unique_common_first_names INTEGER,
unique_common_last_names INTEGER,
potential_tax_ids_or_ssns INTEGER,
text_extracted BOOLEAN default 0 NOT NULL,
unique_common_email_domain_suffixes INTEGER,
unique_emails INTEGER,
unique_addresses INTEGER,
results TEXT,
skip BOOLEAN default 0 NOT NULL
);
Column | Description |
---|---|
path | absolute path to file |
scan_started_at | unix timestamp of when the scan started |
scan_finished_at | unix timestamp of when the scan finished |
size | size of file in bytes |
extension | file extension (ex: .pdf, .csv) |
mime_type | detected file mimetype (ex: application/json, image/jpeg) |
csz_clusters | city, state, zip combination matches |
unique_csz_clusters | unique city, state, zip combination matches |
unique_common_first_names | unique common first names |
unique_common_last_names | unique common last names |
potential_tax_ids_or_ssns | SSNs or Tax IDs |
text_extracted | bool if file parsing, text extraction, or OCR was used |
unique_common_email_domain_suffixes | count of common email suffixes found (supplemental to unique_emails) |
unique_emails | unique full email addresses |
unique_addresses | unique street addresses with match city state zip |
results | not yet used |
skip | bool if true file will not be scanned |
💌 Get notified on new features and updates