Scan your data stores for unencrypted personal data (PII)
- Last names (US)
- Email addresses
- IP addresses (IPv4)
- Street addresses (US)
- Phone numbers
- Credit card numbers
- Social Security numbers (US)
- Dates of birth
- Location data
- OAuth tokens
- MAC addresses
Uses data sampling and naming, and works with compressed files
💥 Zero runtime dependencies and minimal database load
Download the latest version:
You can also install it with Homebrew or Docker.
pdscan elasticsearch+http://user:pass@host:9200For HTTPS, use elasticsearch+https://.
You can also specify indices.
pdscan elasticsearch+http://user:pass@host:9200/index1,index2Wildcards are also supported.
pdscan "elasticsearch+http://user:pass@host:9200/index*"pdscan file://path/to/file.txtYou can also specify a directory.
pdscan file://path/to/directoryFor absolute paths, use file:///.
pdscan file:///absolute/path/to/file.txtFor paths relative to your home directory on Mac and Linux, use:
pdscan file://$HOME/file.txtpdscan mariadb://user:pass@host:3306/dbnamepdscan mongodb://user:pass@host:27017/dbnamepdscan mysql://user:pass@host:3306/dbnamepdscan opensearch+http://user:pass@host:9200For HTTPS, use opensearch+https://.
You can also specify indices.
pdscan opensearch+http://user:pass@host:9200/index1,index2Wildcards are also supported.
pdscan "opensearch+http://user:pass@host:9200/index*"pdscan postgres://user:pass@host:5432/dbnameAlways make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.
If your connection doesn’t use SSL, append to the URI:
?sslmode=disable
For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).
CREATE EXTENSION tsm_system_rows;pdscan redis://user:pass@host:6379/dbpdscan s3://bucket/path/to/file.txtRequires
s3:GetObjectpermission
You can also specify a prefix by ending with a /.
pdscan s3://bucket/path/to/directory/Requires
s3:ListBucketands3:GetObjectpermissions
pdscan sqlite://path/to/dbname.sqlite3Not available with prebuilt binaries
pdscan "sqlserver://user:pass@host:1433?database=dbname"Show the data found
pdscan --show-dataShow low confidence matches
pdscan --show-allChange the sample size
pdscan --sample-size 50000Specify the number of processes to use (defaults to 1)
pdscan --processes 4Scan for only certain types of data
pdscan --only email,phone,locationScan for all except certain types of data
pdscan --except ip,macSpecify the minimum number of rows/documents/lines for a match (experimental)
pdscan --min-count 10Specify a custom pattern (experimental)
pdscan --pattern "\d{16}"Output newline delimited JSON (experimental)
pdscan --format ndjsonWith Homebrew, you can use:
brew install ankane/brew/pdscanGet the Docker image with:
docker pull ankane/pdscanAnd run it with:
docker run -ti ankane/pdscan <connection-uri>For data stores on the host machine, use host.docker.internal as the hostname
docker run -ti ankane/pdscan "postgres://[email protected]:5432/dbname?sslmode=disable"On Linux, this requires
--add-host=host.docker.internal:host-gateway
For files on the host machine, use:
docker run -ti -v /path/to/files:/data ankane/pdscan file:///dataView the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/pdscan.git
cd pdscan
make test