18 regex patterns and 22 active probes: how we find leaked secrets
In January 2025, a researcher found valid AWS keys in the client-side JavaScript bundle of a publicly traded company's web application. The keys had been there for fourteen months. They granted full S3 access to a bucket containing 2.3 million customer records. The company had passed two penetration tests during that period.
This is the problem with secrets: they don't look like vulnerabilities. There's no broken authentication flow, no injection point, no malformed input. There's just a string that shouldn't be there, sitting in plain text, waiting for someone to notice. BrokenApp's exposure scanner is designed to notice.
Why secrets leak
Secrets end up in production through three primary vectors, and understanding them matters because each requires a different detection strategy.
Client bundle inclusion
Environment variables prefixed with NEXT_PUBLIC_, REACT_APP_, or VITE_ get compiled into JavaScript bundles. Developers add a secret to .env.local for testing, prefix it for client access, and forget to remove it. The bundler dutifully inlines it into every page load.
Misconfigured .env and config files
Web servers that serve static files from the project root can expose .env, config.yml, .git/config, and similar files. A missing deny rule in nginx or a misconfigured static file handler is all it takes.
Debug and diagnostic endpoints
Frameworks ship with debug tooling: Laravel's Telescope, Django's debug toolbar, Express's /debug/vars, Spring Boot Actuator. These endpoints dump environment variables, database credentials, and internal state. They're meant for development. They end up in production.
Passive scanning: the 18 regex patterns
The first phase of the exposure scanner is passive. BrokenApp downloads every JavaScript bundle, CSS file, HTML page, and JSON response the application serves, then runs 18 regex patterns against the content. Each pattern targets a specific secret format with high specificity — we match the known structure of each provider's key format, not just "long random string."
# Pattern categories
AWS Access Key AKIA[0-9A-Z]{16}
AWS Secret Key [0-9a-zA-Z/+]{40}
GCP API Key AIza[0-9A-Za-z\-_]{35}
GCP Service Acct "type":"service_account"
GitHub PAT ghp_[0-9a-zA-Z]{36}
GitHub OAuth gho_[0-9a-zA-Z]{36}
Stripe Live Key sk_live_[0-9a-zA-Z]{24,}
Stripe Pub Key pk_live_[0-9a-zA-Z]{24,}
Slack Token xox[bpors]-[0-9a-zA-Z-]{10,}
Twilio Key SK[0-9a-fA-F]{32}
SendGrid Key SG\.[0-9A-Za-z\-_]{22,}
Mailgun Key key-[0-9a-zA-Z]{32}
Firebase Key AAAA[A-Za-z0-9_-]{7}:[A-Za-z0-9_-]{140}
Private Key -----BEGIN (RSA|EC|DSA) PRIVATE KEY
JWT Secret eyJ[A-Za-z0-9-_]+\.eyJ[A-Za-z0-9-_]+
Heroku API Key [0-9a-fA-F]{8}-[0-9a-fA-F]{4}...
Database URL (postgres|mysql|mongodb)://[^\s]+@
Generic High Entropy (SECRET|KEY|TOKEN|PASSWORD)=[A-Za-z0-9+/]{20,}
The first 17 patterns are provider-specific and have near-zero false positive rates because cloud providers use deterministic key prefixes. Pattern 18 — the generic high-entropy matcher — is intentionally broader and carries a higher false-positive risk. It exists to catch secrets from providers we don't have a specific pattern for. Findings from pattern 18 are flagged as "needs review" rather than confirmed.
Active scanning: the 22 path probes
Passive scanning only finds secrets that appear in content the application already serves. But many secrets hide behind known paths that aren't linked from the application itself. BrokenApp's active scanner makes 22 targeted HTTP requests to paths where secrets commonly live.
# Configuration files
/.env
/.env.production
/.env.local
/config.yml
/config.json
/.git/config
/.git/HEAD
# Debug endpoints
/debug/vars
/debug/pprof
/_debug
/actuator/env
/actuator/configprops
/__debug__/
/telescope/requests
# Server info
/server-status
/server-info
/phpinfo.php
/info.php
/.DS_Store
/wp-config.php.bak
/api/swagger.json
/graphql?query=\{__schema\{types\{name\}\}\}
Each probe is scored by both the HTTP status code and a content-type analysis. A 200 response to /.env that returns text/plain with lines matching KEY=VALUE format is a confirmed exposure. A 200 that returns an HTML error page is a false positive — many servers return 200 with a custom "not found" page rather than a proper 404.
How secrets are masked in reports
Here's a design decision we made early: BrokenApp never stores the full secret. When a pattern match fires, the scanner captures enough to confirm the finding — the prefix, the length, and a 4-character suffix — then discards the rest. The scan report shows:
$ brokenapp exposure-scan --url https://app.example.com
CRITICAL AWS Access Key in /static/js/main.a4b2c.js
AKIA████████████████XQWZ
Line 1842, column 23
CRITICAL Stripe Live Key in /static/js/checkout.js
sk_live_████████████████████7mRk
Line 294, column 18
HIGH /.env accessible (200, text/plain, 847 bytes)
Contains: DATABASE_URL, STRIPE_SECRET_KEY, AWS_SECRET_ACCESS_KEY
3 findings | 2 critical | 1 high | 0 medium
This matters for two reasons. First, if the scan report itself leaks (shared in a Slack channel, committed to a repo, attached to a Jira ticket), the full secret isn't compromised. Second, it keeps BrokenApp out of the chain of custody for secrets — we never have them, so there's no risk of us being a secondary exposure vector.
Real-world examples
In our first three months of beta testing, the exposure scanner found secrets in 23% of applications scanned. The most common findings, in order:
- Stripe publishable keys in client bundles (41% of findings) — technically not secret, but indicate the developer may also be leaking secret keys through the same pattern
- Firebase server keys in client JavaScript (18%) — these grant push notification access and should never be client-side
- AWS access keys compiled into static assets (14%) — typically from NEXT_PUBLIC_ or REACT_APP_ env vars that should have been server-only
- Exposed .env files via path probe (11%) — usually on staging environments with less restrictive server configurations
- Spring Boot Actuator /actuator/env returning full environment (8%) — the default configuration exposes everything unless explicitly locked down
- Database connection strings in debug responses (8%) — Laravel Telescope and Django debug toolbar were the most common sources
The median time between secret deployment and discovery was 47 days. The longest-lived secret we found had been exposed for 11 months. In every case, the development team was unaware of the exposure until the scan flagged it.
False positive handling
False positives destroy trust in any scanning tool. If developers learn to ignore findings, real secrets get ignored too. BrokenApp uses three layers to minimize false positives:
1. Structural validation
Each regex pattern matches the known format of the provider's keys, including prefixes, character sets, and length constraints. This eliminates matches on random strings that happen to be the right length.
2. Context analysis
The scanner checks surrounding context. A string matching the AWS key pattern inside a test file, a comment, or a variable named 'example_key' is deprioritized. Strings adjacent to assignment operators with names like 'api_key' or 'secret' are promoted.
3. Entropy scoring
For the generic pattern (pattern 18), BrokenApp computes Shannon entropy. Real secrets have entropy above 4.5 bits per character. Placeholder values like 'your-api-key-here' and repeated characters score well below this threshold and are filtered out.
Across our beta dataset, the provider-specific patterns (1-17) had a combined false positive rate of 2.1%. The generic pattern had a false positive rate of 18%, which is why it's reported separately as "needs review." We're actively tuning the entropy threshold and context heuristics to bring this down, but we'd rather show a few false positives than miss a real secret.
# Run exposure scan with strict mode (patterns 1-17 only)
$ brokenapp exposure-scan --url https://app.example.com --strict
# Include generic pattern with review flags
$ brokenapp exposure-scan --url https://app.example.com --include-generic
# Suppress known false positives with allowlist
$ brokenapp exposure-scan --url https://app.example.com --allowlist .brokenapp/allow.toml
The allowlist file lets you permanently suppress specific findings — for example, if your application intentionally exposes a Stripe publishable key (which is designed to be public). Each allowlist entry requires a reason field, so there's an audit trail for why a finding was dismissed.