ResearchJan 28, 20268 min read

18 regex patterns and 22 active probes: how we find leaked secrets

In January 2025, a researcher found valid AWS keys in the client-side JavaScript bundle of a publicly traded company's web application. The keys had been there for fourteen months. They granted full S3 access to a bucket containing 2.3 million customer records. The company had passed two penetration tests during that period.

This is the problem with secrets: they don't look like vulnerabilities. There's no broken authentication flow, no injection point, no malformed input. There's just a string that shouldn't be there, sitting in plain text, waiting for someone to notice. BrokenApp's exposure scanner is designed to notice.

Why secrets leak

Secrets end up in production through three primary vectors, and understanding them matters because each requires a different detection strategy.

Client bundle inclusion

Environment variables prefixed with NEXT_PUBLIC_, REACT_APP_, or VITE_ get compiled into JavaScript bundles. Developers add a secret to .env.local for testing, prefix it for client access, and forget to remove it. The bundler dutifully inlines it into every page load.

Misconfigured .env and config files

Web servers that serve static files from the project root can expose .env, config.yml, .git/config, and similar files. A missing deny rule in nginx or a misconfigured static file handler is all it takes.

Debug and diagnostic endpoints

Frameworks ship with debug tooling: Laravel's Telescope, Django's debug toolbar, Express's /debug/vars, Spring Boot Actuator. These endpoints dump environment variables, database credentials, and internal state. They're meant for development. They end up in production.

Passive scanning: the 18 regex patterns

The first phase of the exposure scanner is passive. BrokenApp downloads every JavaScript bundle, CSS file, HTML page, and JSON response the application serves, then runs 18 regex patterns against the content. Each pattern targets a specific secret format with high specificity — we match the known structure of each provider's key format, not just "long random string."

# Pattern categories

AWS Access Key AKIA[0-9A-Z]{16}

AWS Secret Key [0-9a-zA-Z/+]{40}

GCP API Key AIza[0-9A-Za-z\-_]{35}

GCP Service Acct "type":"service_account"

GitHub PAT ghp_[0-9a-zA-Z]{36}

GitHub OAuth gho_[0-9a-zA-Z]{36}

Stripe Live Key sk_live_[0-9a-zA-Z]{24,}

Stripe Pub Key pk_live_[0-9a-zA-Z]{24,}

Slack Token xox[bpors]-[0-9a-zA-Z-]{10,}

Twilio Key SK[0-9a-fA-F]{32}

SendGrid Key SG\.[0-9A-Za-z\-_]{22,}

Mailgun Key key-[0-9a-zA-Z]{32}

Firebase Key AAAA[A-Za-z0-9_-]{7}:[A-Za-z0-9_-]{140}

Private Key -----BEGIN (RSA|EC|DSA) PRIVATE KEY

JWT Secret eyJ[A-Za-z0-9-_]+\.eyJ[A-Za-z0-9-_]+

Heroku API Key [0-9a-fA-F]{8}-[0-9a-fA-F]{4}...

Database URL (postgres|mysql|mongodb)://[^\s]+@

Generic High Entropy (SECRET|KEY|TOKEN|PASSWORD)=[A-Za-z0-9+/]{20,}

The first 17 patterns are provider-specific and have near-zero false positive rates because cloud providers use deterministic key prefixes. Pattern 18 — the generic high-entropy matcher — is intentionally broader and carries a higher false-positive risk. It exists to catch secrets from providers we don't have a specific pattern for. Findings from pattern 18 are flagged as "needs review" rather than confirmed.

Active scanning: the 22 path probes

Passive scanning only finds secrets that appear in content the application already serves. But many secrets hide behind known paths that aren't linked from the application itself. BrokenApp's active scanner makes 22 targeted HTTP requests to paths where secrets commonly live.

# Configuration files

/.env

/.env.production

/.env.local

/config.yml

/config.json

/.git/config

/.git/HEAD

# Debug endpoints

/debug/vars

/debug/pprof

/_debug

/actuator/env

/actuator/configprops

/__debug__/

/telescope/requests

# Server info

/server-status

/server-info

/phpinfo.php

/info.php

/.DS_Store

/wp-config.php.bak

/api/swagger.json

/graphql?query=\{__schema\{types\{name\}\}\}

Each probe is scored by both the HTTP status code and a content-type analysis. A 200 response to /.env that returns text/plain with lines matching KEY=VALUE format is a confirmed exposure. A 200 that returns an HTML error page is a false positive — many servers return 200 with a custom "not found" page rather than a proper 404.

How secrets are masked in reports

Here's a design decision we made early: BrokenApp never stores the full secret. When a pattern match fires, the scanner captures enough to confirm the finding — the prefix, the length, and a 4-character suffix — then discards the rest. The scan report shows:

$ brokenapp exposure-scan --url https://app.example.com

CRITICAL AWS Access Key in /static/js/main.a4b2c.js

AKIA████████████████XQWZ

Line 1842, column 23

CRITICAL Stripe Live Key in /static/js/checkout.js

sk_live_████████████████████7mRk

Line 294, column 18

HIGH /.env accessible (200, text/plain, 847 bytes)

Contains: DATABASE_URL, STRIPE_SECRET_KEY, AWS_SECRET_ACCESS_KEY

3 findings | 2 critical | 1 high | 0 medium

This matters for two reasons. First, if the scan report itself leaks (shared in a Slack channel, committed to a repo, attached to a Jira ticket), the full secret isn't compromised. Second, it keeps BrokenApp out of the chain of custody for secrets — we never have them, so there's no risk of us being a secondary exposure vector.

Real-world examples

In our first three months of beta testing, the exposure scanner found secrets in 23% of applications scanned. The most common findings, in order:

Stripe publishable keys in client bundles (41% of findings) — technically not secret, but indicate the developer may also be leaking secret keys through the same pattern
Firebase server keys in client JavaScript (18%) — these grant push notification access and should never be client-side
AWS access keys compiled into static assets (14%) — typically from NEXT_PUBLIC_ or REACT_APP_ env vars that should have been server-only
Exposed .env files via path probe (11%) — usually on staging environments with less restrictive server configurations
Spring Boot Actuator /actuator/env returning full environment (8%) — the default configuration exposes everything unless explicitly locked down
Database connection strings in debug responses (8%) — Laravel Telescope and Django debug toolbar were the most common sources

The median time between secret deployment and discovery was 47 days. The longest-lived secret we found had been exposed for 11 months. In every case, the development team was unaware of the exposure until the scan flagged it.

False positive handling

False positives destroy trust in any scanning tool. If developers learn to ignore findings, real secrets get ignored too. BrokenApp uses three layers to minimize false positives:

1. Structural validation

Each regex pattern matches the known format of the provider's keys, including prefixes, character sets, and length constraints. This eliminates matches on random strings that happen to be the right length.

2. Context analysis

The scanner checks surrounding context. A string matching the AWS key pattern inside a test file, a comment, or a variable named 'example_key' is deprioritized. Strings adjacent to assignment operators with names like 'api_key' or 'secret' are promoted.

3. Entropy scoring

For the generic pattern (pattern 18), BrokenApp computes Shannon entropy. Real secrets have entropy above 4.5 bits per character. Placeholder values like 'your-api-key-here' and repeated characters score well below this threshold and are filtered out.

Across our beta dataset, the provider-specific patterns (1-17) had a combined false positive rate of 2.1%. The generic pattern had a false positive rate of 18%, which is why it's reported separately as "needs review." We're actively tuning the entropy threshold and context heuristics to bring this down, but we'd rather show a few false positives than miss a real secret.

# Run exposure scan with strict mode (patterns 1-17 only)

$ brokenapp exposure-scan --url https://app.example.com --strict

# Include generic pattern with review flags

$ brokenapp exposure-scan --url https://app.example.com --include-generic

# Suppress known false positives with allowlist

$ brokenapp exposure-scan --url https://app.example.com --allowlist .brokenapp/allow.toml

The allowlist file lets you permanently suppress specific findings — for example, if your application intentionally exposes a Stripe publishable key (which is designed to be public). Each allowlist entry requires a reason field, so there's an audit trail for why a finding was dismissed.

All posts View CLI docs