Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
## Purpose

These instructions help an AI code agent be productive in the detect-secrets repo. Focus on the
core components, local conventions, test/build workflows, and concrete code locations where
changes typically belong.

## Big picture (what this project does)

- detect-secrets scans code for potential secrets, produces a JSON-compatible baseline, and
provides CLI tools: `detect-secrets` (scan), `detect-secrets-hook` (pre-commit blocking), and
`detect-secrets audit` (label baseline results). See `README.md` for usage examples.

## Key components & where to look

- Engine & orchestration: `detect_secrets/core/` — scanning, baseline handling, and plugin init.
- Global settings: `detect_secrets/settings.py` — Settings singleton, `default_settings()` and
`transient_settings()` context managers. Use these when testing or running programmatic scans.
- Plugin interface: `detect_secrets/plugins/base.py` — subclass `RegexBasedDetector`,
`LineBased` or `FileBased` detectors. `secret_type` and `json()` are important for baseline
compatibility.
- Built-in detectors: `detect_secrets/plugins/` — examples of detectors and verification logic
(e.g. `high_entropy_strings.py`, `aws.py`). Use these as canonical examples for new plugins.
- Filters: `detect_secrets/filters/` — filters are referenced by import path or `file://` URL and are
configured via the Settings object (see `settings.configure_filters`).
- Tests & helpers: `testing/` and `tests/` — test fixtures, custom filter/plugin helpers live in
`testing/` and unit tests are in `tests/`.

## Project-specific conventions & patterns

- Settings are global via a cached singleton. Prefer `transient_settings({...})` when running
isolated changes so caches are cleared and restored.

- Filters are configured by import path strings (e.g.
`detect_secrets.filters.heuristic.is_sequential_string`) or by file URLs `file:///path/to.py::func`.
Filters must accept injectable variables — see `settings.get_filters()` which attaches
`function.injectable_variables`.

- Plugins in baselines are stored as a list of dicts with a `name` key (see
`Settings.configure_plugins`) — if you change `secret_type` or plugin class names, note
it will affect baseline compatibility.

## Developer workflows (commands to run)

- Run the test suite (recommended via tox):

tox # runs envlist from `tox.ini` (py39..py313, mypy)

Quick local run without tox:

python -m pytest --strict tests

- Coverage and CI thresholds (enforced in `tox.ini`):
- coverage for `tests/*` must be >= 99%
- coverage for `testing/*` must be 100%
- coverage for `detect_secrets/*` must be >= 95%

- Run type checking (mypy):

tox -e mypy

- Pre-commit hooks: configured in `.github` and installed with `pre-commit install` or via
`tox -e venv` which creates a `venv` env and installs hooks. CI runs `pre-commit run --all-files`.

## How to add a detector or filter (quick checklist)

1. Add a new detector under `detect_secrets/plugins/` by subclassing `RegexBasedDetector` or
`BasePlugin` (see `plugins/base.py`). Set a stable `secret_type` and implement `analyze_string`
(or `analyze_line`).
2. Add tests under `tests/plugins/` and helper fixtures in `testing/` if needed.
3. Run `python -m pytest tests/plugins` and ensure coverage thresholds remain satisfied.
4. If the detector needs runtime configuration, ensure it can be referenced via the baseline
structure (name + optional args) and documented in `README.md` or `docs/`.

## Useful code examples (where to change behavior)

- To temporarily change which plugins run in a script or test, use `transient_settings` in
`detect_secrets/settings.py` (example in README under "More Advanced Configuration").

- Verification hooks live in plugin classes (`verify` / `format_scan_result`) and rely on
`detect_secrets.constants.VerifiedResult` and the `detect_secrets.filters` settings.

## Integration points & CI

- CI uses GitHub Actions (`.github/workflows/ci.yml`) and enforces tox-based testing and
coverage thresholds. Packaging metadata and bumpversion configuration live in `setup.cfg`.

## What an AI agent should avoid changing

- Do not rename an existing plugin class or change `secret_type` without adding a migration
note: older baselines rely on the name and JSON fields.
- Avoid modifying coverage thresholds or CI logic without explicit reviewer sign-off — these
are intentionally strict.

## If something's unclear, iterate

Leave a concise PR or comment that references the exact files changed (path + function/class)
and ask for the maintainer to point to intended behavior. If you'd like, I can iterate on this
file to add more repo-specific examples — tell me which areas you'd like expanded.
2 changes: 2 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ cfgv==3.4.0
charset-normalizer==3.3.2
coverage==7.5.3
distlib==0.3.8
fastapi
filelock==3.16.1
flake8==7.1.1
gibberish-detector==0.1.1
Expand Down Expand Up @@ -38,5 +39,6 @@ types-requests==2.31.0.20240106
typing-extensions==4.12.2
unidiff==0.7.5
urllib3==2.2.2
uvicorn[standard]
virtualenv==20.26.6
zipp==3.19.2
86 changes: 86 additions & 0 deletions scripts/api_server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse, HTMLResponse
from typing import List
import json
from datetime import datetime
from pathlib import Path
from detect_secrets.core.scan import scan_line
from detect_secrets.settings import transient_settings
from detect_secrets.core.plugins.util import get_mapping_from_secret_type_to_class

app = FastAPI()

def get_scan_settings():
"""Configure scan settings with all available detectors"""
mapping = get_mapping_from_secret_type_to_class()
return {
'plugins_used': [
{'name': cls.__name__}
for cls in mapping.values()
]
}

@app.get("/")
async def root():
"""Root endpoint that shows API usage instructions"""
return HTMLResponse("""
<html>
<body>
<h1>Detect-Secrets API</h1>
<h2>Available Endpoints:</h2>
<ul>
<li><b>POST /run_original_scan/</b> - Scan logs for secrets</li>
</ul>
<h2>Example Usage:</h2>
<pre>
POST /run_original_scan/
Content-Type: application/json

{
"logs": [
"line 1 to scan",
"line 2 to scan",
"..."
]
}
</pre>
</body>
</html>
""")

@app.post("/run_original_scan/")
async def scan_logs(request: Request):
data = await request.json()
logs: List[str] = data.get("logs", [])

results = []
# Use transient_settings to ensure all detectors are properly configured
with transient_settings(get_scan_settings()):
for lineno, line in enumerate(logs, start=1):
for secret in scan_line(line):
results.append({
'line_number': lineno,
'type': secret.type,
'hashed_secret': secret.secret_hash,
'secret_value': secret.secret_value, # Adding this for debugging
})

# Create output filename with timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_file = Path(f"scan_results_{timestamp}.json")

# Write the results to the JSON file
with open(output_file, 'w', encoding='utf-8') as f:
json.dump({
"matches": results,
"scan_time": timestamp,
"total_lines_scanned": len(logs)
}, f, indent=2)

# Return both the results and the file information
return JSONResponse({
"matches": results,
"output_file": str(output_file),
"scan_time": timestamp,
"total_lines_scanned": len(logs)
})
109 changes: 109 additions & 0 deletions scripts/run_original_scan.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# #!/usr/bin/env python3
# """Run detect-secrets using the repository's original plugin mapping (excluding local additions).

# This script builds the list of plugin classnames from the core mapping and excludes
# `PiiDetector` (our test plugin) to simulate original behavior. It then streams `logs.txt`
# and prints newline-delimited JSON for any matches.
# """
# from __future__ import annotations

# import json
# from typing import Iterable

# from detect_secrets.core.plugins.util import get_mapping_from_secret_type_to_class
# from detect_secrets.settings import transient_settings
# from detect_secrets.core.scan import scan_line


# def main(path: str = 'logs.txt') -> int:
# mapping = get_mapping_from_secret_type_to_class()

# # Use all plugins but configure them to reduce false positives
# plugin_cfg = [
# {'name': cls.__name__}
# for cls in mapping.values()
# ]

# settings = {
# 'plugins_used': plugin_cfg,
# 'filters_used': [
# # Exclude sequential strings that look random but aren't secrets
# {'path': 'detect_secrets.filters.heuristic.is_sequential_string'},
# # Exclude strings that are likely to be gibberish
# {'path': 'detect_secrets.filters.heuristic.is_likely_id_string'},
# # Exclude normal looking file paths
# {'path': 'detect_secrets.filters.heuristic.is_templated_secret'},
# # Only flag strings that match common secret patterns
# {'path': 'detect_secrets.filters.common.is_secret_pattern'},
# ],
# # Increase minimum entropy threshold for base64 strings
# 'base64_limit': 5.5,
# # Increase minimum entropy threshold for hex strings
# 'hex_limit': 3.5,
# }

# with transient_settings(settings):
# with open(path, 'r', encoding='utf-8', errors='replace') as fh:
# for lineno, line in enumerate(fh, start=1):
# for secret in scan_line(line):
# out = {
# 'file': path,
# 'line_number': lineno,
# 'type': secret.type,
# 'hashed_secret': secret.secret_hash,
# }
# # Avoid printing plaintext secrets by default (privacy)
# print(json.dumps(out, ensure_ascii=False))

# return 0


# if __name__ == '__main__':
# raise SystemExit(main())


"""Run detect-secrets using the repository's original plugin mapping (excluding local additions).

This script builds the list of plugin classnames from the core mapping and excludes
`PiiDetector` (our test plugin) to simulate original behavior. It then streams `logs.txt`
and prints newline-delimited JSON for any matches.
"""
from __future__ import annotations

import json
from typing import Iterable

from detect_secrets.core.plugins.util import get_mapping_from_secret_type_to_class
from detect_secrets.settings import transient_settings
from detect_secrets.core.scan import scan_line


def main(path: str = 'logs.txt') -> int:
mapping = get_mapping_from_secret_type_to_class()

# Use the original plugins as defined by their secret_type mapping, but remove
# our experimental plugin (if present) so we simulate the original repo behavior.
plugin_cfg = [
{'name': cls.__name__}
for cls in mapping.values()
if cls.__name__ != 'PiiDetector'
]

with transient_settings({'plugins_used': plugin_cfg}):
with open(path, 'r', encoding='utf-8', errors='replace') as fh:
for lineno, line in enumerate(fh, start=1):
for secret in scan_line(line):
out = {
'file': path,
'line_number': lineno,
'type': secret.type,
'hashed_secret': secret.secret_hash,
}
# Avoid printing plaintext secrets by default (privacy)
print(json.dumps(out, ensure_ascii=False))

return 0


if __name__ == '__main__':
raise SystemExit(main())
86 changes: 86 additions & 0 deletions scripts/scan_logs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#!/usr/bin/env python3
"""Simple wrapper to scan log files line-by-line using detect-secrets' adhoc API.

Usage examples:
# scan default `log.txt` in repo root and pretty-print results
python scripts/scan_logs.py

# scan a specific file and output newline-delimited JSON, showing matched secret values
python scripts/scan_logs.py /path/to/logfile.log --json --show-secret

# only enable a specific plugin by class name
python scripts/scan_logs.py --only PiiDetector
"""
from __future__ import annotations

import argparse
import json
import sys
from typing import Iterable

from detect_secrets.core.scan import scan_line
from detect_secrets.settings import transient_settings


def iter_matches_for_file(path: str) -> Iterable[dict]:
with open(path, 'r', encoding='utf-8', errors='replace') as fh:
for lineno, line in enumerate(fh, start=1):
for secret in scan_line(line):
yield {
'file': path,
'line_number': lineno,
'type': secret.type,
'hashed_secret': secret.secret_hash,
'secret_value': secret.secret_value,
}


def main(argv=None) -> int:
parser = argparse.ArgumentParser(description='Scan log files for secrets/PII using detect-secrets')
parser.add_argument('path', nargs='?', default='log.txt', help='Path to log file (default: log.txt)')
parser.add_argument('--only', action='append', dest='only_plugins', help='Only enable these plugin class names (repeatable)')
parser.add_argument('--json', action='store_true', dest='as_json', help='Print newline-delimited JSON for each match')
parser.add_argument('--show-secret', action='store_true', dest='show_secret', help='Include plaintext secret_value in output (off by default)')
parser.add_argument('--encoding', default='utf-8', help='File encoding (default: utf-8)')

args = parser.parse_args(argv)

# If user asked to limit plugins, use transient_settings to restrict plugins to only those names.
if args.only_plugins:
plugins_cfg = [{'name': name} for name in args.only_plugins]
settings_ctx = transient_settings({'plugins_used': plugins_cfg})
else:
# contextmanager that does nothing (transient_settings requires an arg), so use a dummy one
settings_ctx = transient_settings({'plugins_used': []}) if False else None

try:
if settings_ctx is not None:
settings_ctx.__enter__()

# Stream and print
for match in iter_matches_for_file(args.path):
out = {
'file': match['file'],
'line_number': match['line_number'],
'type': match['type'],
'hashed_secret': match['hashed_secret'],
}
if args.show_secret:
out['secret_value'] = match['secret_value']

if args.as_json:
print(json.dumps(out, ensure_ascii=False))
else:
# Pretty print
secret_display = out.get('secret_value', '<hidden>') if args.show_secret else '<hidden>'
print(f"{out['file']}:{out['line_number']} [{out['type']}] {secret_display}")

finally:
if settings_ctx is not None:
settings_ctx.__exit__(None, None, None)

return 0


if __name__ == '__main__':
raise SystemExit(main())