Yelp · JanaElfeky · Oct 26, 2025 · Oct 29, 2025 · Oct 29, 2025
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,98 @@
+## Purpose
+
+These instructions help an AI code agent be productive in the detect-secrets repo. Focus on the
+core components, local conventions, test/build workflows, and concrete code locations where
+changes typically belong.
+
+## Big picture (what this project does)
+
+- detect-secrets scans code for potential secrets, produces a JSON-compatible baseline, and
+  provides CLI tools: `detect-secrets` (scan), `detect-secrets-hook` (pre-commit blocking), and
+  `detect-secrets audit` (label baseline results). See `README.md` for usage examples.
+
+## Key components & where to look
+
+- Engine & orchestration: `detect_secrets/core/` — scanning, baseline handling, and plugin init.
+- Global settings: `detect_secrets/settings.py` — Settings singleton, `default_settings()` and
+  `transient_settings()` context managers. Use these when testing or running programmatic scans.
+- Plugin interface: `detect_secrets/plugins/base.py` — subclass `RegexBasedDetector`,
+  `LineBased` or `FileBased` detectors. `secret_type` and `json()` are important for baseline
+  compatibility.
+- Built-in detectors: `detect_secrets/plugins/` — examples of detectors and verification logic
+  (e.g. `high_entropy_strings.py`, `aws.py`). Use these as canonical examples for new plugins.
+- Filters: `detect_secrets/filters/` — filters are referenced by import path or `file://` URL and are
+  configured via the Settings object (see `settings.configure_filters`).
+- Tests & helpers: `testing/` and `tests/` — test fixtures, custom filter/plugin helpers live in
+  `testing/` and unit tests are in `tests/`.
+
+## Project-specific conventions & patterns
+
+- Settings are global via a cached singleton. Prefer `transient_settings({...})` when running
+  isolated changes so caches are cleared and restored.
+
+- Filters are configured by import path strings (e.g.
+  `detect_secrets.filters.heuristic.is_sequential_string`) or by file URLs `file:///path/to.py::func`.
+  Filters must accept injectable variables — see `settings.get_filters()` which attaches
+  `function.injectable_variables`.
+
+- Plugins in baselines are stored as a list of dicts with a `name` key (see
+  `Settings.configure_plugins`) — if you change `secret_type` or plugin class names, note
+  it will affect baseline compatibility.
+
+## Developer workflows (commands to run)
+
+- Run the test suite (recommended via tox):
+
+  tox  # runs envlist from `tox.ini` (py39..py313, mypy)
+
+  Quick local run without tox:
+
+  python -m pytest --strict tests
+
+- Coverage and CI thresholds (enforced in `tox.ini`):
+  - coverage for `tests/*` must be >= 99%
+  - coverage for `testing/*` must be 100%
+  - coverage for `detect_secrets/*` must be >= 95%
+
+- Run type checking (mypy):
+
+  tox -e mypy
+
+- Pre-commit hooks: configured in `.github` and installed with `pre-commit install` or via
+  `tox -e venv` which creates a `venv` env and installs hooks. CI runs `pre-commit run --all-files`.
+
+## How to add a detector or filter (quick checklist)
+
+1. Add a new detector under `detect_secrets/plugins/` by subclassing `RegexBasedDetector` or
+   `BasePlugin` (see `plugins/base.py`). Set a stable `secret_type` and implement `analyze_string`
+   (or `analyze_line`).
+2. Add tests under `tests/plugins/` and helper fixtures in `testing/` if needed.
+3. Run `python -m pytest tests/plugins` and ensure coverage thresholds remain satisfied.
+4. If the detector needs runtime configuration, ensure it can be referenced via the baseline
+   structure (name + optional args) and documented in `README.md` or `docs/`.
+
+## Useful code examples (where to change behavior)
+
+- To temporarily change which plugins run in a script or test, use `transient_settings` in
+  `detect_secrets/settings.py` (example in README under "More Advanced Configuration").
+
+- Verification hooks live in plugin classes (`verify` / `format_scan_result`) and rely on
+  `detect_secrets.constants.VerifiedResult` and the `detect_secrets.filters` settings.
+
+## Integration points & CI
+
+- CI uses GitHub Actions (`.github/workflows/ci.yml`) and enforces tox-based testing and
+  coverage thresholds. Packaging metadata and bumpversion configuration live in `setup.cfg`.
+
+## What an AI agent should avoid changing
+
+- Do not rename an existing plugin class or change `secret_type` without adding a migration
+  note: older baselines rely on the name and JSON fields.
+- Avoid modifying coverage thresholds or CI logic without explicit reviewer sign-off — these
+  are intentionally strict.
+
+## If something's unclear, iterate
+
+Leave a concise PR or comment that references the exact files changed (path + function/class)
+and ask for the maintainer to point to intended behavior. If you'd like, I can iterate on this
+file to add more repo-specific examples — tell me which areas you'd like expanded.
diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -5,6 +5,7 @@ cfgv==3.4.0
 charset-normalizer==3.3.2
 coverage==7.5.3
 distlib==0.3.8
+fastapi
 filelock==3.16.1
 flake8==7.1.1
 gibberish-detector==0.1.1
@@ -38,5 +39,6 @@ types-requests==2.31.0.20240106
 typing-extensions==4.12.2
 unidiff==0.7.5
 urllib3==2.2.2
+uvicorn[standard]
 virtualenv==20.26.6
 zipp==3.19.2
diff --git a/scripts/api_server.py b/scripts/api_server.py
@@ -0,0 +1,86 @@
+from fastapi import FastAPI, Request
+from fastapi.responses import JSONResponse, HTMLResponse
+from typing import List
+import json
+from datetime import datetime
+from pathlib import Path
+from detect_secrets.core.scan import scan_line
+from detect_secrets.settings import transient_settings
+from detect_secrets.core.plugins.util import get_mapping_from_secret_type_to_class
+
+app = FastAPI()
+
+def get_scan_settings():
+    """Configure scan settings with all available detectors"""
+    mapping = get_mapping_from_secret_type_to_class()
+    return {
+        'plugins_used': [
+            {'name': cls.__name__}
+            for cls in mapping.values()
+        ]
+    }
+
+@app.get("/")
+async def root():
+    """Root endpoint that shows API usage instructions"""
+    return HTMLResponse("""
+        <html>
+            <body>
+                <h1>Detect-Secrets API</h1>
+                <h2>Available Endpoints:</h2>
+                <ul>
+                    <li><b>POST /run_original_scan/</b> - Scan logs for secrets</li>
+                </ul>
+                <h2>Example Usage:</h2>
+                <pre>
+POST /run_original_scan/
+Content-Type: application/json
+
+{
+    "logs": [
+        "line 1 to scan",
+        "line 2 to scan",
+        "..."
+    ]
+}
+                </pre>
+            </body>
+        </html>
+    """)
+
+@app.post("/run_original_scan/")
+async def scan_logs(request: Request):
+    data = await request.json()
+    logs: List[str] = data.get("logs", [])
+
+    results = []
+    # Use transient_settings to ensure all detectors are properly configured
+    with transient_settings(get_scan_settings()):
+        for lineno, line in enumerate(logs, start=1):
+            for secret in scan_line(line):
+                results.append({
+                    'line_number': lineno,
+                    'type': secret.type,
+                    'hashed_secret': secret.secret_hash,
+                    'secret_value': secret.secret_value,  # Adding this for debugging
+                })
+
+    # Create output filename with timestamp
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    output_file = Path(f"scan_results_{timestamp}.json")
+
+    # Write the results to the JSON file
+    with open(output_file, 'w', encoding='utf-8') as f:
+        json.dump({
+            "matches": results,
+            "scan_time": timestamp,
+            "total_lines_scanned": len(logs)
+        }, f, indent=2)
+
+    # Return both the results and the file information
+    return JSONResponse({
+        "matches": results,
+        "output_file": str(output_file),
+        "scan_time": timestamp,
+        "total_lines_scanned": len(logs)
+    })
diff --git a/scripts/run_original_scan.py b/scripts/run_original_scan.py
@@ -0,0 +1,109 @@
+# #!/usr/bin/env python3
+# """Run detect-secrets using the repository's original plugin mapping (excluding local additions).
+
+# This script builds the list of plugin classnames from the core mapping and excludes
+# `PiiDetector` (our test plugin) to simulate original behavior. It then streams `logs.txt`
+# and prints newline-delimited JSON for any matches.
+# """
+# from __future__ import annotations
+
+# import json
+# from typing import Iterable
+
+# from detect_secrets.core.plugins.util import get_mapping_from_secret_type_to_class
+# from detect_secrets.settings import transient_settings
+# from detect_secrets.core.scan import scan_line
+
+
+# def main(path: str = 'logs.txt') -> int:
+#     mapping = get_mapping_from_secret_type_to_class()
+
+#     # Use all plugins but configure them to reduce false positives
+#     plugin_cfg = [
+#         {'name': cls.__name__}
+#         for cls in mapping.values()
+#     ]
+
+#     settings = {
+#         'plugins_used': plugin_cfg,
+#         'filters_used': [
+#             # Exclude sequential strings that look random but aren't secrets
+#             {'path': 'detect_secrets.filters.heuristic.is_sequential_string'},
+#             # Exclude strings that are likely to be gibberish
+#             {'path': 'detect_secrets.filters.heuristic.is_likely_id_string'},
+#             # Exclude normal looking file paths
+#             {'path': 'detect_secrets.filters.heuristic.is_templated_secret'},
+#             # Only flag strings that match common secret patterns
+#             {'path': 'detect_secrets.filters.common.is_secret_pattern'},
+#         ],
+#         # Increase minimum entropy threshold for base64 strings
+#         'base64_limit': 5.5,
+#         # Increase minimum entropy threshold for hex strings
+#         'hex_limit': 3.5,
+#     }
+
+#     with transient_settings(settings):
+#         with open(path, 'r', encoding='utf-8', errors='replace') as fh:
+#             for lineno, line in enumerate(fh, start=1):
+#                 for secret in scan_line(line):
+#                     out = {
+#                         'file': path,
+#                         'line_number': lineno,
+#                         'type': secret.type,
+#                         'hashed_secret': secret.secret_hash,
+#                     }
+#                     # Avoid printing plaintext secrets by default (privacy)
+#                     print(json.dumps(out, ensure_ascii=False))
+
+#     return 0
+
+
+# if __name__ == '__main__':
+#     raise SystemExit(main())
+
+
+"""Run detect-secrets using the repository's original plugin mapping (excluding local additions).
+
+This script builds the list of plugin classnames from the core mapping and excludes
+`PiiDetector` (our test plugin) to simulate original behavior. It then streams `logs.txt`
+and prints newline-delimited JSON for any matches.
+"""
+from __future__ import annotations
+
+import json
+from typing import Iterable
+
+from detect_secrets.core.plugins.util import get_mapping_from_secret_type_to_class
+from detect_secrets.settings import transient_settings
+from detect_secrets.core.scan import scan_line
+
+
+def main(path: str = 'logs.txt') -> int:
+    mapping = get_mapping_from_secret_type_to_class()
+
+    # Use the original plugins as defined by their secret_type mapping, but remove
+    # our experimental plugin (if present) so we simulate the original repo behavior.
+    plugin_cfg = [
+        {'name': cls.__name__}
+        for cls in mapping.values()
+        if cls.__name__ != 'PiiDetector'
+    ]
+
+    with transient_settings({'plugins_used': plugin_cfg}):
+        with open(path, 'r', encoding='utf-8', errors='replace') as fh:
+            for lineno, line in enumerate(fh, start=1):
+                for secret in scan_line(line):
+                    out = {
+                        'file': path,
+                        'line_number': lineno,
+                        'type': secret.type,
+                        'hashed_secret': secret.secret_hash,
+                    }
+                    # Avoid printing plaintext secrets by default (privacy)
+                    print(json.dumps(out, ensure_ascii=False))
+
+    return 0
+
+
+if __name__ == '__main__':
+    raise SystemExit(main())
diff --git a/scripts/scan_logs.py b/scripts/scan_logs.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+"""Simple wrapper to scan log files line-by-line using detect-secrets' adhoc API.
+
+Usage examples:
+  # scan default `log.txt` in repo root and pretty-print results
+  python scripts/scan_logs.py
+
+  # scan a specific file and output newline-delimited JSON, showing matched secret values
+  python scripts/scan_logs.py /path/to/logfile.log --json --show-secret
+
+  # only enable a specific plugin by class name 
+  python scripts/scan_logs.py --only PiiDetector
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from typing import Iterable
+
+from detect_secrets.core.scan import scan_line
+from detect_secrets.settings import transient_settings
+
+
+def iter_matches_for_file(path: str) -> Iterable[dict]:
+    with open(path, 'r', encoding='utf-8', errors='replace') as fh:
+        for lineno, line in enumerate(fh, start=1):
+            for secret in scan_line(line):
+                yield {
+                    'file': path,
+                    'line_number': lineno,
+                    'type': secret.type,
+                    'hashed_secret': secret.secret_hash,
+                    'secret_value': secret.secret_value,
+                }
+
+
+def main(argv=None) -> int:
+    parser = argparse.ArgumentParser(description='Scan log files for secrets/PII using detect-secrets')
+    parser.add_argument('path', nargs='?', default='log.txt', help='Path to log file (default: log.txt)')
+    parser.add_argument('--only', action='append', dest='only_plugins', help='Only enable these plugin class names (repeatable)')
+    parser.add_argument('--json', action='store_true', dest='as_json', help='Print newline-delimited JSON for each match')
+    parser.add_argument('--show-secret', action='store_true', dest='show_secret', help='Include plaintext secret_value in output (off by default)')
+    parser.add_argument('--encoding', default='utf-8', help='File encoding (default: utf-8)')
+
+    args = parser.parse_args(argv)
+
+    # If user asked to limit plugins, use transient_settings to restrict plugins to only those names.
+    if args.only_plugins:
+        plugins_cfg = [{'name': name} for name in args.only_plugins]
+        settings_ctx = transient_settings({'plugins_used': plugins_cfg})
+    else:
+        # contextmanager that does nothing (transient_settings requires an arg), so use a dummy one
+        settings_ctx = transient_settings({'plugins_used': []}) if False else None
+
+    try:
+        if settings_ctx is not None:
+            settings_ctx.__enter__()
+
+        # Stream and print
+        for match in iter_matches_for_file(args.path):
+            out = {
+                'file': match['file'],
+                'line_number': match['line_number'],
+                'type': match['type'],
+                'hashed_secret': match['hashed_secret'],
+            }
+            if args.show_secret:
+                out['secret_value'] = match['secret_value']
+
+            if args.as_json:
+                print(json.dumps(out, ensure_ascii=False))
+            else:
+                # Pretty print
+                secret_display = out.get('secret_value', '<hidden>') if args.show_secret else '<hidden>'
+                print(f"{out['file']}:{out['line_number']} [{out['type']}] {secret_display}")
+
+    finally:
+        if settings_ctx is not None:
+            settings_ctx.__exit__(None, None, None)
+
+    return 0
+
+
+if __name__ == '__main__':
+    raise SystemExit(main())