Stabilize native responses history and add cleaned Exposure Studio#67
Conversation
| finally: | ||
| try: | ||
| lock_path.unlink(missing_ok=True) | ||
| except Exception: |
|
|
||
| import argparse | ||
| import json | ||
| import os |
|
|
||
| import argparse | ||
| import json | ||
| import os |
|
|
||
| import argparse | ||
| import json | ||
| import re |
| from lane_shared import ( | ||
| acquire_lane_lock, | ||
| SIGNAL_DEFINITIONS, | ||
| SignalDefinition, | ||
| env_or_default, | ||
| extract_signals, | ||
| iso_now, | ||
| load_last_good_lane, | ||
| read_json_file, | ||
| resolve_suggested_paths, | ||
| save_last_good_lane, | ||
| sanitize_text, | ||
| split_sentences, | ||
| write_json_file, | ||
| ) |
| try { | ||
| return `key:${crypto | ||
| .createHash('sha256') | ||
| .update(normalizedApiKey) |
Check failure
Code scanning / CodeQL
Use of password hash with insufficient computational effort High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
General approach: Ensure that when a sensitive secret (like an API key) is transformed into a stored/long‑lived identifier, we use a key derivation / password hashing function that is intentionally slow and parameterizable (e.g., crypto.scryptSync or crypto.pbkdf2Sync) rather than a fast hash or HMAC. This both aligns with CodeQL’s expectations and increases resistance to brute‑force attacks if these identifiers and the derivation parameters are ever exposed.
Best concrete fix here:
- Replace the use of
crypto.createHmac('sha256', 'anygpt-native-owner-scope')with a call to a password‑hashing / KDF function such ascrypto.scryptSync, which is built into Node and requires no new dependencies. - Keep behavior as close as possible: we still want a deterministic, pseudonymous, truncated hex string derived from the API key, using the same static string as the salt so that existing scopes remain stable.
- Implementation details:
- Use
crypto.scryptSync(normalizedApiKey, 'anygpt-native-owner-scope', 32)to derive 32 bytes. - Convert to hex and slice to 24 chars as before.
- Wrap this in the existing
try/catchblock; on errors, retain the current fallback ofnormalizedApiKey.slice(0, 8).
- Use
Required code changes (all in apps/api/routes/nativeProviders.ts):
- In
buildNativeResponsesOwnerScope, lines 300–304, replace the HMAC-based derivation withcrypto.scryptSyncas described above. - No new imports are needed because we already import
cryptofrom'node:crypto'.
| @@ -297,11 +297,12 @@ | ||
| : ''; | ||
| if (!normalizedApiKey) return undefined; | ||
| try { | ||
| return `key:${crypto | ||
| .createHmac('sha256', 'anygpt-native-owner-scope') | ||
| .update(normalizedApiKey) | ||
| .digest('hex') | ||
| .slice(0, 24)}`; | ||
| const derivedKey = crypto.scryptSync( | ||
| normalizedApiKey, | ||
| 'anygpt-native-owner-scope', | ||
| 32 | ||
| ); | ||
| return `key:${derivedKey.toString('hex').slice(0, 24)}`; | ||
| } catch { | ||
| return `key:${normalizedApiKey.slice(0, 8)}`; | ||
| } |
| error: buildSafeUpstreamErrorMessage(statusCode, { | ||
| label: `${family} native request`, | ||
| rateLimitMessage: | ||
| 'Rate limit or quota exceeded at the upstream provider. Please retry later.' | ||
| }), |
| request.method === 'GET' && | ||
| (subpath === '/models' || subpath === '/v1/models') | ||
| ) { | ||
| const payload = await buildModelsPayload(request); |
| (subpath === '/models' || subpath === '/v1/models') | ||
| ) { | ||
| const payload = await buildModelsPayload(request); | ||
| applyResponseHeaders(response, payload.headers); |
| completionTokens: usage.completionTokens | ||
| }); | ||
| } | ||
| let responsePayloadForClient = parsedResponse; |
| } | ||
| } | ||
|
|
||
| function estimateTokens(content: unknown): number { |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f83366301
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| responsePayloadForClient = rewriteNativeResponsesResponseObject( | ||
| parsedResponse, | ||
| nativeResponsesHistoryContext.proxyResponseId | ||
| ); |
There was a problem hiding this comment.
Map proxy IDs before forwarding /responses/{id} requests
This code rewrites successful native /responses objects to a proxy-owned id, but only previous_response_id in request bodies is translated later; path-based operations never get remapped. In practice, a client that calls responses.retrieve(created.id) (or cancel/input-items endpoints using that returned id) will send the proxy id upstream and receive a not-found error because the upstream only knows its original id. Resolve stored upstream_response_id for path-based response-id routes before forwarding.
Useful? React with 👍 / 👎.
| } else { | ||
| console.log(' 𐄂 OpenRouter compatible routes disabled.'); | ||
| } | ||
| app.use('/native', nativeProviderRouter); |
There was a problem hiding this comment.
Respect ENABLE_ route flags for native passthrough*
The native router is mounted unconditionally, so disabling a family with ENABLE_OPENAI_ROUTES=false (or similar flags) no longer prevents traffic to that family because /native/<family>/... remains available. This bypasses the documented per-router enable/disable controls and can unexpectedly expose providers operators intended to turn off. Gate native family handling behind the same route flags (or an equivalent native-specific toggle).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a new “Exposure Studio” (SurfaceScope) demo app and introduces a native provider passthrough router with stabilized /responses history handling, while isolating API test harness data under .tmp/test-data.
Changes:
- Added Exposure Studio (Rust + static frontend) plus Python research lane runners and shared lane utilities.
- Introduced
/native/:family/*passthrough router with proxy-owned Responses API ids, deterministicprevious_response_idhandling, and provider provenance persisted in history entries. - Updated API test harness/scripts to write isolated fixture files under
apps/api/.tmp/test-dataand added SDK/native responses regression coverage.
Reviewed changes
Copilot reviewed 32 out of 35 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/exposure-studio/src/models.rs | Defines JSON-serializable SurfaceScope domain and lane output models. |
| apps/exposure-studio/src/main.rs | Implements Axum backend, JSON loading, enrichment logic, and approved-resource scanning endpoints. |
| apps/exposure-studio/run.sh | Adds a unified runner for Rust app + Python lane modes with venv bootstrapping. |
| apps/exposure-studio/README.md | Documents SurfaceScope purpose, safe-use guardrails, and lane workflows. |
| apps/exposure-studio/PRODUCT_SCOPE.md | Captures product positioning, guardrails, and bounded improvement roadmap. |
| apps/exposure-studio/frontend/styles.css | Adds frontend styling for the SurfaceScope dashboard UI. |
| apps/exposure-studio/frontend/index.html | Adds the SurfaceScope dashboard layout and panels. |
| apps/exposure-studio/frontend/app.js | Implements frontend data fetching and rendering for dashboard panels. |
| apps/exposure-studio/data/assets.json | Adds demo asset inventory fixtures (authorized assets + approved resources). |
| apps/exposure-studio/data/findings.json | Adds demo finding fixtures with workflow metadata (visibility, report status, grace). |
| apps/exposure-studio/data/compliance_sources.json | Adds demo compliance/reference fixtures. |
| apps/exposure-studio/data/testing_methodologies.json | Adds demo testing methodology fixtures. |
| apps/exposure-studio/data/vulnerability_intelligence.json | Adds demo vulnerability intelligence fixtures. |
| apps/exposure-studio/data/research_web_targets.json | Adds website lane seed targets for bounded crawling. |
| apps/exposure-studio/Cargo.toml | Introduces the new surfacescope Rust crate and dependencies. |
| apps/exposure-studio/backend/youtube_research_runner.py | Adds transcript-first YouTube lane runner with retries, caching, and “doctor” mode. |
| apps/exposure-studio/backend/website_research_runner.py | Adds bounded same-origin website crawl lane runner with “doctor” mode. |
| apps/exposure-studio/backend/implementation_lane_runner.py | Adds unified lane merger + health status computation + “doctor” aggregation. |
| apps/exposure-studio/backend/lane_shared.py | Adds shared signal definitions, extraction helpers, and lane lock/snapshot utilities. |
| apps/exposure-studio/backend/requirements-youtube-research.txt | Pins Python deps for YouTube transcription lane. |
| apps/exposure-studio/.lane-secrets.env.example | Provides example env vars for cookies/proxy/retry tuning. |
| apps/exposure-studio/.gitignore | Ignores venv, generated lane outputs, locks/snapshots, and cookie artifacts. |
| apps/api/server.ts | Mounts the new native passthrough router under /native. |
| apps/api/routes/nativeProviders.ts | Implements /native passthrough routing, failover/cooldowns, and responses-history rewriting/persistence. |
| apps/api/modules/responsesHistory.ts | Extends stored history entries with owner scope and provider provenance fields. |
| apps/api/modules/providerIdentity.ts | Adds provider family resolution utilities used by native routing. |
| apps/api/package.json | Moves test harness providers/keys/models files under .tmp/test-data. |
| apps/api/dev/testWs.ts | Reuses a shared default test API key constant. |
| apps/api/dev/testSetup.ts | Writes isolated test fixtures under apps/api/.tmp/test-data and centralizes default test API key. |
| apps/api/dev/testOpenAISdk.ts | Adds native /responses regression coverage (proxy ids + previous_response_id behavior). |
| apps/api/dev/testApi.ts | Uses shared default test API key for API tests. |
| apps/api/dev/runTestServer.sh | Boots test server with .tmp/test-data fixture paths and allows mock providers during test boot. |
| .gitignore | Adds ignores for additional apps/api/.test-data/* fixture files (legacy path). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn extract_html_title(body: &str) -> String { | ||
| let lower = body.to_lowercase(); | ||
| let open = lower.find("<title>"); | ||
| let close = lower.find("</title>"); | ||
| match (open, close) { | ||
| (Some(start), Some(end)) if end > start + 7 => body[start + 7..end].trim().to_string(), | ||
| _ => String::new(), | ||
| } | ||
| } |
There was a problem hiding this comment.
extract_html_title lowercases the entire body and then uses the byte indices from the lowercased string to slice the original body. For non-ASCII input, to_lowercase() can change UTF-8 byte length, which can make the slice indices invalid and panic at runtime. Consider searching in a byte-wise case-insensitive way (ASCII only) or locating the <title> bounds on the original string without reusing indices from a transformed copy.
| for url in candidate_urls_for_resource(resource) { | ||
| match client.get(url.clone()).send().await { | ||
| Ok(response) => { | ||
| final_url = response.url().to_string(); | ||
| http_status = Some(response.status().as_u16()); | ||
| server_header = response | ||
| .headers() | ||
| .get(header::SERVER) | ||
| .and_then(|value| value.to_str().ok()) | ||
| .unwrap_or_default() | ||
| .to_string(); | ||
| let body = response.text().await.unwrap_or_default(); | ||
| page_title = extract_html_title(&body); | ||
| reachability = if http_status == Some(200) { | ||
| "reachable".to_string() | ||
| } else { | ||
| "reachable-with-error".to_string() | ||
| }; | ||
| if page_title.is_empty() { | ||
| notes.push("HTTP response did not include an HTML title tag.".to_string()); | ||
| } | ||
| break; | ||
| } | ||
| Err(error) => { | ||
| notes.push(format!("HTTP probe failed for {url}: {error}")); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The scan loop stops on the first HTTP response even when it is a non-200 (e.g. HTTPS returns 404), so protocol fallbacks from candidate_urls_for_resource() (https→http) may never be attempted. If the intent is to find a reachable origin, consider continuing to the next candidate when the status is not successful (or at least not 2xx), while still recording the final attempt details when all candidates fail.
| function renderAssets(assets) { | ||
| return ` | ||
| <ul class="list"> | ||
| ${assets.map((asset) => ` | ||
| <li class="card"> | ||
| <div class="card-row"> | ||
| <strong>${asset.name}</strong> | ||
| <span class="pill pill-${asset.authorization_state}">${asset.authorization_state}</span> | ||
| </div> | ||
| <p>${asset.kind} · ${asset.owner}</p> | ||
| ${asset.resource_scope ? `<p class="detail"><strong>Scope:</strong> ${asset.resource_scope}</p>` : ''} | ||
| ${asset.authorization_basis ? `<p class="detail"><strong>Authorization basis:</strong> ${asset.authorization_basis}</p>` : ''} | ||
| ${asset.authorization_reference ? `<p class="detail"><strong>Authorization proof:</strong> ${asset.authorization_reference}</p>` : ''} | ||
| <p class="muted">${asset.notes}</p> | ||
| </li> | ||
| `).join('')} | ||
| </ul> | ||
| `; |
There was a problem hiding this comment.
The frontend renders API-provided strings via template literals into innerHTML without escaping (e.g. asset.name, asset.notes, etc.). If any of this data can be influenced by users/operators, this is an XSS risk. Prefer creating DOM nodes and setting textContent, or escape interpolated values before inserting HTML.
|
@copilot automatically fix issues codeql found with pr |
- nativeProviders.ts: replace createHash('sha256') with createHmac for
owner scope ID derivation (fixes CodeQL CWE-916 insufficient hash effort)
- app.js: add escapeHtml()/safeHref() helpers and apply throughout all
render functions to prevent XSS via innerHTML template interpolation
- main.rs: fix extract_html_title to use byte-level ASCII lowercasing
(avoids UTF-8 index mismatches from to_lowercase() on non-ASCII input)
- main.rs: fix scan_resource loop to continue on non-2xx responses,
only break on successful 2xx (allows http fallback after https 404)
Agent-Logs-Url: https://github.com/AnyVM-Tech/AnyGPT/sessions/212a346a-4054-4a08-9865-7cb1c6b439f2
Co-authored-by: skullcrushercmd <93234024+skullcrushercmd@users.noreply.github.com>
Fixed all CodeQL and reviewer-flagged issues in commit 248f840:
|
createHash('sha256')with HMAC for owner scope derivation innativeProviders.tsescapeHtml/safeHrefhelpers and apply throughoutapp.jsinexposure-studioextract_html_title: use byte-level ASCII lowercasing instead ofto_lowercase()to avoid UTF-8 index mismatches