-
Notifications
You must be signed in to change notification settings - Fork 17
vscode & Roo Code support -- sessionId= is required #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ired sessionId= should not be required
d6e
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
No worries. Are you going to merge it? (i'll switch back to your version rather than running on my fork) |
d6e
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay yeah, actually I think it's fine
Output is truncated to fit the specified token limit using Hugging Face's pretrained bert-base-cased tokenizer. 🦨 Skunky: Truncation is by character, not true token boundary; see code comment for future improvement. All changes build cleanly and are isolated to the explicit requirements.
version 0.2.0; --tldr --max_tokens & new list_crate_items (uses rust analyzer)
|
FYI - not sure why this wasn't merged, but I've continued to add features to my fork. |
feature/list crate items
Co-authored-by: elasticdotventures <[email protected]>
Co-authored-by: elasticdotventures <[email protected]>
Co-authored-by: elasticdotventures <[email protected]>
- Update Dockerfile to use Debian slim instead of Alpine - Improve entrypoint.sh with better error handling - Add ci.yml workflow for builds/tests on PRs and main - Add release-and-publish.yml for multi-arch Docker + cross-compiled binaries - Update set-version.sh to use shell instead of Python for better portability - Update README.md with comprehensive Docker and release documentation Co-authored-by: elasticdotventures <[email protected]>
Co-authored-by: elasticdotventures <[email protected]>
Co-authored-by: elasticdotventures <[email protected]>
…-compilation (#7) * Initial plan * Update Docker to Debian and add comprehensive CI/CD workflows Co-authored-by: elasticdotventures <[email protected]> * Fix GitHub Actions output redirection syntax Co-authored-by: elasticdotventures <[email protected]> * Update GitHub Actions to use newer versions and improve cross-compilation setup Co-authored-by: elasticdotventures <[email protected]> * Add Docker and pkgx MCP configuration examples to README Co-authored-by: elasticdotventures <[email protected]> --------- Signed-off-by: Brian Horakh <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: elasticdotventures <[email protected]>
* Initial plan * Fix YAML syntax errors in rust.yml workflow Co-authored-by: elasticdotventures <[email protected]> * Fix apply_tldr logic to properly handle multiple LICENSE/VERSION sections Co-authored-by: elasticdotventures <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: elasticdotventures <[email protected]>
* wip: testing pkgx & docker builds * feat: pkgx mcp registry * feat: pkgx --------- Signed-off-by: Brian Horakh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive support for VSCode and Roo Code MCP integration by making the sessionId query parameter optional in the HTTP SSE server, along with several major feature additions including TLDR mode, token counting, item enumeration, and complete Docker/CI/CD infrastructure.
Key Changes:
- Made
sessionIdoptional in HTTP SSE endpoint with automatic session creation - Added TLDR mode to filter out LICENSE/VERSION sections from documentation
- Implemented token-aware truncation with configurable max_tokens limit
- Added
list_crate_itemstool for enumerating crate contents - Introduced comprehensive Docker support with multi-arch builds and GHCR publishing
- Added release automation with cross-compiled binaries
Reviewed Changes
Copilot reviewed 25 out of 28 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/transport/http_sse_server/http_sse_server.rs | Made sessionId optional, added automatic session creation for POST requests without sessionId |
| src/tools/tldr.rs | New module to filter LICENSE and VERSION sections from documentation output |
| src/tools/item_list.rs | New tool to enumerate items (structs, enums, traits, functions) in Rust crates |
| src/tools/mod.rs | Added count_tokens function using Hugging Face tokenizer |
| src/tools/docs/docs.rs | Added tldr and max_tokens support to DocRouter with post-processing logic |
| src/bin/cratedocs.rs | Added version command, tldr/max_tokens flags, list_crate_items tool support, and comprehensive tests |
| tests/integration_tests.rs | Updated tool count from 3 to 4, changed HTML assertions to markdown |
| Dockerfile | New multi-stage Debian-based build with entrypoint script |
| docker/entrypoint.sh | Shell script supporting http/stdio modes with environment configuration |
| .github/workflows/release-and-publish.yml | Automated release workflow with multi-arch Docker builds and cross-compiled binaries |
| .github/workflows/docker.yml | Docker build workflow with MCP registry publishing |
| .github/workflows/ci.yml | CI workflow for build and test automation |
| README.md | Comprehensive rewrite with Docker, pkgx, versioning, and MCP integration examples |
| server.json | MCP server manifest for registry publishing |
| Cargo.toml | Updated package name, version, repository, and dependencies |
| scripts/set-version.sh | Python-based version bumping script for Cargo.toml and Cargo.lock |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| with: | ||
| context: . | ||
| push: true | ||
| platforms: linux/amd64 |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Docker workflow builds only for linux/amd64 (line 54), while the release workflow builds for both linux/amd64 and linux/arm64. This inconsistency means that pushes to main will only have amd64 images, but tagged releases will have both architectures. Consider aligning the platforms across both workflows for consistency.
| platforms: linux/amd64 | |
| platforms: linux/amd64,linux/arm64 |
| // Match any heading (for ending the skip) | ||
| let heading_re = Regex::new(r"^\s*#+").unwrap(); | ||
| // Match <detail> tags including start, end, and inline attributes | ||
| let detail_tag_re = Regex::new(r"<[/]?detail.*?>").unwrap(); |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern r"<[/]?detail.*?>" will only match lowercase "detail" tags. HTML tags are case-insensitive, so this should use the case-insensitive flag: r"(?i)<[/]?detail.*?>" to properly match <Detail>, <DETAIL>, etc.
| let detail_tag_re = Regex::new(r"<[/]?detail.*?>").unwrap(); | |
| let detail_tag_re = Regex::new(r"(?i)<[/]?detail.*?>").unwrap(); |
| if let Some(last_space) = truncated.rfind(' ') { | ||
| truncated.truncate(last_space); | ||
| } | ||
| truncated.push_str(" 内容被截断"); |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Chinese text "内容被截断" (meaning "content truncated") should be internationalized or use English for consistency with the rest of the codebase. Consider using "... [truncated]" or a similar English message.
| truncated.push_str(" 内容被截断"); | |
| truncated.push_str(" [content truncated]"); |
| # If explicit args provided, run with those | ||
| if [ "$#" -gt 0 ]; then | ||
| exec /usr/local/bin/cratedocs "$@" | ||
| fi | ||
|
|
||
| # default behavior: start in selected mode | ||
| case "$MODE" in | ||
| http) | ||
| if [ "${DEBUG}" = "true" ]; then | ||
| exec /usr/local/bin/cratedocs http --address "$ADDRESS" --debug | ||
| else | ||
| exec /usr/local/bin/cratedocs http --address "$ADDRESS" | ||
| fi | ||
| ;; | ||
| stdio) | ||
| exec /usr/local/bin/cratedocs stdio | ||
| ;; | ||
| *) | ||
| echo "Unknown CRATEDOCS_MODE: $MODE" >&2 | ||
| exit 2 | ||
| ;; | ||
| esac |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entrypoint script has duplicate logic for handling the MODE variable. Lines 8-16 execute the command, but then lines 18-38 have another case statement that would never be reached due to the exec call. This dead code should be removed.
| # If explicit args provided, run with those | |
| if [ "$#" -gt 0 ]; then | |
| exec /usr/local/bin/cratedocs "$@" | |
| fi | |
| # default behavior: start in selected mode | |
| case "$MODE" in | |
| http) | |
| if [ "${DEBUG}" = "true" ]; then | |
| exec /usr/local/bin/cratedocs http --address "$ADDRESS" --debug | |
| else | |
| exec /usr/local/bin/cratedocs http --address "$ADDRESS" | |
| fi | |
| ;; | |
| stdio) | |
| exec /usr/local/bin/cratedocs stdio | |
| ;; | |
| *) | |
| echo "Unknown CRATEDOCS_MODE: $MODE" >&2 | |
| exit 2 | |
| ;; | |
| esac | |
| # (lines 17-39 removed; no replacement needed) |
| cargo run --bin cratedocs test --tool search_crates --query logger --format json | ||
|
|
||
| # Summarize output (strip LICENSE and VERSION sections, limit tokens) | ||
| cargo run --bin cratedocs test --tool lookup_crate --crate-name tokio --tldr --max_tokens 48000 |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the "tldr" mode documentation, the --max_tokens flag should use underscores consistently with the actual parameter name. However, CLI flags typically use hyphens (kebab-case), so this should be --max-tokens to match standard CLI conventions.
| cargo run --bin cratedocs test --tool lookup_crate --crate-name tokio --tldr --max_tokens 48000 | |
| cargo run --bin cratedocs test --tool lookup_crate --crate-name tokio --tldr --max-tokens 48000 |
| let app_clone = app.clone(); | ||
| let task_session_id = new_session_id_arc.clone(); | ||
| tokio::spawn(async move { | ||
| let router = RouterService(DocRouter::new()); |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DocRouter::new() is used here, but the new session creation logic doesn't respect the tldr and max_tokens settings that might be configured elsewhere in the application. Consider passing these settings or using a consistent configuration mechanism across the application.
| let router = RouterService(DocRouter::new()); | |
| // Retrieve tldr and max_tokens from app_clone or config | |
| let tldr = app_clone.tldr; | |
| let max_tokens = app_clone.max_tokens; | |
| let router = RouterService(DocRouter::with_config(tldr, max_tokens)); |
|
|
||
| pub fn count_tokens(text: &str) -> Result<usize, tokenizers::Error> { | ||
| // 🦨 skunky: This loads the tokenizer from Hugging Face Hub every call; cache for production. | ||
| let tokenizer = Tokenizer::from_pretrained("bert-base-cased", None)?; |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer is loaded from Hugging Face on every call. As the comment notes, this should be cached for production use. Consider implementing a lazy_static or once_cell pattern to cache the tokenizer instance across calls to avoid repeated downloads and initialization overhead.
| pub fn count_tokens(text: &str) -> Result<usize, tokenizers::Error> { | |
| // 🦨 skunky: This loads the tokenizer from Hugging Face Hub every call; cache for production. | |
| let tokenizer = Tokenizer::from_pretrained("bert-base-cased", None)?; | |
| use once_cell::sync::Lazy; | |
| static TOKENIZER: Lazy<Result<Tokenizer, tokenizers::Error>> = Lazy::new(|| { | |
| Tokenizer::from_pretrained("bert-base-cased", None) | |
| }); | |
| pub fn count_tokens(text: &str) -> Result<usize, tokenizers::Error> { | |
| // 🦨 skunky: Tokenizer is now cached using once_cell::sync::Lazy for production use. | |
| let tokenizer = TOKENIZER.as_ref()?; |
|
|
||
| /// Utility function to download and cache crate source. | ||
| async fn download_and_cache_crate(crate_name: &str, version: &str) -> Result<String> { | ||
| let cache_dir = Path::new("./cache"); |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache directory ./cache is relative to the current working directory, which can cause issues when the binary is run from different locations. Consider using a system-appropriate cache directory (e.g., user's cache directory via dirs crate) or making this configurable.
| "args": [ | ||
| "-y", | ||
| "mcp-remote@latest", | ||
| "http://127.0.0.1:3000/sse?sessionId=", |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL parameter in the example shows sessionId= with no value. According to the code changes, sessionId is now optional and should either be omitted entirely or have a value. The empty value sessionId= may cause parsing issues. Update the example to either remove sessionId= or provide a placeholder value like sessionId=abc123.
| "http://127.0.0.1:3000/sse?sessionId=", | |
| "http://127.0.0.1:3000/sse?sessionId=abc123", |
| - name: Upload x86_64 binary to release | ||
| uses: softprops/action-gh-release@v1 | ||
| with: | ||
| files: target/x86_64-unknown-linux-gnu/release/cratedocs | ||
| env: | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| - name: Upload aarch64 binary to release | ||
| uses: softprops/action-gh-release@v1 | ||
| with: | ||
| files: target/aarch64-unknown-linux-gnu/release/cratedocs |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uploaded binaries have identical names (cratedocs) for both architectures, which means the second upload will overwrite the first. The filenames should include the architecture (e.g., cratedocs-x86_64 and cratedocs-aarch64) or be placed in separate directories to prevent conflicts.
No description provided.