Skip to content

Conversation

@elasticdotventures
Copy link

No description provided.

d6e
d6e previously approved these changes May 30, 2025
Copy link
Owner

@d6e d6e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@d6e d6e dismissed their stale review May 30, 2025 22:57

I realized I overlooked several issues

@elasticdotventures
Copy link
Author

No worries. Are you going to merge it? (i'll switch back to your version rather than running on my fork)

Copy link
Owner

@d6e d6e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah, actually I think it's fine

elasticdotventures and others added 9 commits July 5, 2025 05:45
Output is truncated to fit the specified token limit using Hugging Face's pretrained bert-base-cased tokenizer.
🦨 Skunky: Truncation is by character, not true token boundary; see code comment for future improvement.
All changes build cleanly and are isolated to the explicit requirements.
version 0.2.0; --tldr --max_tokens & new list_crate_items (uses rust analyzer)
@elasticdotventures
Copy link
Author

FYI - not sure why this wasn't merged, but I've continued to add features to my fork.

elasticdotventures and others added 10 commits November 13, 2025 08:25
Co-authored-by: elasticdotventures <[email protected]>
- Update Dockerfile to use Debian slim instead of Alpine
- Improve entrypoint.sh with better error handling
- Add ci.yml workflow for builds/tests on PRs and main
- Add release-and-publish.yml for multi-arch Docker + cross-compiled binaries
- Update set-version.sh to use shell instead of Python for better portability
- Update README.md with comprehensive Docker and release documentation

Co-authored-by: elasticdotventures <[email protected]>
Co-authored-by: elasticdotventures <[email protected]>
…-compilation (#7)

* Initial plan

* Update Docker to Debian and add comprehensive CI/CD workflows

Co-authored-by: elasticdotventures <[email protected]>

* Fix GitHub Actions output redirection syntax

Co-authored-by: elasticdotventures <[email protected]>

* Update GitHub Actions to use newer versions and improve cross-compilation setup

Co-authored-by: elasticdotventures <[email protected]>

* Add Docker and pkgx MCP configuration examples to README

Co-authored-by: elasticdotventures <[email protected]>

---------

Signed-off-by: Brian Horakh <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: elasticdotventures <[email protected]>
* Initial plan

* Fix YAML syntax errors in rust.yml workflow

Co-authored-by: elasticdotventures <[email protected]>

* Fix apply_tldr logic to properly handle multiple LICENSE/VERSION sections

Co-authored-by: elasticdotventures <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: elasticdotventures <[email protected]>
* wip: testing pkgx & docker builds

* feat: pkgx mcp registry

* feat: pkgx

---------

Signed-off-by: Brian Horakh <[email protected]>
Copilot AI review requested due to automatic review settings November 15, 2025 23:09
Copilot finished reviewing on behalf of elasticdotventures November 15, 2025 23:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive support for VSCode and Roo Code MCP integration by making the sessionId query parameter optional in the HTTP SSE server, along with several major feature additions including TLDR mode, token counting, item enumeration, and complete Docker/CI/CD infrastructure.

Key Changes:

  • Made sessionId optional in HTTP SSE endpoint with automatic session creation
  • Added TLDR mode to filter out LICENSE/VERSION sections from documentation
  • Implemented token-aware truncation with configurable max_tokens limit
  • Added list_crate_items tool for enumerating crate contents
  • Introduced comprehensive Docker support with multi-arch builds and GHCR publishing
  • Added release automation with cross-compiled binaries

Reviewed Changes

Copilot reviewed 25 out of 28 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/transport/http_sse_server/http_sse_server.rs Made sessionId optional, added automatic session creation for POST requests without sessionId
src/tools/tldr.rs New module to filter LICENSE and VERSION sections from documentation output
src/tools/item_list.rs New tool to enumerate items (structs, enums, traits, functions) in Rust crates
src/tools/mod.rs Added count_tokens function using Hugging Face tokenizer
src/tools/docs/docs.rs Added tldr and max_tokens support to DocRouter with post-processing logic
src/bin/cratedocs.rs Added version command, tldr/max_tokens flags, list_crate_items tool support, and comprehensive tests
tests/integration_tests.rs Updated tool count from 3 to 4, changed HTML assertions to markdown
Dockerfile New multi-stage Debian-based build with entrypoint script
docker/entrypoint.sh Shell script supporting http/stdio modes with environment configuration
.github/workflows/release-and-publish.yml Automated release workflow with multi-arch Docker builds and cross-compiled binaries
.github/workflows/docker.yml Docker build workflow with MCP registry publishing
.github/workflows/ci.yml CI workflow for build and test automation
README.md Comprehensive rewrite with Docker, pkgx, versioning, and MCP integration examples
server.json MCP server manifest for registry publishing
Cargo.toml Updated package name, version, repository, and dependencies
scripts/set-version.sh Python-based version bumping script for Cargo.toml and Cargo.lock

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

with:
context: .
push: true
platforms: linux/amd64
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker workflow builds only for linux/amd64 (line 54), while the release workflow builds for both linux/amd64 and linux/arm64. This inconsistency means that pushes to main will only have amd64 images, but tagged releases will have both architectures. Consider aligning the platforms across both workflows for consistency.

Suggested change
platforms: linux/amd64
platforms: linux/amd64,linux/arm64

Copilot uses AI. Check for mistakes.
// Match any heading (for ending the skip)
let heading_re = Regex::new(r"^\s*#+").unwrap();
// Match <detail> tags including start, end, and inline attributes
let detail_tag_re = Regex::new(r"<[/]?detail.*?>").unwrap();
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern r"<[/]?detail.*?>" will only match lowercase "detail" tags. HTML tags are case-insensitive, so this should use the case-insensitive flag: r"(?i)<[/]?detail.*?>" to properly match <Detail>, <DETAIL>, etc.

Suggested change
let detail_tag_re = Regex::new(r"<[/]?detail.*?>").unwrap();
let detail_tag_re = Regex::new(r"(?i)<[/]?detail.*?>").unwrap();

Copilot uses AI. Check for mistakes.
if let Some(last_space) = truncated.rfind(' ') {
truncated.truncate(last_space);
}
truncated.push_str(" 内容被截断");
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Chinese text "内容被截断" (meaning "content truncated") should be internationalized or use English for consistency with the rest of the codebase. Consider using "... [truncated]" or a similar English message.

Suggested change
truncated.push_str(" 内容被截断");
truncated.push_str(" [content truncated]");

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +38
# If explicit args provided, run with those
if [ "$#" -gt 0 ]; then
exec /usr/local/bin/cratedocs "$@"
fi

# default behavior: start in selected mode
case "$MODE" in
http)
if [ "${DEBUG}" = "true" ]; then
exec /usr/local/bin/cratedocs http --address "$ADDRESS" --debug
else
exec /usr/local/bin/cratedocs http --address "$ADDRESS"
fi
;;
stdio)
exec /usr/local/bin/cratedocs stdio
;;
*)
echo "Unknown CRATEDOCS_MODE: $MODE" >&2
exit 2
;;
esac
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entrypoint script has duplicate logic for handling the MODE variable. Lines 8-16 execute the command, but then lines 18-38 have another case statement that would never be reached due to the exec call. This dead code should be removed.

Suggested change
# If explicit args provided, run with those
if [ "$#" -gt 0 ]; then
exec /usr/local/bin/cratedocs "$@"
fi
# default behavior: start in selected mode
case "$MODE" in
http)
if [ "${DEBUG}" = "true" ]; then
exec /usr/local/bin/cratedocs http --address "$ADDRESS" --debug
else
exec /usr/local/bin/cratedocs http --address "$ADDRESS"
fi
;;
stdio)
exec /usr/local/bin/cratedocs stdio
;;
*)
echo "Unknown CRATEDOCS_MODE: $MODE" >&2
exit 2
;;
esac
# (lines 17-39 removed; no replacement needed)

Copilot uses AI. Check for mistakes.
cargo run --bin cratedocs test --tool search_crates --query logger --format json

# Summarize output (strip LICENSE and VERSION sections, limit tokens)
cargo run --bin cratedocs test --tool lookup_crate --crate-name tokio --tldr --max_tokens 48000
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the "tldr" mode documentation, the --max_tokens flag should use underscores consistently with the actual parameter name. However, CLI flags typically use hyphens (kebab-case), so this should be --max-tokens to match standard CLI conventions.

Suggested change
cargo run --bin cratedocs test --tool lookup_crate --crate-name tokio --tldr --max_tokens 48000
cargo run --bin cratedocs test --tool lookup_crate --crate-name tokio --tldr --max-tokens 48000

Copilot uses AI. Check for mistakes.
let app_clone = app.clone();
let task_session_id = new_session_id_arc.clone();
tokio::spawn(async move {
let router = RouterService(DocRouter::new());
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DocRouter::new() is used here, but the new session creation logic doesn't respect the tldr and max_tokens settings that might be configured elsewhere in the application. Consider passing these settings or using a consistent configuration mechanism across the application.

Suggested change
let router = RouterService(DocRouter::new());
// Retrieve tldr and max_tokens from app_clone or config
let tldr = app_clone.tldr;
let max_tokens = app_clone.max_tokens;
let router = RouterService(DocRouter::with_config(tldr, max_tokens));

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +14

pub fn count_tokens(text: &str) -> Result<usize, tokenizers::Error> {
// 🦨 skunky: This loads the tokenizer from Hugging Face Hub every call; cache for production.
let tokenizer = Tokenizer::from_pretrained("bert-base-cased", None)?;
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer is loaded from Hugging Face on every call. As the comment notes, this should be cached for production use. Consider implementing a lazy_static or once_cell pattern to cache the tokenizer instance across calls to avoid repeated downloads and initialization overhead.

Suggested change
pub fn count_tokens(text: &str) -> Result<usize, tokenizers::Error> {
// 🦨 skunky: This loads the tokenizer from Hugging Face Hub every call; cache for production.
let tokenizer = Tokenizer::from_pretrained("bert-base-cased", None)?;
use once_cell::sync::Lazy;
static TOKENIZER: Lazy<Result<Tokenizer, tokenizers::Error>> = Lazy::new(|| {
Tokenizer::from_pretrained("bert-base-cased", None)
});
pub fn count_tokens(text: &str) -> Result<usize, tokenizers::Error> {
// 🦨 skunky: Tokenizer is now cached using once_cell::sync::Lazy for production use.
let tokenizer = TOKENIZER.as_ref()?;

Copilot uses AI. Check for mistakes.

/// Utility function to download and cache crate source.
async fn download_and_cache_crate(crate_name: &str, version: &str) -> Result<String> {
let cache_dir = Path::new("./cache");
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache directory ./cache is relative to the current working directory, which can cause issues when the binary is run from different locations. Consider using a system-appropriate cache directory (e.g., user's cache directory via dirs crate) or making this configurable.

Copilot uses AI. Check for mistakes.
"args": [
"-y",
"mcp-remote@latest",
"http://127.0.0.1:3000/sse?sessionId=",
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL parameter in the example shows sessionId= with no value. According to the code changes, sessionId is now optional and should either be omitted entirely or have a value. The empty value sessionId= may cause parsing issues. Update the example to either remove sessionId= or provide a placeholder value like sessionId=abc123.

Suggested change
"http://127.0.0.1:3000/sse?sessionId=",
"http://127.0.0.1:3000/sse?sessionId=abc123",

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +95
- name: Upload x86_64 binary to release
uses: softprops/action-gh-release@v1
with:
files: target/x86_64-unknown-linux-gnu/release/cratedocs
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Upload aarch64 binary to release
uses: softprops/action-gh-release@v1
with:
files: target/aarch64-unknown-linux-gnu/release/cratedocs
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uploaded binaries have identical names (cratedocs) for both architectures, which means the second upload will overwrite the first. The filenames should include the architecture (e.g., cratedocs-x86_64 and cratedocs-aarch64) or be placed in separate directories to prevent conflicts.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants