Skip to content

fix(rerank): filter empty docs before rerank API; concurrent read_batch#2619

Open
njuboy11 wants to merge 1 commit into
volcengine:mainfrom
njuboy11:fix/rerank-empty-doc-and-concurrent-convert
Open

fix(rerank): filter empty docs before rerank API; concurrent read_batch#2619
njuboy11 wants to merge 1 commit into
volcengine:mainfrom
njuboy11:fix/rerank-empty-doc-and-concurrent-convert

Conversation

@njuboy11

Copy link
Copy Markdown

Problem

Two issues causing slow search (up to 8.3s worst-case):

1. Reranker fails when documents contain empty strings

SiliconFlow API silently drops empty documents from results. When one of N documents has an empty string, the API returns N-1 results. The client detects the count mismatch and falls back to vector scores, wasting the API call entirely (approximately 180ms per call, plus worse result quality).

Log evidence:

[OpenAIRerankClient] Unexpected rerank result length: expected=24 actual=23
[HierarchicalRetriever] Invalid rerank result, fallback to vector scores

2. read_batch reads files sequentially

VikingFS.read_batch() loops through URIs sequentially with await self.abstract() for each file. For 24 candidates each with up to 5 relations, that is up to 120 serial file reads.

Fix

openai_rerank.py

  • Filter out empty/blank documents before sending to the rerank API
  • Track original indices with non_empty_indices list
  • Map API results back to original document indices, with 0.0 for empty docs
  • Also add top_n parameter to prevent SiliconFlow default top_n=12 truncation

viking_fs.py

  • Rewrite read_batch() to use asyncio.gather() for concurrent I/O
  • Add Tuple to typing imports

Test Results

Before:

[OV-TIMING] global_search: 4632ms
[OV-TIMING] recursive_search: 334ms
[OV-TIMING] convert: 3385ms
Total worst: approximately 8.3s

After:

[OV-TIMING] global_search: 212ms
[OV-TIMING] recursive_search: 220ms
[OV-TIMING] convert: 4ms
Total: approximately 436ms

Reranker now works correctly - no more "Invalid rerank result" warnings.

…d_batch

1. openai_rerank.py: Filter out empty/blank documents before sending to
   rerank API (SiliconFlow drops empty docs, causing result count mismatch).
   Map results back to original indices with 0.0 score for empty docs.

2. viking_fs.py: Make read_batch truly concurrent with asyncio.gather
   instead of sequential file reads. Added Tuple to typing imports.

These two issues combined could add 3-5s to search latency with
24 candidates (empty doc causes rerank fallback + serial file reads).
@github-actions

Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 80
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: fix(rerank): filter empty docs and add top_n parameter

Relevant files:

  • openviking/models/rerank/openai_rerank.py

Sub-PR theme: perf: concurrent read_batch with asyncio.gather

Relevant files:

  • openviking/storage/viking_fs.py

⚡ Recommended focus areas for review

Backward Incompatibility

read_batch() now includes URIs that failed to read in the result dict with empty string value, whereas the old implementation omitted them entirely. This changes API contract and may break existing callers that expect only successfully read URIs in the output.

async def read_batch(
    self, uris: List[str], level: str = "l0", ctx: Optional[RequestContext] = None
) -> Dict[str, str]:
    """Batch read content from multiple URIs concurrently."""
    async def _read_one(uri: str) -> Tuple[str, str]:
        try:
            if level == "l0":
                content = await self.abstract(uri, ctx=ctx)
            elif level == "l1":
                content = await self.overview(uri, ctx=ctx)
            else:
                content = ""
            return uri, content
        except Exception:
            return uri, ""

    tasks = [_read_one(uri) for uri in uris]
    gathered = await asyncio.gather(*tasks)
    return {uri: content for uri, content in gathered}
Error Handling Regression

The new read_batch() swallows all exceptions without logging, making it impossible to diagnose why individual URI reads failed. The old implementation at least didn't mask failures completely (by omitting failed URIs).

except Exception:
    return uri, ""

@github-actions

Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant