Summary
When using the OpenAI-compatible rerank provider, hierarchical retrieval can send empty strings to the rerank endpoint because candidate documents are built from result["abstract"]. If some retrieved candidates have an empty or missing abstract, third-party rerank providers may return an empty results array. OpenViking then treats the entire rerank result as invalid and falls back to vector scores for the whole batch.
This makes rerank appear configured but ineffective for affected searches.
Environment
- OpenViking version: 0.3.22
- Platform: Windows
- Rerank provider config:
provider = "openai"
- OpenAI-compatible rerank endpoint
- model: Qwen/Qwen3-Reranker-8B
Observed logs
openviking.models.rerank.openai_rerank - WARNING - [OpenAIRerankClient] Unexpected response format: {'id': '...', 'results': [], 'meta': {'tokens': {'input_tokens': 0, 'output_tokens': 0, 'image_tokens': 0}, ...}}
openviking.retrieve.hierarchical_retriever - WARNING - [HierarchicalRetriever] Invalid rerank result, fallback to vector scores
Root cause
In openviking/retrieve/hierarchical_retriever.py, rerank documents are prepared like this in multiple places:
documents = [str(r.get("abstract", "")) for r in results]
query_scores = self._rerank_scores(query, documents, query_scores)
_rerank_scores() only checks whether the list itself is non-empty. It does not filter out empty strings before calling rerank_batch().
If a provider rejects empty documents or returns no scores for them, OpenAIRerankClient.rerank_batch() returns None, and _rerank_scores() falls back for the whole batch.
Expected behavior
Rerank should be applied to valid non-empty documents. Candidates whose rerank text is empty should keep their vector fallback score, without invalidating the whole batch.
Suggested fix
Filter documents before calling the rerank provider and map returned scores back to their original indices. For example:
rerank_documents = []
rerank_indices = []
for idx, document in enumerate(documents):
text = document.strip() if isinstance(document, str) else str(document).strip()
if text:
rerank_indices.append(idx)
rerank_documents.append(text)
if not rerank_documents:
return fallback_scores
scores = self._rerank_client.rerank_batch(query, rerank_documents)
if not scores or len(scores) != len(rerank_documents):
return fallback_scores
normalized_scores = list(fallback_scores)
for original_index, score in zip(rerank_indices, scores, strict=True):
if isinstance(score, (int, float)):
normalized_scores[original_index] = float(score)
return normalized_scores
This keeps the current fallback behavior but avoids a single empty abstract disabling rerank for all valid candidates in the same batch.
Additional note
A manual request to the same OpenAI-compatible rerank endpoint with non-empty documents returns valid results, so the endpoint/model/key are working. The failure appears to be caused by empty candidate text in OpenViking's rerank input.
Summary
When using the OpenAI-compatible rerank provider, hierarchical retrieval can send empty strings to the rerank endpoint because candidate documents are built from
result["abstract"]. If some retrieved candidates have an empty or missingabstract, third-party rerank providers may return an emptyresultsarray. OpenViking then treats the entire rerank result as invalid and falls back to vector scores for the whole batch.This makes rerank appear configured but ineffective for affected searches.
Environment
provider = "openai"Observed logs
Root cause
In
openviking/retrieve/hierarchical_retriever.py, rerank documents are prepared like this in multiple places:_rerank_scores()only checks whether the list itself is non-empty. It does not filter out empty strings before callingrerank_batch().If a provider rejects empty documents or returns no scores for them,
OpenAIRerankClient.rerank_batch()returnsNone, and_rerank_scores()falls back for the whole batch.Expected behavior
Rerank should be applied to valid non-empty documents. Candidates whose rerank text is empty should keep their vector fallback score, without invalidating the whole batch.
Suggested fix
Filter documents before calling the rerank provider and map returned scores back to their original indices. For example:
This keeps the current fallback behavior but avoids a single empty abstract disabling rerank for all valid candidates in the same batch.
Additional note
A manual request to the same OpenAI-compatible rerank endpoint with non-empty documents returns valid
results, so the endpoint/model/key are working. The failure appears to be caused by empty candidate text in OpenViking's rerank input.