OpenAI-compatible rerank falls back for whole batch when candidates contain empty abstracts

﻿### Summary

When using the OpenAI-compatible rerank provider, hierarchical retrieval can send empty strings to the rerank endpoint because candidate documents are built from `result["abstract"]`. If some retrieved candidates have an empty or missing `abstract`, third-party rerank providers may return an empty `results` array. OpenViking then treats the entire rerank result as invalid and falls back to vector scores for the whole batch.

This makes rerank appear configured but ineffective for affected searches.

### Environment

- OpenViking version: 0.3.22
- Platform: Windows
- Rerank provider config:
  - `provider = "openai"`
  - OpenAI-compatible rerank endpoint
  - model: Qwen/Qwen3-Reranker-8B

### Observed logs

```text
openviking.models.rerank.openai_rerank - WARNING - [OpenAIRerankClient] Unexpected response format: {'id': '...', 'results': [], 'meta': {'tokens': {'input_tokens': 0, 'output_tokens': 0, 'image_tokens': 0}, ...}}
openviking.retrieve.hierarchical_retriever - WARNING - [HierarchicalRetriever] Invalid rerank result, fallback to vector scores
```

### Root cause

In `openviking/retrieve/hierarchical_retriever.py`, rerank documents are prepared like this in multiple places:

```python
documents = [str(r.get("abstract", "")) for r in results]
query_scores = self._rerank_scores(query, documents, query_scores)
```

`_rerank_scores()` only checks whether the list itself is non-empty. It does not filter out empty strings before calling `rerank_batch()`.

If a provider rejects empty documents or returns no scores for them, `OpenAIRerankClient.rerank_batch()` returns `None`, and `_rerank_scores()` falls back for the whole batch.

### Expected behavior

Rerank should be applied to valid non-empty documents. Candidates whose rerank text is empty should keep their vector fallback score, without invalidating the whole batch.

### Suggested fix

Filter documents before calling the rerank provider and map returned scores back to their original indices. For example:

```python
rerank_documents = []
rerank_indices = []
for idx, document in enumerate(documents):
    text = document.strip() if isinstance(document, str) else str(document).strip()
    if text:
        rerank_indices.append(idx)
        rerank_documents.append(text)

if not rerank_documents:
    return fallback_scores

scores = self._rerank_client.rerank_batch(query, rerank_documents)

if not scores or len(scores) != len(rerank_documents):
    return fallback_scores

normalized_scores = list(fallback_scores)
for original_index, score in zip(rerank_indices, scores, strict=True):
    if isinstance(score, (int, float)):
        normalized_scores[original_index] = float(score)
return normalized_scores
```

This keeps the current fallback behavior but avoids a single empty abstract disabling rerank for all valid candidates in the same batch.

### Additional note

A manual request to the same OpenAI-compatible rerank endpoint with non-empty documents returns valid `results`, so the endpoint/model/key are working. The failure appears to be caused by empty candidate text in OpenViking's rerank input.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI-compatible rerank falls back for whole batch when candidates contain empty abstracts #2330

Summary

Environment

Observed logs

Root cause

Expected behavior

Suggested fix

Additional note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OpenAI-compatible rerank falls back for whole batch when candidates contain empty abstracts #2330

Description

Summary

Environment

Observed logs

Root cause

Expected behavior

Suggested fix

Additional note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions