Skip to content

fix(audio): bind tmp_wav before try in OpenAI transcribe() (#366)#392

Open
SAY-5 wants to merge 2 commits into
linagora:devfrom
SAY-5:fix/audio-finally-unbound-local
Open

fix(audio): bind tmp_wav before try in OpenAI transcribe() (#366)#392
SAY-5 wants to merge 2 commits into
linagora:devfrom
SAY-5:fix/audio-finally-unbound-local

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented May 11, 2026

Fixes #366.

AudioTranscriber.transcribe() only assigned tmp_wav inside the if/else block, so when AudioSegment.from_file raised before that point, the finally clause hit UnboundLocalError and masked the real decode failure.

Move tmp_wav = None ahead of the try so the cleanup branch always sees the name; the real exception now propagates to the caller. Added a small regression test that mirrors the buggy / fixed control flow without pulling in the heavy ray / openai dependencies.

Summary by CodeRabbit

  • Tests
    • Added comprehensive regression test suite for audio transcription error handling, covering edge cases in control flow where variable initialization timing affects exception propagation. Test cases validate both problematic patterns that inadvertently mask errors and corrected patterns that ensure audio processing failures are properly reported.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68994c2a-edec-440b-ad40-a05c3cf0824c

📥 Commits

Reviewing files that changed from the base of the PR and between c5e2566 and 49cc3a8.

📒 Files selected for processing (1)
  • openrag/components/indexer/loaders/audio/test_openai.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • openrag/components/indexer/loaders/audio/test_openai.py

📝 Walkthrough

Walkthrough

This PR adds a regression test suite that models the try/finally cleanup pattern in audio transcription. The tests demonstrate how an uninitialized local variable in a finally block can mask the original exception, and verify that pre-initializing the variable allows proper exception propagation.

Changes

Audio Transcriber Error Handling

Layer / File(s) Summary
Regression test for finally cleanup control flow
openrag/components/indexer/loaders/audio/test_openai.py
TestTranscribeFinallyCleanup includes a buggy static method that triggers UnboundLocalError when tmp_wav is referenced in finally, and a fixed method where tmp_wav is pre-initialized so the original RuntimeError propagates. Two pytest tests assert the exception behavior for both patterns.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

fix

Poem

🐰 A test hops forth to catch the bug,
Where finally unbound variables tug,
Initialize first, let exceptions through—
The original truth, now crystal and true!
bounces triumphantly 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main fix: binding tmp_wav before the try block in OpenAI's transcribe() function to prevent UnboundLocalError during cleanup.
Linked Issues check ✅ Passed The PR fully addresses all requirements from issue #366: initializing tmp_wav before try, enabling safe finally cleanup, propagating original exceptions, and adding a regression test.
Out of Scope Changes check ✅ Passed All changes are scoped to the regression test for the tmp_wav initialization fix. No unrelated modifications or unintended alterations were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Collaborator

@Ahmath-Gadji Ahmath-Gadji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution — I approve these changes.
Could you retarget the PR to the dev branch?

Our workflow is as the following: We merge features and bug fixes on dev, and once dev is stable and tested we rebase main on dev then proceed to a new release

@SAY-5 SAY-5 changed the base branch from main to dev May 12, 2026 16:59
@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 12, 2026

Done, retargeted to dev. Thanks for the review.

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 12, 2026

Retargeted to dev. Thanks!

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 12, 2026

Retargeted to dev as requested. Thanks for the review.

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 12, 2026

Retargeted to dev. Thanks!

1 similar comment
@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 12, 2026

Retargeted to dev. Thanks!

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 13, 2026

Looks like the dev branch already initializes tmp_wav at the top of transcribe() since cbcd054, so this fix has effectively landed there. Happy for you to close this PR or, if you'd like the test coverage, I can rebase a tests-only commit. Let me know.

@Ahmath-Gadji
Copy link
Copy Markdown
Collaborator

Looks like the dev branch already initializes tmp_wav at the top of transcribe() since cbcd054, so this fix has effectively landed there. Happy for you to close this PR or, if you'd like the test coverage, I can rebase a tests-only commit. Let me know.

Actually i think your commit is much needed as on dev is not initialized inside the try...catch block. It would be nice to have that test block aswell. See the current state of dev

try:
logger.bind(file=file_path.name)
suffix = file_path.suffix.lower()
if suffix in self.direct_upload_suffixes:
wav_path = file_path
tmp_wav = None

After rebase we will merge it

SAY-5 added 2 commits May 13, 2026 12:03
When AudioSegment.from_file raises before tmp_wav is assigned, the
finally clause that checks 'if tmp_wav' crashes with UnboundLocalError
and masks the original exception, leaving operators with a confusing
stack trace instead of the real decode error.

Initialize tmp_wav = None ahead of the try block so the cleanup branch
always sees the name.

Signed-off-by: Sai Asish Y <say.apm35@gmail.com>
@SAY-5 SAY-5 force-pushed the fix/audio-finally-unbound-local branch from c5e2566 to 49cc3a8 Compare May 13, 2026 19:04
@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 13, 2026

Rebased onto dev. The transcribe() fix already landed in dev, so the diff is now just the regression test (TestTranscribeFinallyCleanup) that exercises the buggy pattern vs the fixed pattern.

@coderabbitai coderabbitai Bot added the fix Fix issue label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Fix issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audio loader masks errors with UnboundLocalError on cleanup

2 participants