Fix redirect target verification in AsyncUrlSeeder and enhance tests #1622
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixed a critical bug in AsyncUrlSeeder where
_resolve_head()was incorrectly returning redirect targets without verifying they were alive. This could cause dead URLs to be treated as valid during URL discovery.#1603
Key Changes:
verify_redirect_targetsparameter toAsyncUrlSeeder.__init__()(default:True)_resolve_head()to conditionally verify redirect targets based on the parameterBackward Compatibility: Existing code continues to work with improved behavior by default. Users can set
verify_redirect_targets=Falseto restore the previous behavior if needed.List of files changed and why
crawl4ai/async_url_seeder.py- Core bug fix and parameter additiontests/test_async_url_seeder.py- Added unit tests for both verification modestest_scripts/test_async_url_seeder_fixes.py- Comprehensive demo/test suite for all fixestest_scripts/README.md- Documentation for test scriptsHow Has This Been Tested?
Created comprehensive test suite covering:
verify_redirect_targets=FalseRun the test suite with:
python test_scripts/test_async_url_seeder_fixes.pyChecklist: