Skip to content

Conversation

@Ahmed-Tawfik94
Copy link
Collaborator

Summary

This PR fixes a concurrency bug in AsyncWebCrawler.arun_many() when using managed browsers. The issue was that all concurrent crawl tasks were fighting over one shared tab, causing failures. The fix modifies the get_page() method in browser_manager.py to always create new pages instead of reusing context.pages[0] for managed browsers.

Fixes #1563

List of files changed and why

  1. crawl4ai/browser_manager.py - Modified the get_page() method to create new pages for managed browsers instead of reusing the first page, which was causing tab contention in concurrent scenarios.

  2. tests/test_cdp_concurrency_compact.py - Created a comprehensive test suite that verifies the concurrency fix works correctly across multiple scenarios including basic arun_many functionality, managed CDP browsers, and various concurrency patterns.

How Has This Been Tested?

  1. Created and ran a comprehensive test suite with 6 different test scenarios:

    • Basic arun_many functionality test
    • Managed CDP browser test
    • Concurrency verification test
    • Concurrency fix demonstration
    • Before/after behavior comparison
    • Reference pattern test
  2. All tests pass successfully, demonstrating that:

    • Multiple concurrent crawl tasks no longer fight over shared tabs
    • The fix works with both basic and managed browser configurations
    • Backward compatibility is maintained
    • Performance is not negatively impacted
  3. Manual verification of the fix with both basic and managed browser configurations.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

…rency

- Modify get_page() to always create new pages instead of reusing existing ones
- Add page lock to serialize new page creation in managed browser context
- Improve subprocess argument formatting and cleanup logging
- Delegate profile-related static methods to BrowserProfiler class
- Enhance startup checks for managed browser processes
- Add comprehensive test suite validating concurrency fix for arun_many with CDP browsers
- Fix proxy flag formatting and deduplicate browser launch args
- Refactor imports and code formatting for clarity and consistency
… concurrency

- Modify get_page() to always create new pages instead of reusing existing ones
- Add page lock to serialize new page creation in managed browser context
- Improve subprocess argument formatting and cleanup logging
- Delegate profile-related static methods to BrowserProfiler class
- Enhance startup checks for managed browser processes
- Add comprehensive test suite validating concurrency fix for arun_many with CDP browsers
- Fix proxy flag formatting and deduplicate browser launch args
- Refactor imports and code formatting for clarity and consistency
@prokopis3
Copy link
Contributor

I was eagerly anticipating this pull request to land. Great job guys

@Ilaiwi
Copy link

Ilaiwi commented Nov 17, 2025

Hi team, thank you for all the hard work! any way we can help here to move this PR forward? This solves a bug that we have been encountering using cdp.

@ntohidi
Copy link
Collaborator

ntohidi commented Nov 17, 2025

Hi there! We will include this in our next release in two weeks. Stay tuned! 💜

@ntohidi ntohidi added this to the 2025-NOV-2 milestone Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants