Skip to content

Fix: cdp connections spawn isolated contexts instead of reusing existing profiles#281

Open
Ewal11 wants to merge 1 commit into
D4Vinci:mainfrom
Ewal11:fix/cdp-default-context
Open

Fix: cdp connections spawn isolated contexts instead of reusing existing profiles#281
Ewal11 wants to merge 1 commit into
D4Vinci:mainfrom
Ewal11:fix/cdp-default-context

Conversation

@Ewal11
Copy link
Copy Markdown

@Ewal11 Ewal11 commented May 14, 2026

What does this PR do?

This PR modifies the default behavior of cdp_url connections across all Fetcher variants (Sync and Async, Dynamic and Stealthy).

Currently, when passing a cdp_url to connect to an existing, headful Chrome browser, Scrapling unconditionally calls browser.new_context(). Because of how Playwright operates, this generates a completely isolated, incognito-like browsing context. As a result, the user's active logins, installed extensions, and session cookies from their main profile are not accessible to the Fetcher.

This PR checks if browser.contexts exists upon a successful CDP connection. If the default context exists, it assigns it to self.context instead of spawning a new one.

Why is this needed?

The primary use case for connecting to an active browser via a debug port (cdp_url) is to leverage the existing, authenticated state of that browser (e.g., bypassing logins or utilizing active extensions). Spawning an isolated context defeats the purpose of connecting to a pre-configured user-data directory.

Impact

  • cdp_url connections will now inherit the browser's active tabs, cookies, and extensions.
  • Proxy rotation and standard headless/headful launches are completely unaffected by this change.

@yetval
Copy link
Copy Markdown
Contributor

yetval commented May 14, 2026

Hey @Ewal11, this fix targets a real problem, but it creates more issues than it solves.

First, the PR is against main, but per CONTRIBUTING.md it should target dev

More importantly, reusing the user’s existing context breaks teardown: close() unconditionally calls self.context.close(), which now closes the user’s real Chrome tabs on every clean exit.

It also silently drops the configured proxy on the reuse path, so cdp_url + proxy can leak the real IP, which contradicts the PR description. The same applies to the other context options — viewport, user_agent, extra_http_headers, color_scheme, and permissions are all skipped when reusing a context, weakening Stealthy.

This needs an ownership flag, so teardown only closes contexts Scrapling created, and context options are only applied when Scrapling owns the context. Also, no tests were added, and all CDP paths are currently marked # pragma: no cover.

Not a maintainer, this is just to inform you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants