Skip to content

Conversation

@Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Dec 9, 2025

Description

  • Sets the enqueue_strategy attribute in Request during enqueue_links processing, for correct check of requests that have completed redirection.

Issues

@Mantisus Mantisus self-assigned this Dec 9, 2025
@Mantisus Mantisus added the bug Something isn't working. label Dec 9, 2025
@Pijukatel
Copy link
Collaborator

Pijukatel commented Dec 9, 2025

Hi, I am not sure what is the desired behavior here. You are overriding the strategy that is in the request.

Imagine we have (line from the test you added):

async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
...
await context.enqueue_links(requests=[Request.from_url(target_url, enqueue_strategy='same-hostname')], strategy='same-origin')

What is the desired behavior in such case? With your change, we will override the explicitly set enqueue_strategy from the request.

I am not really sure about the current behavior. I would expect all requests to inherit enqueue_strategy from the request that enqueued them (unless they explicitly pick a different strategy), but we are not doing that. We pick same-hostnameas the default. I would expect context.request.enqueue_strategy to be used as the default.
https://github.com/apify/crawlee-python/blob/master/src/crawlee/crawlers/_basic/_basic_crawler.py#L995

My point is that in this scenario, there are 3 sources of information for the enqueue_strategy and we have to understand how to handle their conflicts and which one of them is relevant in which case:

  • context.request.enqueue_strategy
  • enqueue_strategy (from newly created Request)
  • strategy (from enqueue_links kwarg)

@Mantisus
Copy link
Collaborator Author

Mantisus commented Dec 9, 2025

My point is that in this scenario, there are 3 sources of information for the enqueue_strategy and we have to understand how to handle their conflicts and which one of them is relevant in which case:

Excellent point. Thank you.

  • strategy (from enqueue_links kwarg) - If the user has explicitly specified that all these requests should be set with this strategy. I think, it is priority.
  • Request.enqueue_strategy - I'm not sure, because if these are user-generated requests, then the user can simply call add_requests. If the requests are obtained using extract_links and then passed to enqueue_links with different strategy parameters, then these are uncoordinated calls, and I would give priority to enqueue_links.
  • context.request.enqueue_strategy - This is debatable, perhaps it would be worth doing when calling add_requests with a list of requests without enqueue_links?

@janbuchar @vdusek Perhaps you will have some ideas.

@Pijukatel Pijukatel requested review from vdusek and removed request for Pijukatel December 11, 2025 17:59
@vdusek vdusek changed the title fix: Respect enque_strategy after redirecting for Requests added to the queue using enqueue_links. fix: Respect enque_strategy after redirecting for Requests added to the queue using enqueue_links Dec 11, 2025
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just typos, otherwise LGTM

@vdusek vdusek changed the title fix: Respect enque_strategy after redirecting for Requests added to the queue using enqueue_links fix: Respect enqueue_strategy after redirecting for Requests added to the queue using enqueue_links Dec 12, 2025
@vdusek vdusek changed the title fix: Respect enqueue_strategy after redirecting for Requests added to the queue using enqueue_links fix: Respect enqueue_strategy after redirects in enqueue_links Dec 12, 2025
@vdusek vdusek changed the title fix: Respect enqueue_strategy after redirects in enqueue_links fix: Respect enqueue_strategy after redirects in enqueue_links Dec 12, 2025
@vdusek vdusek merged commit 700df91 into apify:master Dec 15, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enqueue_links does not set the corresponding strategy in the Request.enqueue_strategy attribute

3 participants