Conversation
…r the agent skill
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A big spider update that takes the crawling framework to the next level 🕷️
Note
Follow us on X for daily tips and tricks
🚀 New Stuff and quality of life changes
Added a
LinkExtractorprimitive inscrapling.spiders.LinkExtractorto pull URLs out of aResponse. There are a lot of controls (Check the docs)Added
CrawlSpiderandCrawlRulegeneric spider templates so you no longer have to hand-write the same "follow links matching this pattern" boilerplate. Overriderules()to return a list ofCrawlRuleobjects, each pairing aLinkExtractor. (Check the docs)Added a
SitemapSpidertemplate that seeds a crawl directly from a sitemap, orrobots.txtURLs. Handles gzip-compressed sitemaps, and a lot of controls and options. URLs are dispatched via the crawl rules as shown above for CrawlSpider. (Check the docs)Adaptive relocation now defaults to a 40% similarity threshold instead of
0across all methods. This will make the adaptive feature work better. When nothing crosses the threshold, a warning now tells you the top score it did see, so you can lowerpercentagedeliberately if needed.Updated all browsers and fingerprints. Run a new
scrapling install --forceafter updating to refresh the browsers and fingerprints.🐛 Bug Fixes
Fetcher.configure(...)not applying to per-request calls. Same fix applied toAsyncFetcher.Docs
🙏 Special thanks to the community for all the continuous testing and feedback
Big shoutout to our Platinum Sponsors