Releases: aborruso/scrape-cli
Releases · aborruso/scrape-cli
Release v1.2.0
Bug Fix
- Fixed XPath detection for expressions wrapped in parentheses
- XPath expressions like
(//div[@class='coordinate lat'])[1]are now correctly recognized as XPath instead of being incorrectly treated as CSS selectors - Enhanced the
is_xpathfunction with additional pattern recognition for XPath-specific syntax including attribute predicates, position predicates, and XPath functions
Installation
pip install scrape_cli==1.2.0v1.1.9: CSS Selector Fix
What Changed
🐛 Bug Fix: Fixed CSS selector parsing that was incorrectly identified as XPath
Details
- Fixed
is_xpath()function to properly distinguish CSS selectors from XPath expressions - CSS selectors like
a[href*="/talk/"]now work correctly - Improved selector recognition logic to be more restrictive for XPath detection
Technical Changes
- Updated XPath detection to only recognize expressions starting with
/or//or containing:: - This prevents CSS attribute selectors with square brackets from being misidentified as XPath
Installation
pip install --upgrade scrape-cliv1.1.8
What's Changed
Features
- Added text extraction functionality with -t option
- Extract only text content without HTML tags
- Automatically excludes text from script and style tags
- Cleans up whitespace for better readability
- Particularly useful for LLMs and text processing workflows
- Can be combined with CSS selectors or XPath expressions for targeted text extraction
v1.1.7
What's Changed
Features
- Improved XPath detection with support for complex expressions:
- Added support for predicates and square brackets
- Added support for XPath functions (last(), position(), contains(), text())
- Added support for XPath axes and attributes
- Better handling of complex XPath expressions