fix(amazon): Amazon Business URL + hash-based pagination#71
fix(amazon): Amazon Business URL + hash-based pagination#71arminfauland wants to merge 1 commit intoDisane87:mainfrom
Conversation
The previous config used `your-orders/orders` which only shows a subset
of orders for Amazon Business accounts. The correct URL is
`/gp/css/order-history`.
The `?startIndex=N` URL parameter is ignored by the React SPA. Pagination
is hash-based: `#time/YYYY/pagination/N/`. This fix navigates directly
via hash, removing the need for any dropdown interaction.
Additional fixes:
- extractedYears selector updated for actual option values ("2024" not "year-2024")
- $startsWith replaced with $substring (not a JSONata built-in)
- maxPages=0 now treated as "no limit" (same as empty)
- shouldBreak disabled when explicit targetYear is set (historical runs)
- yearDisplayText computed before Navigate to year action
Tested with Amazon Business DE: 14 pages / 136 orders / 198 PDFs for 2024
There was a problem hiding this comment.
👋 Thanks for opening your first pull request! We're excited to have you contribute to Scrape Dojo.
A maintainer will review this shortly. Please make sure:
- ✅ Your code follows the project's code style
- ✅ You've tested your changes locally
- ✅ All existing tests still pass
Check out our Contributing Guide for more info!
Review Feedback: Compatibility with Standard Amazon (Consumer) AccountsGreat fix for Amazon Business! 🎉 The hash-based pagination and switch to However, we have concerns about whether this fix also works with regular Amazon consumer accounts (non-Business): Specific Points to Verify
RequestCould you test this fix with a regular Amazon consumer account (non-Business) as well? We unfortunately lack experience with the React SPA version of the order history, and we want to make sure there's no regression for existing standard users. If it turns out that standard and Business accounts get different page versions, a separate Thanks! 🙏 |
|
This PR has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. |
|
This PR was closed because it has been stale for 14 days with no activity. Feel free to reopen if still relevant. |
Problem
The existing Amazon config had two critical bugs for Amazon Business accounts:
Wrong URL:
your-orders/ordersonly shows a subset of orders. Amazon Business uses/gp/css/order-history— missing ~60% of orders (6 pages found vs 14 actual pages).Broken pagination:
?startIndex=Nis ignored by the React SPA. The page always showed the same first 10 orders regardless ofstartIndex.Fix
Navigate directly via hash-based routing which the React SPA uses natively:
No dropdown interaction needed — year and page are fully encoded in the URL hash.
Additional fixes
extractedYearsselector: option values are"2024"not"year-2024"in#timeFilterDropdownextractedYearsNormalized: normalizes to internalyear-XXXXformat via$substring(not$startsWith— not a JSONata built-in)maxPages=0treated as "no limit" (same as empty)shouldBreakdisabled when explicittargetYearis set — historical runs must process all pagesyearDisplayTextmoved beforeNavigate to yearaction (was computed after, causing empty hash)Tested
Amazon Business DE account — 2024: 14 pages / 136 orders / 198 PDFs successfully downloaded to Paperless-NGX consume directory.
Previously (broken): 6 pages / ~60 orders / same 10 orders repeated on every "page".