Skip to content

fix(amazon): Amazon Business URL + hash-based pagination#71

Closed
arminfauland wants to merge 1 commit intoDisane87:mainfrom
arminfauland:fix/amazon-business-pagination
Closed

fix(amazon): Amazon Business URL + hash-based pagination#71
arminfauland wants to merge 1 commit intoDisane87:mainfrom
arminfauland:fix/amazon-business-pagination

Conversation

@arminfauland
Copy link
Copy Markdown

Problem

The existing Amazon config had two critical bugs for Amazon Business accounts:

  1. Wrong URL: your-orders/orders only shows a subset of orders. Amazon Business uses /gp/css/order-history — missing ~60% of orders (6 pages found vs 14 actual pages).

  2. Broken pagination: ?startIndex=N is ignored by the React SPA. The page always showed the same first 10 orders regardless of startIndex.

Fix

Navigate directly via hash-based routing which the React SPA uses natively:

https://www.amazon.de/gp/css/order-history#time/2024/pagination/2/

No dropdown interaction needed — year and page are fully encoded in the URL hash.

Additional fixes

  • extractedYears selector: option values are "2024" not "year-2024" in #timeFilterDropdown
  • extractedYearsNormalized: normalizes to internal year-XXXX format via $substring (not $startsWith — not a JSONata built-in)
  • maxPages=0 treated as "no limit" (same as empty)
  • shouldBreak disabled when explicit targetYear is set — historical runs must process all pages
  • yearDisplayText moved before Navigate to year action (was computed after, causing empty hash)

Tested

Amazon Business DE account — 2024: 14 pages / 136 orders / 198 PDFs successfully downloaded to Paperless-NGX consume directory.

Previously (broken): 6 pages / ~60 orders / same 10 orders repeated on every "page".

The previous config used `your-orders/orders` which only shows a subset
of orders for Amazon Business accounts. The correct URL is
`/gp/css/order-history`.

The `?startIndex=N` URL parameter is ignored by the React SPA. Pagination
is hash-based: `#time/YYYY/pagination/N/`. This fix navigates directly
via hash, removing the need for any dropdown interaction.

Additional fixes:
- extractedYears selector updated for actual option values ("2024" not "year-2024")
- $startsWith replaced with $substring (not a JSONata built-in)
- maxPages=0 now treated as "no limit" (same as empty)
- shouldBreak disabled when explicit targetYear is set (historical runs)
- yearDisplayText computed before Navigate to year action

Tested with Amazon Business DE: 14 pages / 136 orders / 198 PDFs for 2024
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Thanks for opening your first pull request! We're excited to have you contribute to Scrape Dojo.

A maintainer will review this shortly. Please make sure:

  • ✅ Your code follows the project's code style
  • ✅ You've tested your changes locally
  • ✅ All existing tests still pass

Check out our Contributing Guide for more info!

@Disane87
Copy link
Copy Markdown
Owner

Review Feedback: Compatibility with Standard Amazon (Consumer) Accounts

Great fix for Amazon Business! 🎉 The hash-based pagination and switch to /gp/css/order-history solve the problem cleanly.

However, we have concerns about whether this fix also works with regular Amazon consumer accounts (non-Business):

Specific Points to Verify

  1. Hash-based pagination (#time/2024/pagination/2/): Standard consumer accounts may still use the server-rendered order history, where hash routing is ignored and ?startIndex=N is the correct approach. Can you verify that pagination also works on a regular consumer account?

  2. clickByText: "Rechnung": This is DE-locale-specific. On EN accounts or .com domains, this would need to be "Invoice". Does this work on your account because it's set to DE, or have you tested with other locales as well?

  3. #timeFilterDropdown vs. #time-filter: The fallback selector for #time-filter is included — good. But have you been able to verify whether standard accounts use the React dropdown or the classic <select>?

Request

Could you test this fix with a regular Amazon consumer account (non-Business) as well? We unfortunately lack experience with the React SPA version of the order history, and we want to make sure there's no regression for existing standard users.

If it turns out that standard and Business accounts get different page versions, a separate amazon-business.jsonc config would be a clean alternative.

Thanks! 🙏

@github-actions
Copy link
Copy Markdown

This PR has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs.

@github-actions github-actions Bot added the stale Marked as stale due to inactivity label Apr 11, 2026
@github-actions
Copy link
Copy Markdown

This PR was closed because it has been stale for 14 days with no activity. Feel free to reopen if still relevant.

@github-actions github-actions Bot closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants