docudigger - ARCHIVED

This project has been archived and is no longer maintained.

The successor is Scrape Dojo — a much more powerful, flexible, and feature-rich web scraping platform.

Why Scrape Dojo?

Scrape Dojo is the next evolution of docudigger. While docudigger was limited to Amazon invoice scraping, Scrape Dojo is a full-featured, self-hosted web scraping & browser automation platform.

Key Features of Scrape Dojo

Declarative JSON/JSONC Workflows — Define scrapes as code, no more writing Puppeteer scripts manually
25+ Built-in Actions — Navigate, click, type, extract, loop, download, screenshot, and more
Universal Scraping — Not limited to Amazon; scrape any website with customizable workflows
Cron Scheduling & Webhooks — Automate scrapes with cron patterns, webhooks, or startup triggers
Handlebars + JSONata Templates — Dynamic templates and powerful data transformations
Encrypted Secrets — AES-256-CBC at-rest encryption for credentials
Real-time Monitoring — SSE-powered live execution tracking with a modern Angular UI
Authentication & SSO — JWT, OIDC/SSO, MFA/TOTP, API keys
Multi-Database Support — SQLite (default), MySQL, PostgreSQL
Docker-Ready — Easy deployment with Docker Compose
Modern Tech Stack — Built with NestJS, Angular, Puppeteer, TypeScript, and Nx

Get Started with Scrape Dojo

docker compose up -d

Full documentation: scrape-dojo.com

Original Project

docudigger was a document scraper for getting invoices automatically as PDF (useful for taxes or DMS). It supported Amazon invoice scraping via CLI or Docker.

Author

Marco Franke

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 817 Commits
.devcontainer		.devcontainer
.github		.github
.husky		.husky
.vscode		.vscode
bin		bin
docs		docs
scripts		scripts
src		src
test		test
.commitlintrc.js		.commitlintrc.js
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.hintrc		.hintrc
.mocharc.json		.mocharc.json
.releaserc		.releaserc
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
dockerfile		dockerfile
dockerfile.debug		dockerfile.debug
eslint.config.mjs		eslint.config.mjs
npm-shrinkwrap.json		npm-shrinkwrap.json
package.json		package.json
renovate.json		renovate.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

docudigger - ARCHIVED

Why Scrape Dojo?

Key Features of Scrape Dojo

Get Started with Scrape Dojo

Original Project

Author

License

About

Uh oh!

Releases 48

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

docudigger - ARCHIVED

Why Scrape Dojo?

Key Features of Scrape Dojo

Get Started with Scrape Dojo

Original Project

Author

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 48

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages