From b8ad7be9f21ded245e8440e8173f5bd10a067793 Mon Sep 17 00:00:00 2001
From: Zachary Roth <zach.accounts@pm.me>
Date: Mon, 17 Nov 2025 14:36:14 -0800
Subject: [PATCH] Add docpull
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

docpull is a tool that extracts documentation from websites and converts it into clean, AI-ready Markdown files. It provides smart content extraction, async parallel fetching, JavaScript rendering support, and structured Markdown output - making it valuable for building knowledge bases and training datasets.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 6e22d7757..32a6dbecc 100644
--- a/README.md
+++ b/README.md
@@ -1135,6 +1135,7 @@ Inspired by [awesome-php](https://github.com/ziadoz/awesome-php).
 *Libraries for extracting web contents.*
 
 * [html2text](https://github.com/Alir3z4/html2text) - Convert HTML to Markdown-formatted text.
+* [docpull](https://github.com/raintree-technology/docpull) - Extracts documentation from websites and converts it into clean, AI-ready Markdown files.
 * [lassie](https://github.com/michaelhelmick/lassie) - Web Content Retrieval for Humans.
 * [micawber](https://github.com/coleifer/micawber) - A small library for extracting rich content from URLs.
 * [newspaper](https://github.com/codelucas/newspaper) - News extraction, article extraction and content curation in Python.