From b8ad7be9f21ded245e8440e8173f5bd10a067793 Mon Sep 17 00:00:00 2001 From: Zachary Roth Date: Mon, 17 Nov 2025 14:36:14 -0800 Subject: [PATCH] Add docpull MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit docpull is a tool that extracts documentation from websites and converts it into clean, AI-ready Markdown files. It provides smart content extraction, async parallel fetching, JavaScript rendering support, and structured Markdown output - making it valuable for building knowledge bases and training datasets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6e22d7757..32a6dbecc 100644 --- a/README.md +++ b/README.md @@ -1135,6 +1135,7 @@ Inspired by [awesome-php](https://github.com/ziadoz/awesome-php). *Libraries for extracting web contents.* * [html2text](https://github.com/Alir3z4/html2text) - Convert HTML to Markdown-formatted text. +* [docpull](https://github.com/raintree-technology/docpull) - Extracts documentation from websites and converts it into clean, AI-ready Markdown files. * [lassie](https://github.com/michaelhelmick/lassie) - Web Content Retrieval for Humans. * [micawber](https://github.com/coleifer/micawber) - A small library for extracting rich content from URLs. * [newspaper](https://github.com/codelucas/newspaper) - News extraction, article extraction and content curation in Python.