diff --git a/ocr/arabic/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/arabic/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..e51f19ee5 --- /dev/null +++ b/ocr/arabic/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,218 @@ +--- +category: general +date: 2026-04-29 +description: استخراج النص من ملف PDF باستخدام Aspose OCR في بايثون. تعلم معالجة ملفات + PDF دفعةً باستخدام OCR، تحويل نص PDF الممسوح ضوئياً، ومعالجة الصفحات ذات الثقة المنخفضة. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: ar +og_description: استخراج النص من ملف PDF باستخدام Aspose OCR في بايثون. يوضح هذا الدليل + معالجة دفعات OCR لملفات PDF، تحويل نص PDF الممسوح ضوئياً، والتعامل مع النتائج ذات + الثقة المنخفضة. +og_title: استخراج النص من PDF – التعرف الضوئي على الأحرف في PDF باستخدام بايثون +tags: +- OCR +- Python +- PDF processing +title: استخراج النص من PDF – OCR PDF باستخدام بايثون +url: /ar/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# استخراج النص من PDF – OCR PDF باستخدام Python + +هل احتجت يوماً إلى **استخراج النص من PDF** لكن الملف مجرد صورة ممسوحة ضوئياً؟ لست وحدك—العديد من المطورين يواجهون هذه المشكلة عندما يحاولون تحويل ملفات PDF إلى بيانات قابلة للبحث. الخبر السار؟ باستخدام Aspose OCR for Python يمكنك تحويل نص PDF الممسوح في بضع أسطر فقط، وحتى تشغيل **معالجة دفعة OCR لملفات PDF** عندما يكون لديك عشرات الملفات للتعامل معها. + +في هذا الدرس سنستعرض سير العمل بالكامل: إعداد المكتبة، تشغيل OCR على ملف PDF واحد، توسيع العملية إلى دفعة، والتعامل مع الصفحات ذات الثقة المنخفضة حتى تعرف متى تحتاج إلى مراجعة يدوية. في النهاية ستحصل على سكريبت جاهز للتنفيذ يستخرج النص من أي PDF ممسوح، وستفهم سبب كل خطوة. + +## ما ستحتاجه + +قبل أن نبدأ، تأكد من وجود التالي: + +- Python 3.8 أو أحدث (الكود يستخدم f‑strings، لذا 3.6+ يعمل، لكن يفضَّل 3.8+) +- رخصة Aspose OCR for Python أو مفتاح تجربة مجانية (يمكنك الحصول عليه من موقع Aspose) +- مجلد يحتوي على ملف PDF ممسوح أو أكثر تريد معالجته +- مساحة تخزين معتدلة لتقارير *.txt* التي سيتم إنشاؤها + +هذا كل شيء—لا توجد تبعيات خارجية ثقيلة، ولا تمارين OpenCV. محرك Aspose OCR يتولى كل العمل الشاق نيابةً عنك. + +## إعداد البيئة + +أولاً، قم بتثبيت حزمة Aspose OCR من PyPI: + +```bash +pip install aspose-ocr +``` + +إذا كان لديك ملف ترخيص (`Aspose.OCR.lic`)، ضعّه في جذر المشروع وفعل الترخيص كالتالي: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **نصيحة احترافية:** احتفظ بملف الترخيص خارج نظام التحكم في الإصدارات؛ أضفه إلى `.gitignore` لتجنب كشفه عن طريق الخطأ. + +## تنفيذ OCR على PDF واحد + +الآن لنستخرج النص من PDF ممسوح واحد. الخطوات الأساسية هي: + +1. إنشاء كائن `OcrEngine`. +2. توجيهه إلى ملف PDF. +3. استرجاع `OcrResult` لكل صفحة. +4. كتابة الناتج النصي إلى القرص. +5. تحرير المحرك لتحرير الموارد الأصلية. + +إليك السكريبت الكامل القابل للتنفيذ: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**ما ستراه:** لكل صفحة يطبع السكريبت شيئاً مثل `Page 1: confidence 97.45%`. إذا كانت الصفحة تحت عتبة الـ 80 %، سيظهر تحذير يُخبرك أن الـ OCR قد يكون قد فاته بعض الأحرف. + +### لماذا يعمل هذا؟ + +- **`OcrEngine`** هو البوابة إلى مكتبة Aspose OCR الأصلية؛ يتعامل مع كل شيء من معالجة الصور إلى التعرف على الأحرف. +- **`extract_from_pdf`** يقوم تلقائياً بتحويل كل صفحة PDF إلى صورة نقطية، لذا لا تحتاج إلى تحويل PDF إلى صور بنفسك. +- **درجات الثقة** تتيح لك أتمتة فحوصات الجودة—وذلك أمر حاسم عندما تعالج مستندات قانونية أو طبية حيث الدقة مهمة. + +## معالجة دفعة OCR لملفات PDF باستخدام Python + +معظم المشاريع الواقعية تتعامل مع أكثر من ملف واحد. لنُوسِّع سكريبت الملف الواحد إلى **خط أنابيب معالجة دفعة OCR لملفات PDF** يتجول عبر دليل، يعالج كل PDF، ويخزن النتائج في مجلد فرعي مطابق. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### كيف يساعدك هذا؟ + +- **القابلية للتوسع:** الدالة تتجول في المجلد مرة واحدة، وتُنشئ مجلد إخراج مخصص لكل PDF. هذا يحافظ على التنظيم عندما يكون لديك عشرات المستندات. +- **إعادة الاستخدام:** يمكن استدعاء `ocr_pdf_file` من سكريبتات أخرى (مثل خدمة ويب) لأنها دالة نقية. +- **معالجة الأخطاء:** يطبع السكريبت رسالة ودية إذا كان دليل الإدخال فارغاً، مما يحفظك من فشل صامت. + +## تحويل نص PDF ممسوح – التعامل مع الحالات الخاصة + +بينما يعمل الكود أعلاه لمعظم ملفات PDF، قد تواجه بعض الخصائص الغريبة: + +| الحالة | لماذا يحدث ذلك | كيفية التخفيف | +|-----------|----------------|-----------------| +| **PDFs مشفرة** | الملف محمي بكلمة مرور. | مرّر كلمة المرور إلى `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **مستندات متعددة اللغات** | Aspose OCR يفرض اللغة الإنجليزية افتراضياً. | عيّن `ocr_engine.language = "spa"` للإسبانية، أو قدّم قائمة للغات المختلطة. | +| **PDFs ضخمة جداً (>500 صفحة)** | استهلاك الذاكرة يرتفع لأن كل صفحة تُحمَّل في الذاكرة. | عالج الـ PDF على دفعات باستخدام `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` داخل حلقة. | +| **جودة مسح ضعيفة** | DPI منخفض أو ضوضاء عالية تقلل من الثقة. | عالج الصورة مسبقاً عبر `engine.image_preprocessing = True` أو زد الـ DPI إلى `engine.dpi = 300`. | + +> **احذر:** تفعيل معالجة الصورة مسبقاً قد يزيد من زمن استهلاك الـ CPU بشكل ملحوظ. إذا كنت تُجري دفعة ليلية، احرص على جدولة وقت كافٍ أو تشغيل عامل منفصل. + +## التحقق من المخرجات + +بعد انتهاء السكريبت، ستجد بنية مجلد مشابهة لـ: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +افتح أي ملف `.txt`؛ يجب أن ترى نصاً نظيفاً بترميز UTF‑8 يعكس المحتوى الممسوح الأصلي. إذا لاحظت أحرفاً مشوهة، تحقق مرة أخرى من إعدادات لغة الـ PDF وتأكد من تثبيت حزم الخطوط المناسبة على الجهاز. + +## تنظيف الموارد + +يعتمد Aspose OCR على ملفات DLL أصلية، لذا من الضروري استدعاء `engine.dispose()` بمجرد الانتهاء. نسيان هذه الخطوة قد يؤدي إلى تسرب الذاكرة، خصوصاً في وظائف دفعات طويلة التشغيل. + +```python +# Always the last line of your script +engine.dispose() +``` + +## مثال كامل من البداية إلى النهاية + +لنجمع كل شيء معاً، إليك مثالاً واحداً + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/arabic/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/arabic/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..7119d7d1f --- /dev/null +++ b/ocr/arabic/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,277 @@ +--- +category: general +date: 2026-04-29 +description: تعلم كيفية التعرف على الخط اليدوي في بايثون باستخدام Aspose OCR. يوضح + هذا الدليل خطوة بخطوة كيفية استخراج النص المكتوب يدويًا بكفاءة. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: ar +og_description: كيف تتعرف على الخط اليدوي في بايثون؟ اتبع هذا الدليل الكامل لاستخراج + النص المكتوب يدويًا باستخدام Aspose OCR، مع الشيفرة والنصائح ومعالجة الحالات الخاصة. +og_title: كيفية التعرف على الخط اليدوي في بايثون – دليل كامل +tags: +- OCR +- Python +- HandwritingRecognition +title: كيفية التعرف على الخط اليدوي في بايثون – دليل كامل +url: /ar/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# كيفية التعرف على الكتابة اليدوية في بايثون – دليل كامل + +هل احتجت يوماً إلى **how to recognize handwriting** في مشروع بايثون لكنك لم تكن متأكدًا من أين تبدأ؟ لست وحدك—المطورون يسألون باستمرار، “هل يمكنني استخراج النص من ملاحظة ممسوحة ضوئيًا؟” الخبر السار هو أن مكتبات OCR الحديثة تجعل ذلك سهلًا للغاية. في هذا الدليل سنستعرض **how to recognize handwriting** باستخدام Aspose OCR، وستتعلم أيضًا كيفية **extract handwritten text** بشكل موثوق. + +سنعرض كل شيء من تثبيت المكتبة إلى ضبط عتبات الثقة لتلك الخطوط المتصلة الفوضوية. في النهاية ستحصل على سكريبت قابل للتنفيذ يطبع النص المستخرج وتقييم الثقة العام—مثالي لتطبيقات تدوين الملاحظات، أدوات الأرشفة، أو مجرد إشباع الفضول. لا تحتاج إلى خبرة سابقة في OCR؛ معرفة أساسية ببايثون كافية. + +--- + +## ما ستحتاجه + +- **Python 3.9+** (الإصدار المستقر الأحدث هو الأنسب) +- **Aspose.OCR for Python via .NET** – تثبيت باستخدام `pip install aspose-ocr` +- صورة **handwritten image** (JPEG/PNG) تريد معالجتها +- اختياري: بيئة افتراضية للحفاظ على نظافة الاعتمادات + +إذا كان لديك هذه العناصر جاهزة، لنبدأ. + +![مثال على كيفية التعرف على الكتابة اليدوية](/images/handwritten-sample.jpg "مثال على كيفية التعرف على الكتابة اليدوية") + +*(نص بديل: “مثال على كيفية التعرف على الكتابة اليدوية يظهر ملاحظة مكتوبة بخط اليد ممسوحة”)* + +--- + +## الخطوة 1 – تثبيت واستيراد فئات Aspose OCR + +أولًا، نحتاج إلى محرك OCR نفسه. توفر Aspose واجهة برمجة تطبيقات نظيفة تفصل بين التعرف على النص المطبوع ووضع الكتابة اليدوية. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*لماذا هذا مهم:* استيراد `HandwritingMode` يتيح لنا إخبار المحرك أننا نتعامل مع **handwritten text recognition python** بدلاً من النص المطبوع، مما يحسن الدقة بشكل كبير للخطوط المتصلة. + +--- + +## الخطوة 2 – إنشاء وتكوين محرك OCR + +الآن نقوم بإنشاء نسخة من `OcrEngine` ونحوّلها إلى وضع الكتابة اليدوية. يمكنك أيضًا ضبط عتبة الثقة؛ القيم الأقل تقبل الكتابة المتذبذبة، والقيم الأعلى تتطلب إدخالًا أنظف. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*نصيحة احترافية:* إذا تم مسح ملاحظاتك بدقة 300 DPI أو أعلى، ستحصل عادةً على نتيجة أفضل. بالنسبة للصور منخفضة الدقة، فكر في تكبيرها باستخدام Pillow قبل تمريرها إلى المحرك. + +--- + +## الخطوة 3 – إعداد مسار الصورة + +تأكد من أن مسار الملف يشير إلى الصورة التي تريد معالجتها. المسارات النسبية تعمل جيدًا، لكن المسارات المطلقة تتجنب مفاجآت “الملف غير موجود”. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*مشكلة شائعة:* نسيان هروب الشرط المائل العكسي في Windows (`C:\\folder\\image.jpg`). استخدام السلاسل الخام (`r"C:\folder\image.jpg"`) يتجاوز هذه المشكلة. + +--- + +## الخطوة 4 – تشغيل التعرف وجمع النتائج + +طريقة `recognize` تقوم بالعمل الشاق. تُعيد كائنًا يحتوي على خصائص `.text` و `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**الناتج المتوقع (مثال):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +إذا انخفضت الثقة إلى أقل من 0.5، قد تحتاج إلى تنظيف الصورة (إزالة الظلال، زيادة التباين) أو خفض العتبة في الخطوة 2. + +--- + +## الخطوة 5 – تنظيف الموارد + +تحتفظ Aspose OCR بموارد أصلية؛ استدعاء `dispose()` يحررها ويمنع تسرب الذاكرة، خاصةً عند معالجة العديد من الصور في حلقة. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*لماذا dispose؟* في الخدمات طويلة التشغيل (مثل Flask API التي تستقبل تحميلات)، نسيان تحرير الموارد يمكن أن يستهلك ذاكرة النظام بسرعة. + +--- + +## السكريبت الكامل – تشغيل بنقرة واحدة + +بجمع كل شيء معًا، إليك سكريبت مستقل يمكنك نسخه ولصقه وتنفيذه. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +احفظ هذا باسم `handwritten_ocr.py` وشغّله باستخدام `python handwritten_ocr.py`. إذا تم إعداد كل شيء بشكل صحيح، سترى النص المستخرج يُطبع في وحدة التحكم. + +--- + +## معالجة الحالات الحدية والاختلافات الشائعة + +### صور منخفضة التباين +إذا كان الخلفية تتداخل مع الحبر، قم بزيادة التباين أولاً: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### ملاحظات مائلة +صفحة دفتر مائلة قد تعيق التعرف. استخدم Pillow لتصحيح الميل: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### ملفات PDF متعددة الصفحات +يمكن لـ Aspose OCR أيضًا معالجة صفحات PDF، لكن عليك تحويل كل صفحة إلى صورة أولاً (مثلاً باستخدام `pdf2image`). ثم تكرار المرور عبر الصور باستخدام نفس دالة `recognize_handwriting`. + +--- + +## نصائح احترافية للحصول على نتائج أفضل في **Extract Handwritten Text** + +- **DPI matters:** استهدف 300 DPI أو أعلى عند المسح. +- **Avoid colored backgrounds:** الخلفيات البيضاء النقية أو الرمادية الفاتحة تعطي أنقى مخرجات. +- **Batch processing:** غلف الدالة داخل حلقة `for` وسجّل ثقة كل صفحة؛ احذف النتائج التي تقل عن العتبة للحفاظ على جودة عالية. +- **Language support:** تدعم Aspose OCR عدة لغات؛ اضبط `engine.set_language("en")` لتحسين الإنجليزية فقط. + +--- + +## الأسئلة المتكررة + +**هل يعمل هذا على لينكس؟** +نعم—تأتي Aspose OCR مع ثنائيات أصلية لـ Windows و macOS و Linux. فقط قم بتثبيت حزمة pip وستكون جاهزًا. + +**ماذا لو كانت كتابتي اليدوية متصلة للغاية؟** +حاول خفض عتبة الثقة (`0.5` أو حتى `0.4`). ضع في اعتبارك أن هذا قد يضيف مزيدًا من الضوضاء، لذا عالج المخرجات لاحقًا (مثلاً، تدقيق إملائي) إذا لزم الأمر. + +**هل يمكنني استخدام هذا في خدمة ويب؟** +بالطبع. دالة `recognize_handwriting` لا تحتفظ بحالة، مما يجعلها مثالية لنقاط النهاية في Flask أو FastAPI. فقط تذكر استدعاء `dispose()` بعد كل طلب أو استخدم مدير سياق. + +--- + +## الخاتمة + +لقد غطينا **how to recognize handwriting** في بايثون من البداية إلى النهاية، موضحين لك كيفية **extract handwritten text**، وضبط إعدادات الثقة، ومعالجة المشكلات الشائعة مثل انخفاض التباين أو الصفحات المائلة. السكريبت الكامل أعلاه جاهز للتنفيذ، وتسهّل الدالة المعيارية دمجه في مشاريع أكبر—سواء كنت تبني تطبيقًا لتدوين الملاحظات، أو رقمّ الأرشيفات، أو مجرد تجربة تقنيات **handwritten ocr tutorial python**. + +في المستقبل، قد تستكشف **handwritten text recognition python** للملاحظات متعددة اللغات، أو تجمع بين OCR ومعالجة اللغة الطبيعية لتلخيص محاضر الاجتماعات تلقائيًا. السماء هي الحد—جرّبها ودع كودك يمنح الحياة للرسومات. + +برمجة سعيدة، ولا تتردد في طرح أسئلتك في التعليقات! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/arabic/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/arabic/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..4c2638ff3 --- /dev/null +++ b/ocr/arabic/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: تعلم كيفية تشغيل OCR على مسحاتك، واستخدام نموذج Hugging Face تلقائيًا، + والتعرف على النص من المسحات باستخدام Aspose OCR في دقائق. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: ar +og_description: كيفية تشغيل OCR على المسحات باستخدام Aspose OCR، وتحميل نموذج من Hugging + Face تلقائيًا، والحصول على نص نظيف ومُعَلَّم بعلامات الترقيم. +og_title: كيفية تشغيل OCR باستخدام Aspose و Hugging Face – دليل كامل +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: كيفية تشغيل OCR باستخدام Aspose و Hugging Face – دليل كامل +url: /ar/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# كيف تشغّل OCR باستخدام Aspose & Hugging Face – دليل كامل + +هل تساءلت يومًا **كيف تشغّل OCR** على مجموعة من المستندات الممسوحة ضوئيًا دون قضاء ساعات في تعديل الإعدادات؟ لست وحدك. في العديد من المشاريع، يحتاج المطورون إلى **التعرف على النص من المسحات** بسرعة، لكنهم يواجهون صعوبات في تنزيل النماذج ومعالجة ما بعد الاستخراج. + +خبر سار: هذا الدرس يوضح لك حلًا جاهزًا للتنفيذ **يستخدم نموذجًا من Hugging Face**، يقوم بتنزيله تلقائيًا، ويضيف علامات الترقيم بحيث يبدو الناتج كأنه كتب بواسطة إنسان. في النهاية، ستحصل على سكريبت يعالج كل صورة في مجلد وينتج ملف `.txt` نظيف بجانب كل مسح. + +## ما الذي ستحتاجه + +- Python 3.8+ (الكود يستخدم f‑strings، لذا الإصدارات الأقدم لن تعمل) +- حزمة `aspose-ocr` (تثبيت عبر `pip install aspose-ocr`) +- اتصال بالإنترنت لتنزيل النموذج للمرة الأولى +- مجلد يحتوي على مسحات صور (`.png`، `.jpg` أو `.tif`) + +هذا كل شيء—لا حاجة لملفات تنفيذية إضافية، ولا تعديل يدوي للنماذج. هيا نبدأ. + +![مثال على تشغيل OCR](https://example.com/ocr-demo.png "مثال على تشغيل OCR") + +## الخطوة 1: استيراد فئات Aspose OCR وإعداد البيئة + +نبدأ بسحب الفئات الضرورية من مكتبة Aspose OCR. استيراد كل شيء في البداية يبقي السكريبت منظمًا ويسهل اكتشاف الاعتمادات المفقودة. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*لماذا هذا مهم*: `OcrEngine` يقوم بالعمل الشاق، بينما `AsposeAI` يتيح لنا ربط نموذج لغة كبير لمعالجة ما بعد الاستخراج بشكل أذكى. إذا تخطيت الاستيراد، لن يتم تجميع باقي الكود—لذا لا تنساه. + +## الخطوة 2: تكوين نموذج Hugging Face يدعم الـ GPU + +الآن نخبر Aspose من أين يجلب النموذج وعدد الطبقات التي يجب تشغيلها على الـ GPU. العلامة `allow_auto_download="true"` تقوم بـ **تنزيل النموذج تلقائيًا** نيابةً عنك. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **نصيحة احترافية**: إذا لم يكن لديك GPU، اضبط `gpu_layers=0`. سيتحول النموذج إلى CPU، وهو أبطأ لكنه لا يزال يعمل. + +### لماذا نختار نموذج Hugging Face؟ + +يستضيف Hugging Face مجموعة ضخمة من النماذج الجاهزة للاستخدام. بالإشارة إلى `Qwen/Qwen2.5-3B-Instruct-GGUF` تحصل على نموذج مضغوط ومُدرب على التعليمات يمكنه إضافة علامات الترقيم، تصحيح المسافات، وحتى إصلاح أخطاء OCR الصغيرة. هذا هو جوهر **استخدام نموذج Hugging Face** في التطبيق العملي. + +## الخطوة 3: تهيئة محرك الذكاء الاصطناعي وتمكين معالجة الترقيم بعد الاستخراج + +محرك الذكاء الاصطناعي ليس فقط للدردشة الفاخرة—هنا نرفق *مضيف الترقيم* الذي ينظف مخرجات OCR الخام. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*ما الذي يحدث؟* استدعاء `set_post_processor` يسجل معالجًا مدمجًا يُنفّذ بعد انتهاء محرك OCR. يأخذ السلسلة الخام ويضيف الفواصل، النقاط، والحروف الكبيرة في المواضع المناسبة، مما يجعل النص النهائي أكثر قابلية للقراءة. + +## الخطوة 4: إنشاء محرك OCR وربط محرك الذكاء الاصطناعي + +ربط محرك الذكاء الاصطناعي بمحرك OCR يمنحنا كائنًا واحدًا يمكنه قراءة الأحرف وتلميع النتيجة في آنٍ واحد. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +إذا تخطيت هذه الخطوة، سيظل OCR يعمل، لكنك ستفقد تحسين الترقيم—وسيظهر الناتج كسلسلة من الكلمات المتصلة. + +## الخطوة 5: معالجة كل صورة في مجلد + +هذا هو جوهر الدرس. نمر على كل صورة، نشغّل OCR، نطبق معالج ما بعد الاستخراج، ونكتب النص المنقّح إلى ملف `.txt` موازٍ. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### ما الذي تتوقعه + +تشغيل السكريبت يطبع شيئًا مثل: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +كل سطر يوضح درجة الثقة (فحص سريع للصحة) وينشئ ملفات مثل `invoice_001.png.txt`، `receipt_2024.tif.txt`، إلخ، تحتوي على نص مُرقّم وقابل للقراءة البشرية. + +### الحالات الخاصة والبدائل + +- **مسحات غير إنجليزية**: غيّر `hugging_face_repo_id` إلى نموذج متعدد اللغات (مثال: `microsoft/Multilingual-LLM-GGUF`). +- **دفعات كبيرة**: غلف الحلقة داخل `concurrent.futures.ThreadPoolExecutor` للمعالجة المتوازية، لكن احرص على حدود ذاكرة الـ GPU. +- **معالجة ما بعد مخصصة**: استبدل `"punctuation_adder"` بالسكريبت الخاص بك إذا كنت تحتاج إلى تنظيف مخصص (مثال: إزالة أرقام الفواتير). + +## الخطوة 6: تنظيف الموارد + +عند انتهاء المهمة، تحرير الموارد يمنع تسرب الذاكرة، وهو أمر مهم خاصة إذا كنت تشغّل هذا داخل خدمة طويلة الأمد. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +إهمال هذه الخطوة قد يترك ذاكرة الـ GPU محجوزة، مما سيعطل عمليات التشغيل اللاحقة. + +## ملخص: كيف تشغّل OCR من البداية إلى النهاية + +في بضع أسطر فقط، أظهرنا **كيفية تشغيل OCR** على مجلد من المسحات، **استخدام نموذج Hugging Face** الذي ينزل نفسه في المرة الأولى، و**التعرف على النص من المسحات** مع إضافة الترقيم تلقائيًا. السكريبت الكامل جاهز للنسخ، تعديل المسارات، والتنفيذ. + +## الخطوات التالية والمواضيع ذات الصلة + +- **معالجة ما بعد الدفعة**: استكشف `ocr_engine.run_batch_postprocessor` لمعالجة دفعات أسرع. +- **نماذج بديلة**: جرّب عائلة `openai/whisper` إذا كنت تحتاج إلى تحويل الكلام إلى نص إلى جانب OCR. +- **التكامل مع قواعد البيانات**: احفظ النص المستخرج في SQLite أو Elasticsearch لأرشفة قابلة للبحث. + +لا تتردد في التجربة—بدّل النموذج، عدّل `gpu_layers`، أو أضف معالج ما بعد خاص بك. مرونة Aspose OCR مع مركز نماذج Hugging Face تجعل هذا الأساس مناسبًا لأي مشروع رقمنة مستندات. + +--- + +*برمجة سعيدة! إذا واجهت أي مشكلة، اترك تعليقًا أدناه أو راجع وثائق Aspose OCR لمزيد من خيارات التكوين.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/arabic/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/arabic/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..333fc98a0 --- /dev/null +++ b/ocr/arabic/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,190 @@ +--- +category: general +date: 2026-04-29 +description: قم بإجراء التعرف الضوئي على الحروف (OCR) على الصورة باستخدام بايثون، + حمّل نموذج HuggingFace تلقائيًا وحرّر ذاكرة الـ GPU بفعالية أثناء تنظيف نص الـ OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: ar +og_description: تعلم كيفية إجراء التعرف الضوئي على الأحرف (OCR) على صورة باستخدام + بايثون، تحميل نموذج HuggingFace تلقائيًا، تنظيف النص وتحرير ذاكرة وحدة معالجة الرسومات. +og_title: تنفيذ التعرف الضوئي على الأحرف في الصورة باستخدام بايثون – دليل خطوة بخطوة +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: إجراء التعرف الضوئي على الحروف في الصورة باستخدام بايثون – دليل كامل +url: /ar/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# تنفيذ OCR على صورة باستخدام Python – دليل كامل + +هل احتجت يومًا إلى **perform OCR on image** ملفات ولكن واجهت صعوبة في مرحلة تنزيل النموذج أو تنظيف ذاكرة GPU؟ لست وحدك—العديد من المطورين يواجهون هذه المشكلة عندما يحاولون الجمع بين التعرف الضوئي على الأحرف ونماذج اللغة الكبيرة للمرة الأولى. + +في هذا الدرس سنستعرض حلًا واحدًا من البداية إلى النهاية **downloads a HuggingFace model in Python**، يُشغِّل Aspose OCR، ينظف المخرجات الخام، وأخيرًا **releases GPU memory Python** التي يمكن استعادتها. في النهاية ستحصل على سكريبت جاهز للتنفيذ يحول صورة PNG مُمسوحة ضوئيًا إلى نص مصقول وقابل للبحث. + +> **ما ستحصل عليه:** عينة كود كاملة قابلة للتنفيذ، شروحات لأسباب أهمية كل خطوة، نصائح لتجنب الأخطاء الشائعة، ونظرة على كيفية تعديل خط الأنابيب لمشاريعك الخاصة. + +--- + +## ما ستحتاجه + +- Python 3.9 أو أحدث (تم اختبار المثال على 3.11) +- حزمة `aspose-ocr` (التثبيت عبر `pip install aspose-ocr`) +- اتصال بالإنترنت لخطوة **download HuggingFace model python** +- GPU متوافق مع CUDA إذا كنت تريد تسريع الأداء (اختياري لكن يُنصح به) + +لا توجد تبعيات نظامية إضافية مطلوبة؛ محرك Aspose OCR يضم كل ما تحتاجه. + +![مثال على تنفيذ OCR على صورة](image.png "مثال على تنفيذ OCR على صورة باستخدام Aspose OCR ومعالج ما بعد LLM") + +*نص بديل للصورة: “perform OCR on image – Aspose OCR output before and after AI cleaning”* + +## تنفيذ OCR على صورة – نظرة عامة خطوة بخطوة + +فيما يلي نقسم سير العمل إلى أجزاء منطقية. كل جزء له عنوانه الخاص، حتى يتمكن مساعدو الذكاء الاصطناعي من القفز بسرعة إلى الجزء الذي يهمك، ويمكن لمحركات البحث فهرسة الكلمات المفتاحية ذات الصلة. + +### 1. تنزيل نموذج HuggingFace في Python + +أول شيء علينا القيام به هو جلب نموذج لغة سيعمل كمعالج لاحق للمخرجات الخام لـ OCR. يأتي Aspose OCR مع فئة مساعدة تُدعى `AsposeAI` يمكنها سحب نموذج تلقائيًا من مركز HuggingFace hub. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**لماذا هذا مهم:** +- **download HuggingFace model python** – تتجنب التعامل اليدوي مع ملفات zip أو مصادقة الرموز. +- استخدام تقليل الدقة `int8` يقلص حجم النموذج إلى حوالي ربع حجمه الأصلي، وهو أمر حاسم عندما تحتاج لاحقًا إلى **release GPU memory python**. + +> **نصيحة احترافية:** احتفظ بـ `directory_model_path` على SSD للحصول على أوقات تحميل أسرع. + +### 2. تهيئة مساعد الذكاء الاصطناعي وتفعيل التدقيق الإملائي + +الآن نقوم بإنشاء نسخة من `AsposeAI` ونرفق معالج تصحيح إملائي كمعالج لاحق. هنا يبدأ سحر **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**شرح:** +يقوم مصحح الإملاء بفحص كل رمز من محرك OCR ويقترح تعديلات محدودة بـ `max_edits`. هذه اللمسة الصغيرة يمكنها تحويل “rec0gn1tion” إلى “recognition” دون الحاجة إلى نموذج لغة ضخم. + +### 3. ربط مساعد الذكاء الاصطناعي بمحرك OCR + +قدمت Aspose طريقة جديدة في الإصدار 23.4 تتيح لك توصيل محرك ذكاء اصطناعي مباشرةً إلى خط أنابيب OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**لماذا نفعل ذلك:** +من خلال ربط مساعد الذكاء الاصطناعي مبكرًا، يمكن لمحرك OCR اختيارياً استخدام النموذج لتحسينات فورية (مثل اكتشاف التخطيط). كما يبقي الكود منظمًا—لا حاجة لحلقات معالجة لاحقة منفصلة. + +### 4. تنفيذ OCR على الصورة الممسوحة ضوئيًا + +هذه هي الخطوة الأساسية التي تقوم فعليًا **perform OCR on image** للملفات. استبدل `YOUR_DIRECTORY/input.png` بالمسار إلى مسحك الخاص. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +قد يحتوي المخرج الخام النموذجي على فواصل أسطر في أماكن غير مناسبة، أو أحرف تم التعرف عليها بشكل خاطئ، أو رموز عشوائية. لهذا نحتاج إلى الخطوة التالية. + +**المخرجات الخام المتوقعة (مثال):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +### 5. تنظيف نص OCR في Python باستخدام معالج ما بعد الذكاء الاصطناعي + +الآن نسمح للذكاء الاصطناعي بتنظيف الفوضى. هذه هي جوهر عملية **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**النتيجة التي ستراها:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +لاحظ كيف قام مصحح الإملاء بإصلاح “Th1s” → “This” وإزالة “4n” العشوائية. كما يقوم النموذج بتطبيع المسافات، وهو ما يُعد نقطة ألم شائعة عندما تقوم لاحقًا بإدخال النص في خطوط أنابيب NLP اللاحقة. + +### 6. تحرير ذاكرة GPU في Python – خطوات التنظيف + +عند الانتهاء، من الممارسات الجيدة تحرير موارد GPU، خاصة إذا كنت تشغل مهام OCR متعددة في خدمة طويلة الأمد. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**ما يحدث خلف الكواليس:** +`free_resources()` يزيل النموذج من GPU، معيدًا الذاكرة إلى برنامج تشغيل CUDA. `dispose()` يغلق المخازن الداخلية لمحرك OCR. تخطي هذه الاستدعاءات قد يؤدي إلى أخطاء نفاد الذاكرة بعد عدد قليل فقط من الصور. + +> **تذكر:** إذا كنت تخطط لمعالجة دفعات في حلقة، استدعِ عملية التنظيف بعد كل دفعة أو أعد استخدام نفس `ai_helper` دون تحريره حتى النهاية. + +## إضافي: تعديل خط الأنابيب لسيناريوهات مختلفة + +### تعديل تقليل الدقة للنموذج + +إذا كان لديك GPU قوي (مثل RTX 4090) وتريد دقة أعلى، غيّر `hugging_face_quantization` إلى `"fp16"` وزد `gpu_layers` إلى `30`. سيستهلك هذا المزيد من الذاكرة، لذا ستحتاج إلى **release GPU memory python** بشكل أكثر حدة بعد كل دفعة. + +### استخدام مدقق إملائي مخصص + +يمكنك استبدال `spell_corrector` المدمج بمعالج لاحق مخصص يقوم بتصحيحات خاصة بالمجال (مثل المصطلحات الطبية). فقط نفّذ الواجهة المطلوبة ومرّر اسمه إلى `set_post_processor`. + +### معالجة دفعات متعددة من الصور + +ضع خطوات OCR داخل حلقة `for`، اجمع `cleaned_result.text` في قائمة، واستدعِ `ai_helper.free_resources()` فقط بعد الحلقة إذا كان لديك ذاكرة GPU كافية. هذا يقلل من العبء الناتج عن تحميل النموذج مرارًا. + +## الخلاصة + +لقد أظهرنا لك الآن كيفية **perform OCR on image** للملفات في Python، تلقائيًا **download a HuggingFace model**، **clean OCR text**، وتحرير **release GPU memory** بأمان عند الانتهاء. السكريبت الكامل جاهز للنسخ واللصق، والشروحات تمنحك الثقة لتكييفه مع مشاريع أكبر. + +ما الخطوات التالية؟ جرّب استبدال نموذج Qwen 2.5 بنسخة أكبر من LLaMA، جرب معالجات لاحقة مختلفة، أو دمج المخرجات المنقحة في فهرس Elasticsearch قابل للبحث. الاحتمالات لا حصر لها، والآن لديك أساس قوي للانطلاق. + +برمجة سعيدة، ولتكن خطوط OCR الخاصة بك دائمًا نظيفة وصديقة للذاكرة! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/chinese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..7a12da222 --- /dev/null +++ b/ocr/chinese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,215 @@ +--- +category: general +date: 2026-04-29 +description: 使用 Aspose OCR 在 Python 中提取 PDF 文本。了解批量 OCR PDF 处理,转换扫描 PDF 文本,并处理低置信度页面。 +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: zh +og_description: 使用 Aspose OCR 在 Python 中提取 PDF 文本。本指南展示批量 OCR PDF 处理、将扫描的 PDF 转换为文本,以及处理低置信度结果。 +og_title: 从 PDF 中提取文本 – 使用 Python 进行 OCR +tags: +- OCR +- Python +- PDF processing +title: 从 PDF 中提取文本 – 使用 Python 进行 OCR +url: /zh/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 从 PDF 中提取文本 – 使用 Python 进行 OCR PDF + +是否曾经需要 **从 PDF 中提取文本**,但文件只是扫描的图像?你并不孤单——很多开发者在尝试将 PDF 转换为可搜索数据时都会遇到这个难题。好消息是:使用 Aspose OCR for Python,你可以在几行代码内转换扫描的 PDF 文本,甚至在处理数十个文件时进行 **批量 OCR PDF 处理**。 + +在本教程中,我们将完整演示工作流:设置库、对单个 PDF 进行 OCR、扩展为批量处理,并处理低置信度页面,以便在需要人工审查时及时发现。完成后,你将拥有一个可直接运行的脚本,能够从任何扫描的 PDF 中提取文本,并且了解每一步背后的原理。 + +## 您需要的条件 + +在开始之前,请确保你具备: + +- Python 3.8 或更高版本(代码使用 f‑strings,3.6+ 可运行,但推荐 3.8+) +- Aspose OCR for Python 许可证或免费试用密钥(可从 Aspose 官网获取) +- 一个或多个待处理的扫描 PDF 所在文件夹 +- 用于生成 *.txt* 报告的适量磁盘空间 + +就这些——无需繁重的外部依赖,也不需要 OpenCV 之类的技巧。Aspose OCR 引擎会为你完成所有繁重工作。 + +## 设置环境 + +首先,从 PyPI 安装 Aspose OCR 包: + +```bash +pip install aspose-ocr +``` + +如果你有许可证文件 (`Aspose.OCR.lic`),请将其放在项目根目录,并按如下方式激活: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **小贴士:** 将许可证文件排除在版本控制之外;将其加入 `.gitignore`,以避免意外泄露。 + +## 对单个 PDF 执行 OCR + +现在让我们从单个扫描 PDF 中提取文本。核心步骤如下: + +1. 创建 `OcrEngine` 实例。 +2. 指定 PDF 文件路径。 +3. 为每一页获取 `OcrResult`。 +4. 将纯文本输出写入磁盘。 +5. 释放引擎以清理本地资源。 + +以下是完整、可运行的脚本: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**您将看到:** 脚本会为每页打印类似 `Page 1: confidence 97.45%` 的信息。如果某页的置信度低于 80 %,会出现警告,提示 OCR 可能遗漏了字符。 + +### 为什么这样有效 + +- **`OcrEngine`** 是通向本地 Aspose OCR 库的入口,负责从图像预处理到字符识别的全部工作。 +- **`extract_from_pdf`** 会自动将每页 PDF 栅格化,无需自行将 PDF 转为图像。 +- **置信度分数** 让你能够自动化质量检查——在处理法律或医疗文档等对准确性要求极高的场景时尤为关键。 + +## 使用 Python 批量 OCR PDF 处理 + +大多数真实项目都会涉及多个文件。下面我们把单文件脚本扩展为 **批量 OCR PDF 处理** 流程,遍历目录、处理每个 PDF,并将结果存入对应的子文件夹。 + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### 此功能的帮助 + +- **可扩展性:** 该函数一次遍历文件夹,为每个 PDF 创建专属输出子文件夹。当文档数量达到数十甚至上百时,文件结构依然整洁。 +- **可复用性:** `ocr_pdf_file` 可以被其他脚本(例如 Web 服务)调用,因为它是纯函数。 +- **错误处理:** 若输入文件夹为空,脚本会打印友好提示,避免静默失败。 + +## 转换扫描 PDF 文本 – 处理边缘情况 + +虽然上述代码能适用于大多数 PDF,但仍可能遇到以下特殊情况: + +| 情况 | 产生原因 | 解决方案 | +|-----------|----------------|-----------------| +| **加密 PDF** | PDF 受密码保护。 | 使用 `extract_from_pdf(pdf_path, password="yourPwd")` 传入密码。 | +| **多语言文档** | Aspose OCR 默认使用英文。 | 设置 `ocr_engine.language = "spa"` 以识别西班牙语,或提供语言列表以处理混合语言。 | +| **超大 PDF(>500 页)** | 每页加载到内存导致内存占用激增。 | 使用 `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` 分块处理,并在循环中调用。 | +| **扫描质量差** | DPI 过低或噪点过多导致置信度下降。 | 开启 `engine.image_preprocessing = True` 进行图像预处理,或通过 `engine.dpi = 300` 提高 DPI。 | + +> **注意:** 开启图像预处理会显著增加 CPU 耗时。如果你在夜间运行批处理,请预留足够时间或启用独立的工作节点。 + +## 验证输出 + +脚本执行完毕后,你会看到类似以下的文件夹结构: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +打开任意 `.txt` 文件,你应该能看到干净的 UTF‑8 编码文本,内容与原始扫描文档相匹配。如果出现乱码,请再次检查 PDF 的语言设置,并确保机器上已安装相应的字体包。 + +## 清理资源 + +Aspose OCR 依赖本地 DLL,完成后务必调用 `engine.dispose()` 释放资源。忘记此步骤会导致内存泄漏,尤其在长时间运行的批处理任务中更为严重。 + +```python +# Always the last line of your script +engine.dispose() +``` + +## 完整端到端示例 + +把所有内容组合在一起,下面是一个完整的示例脚本: + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/chinese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..119b36a32 --- /dev/null +++ b/ocr/chinese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,275 @@ +--- +category: general +date: 2026-04-29 +description: 学习如何使用 Aspose OCR 在 Python 中识别手写文字。本分步指南展示了如何高效提取手写文本。 +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: zh +og_description: 如何在 Python 中识别手写文字?请跟随本完整指南,使用 Aspose OCR 提取手写文本,包含代码、技巧和边缘情况处理。 +og_title: 如何在 Python 中识别手写体 – 完整教程 +tags: +- OCR +- Python +- HandwritingRecognition +title: 如何在 Python 中识别手写文字 – 完整教程 +url: /zh/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 如何在 Python 中识别手写文字 – 完整教程 + +是否曾在 Python 项目中需要 **如何识别手写文字** 却不知从何入手?你并不孤单——开发者们常常问:“我能从扫描的笔记中提取文字吗?”好消息是,现代 OCR 库让这变得轻而易举。在本指南中,我们将演示使用 Aspose OCR **如何识别手写文字**,并教你可靠地 **提取手写文本**。 + +我们会覆盖从安装库到为那些凌乱的连笔字调整置信度阈值的全部内容。结束时,你将拥有一个可运行的脚本,能够打印提取的文本以及整体置信度分数——非常适合记笔记应用、档案工具,或仅仅满足好奇心。无需任何 OCR 经验;只要具备基础的 Python 知识即可。 + +--- + +## 你需要准备的东西 + +- **Python 3.9+**(最新稳定版效果最佳) +- **Aspose.OCR for Python via .NET** – 使用 `pip install aspose-ocr` 安装 +- 一张 **手写图像**(JPEG/PNG),即你想要处理的文件 +- 可选:用于管理依赖的虚拟环境 + +如果这些都已准备好,下面开始吧。 + +![How to recognize handwriting example](/images/handwritten-sample.jpg "How to recognize handwriting example") + +*(Alt text: “how to recognize handwriting example showing a scanned handwritten note”)* + +--- + +## 第一步 – 安装并导入 Aspose OCR 类 + +首先,需要 OCR 引擎本身。Aspose 提供了一个简洁的 API,将印刷体识别与手写模式分离。 + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*为什么重要:* 导入 `HandwritingMode` 可以让引擎知道我们在进行 **handwritten text recognition python**,而不是印刷体识别,从而显著提升连笔字的准确率。 + +--- + +## 第二步 – 创建并配置 OCR 引擎 + +现在实例化一个 `OcrEngine` 并切换到手写模式。你还可以调整置信度阈值;较低的值接受抖动的书写,较高的值则要求更清晰的输入。 + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*小技巧:* 如果你的笔记以 300 DPI 或更高分辨率扫描,通常会得到更好的分数。对于低分辨率图像,考虑在送入引擎前使用 Pillow 进行放大。 + +--- + +## 第三步 – 准备图像路径 + +确保文件路径指向你想要处理的图像。相对路径可以使用,但绝对路径可以避免 “文件未找到” 的意外。 + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*常见坑点:* 在 Windows 上忘记转义反斜杠 (`C:\\folder\\image.jpg`)。使用原始字符串 (`r"C:\folder\image.jpg"`) 可以规避此问题。 + +--- + +## 第四步 – 运行识别并获取结果 + +`recognize` 方法负责核心工作。它返回一个对象,包含 `.text` 和 `.confidence` 属性。 + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**预期输出(示例):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +如果置信度低于 0.5,可能需要清理图像(去除阴影、提升对比度)或在步骤 2 中降低阈值。 + +--- + +## 第五步 – 清理资源 + +Aspose OCR 会占用本地资源;调用 `dispose()` 可以释放它们,防止在循环处理大量图像时出现内存泄漏。 + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*为什么要 dispose?* 在长时间运行的服务中(例如接受上传的 Flask API),忘记释放资源会迅速耗尽系统内存。 + +--- + +## 完整脚本 – 一键运行 + +把所有内容整合在一起,下面是一个可直接复制粘贴并执行的自包含脚本。 + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +将其保存为 `handwritten_ocr.py`,然后运行 `python handwritten_ocr.py`。如果一切配置正确,你将在控制台看到提取的文本。 + +--- + +## 处理边缘情况和常见变体 + +### 低对比度图像 +如果背景与墨水混在一起,先提升对比度: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### 旋转的笔记 +倾斜的笔记本页面会影响识别。使用 Pillow 进行去倾斜: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### 多页 PDF +Aspose OCR 也能处理 PDF 页面,但需要先将每页转换为图像(例如使用 `pdf2image`),然后使用相同的 `recognize_handwriting` 函数遍历这些图像。 + +--- + +## 提升 **Extract Handwritten Text** 效果的专业技巧 + +- **DPI 很重要:** 扫描时尽量达到 300 DPI 或更高。 +- **避免彩色背景:** 纯白或浅灰背景能得到最干净的输出。 +- **批量处理:** 将函数包装在 `for` 循环中,记录每页的置信度;对低于阈值的结果进行丢弃,以保持高质量。 +- **语言支持:** Aspose OCR 支持多语言;使用 `engine.set_language("en")` 可针对英文进行优化。 + +--- + +## 常见问题 + +**这在 Linux 上能用吗?** +可以——Aspose OCR 为 Windows、macOS 和 Linux 都提供了本地二进制文件。只需安装 pip 包即可使用。 + +**如果我的手写体极度连笔怎么办?** +尝试降低置信度阈值(如 `0.5` 甚至 `0.4`)。请注意,这可能会引入更多噪声,必要时对输出进行后处理(例如拼写检查)。 + +**可以在 Web 服务中使用吗?** +完全可以。`recognize_handwriting` 函数是无状态的,非常适合用于 Flask 或 FastAPI 接口。只需在每次请求后调用 `dispose()`,或使用上下文管理器。 + +--- + +## 结论 + +我们已经从头到尾展示了在 Python 中 **如何识别手写文字**,教会你 **提取手写文本**、调整置信度设置,并处理低对比度或旋转页面等常见问题。上面的完整脚本已可直接运行,模块化的函数也便于集成到更大的项目中——无论是构建记笔记应用、数字化档案,还是仅仅尝试 **handwritten ocr tutorial python** 技术。 + +接下来,你可以探索针对多语言笔记的 **handwritten text recognition python**,或将 OCR 与自然语言处理结合,实现会议纪要的自动摘要。可能性无限——快去尝试,让你的代码为涂鸦赋予生命吧。 + +祝编码愉快,欢迎在评论区留下你的问题! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/chinese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..60c87aa09 --- /dev/null +++ b/ocr/chinese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,178 @@ +--- +category: general +date: 2026-04-29 +description: 了解如何对扫描件运行 OCR,自动使用 Hugging Face 模型,并在几分钟内使用 Aspose OCR 识别扫描件中的文本。 +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: zh +og_description: 如何使用 Aspose OCR 对扫描件进行 OCR,自动下载 Hugging Face 模型,并获得干净且带标点的文本。 +og_title: 如何使用 Aspose 与 Hugging Face 进行 OCR – 完整指南 +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: 如何使用 Aspose 与 Hugging Face 进行 OCR – 完整指南 +url: /zh/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 如何使用 Aspose 与 Hugging Face 运行 OCR – 完整指南 + +是否曾想过 **如何在一堆扫描文档上运行 OCR**,却不想花费数小时去调参?你并不孤单。在许多项目中,开发者需要 **快速识别扫描文本**,但常常在模型下载和后处理上卡壳。 + +好消息:本教程提供了一个开箱即用的解决方案,**使用 Hugging Face 模型**,自动下载,并添加标点,使输出看起来像人类撰写。完成后,你将拥有一个脚本,能够处理文件夹中的每张图片,并在每个扫描文件旁生成干净的 `.txt` 文件。 + +## 你需要准备的环境 + +- Python 3.8+(代码使用 f‑strings,旧版本不兼容) +- `aspose-ocr` 包(通过 `pip install aspose-ocr` 安装) +- 首次下载模型时需要网络连接 +- 一个存放图像扫描件的文件夹(`.png`、`.jpg` 或 `.tif`) + +就这些——无需额外二进制文件,无需手动模型配置。让我们开始吧。 + +![运行 OCR 示例](https://example.com/ocr-demo.png "运行 OCR 示例") + +## 第一步:导入 Aspose OCR 类并设置环境 + +我们首先从 Aspose OCR 库中导入所需的类。提前导入所有内容可以让脚本保持整洁,也便于快速发现缺失的依赖。 + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*为什么重要*:`OcrEngine` 承担核心识别工作,而 `AsposeAI` 让我们能够接入大型语言模型进行更智能的后处理。如果缺少导入,后面的代码根本无法编译——一定要记得这一步。 + +## 第二步:配置支持 GPU 的 Hugging Face 模型 + +现在告诉 Aspose 从哪里获取模型,以及有多少层将在 GPU 上运行。`allow_auto_download="true"` 标志会为你 **自动下载模型**。 + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **小技巧**:如果没有 GPU,设置 `gpu_layers=0`。模型会回退到 CPU,速度会慢一些,但仍能正常工作。 + +### 为什么选择 Hugging Face 模型? + +Hugging Face 提供了海量即用的 LLM。指向 `Qwen/Qwen2.5-3B-Instruct-GGUF`,即可获得一个体积小、指令微调的模型,能够添加标点、纠正空格,甚至修正轻微的 OCR 错误。这正是 **使用 Hugging Face 模型** 的实际价值所在。 + +## 第三步:初始化 AI 引擎并启用标点后处理 + +AI 引擎不仅用于聊天——这里我们挂载一个 *标点添加器*,对原始 OCR 输出进行清理。 + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*发生了什么*?`set_post_processor` 调用注册了内置的后处理器,在 OCR 引擎完成后运行。它会在适当的位置插入逗号、句号和大写字母,使最终文本更易阅读。 + +## 第四步:创建 OCR 引擎并关联 AI 引擎 + +将 AI 引擎与 OCR 引擎连接后,我们得到一个既能识别字符又能润色结果的统一对象。 + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +如果跳过此步骤,OCR 仍能工作,但会失去标点提升——输出将会是一连串没有间隔的词语。 + +## 第五步:处理文件夹中的每张图片 + +下面是本教程的核心。我们遍历文件夹中的每张图片,执行 OCR,应用后处理器,并将清理后的文本写入同名的 `.txt` 文件。 + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### 预期结果 + +运行脚本后会打印类似如下信息: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +每行会显示置信度分数(快速健康检查),并生成 `invoice_001.png.txt`、`receipt_2024.tif.txt` 等文件,里面包含已加标点、可读性强的文本。 + +### 边缘情况与变体 + +- **非英文扫描**:将 `hugging_face_repo_id` 切换为多语言模型(例如 `microsoft/Multilingual-LLM-GGUF`)。 +- **大批量处理**:将循环包装在 `concurrent.futures.ThreadPoolExecutor` 中实现并行处理,但需注意 GPU 内存限制。 +- **自定义后处理**:如果需要领域特定的清理(如去除发票号码),可将 `"punctuation_adder"` 替换为自己的脚本。 + +## 第六步:清理资源 + +任务完成后,释放资源可以防止内存泄漏,尤其在长时间运行的服务中尤为重要。 + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +忽视此步骤可能导致 GPU 内存残留,进而影响后续运行。 + +## 小结:端到端运行 OCR 的完整流程 + +仅用几行代码,我们展示了 **如何在文件夹的扫描件上运行 OCR**,**使用首次自动下载的 Hugging Face 模型**,并 **在识别文本时自动添加标点**。完整脚本已准备好复制粘贴,修改路径后即可执行。 + +## 后续步骤与相关主题 + +- **批量后处理**:探索 `ocr_engine.run_batch_postprocessor` 以实现更快的批量处理。 +- **替代模型**:如果需要语音转文字,可尝试 `openai/whisper` 系列模型。 +- **与数据库集成**:将提取的文本存入 SQLite 或 Elasticsearch,实现可搜索的文档档案。 + +尽情实验——更换模型、调节 `gpu_layers`,或添加自定义后处理器。Aspose OCR 与 Hugging Face 模型中心的组合,为任何文档数字化项目提供了灵活且强大的基础。 + +--- + +*祝编码愉快!如果遇到问题,欢迎在下方留言或查阅 Aspose OCR 文档获取更深入的配置选项。* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/chinese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..2ef35d08b --- /dev/null +++ b/ocr/chinese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,206 @@ +--- +category: general +date: 2026-04-29 +description: 使用 Python 对图像进行 OCR,自动下载 HuggingFace 模型,并在清理 OCR 文本的同时高效释放 GPU 内存。 +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: zh +og_description: 学习如何在 Python 中对图像进行 OCR,自动下载 HuggingFace 模型,清理文本并释放 GPU 内存。 +og_title: 使用 Python 对图像进行 OCR – 步骤指南 +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: 使用Python对图像进行OCR – 完整指南 +url: /zh/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 使用 Python 对图像执行 OCR – 完整指南 + +是否曾经需要**对图像执行 OCR**但在模型下载或 GPU 内存清理阶段卡住了?你并不是唯一遇到这种情况的人——许多开发者在首次尝试将光学字符识别与大型语言模型结合时都会碰到这个难题。 + +在本教程中,我们将演示一个完整的端到端方案,**在 Python 中下载 HuggingFace 模型**、运行 Aspose OCR、清理原始输出,最后**释放 Python 可以回收的 GPU 内存**。完成后,你将拥有一个可直接运行的脚本,能够把扫描的 PNG 转换为精炼、可搜索的文本。 + +> **你将获得:** 完整可运行的代码示例、每一步意义的解释、避免常见陷阱的技巧,以及如何为自己的项目微调流水线的简要概览。 + +--- + +## 你需要的环境 + +- Python 3.9 或更高(示例在 3.11 上测试) +- `aspose-ocr` 包(通过 `pip install aspose-ocr` 安装) +- 用于**下载 HuggingFace 模型 python**步骤的网络连接 +- 如果想要加速,可选的 CUDA 兼容 GPU(推荐但非必需) + +无需额外的系统级依赖;Aspose OCR 引擎已经打包了所有必需的组件。 + +--- + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*图片替代文字:“perform OCR on image – Aspose OCR 输出前后对比(AI 清理后)”* + +--- + +## 对图像执行 OCR – 步骤概览 + +下面我们将工作流拆分为若干逻辑块。每个块都有独立标题,便于 AI 助手快速定位感兴趣的部分,也方便搜索引擎抓取相关关键词。 + +### 1. 在 Python 中下载 HuggingFace 模型 + +首先需要获取一个语言模型,用作原始 OCR 输出的后处理器。Aspose OCR 附带了一个名为 `AsposeAI` 的帮助类,能够自动从 HuggingFace Hub 拉取模型。 + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**为什么重要:** +- **download HuggingFace model python** – 免去手动处理 zip 包或令牌认证的麻烦。 +- 使用 `int8` 量化可将模型体积压缩至原始的约四分之一,这在后续**release GPU memory python**时至关重要。 + +> **小技巧:** 将 `directory_model_path` 放在 SSD 上可获得更快的加载速度。 + +--- + +### 2. 初始化 AI 助手并启用拼写检查 + +接下来创建 `AsposeAI` 实例并挂载拼写纠正后处理器。这里就是**clean OCR text python**魔法开始的地方。 + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**解释:** +拼写纠正器会检查 OCR 引擎输出的每个 token,并在 `max_edits` 限制内给出修改建议。这个小 tweak 能把 “rec0gn1tion” 纠正为 “recognition”,而无需使用重量级语言模型。 + +--- + +### 3. 将 AI 助手接入 OCR 引擎 + +Aspose 在 23.4 版本中新增了一个方法,允许直接将 AI 引擎插入 OCR 流程。 + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**为什么这样做:** +提前接入 AI 助手后,OCR 引擎可以在运行时使用模型进行即时改进(例如布局检测)。同时代码更简洁——后续无需额外的后处理循环。 + +--- + +### 4. 对扫描图像执行 OCR + +下面是实际**对图像执行 OCR**的核心步骤。将 `YOUR_DIRECTORY/input.png` 替换为你自己的扫描文件路径。 + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +典型的原始输出可能会出现奇怪的换行、误识别字符或多余符号。这也是我们需要后续步骤的原因。 + +**示例原始输出:** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. 使用 AI 后处理器清理 OCR 文本 + +现在让 AI 来清理这些乱七八糟的内容。这是**clean OCR text python**过程的核心。 + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**你将看到的结果:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +可以看到拼写纠正器把 “Th1s” → “This” 修正了,同时去除了多余的 “4n”。模型还会统一空格,这在后续将文本喂入下游 NLP 流水线时常常是个痛点。 + +--- + +### 6. 在 Python 中释放 GPU 内存 – 清理步骤 + +完成后,最好释放 GPU 资源,尤其是在长时间运行的服务中处理多次 OCR 任务时。 + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**内部发生的事情:** +`free_resources()` 会将模型从 GPU 中卸载,归还显存给 CUDA 驱动。`dispose()` 则关闭 OCR 引擎内部的缓冲区。若省略这些调用,处理少量图像后就可能出现显存不足的错误。 + +> **记住:** 如果你计划在循环中批量处理图像,请在每个批次后调用清理,或在整个循环结束后统一释放 `ai_helper`,以避免频繁的显存分配/释放。 + +--- + +## 进阶:针对不同场景微调流水线 + +### 调整模型量化方式 + +如果你拥有强大的 GPU(例如 RTX 4090),想要更高的精度,可以将 `hugging_face_quantization` 改为 `"fp16"` 并将 `gpu_layers` 提升至 `30`。这会占用更多显存,因此需要在每个批次后更积极地**release GPU memory python**。 + +### 使用自定义拼写检查器 + +你可以用自定义的后处理器替换内置的 `spell_corrector`,实现领域专用的纠正(例如医学术语)。只需实现相应接口并将其名称传给 `set_post_processor` 即可。 + +### 批量处理多张图像 + +将 OCR 步骤包装在 `for` 循环中,收集 `cleaned_result.text` 到列表里;如果显存充足,可在循环结束后统一调用 `ai_helper.free_resources()`,从而减少模型重复加载的开销。 + +--- + +## 结论 + +我们已经演示了如何在 Python 中**对图像执行 OCR**,自动**下载 HuggingFace 模型**、**清理 OCR 文本**,并在完成后安全**释放 GPU 内存**。完整脚本已准备好直接复制粘贴,配套的解释帮助你自信地将其扩展到更大的项目中。 + +下一步?尝试将 Qwen 2.5 模型换成更大的 LLaMA 变体,实验不同的后处理器,或将清理后的输出集成到可搜索的 Elasticsearch 索引中。可能性无限,而你已经拥有了坚实的基础。 + +祝编码愉快,愿你的 OCR 流水线始终保持干净且显存友好! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/czech/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..ef7f8ab46 --- /dev/null +++ b/ocr/czech/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Extrahujte text z PDF pomocí Aspose OCR v Pythonu. Naučte se hromadné + zpracování PDF pomocí OCR, převádějte text ze skenovaných PDF a řešte stránky s + nízkou úrovní spolehlivosti. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: cs +og_description: Extrahujte text z PDF pomocí Aspose OCR v Pythonu. Tento průvodce + ukazuje hromadné zpracování OCR PDF, převod naskenovaného textu v PDF a práci s + výsledky s nízkou důvěryhodností. +og_title: Extrahovat text z PDF – OCR PDF pomocí Pythonu +tags: +- OCR +- Python +- PDF processing +title: Extrahovat text z PDF – OCR PDF pomocí Pythonu +url: /cs/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Extrahování textu z PDF – OCR PDF s Pythonem + +Už jste někdy potřebovali **extrahovat text z PDF**, ale soubor je jen naskenovaný obrázek? Nejste v tom sami — mnoho vývojářů narazí na tuto překážku, když se snaží převést PDF na prohledávatelná data. Dobrá zpráva? S Aspose OCR pro Python můžete převést text naskenovaného PDF během několika řádků a dokonce spustit **batch OCR PDF processing**, když máte desítky souborů ke zpracování. + +V tomto tutoriálu projdeme celým pracovním postupem: nastavení knihovny, spuštění OCR na jednom PDF, škálování na dávku a řešení stránek s nízkou důvěrou, abyste věděli, kdy je potřeba ruční revize. Na konci budete mít připravený skript, který extrahuje text z libovolného naskenovaného PDF, a pochopíte, proč se každý krok provádí. + +## Co budete potřebovat + +Než se ponoříme dál, ujistěte se, že máte: + +- Python 3.8 nebo novější (kód používá f‑stringy, takže 3.6+ funguje, ale doporučujeme 3.8+). +- Licence Aspose OCR pro Python nebo klíč pro bezplatnou zkušební verzi (můžete jej získat na webu Aspose). +- Složka s jedním nebo více naskenovanými PDF, které chcete zpracovat. +- Mírné množství místa na disku pro vygenerované *.txt* zprávy. + +A to je vše — žádné těžké externí závislosti, žádné gymnastiky s OpenCV. Engine Aspose OCR udělá těžkou práci za vás. + +## Nastavení prostředí + +Nejprve nainstalujte balíček Aspose OCR z PyPI: + +```bash +pip install aspose-ocr +``` + +Pokud máte soubor licence (`Aspose.OCR.lic`), umístěte jej do kořenového adresáře projektu a aktivujte jej takto: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Tip:** Uchovávejte soubor licence mimo verzovací systém; přidejte jej do `.gitignore`, abyste předešli neúmyslnému zveřejnění. + +## Provádění OCR na jednom PDF + +Nyní extrahujme text z jednoho naskenovaného PDF. Hlavní kroky jsou: + +1. Vytvořte instanci `OcrEngine`. +2. Nasmerujte ji na PDF soubor. +3. Získejte `OcrResult` pro každou stránku. +4. Zapište výstup plain‑textu na disk. +5. Uvolněte engine, aby se uvolnily nativní zdroje. + +Zde je kompletní spustitelný skript: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Co uvidíte:** Pro každou stránku skript vypíše něco jako `Page 1: confidence 97.45%`. Pokud stránka spadne pod práh 80 %, objeví se varování, které vás upozorní, že OCR mohlo některé znaky vynechat. + +### Proč to funguje + +- **`OcrEngine`** je vstupní brána do nativní knihovny Aspose OCR; zajišťuje vše od předzpracování obrazu po rozpoznávání znaků. +- **`extract_from_pdf`** automaticky rasterizuje každou stránku PDF, takže nemusíte PDF převádět na obrázky sami. +- **Confidence scores** vám umožňují automatizovat kontrolu kvality — kritické při zpracování právních nebo zdravotnických dokumentů, kde je přesnost důležitá. + +## Dávkové zpracování OCR PDF s Pythonem + +Většina reálných projektů zahrnuje více než jeden soubor. Rozšíříme skript pro jeden soubor na **batch OCR PDF processing** pipeline, která prochází adresář, zpracuje každé PDF a uloží výsledky do odpovídající podadresáře. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Jak to pomáhá + +- **Scalability:** Funkce projde složku jednou a vytvoří dedikovaný výstupní podadresář pro každé PDF. To udržuje pořádek, když máte desítky dokumentů. +- **Reusability:** `ocr_pdf_file` může být volána z jiných skriptů (např. webové služby), protože je čistou funkcí. +- **Error handling:** Skript vypíše přátelskou zprávu, pokud je vstupní složka prázdná, čímž vás ochrání před tichým selháním. + +## Převod textu z naskenovaného PDF – Řešení okrajových případů + +I když výše uvedený kód funguje pro většinu PDF, můžete narazit na několik zvláštností: + +| Situace | Proč k tomu dochází | Jak to zmírnit | +|-----------|----------------|-----------------| +| **Šifrovaná PDF** | PDF je chráněno heslem. | Předejte heslo do `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Vícejazyčné dokumenty** | Aspose OCR má výchozí jazyk angličtinu. | Nastavte `ocr_engine.language = "spa"` pro španělštinu, nebo poskytněte seznam pro smíšené jazyky. | +| **Velmi velká PDF (>500 stránek)** | Spotřeba paměti stoupá, protože každá stránka je načtena do RAM. | Zpracovávejte PDF po částech pomocí `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` a cyklu. | +| **Špatná kvalita skenu** | Nízké DPI nebo silný šum snižuje důvěru. | Předzpracujte PDF pomocí `engine.image_preprocessing = True` nebo zvyšte DPI pomocí `engine.dpi = 300`. | + +> **Pozor:** Zapnutí předzpracování obrazu může výrazně zvýšit čas CPU. Pokud spouštíte noční dávku, naplánujte dostatek času nebo spusťte samostatného pracovníka. + +## Ověření výstupu + +Po dokončení skriptu najdete strukturu složek podobnou: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Otevřete libovolný soubor `.txt`; měli byste vidět čistý text kódovaný v UTF‑8, který odráží původní naskenovaný obsah. Pokud zaznamenáte poškozené znaky, zkontrolujte nastavení jazyka PDF a ujistěte se, že jsou na stroji nainstalovány správné balíčky fontů. + +## Vyčištění zdrojů + +Aspose OCR se spoléhá na nativní DLL, takže je nezbytné po dokončení zavolat `engine.dispose()`. Zapomenutí tohoto kroku může vést k únikům paměti, zejména u dlouho běžících dávkových úloh. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Kompletní end‑to‑end příklad + +Spojením všeho dohromady, zde je jeden + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/czech/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..dc7895ea7 --- /dev/null +++ b/ocr/czech/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,281 @@ +--- +category: general +date: 2026-04-29 +description: Naučte se rozpoznávat rukopis v Pythonu pomocí Aspose OCR. Tento průvodce + krok za krokem ukazuje, jak efektivně extrahovat rukopisný text. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: cs +og_description: Jak rozpoznat rukopis v Pythonu? Přečtěte si tento kompletní návod + na extrakci ručně psaného textu pomocí Aspose OCR, včetně kódu, tipů a řešení okrajových + případů. +og_title: Jak rozpoznat rukopis v Pythonu – kompletní tutoriál +tags: +- OCR +- Python +- HandwritingRecognition +title: Jak rozpoznat rukopis v Pythonu – kompletní tutoriál +url: /cs/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Jak rozpoznat ruční psaní v Pythonu – kompletní tutoriál + +Už jste někdy potřebovali **jak rozpoznat ruční psaní** v Python projektu, ale nebyli jste si jisti, kde začít? Nejste sami — vývojáři se často ptají: „Mohu získat text ze skenované poznámky?“ Dobrou zprávou je, že moderní OCR knihovny to dělají hračkou. V tomto průvodci si projdeme **jak rozpoznat ruční psaní** pomocí Aspose OCR a také se naučíte **spolehlivě extrahovat ručně psaný text**. + +Probereme vše od instalace knihovny až po ladění prahových hodnot důvěry pro ty nepořádné kurzívní skripty. Na konci budete mít spustitelný skript, který vytiskne extrahovaný text a celkové skóre důvěry — ideální pro aplikace na pořizování poznámek, archivní nástroje nebo jen pro uspokojení zvědavosti. Předchozí zkušenost s OCR není vyžadována; stačí základní znalost Pythonu. + +--- + +## Co budete potřebovat + +- **Python 3.9+** (nejnovější stabilní verze funguje nejlépe) +- **Aspose.OCR for Python via .NET** – nainstalujte pomocí `pip install aspose-ocr` +- **Obrázek s ručním písmem** (JPEG/PNG), který chcete zpracovat +- Volitelné: virtuální prostředí pro udržení závislostí v pořádku + +Pokud máte tyto položky připravené, pojďme na to. + +![příklad rozpoznání ručního psaní](/images/handwritten-sample.jpg "příklad rozpoznání ručního psaní") + +*(Alt text: “příklad rozpoznání ručního psaní ukazující skenovanou ručně psanou poznámku”)* + +--- + +## Krok 1 – Instalace a import tříd Aspose OCR + +Nejprve potřebujeme samotný OCR engine. Aspose poskytuje čisté API, které odděluje rozpoznávání tištěného textu od režimu ručního psaní. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Proč je to důležité:* Importování `HandwritingMode` nám umožňuje říct enginu, že pracujeme s **rozpoznáváním ručně psaného textu v Pythonu** místo tištěného textu, což dramaticky zvyšuje přesnost u kurzívních tahů. + +--- + +## Krok 2 – Vytvoření a konfigurace OCR enginu + +Nyní vytvoříme instanci `OcrEngine` a přepneme ji do režimu ručního psaní. Můžete také upravit prahovou hodnotu důvěry; nižší hodnoty přijímají nejisté psaní, vyšší hodnoty vyžadují čistší vstup. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Tip:* Pokud jsou vaše poznámky skenovány s 300 DPI nebo vyšším, obvykle získáte lepší skóre. U nízkorozlišovacích obrázků zvažte zvětšení pomocí Pillow před předáním enginu. + +--- + +## Krok 3 – Připravte cestu k obrázku + +Ujistěte se, že cesta k souboru ukazuje na obrázek, který chcete zpracovat. Relativní cesty fungují, ale absolutní cesty zabraňují překvapením typu „soubor nenalezen“. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Častá chyba:* Zapomenout uniknout zpětné lomítka ve Windows (`C:\\folder\\image.jpg`). Použití raw řetězců (`r"C:\folder\image.jpg"`) tento problém obejde. + +--- + +## Krok 4 – Spusťte rozpoznávání a zachyťte výsledky + +Metoda `recognize` vykonává těžkou práci. Vrací objekt s vlastnostmi `.text` a `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Očekávaný výstup (příklad):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Pokud důvěra klesne pod 0,5, možná budete muset obrázek vyčistit (odstranit stíny, zvýšit kontrast) nebo snížit práh v Kroku 2. + +--- + +## Krok 5 – Vyčištění zdrojů + +Aspose OCR drží nativní zdroje; volání `dispose()` je uvolní a zabrání únikům paměti, zejména při zpracování mnoha obrázků ve smyčce. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Proč dispose?* V dlouho běžících službách (např. Flask API přijímající nahrávky) zapomenutí uvolnit zdroje může rychle vyčerpat paměť systému. + +--- + +## Kompletní skript – jedním kliknutím + +Spojením všeho dohromady zde máte samostatný skript, který můžete zkopírovat a spustit. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Uložte jej jako `handwritten_ocr.py` a spusťte `python handwritten_ocr.py`. Pokud je vše správně nastaveno, uvidíte extrahovaný text vytištěný v konzoli. + +--- + +## Řešení okrajových případů a běžných variant + +### Obrázky s nízkým kontrastem + +Pokud se pozadí mísí s inkoustem, nejprve zvýšte kontrast: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Otočené poznámky + +Šikmá stránka zápisníku může rozpoznávání narušit. Použijte Pillow k vyrovnání: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Vícestránkové PDF + +Aspose OCR také dokáže zpracovat PDF stránky, ale nejprve musíte každou stránku převést na obrázek (např. pomocí `pdf2image`). Pak projděte obrázky ve smyčce pomocí stejné funkce `recognize_handwriting`. + +--- + +## Tipy pro lepší výsledky **Extract Handwritten Text** + +- **DPI je důležité:** Při skenování cílte na 300 DPI nebo vyšší. +- **Vyhněte se barevným pozadím:** Čistě bílá nebo světle šedá poskytuje nejčistší výstup. +- **Dávkové zpracování:** Zabalte funkci do `for` smyčky a zaznamenejte důvěru každé stránky; odstraňte výsledky pod prahem, aby kvalita zůstala vysoká. +- **Podpora jazyků:** Aspose OCR podporuje více jazyků; nastavte `engine.set_language("en")` pro optimalizaci pouze pro angličtinu. + +--- + +## Často kladené otázky + +**Funguje to na Linuxu?** +Ano — Aspose OCR je dodáván s nativními binárními soubory pro Windows, macOS a Linux. Stačí nainstalovat pip balíček a jste připraveni. + +**Co když je moje ručně psané písmo extrémně kurzívní?** +Zkuste snížit prahovou hodnotu důvěry (`0.5` nebo i `0.4`). Mějte na paměti, že to může zavést více šumu, takže výstup případně doprocesujte (např. kontrolou pravopisu). + +**Mohu to použít ve webové službě?** +Určitě. Funkce `recognize_handwriting` je bezstavová, což ji činí ideální pro endpointy ve Flask nebo FastAPI. Jen nezapomeňte po každém požadavku zavolat `dispose()` nebo použít kontextový manažer. + +--- + +## Závěr + +Přehledně jsme pokryli **jak rozpoznat ruční psaní** v Pythonu od začátku až po konec, ukázali vám, jak **extrahovat ručně psaný text**, ladit nastavení důvěry a řešit běžné problémy jako nízký kontrast nebo otočené stránky. Kompletní skript výše je připraven k spuštění a modulární funkce usnadňuje integraci do větších projektů — ať už vytváříte aplikaci pro poznámky, digitalizujete archivy nebo jen experimentujete s **handwritten ocr tutorial python** technikami. + +Dále můžete prozkoumat **handwritten text recognition python** pro vícejazykové poznámky, nebo kombinovat OCR s analýzou přirozeného jazyka pro automatické shrnutí zápisů ze schůzek. Možnosti jsou neomezené — vyzkoušejte to a nechte svůj kód oživit skici. + +Šťastné kódování a neváhejte položit své otázky v komentářích! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/czech/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..37d1378c2 --- /dev/null +++ b/ocr/czech/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Naučte se, jak spustit OCR na svých skenech, automaticky použít model + Hugging Face a rozpoznat text ze skenů pomocí Aspose OCR během několika minut. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: cs +og_description: Jak spustit OCR na skenovaných dokumentech pomocí Aspose OCR, automaticky + stáhnout model z Hugging Face a získat čistý, interpunkčně správný text. +og_title: Jak spustit OCR s Aspose a Hugging Face – Kompletní průvodce +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Jak spustit OCR s Aspose a Hugging Face – kompletní průvodce +url: /cs/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Jak spustit OCR s Aspose & Hugging Face – Kompletní průvodce + +Už jste se někdy zamysleli nad **tím, jak spustit OCR** na hromadě naskenovaných dokumentů, aniž byste museli strávit hodiny laděním nastavení? Nejste v tom sami. V mnoha projektech vývojáři potřebují rychle **rozpoznat text ze skenů**, ale narážejí na stahování modelů a následné zpracování. + +Dobrá zpráva: tento tutoriál vám ukáže připravené řešení, které **používá model Hugging Face**, automaticky jej stáhne a přidá interpunkci, takže výstup vypadá, jako by jej napsal člověk. Na konci budete mít skript, který zpracuje každý obrázek ve složce a vytvoří čistý `.txt` soubor vedle každého skenu. + +## Co budete potřebovat + +- Python 3.8+ (kód používá f‑stringy, takže starší verze nebudou stačit) +- `aspose-ocr` package (nainstalujte pomocí `pip install aspose-ocr`) +- Přístup k internetu pro první stažení modelu +- Složka s obrazovými skeny (`.png`, `.jpg`, nebo `.tif`) + +To je vše—žádné extra binární soubory, žádné ruční manipulace s modelem. Pojďme na to. + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## Krok 1: Importujte třídy Aspose OCR a nastavte prostředí + +Začínáme načtením potřebných tříd z knihovny Aspose OCR. Importování všeho najednou udržuje skript přehledný a usnadňuje odhalení chybějících závislostí. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Proč je to důležité*: `OcrEngine` provádí těžkou práci, zatímco `AsposeAI` nám umožňuje připojit velký jazykový model pro chytřejší post‑processing. Pokud import vynecháte, zbytek kódu se ani neskompiluje—takže na to nezapomeňte. + +## Krok 2: Nakonfigurujte GPU‑vědomý model Hugging Face + +Nyní říkáme Aspose, kde má model stáhnout a kolik vrstev má běžet na GPU. Příznak `allow_auto_download="true"` provádí část **automatického stažení modelu** za vás. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Tip**: Pokud nemáte GPU, nastavte `gpu_layers=0`. Model přejde na CPU, což je pomalejší, ale stále funguje. + +### Proč zvolit model Hugging Face? + +Hugging Face hostuje obrovskou sbírku připravených LLM. Odkazem na `Qwen/Qwen2.5-3B-Instruct-GGUF` získáte kompaktní model optimalizovaný pro instrukce, který dokáže přidat interpunkci, opravit mezery a dokonce opravit drobné OCR chyby. To je podstata **použití modelu Hugging Face** v praxi. + +## Krok 3: Inicializujte AI engine a povolte post‑processing interpunkce + +AI engine není jen pro pokročilý chat—zde připojujeme *přidávač interpunkce*, který vyčistí surový OCR výstup. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Co se děje?* Volání `set_post_processor` zaregistruje vestavěný post‑processor, který se spustí po dokončení OCR engine. Vezme surový řetězec a vloží čárky, tečky a velká písmena tam, kde patří, čímž učiní konečný text mnohem čitelnějším. + +## Krok 4: Vytvořte OCR engine a připojte AI engine + +Propojení AI engine s OCR engine nám poskytuje jeden objekt, který dokáže jak číst znaky, tak vylepšit výsledek. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Pokud tento krok vynecháte, OCR bude stále fungovat, ale ztratíte přídavek interpunkce—výstup bude vypadat jako proud slov. + +## Krok 5: Zpracujte každý obrázek ve složce + +Toto je jádro tutoriálu. Procházíme každý obrázek, spustíme OCR, aplikujeme post‑processor a zapíšeme vyčištěný text do souboru `.txt` vedle skenu. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Co očekávat + +Spuštění skriptu vypíše něco jako: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Každý řádek vám ukáže skóre důvěry (rychlá kontrola) a vytvoří soubory jako `invoice_001.png.txt`, `receipt_2024.tif.txt` atd., obsahující interpunkčně upravený, čitelný text. + +### Okrajové případy a varianty + +- **Skeny v jiných jazycích**: Přepněte `hugging_face_repo_id` na vícejazykový model (např. `microsoft/Multilingual-LLM-GGUF`). +- **Velké dávky**: Zabalte smyčku do `concurrent.futures.ThreadPoolExecutor` pro paralelní zpracování, ale dbejte na limity paměti GPU. +- **Vlastní post‑processing**: Nahraďte `"punctuation_adder"` svým vlastním skriptem, pokud potřebujete doménově specifické čištění (např. odstraňování čísel faktur). + +## Krok 6: Uvolněte zdroje + +Když úloha skončí, uvolnění zdrojů zabraňuje únikům paměti, což je zvláště důležité, pokud běžíte toto v dlouhodobé službě. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Opomenutí tohoto kroku může zanechat paměť GPU obsazenou, což by zmařilo následné spuštění. + +## Shrnutí: Jak spustit OCR od začátku do konce + +V několika řádcích jsme ukázali **jak spustit OCR** na složce skenů, **použít model Hugging Face**, který se při prvním spuštění sám stáhne, a **rozpoznat text ze skenů** s automaticky přidanou interpunkcí. Kompletní skript je připraven ke zkopírování, úpravě cest a spuštění. + +## Další kroky a související témata + +- **Dávkové post‑processing**: Prozkoumejte `ocr_engine.run_batch_postprocessor` pro ještě rychlejší hromadné zpracování. +- **Alternativní modely**: Vyzkoušejte rodinu `openai/whisper`, pokud potřebujete převod řeči na text vedle OCR. +- **Integrace s databázemi**: Uložte extrahovaný text do SQLite nebo Elasticsearch pro prohledávatelné archivy. + +Neváhejte experimentovat—vyměňte model, upravte `gpu_layers` nebo přidejte vlastní post‑processor. Flexibilita Aspose OCR v kombinaci s hubem modelů Hugging Face dělá z tohoto řešení univerzální základ pro jakýkoli projekt digitalizace dokumentů. + +--- + +*Šťastné kódování! Pokud narazíte na problém, zanechte komentář níže nebo si prohlédněte dokumentaci Aspose OCR pro podrobnější možnosti konfigurace.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/czech/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..851be5831 --- /dev/null +++ b/ocr/czech/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,206 @@ +--- +category: general +date: 2026-04-29 +description: Provést OCR na obrázku pomocí Pythonu, automaticky stáhnout model z HuggingFace + a efektivně uvolnit GPU paměť při čištění OCR textu. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: cs +og_description: Naučte se, jak provádět OCR na obrázku v Pythonu, automaticky stáhnout + model z HuggingFace, vyčistit text a uvolnit paměť GPU. +og_title: Provést OCR na obrázku pomocí Pythonu – krok za krokem průvodce +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Proveďte OCR na obrázku pomocí Pythonu – kompletní průvodce +url: /cs/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Proveďte OCR na obrázku pomocí Pythonu – Kompletní průvodce + +Už jste někdy potřebovali **perform OCR on image** soubory, ale uvízli jste ve stáhnutí modelu nebo při úklidu GPU paměti? Nejste v tom sami – mnoho vývojářů narazí na tuto překážku, když poprvé zkusí spojit optické rozpoznávání znaků s velkými jazykovými modely. + +V tomto tutoriálu projdeme jedním, end‑to‑end řešením, které **downloads a HuggingFace model in Python**, spustí Aspose OCR, vyčistí surový výstup a nakonec **releases GPU memory Python** může získat zpět. Na konci budete mít připravený skript, který ze skenovaného PNG vytvoří upravený, prohledávatelný text. + +> **Co získáte:** kompletní, spustitelný ukázkový kód, vysvětlení, proč je každý krok důležitý, tipy, jak se vyhnout běžným úskalím, a pohled na to, jak si pipeline přizpůsobit pro vlastní projekty. + +--- + +## Co budete potřebovat + +- Python 3.9 nebo novější (příklad byl testován na 3.11) +- `aspose-ocr` balíček (instalujte pomocí `pip install aspose-ocr`) +- Internetové připojení pro krok **download HuggingFace model python** +- GPU kompatibilní s CUDA, pokud chcete zvýšit rychlost (volitelné, ale doporučené) + +Žádné další systémové závislosti nejsou vyžadovány; Aspose OCR engine obsahuje vše, co potřebujete. + +![provedení OCR na obrázku příklad](image.png "Příklad provádění OCR na obrázku s Aspose OCR a post‑procesorem LLM") + +*Text obrázku: “perform OCR on image – Aspose OCR output before and after AI cleaning”* + +--- + +## Provedení OCR na obrázku – Přehled krok za krokem + +Níže rozdělujeme workflow do logických částí. Každá část má vlastní nadpis, takže AI asistenti mohou rychle přejít na část, která vás zajímá, a vyhledávače mohou indexovat relevantní klíčová slova. + +### 1. Stažení modelu HuggingFace v Pythonu + +První věc, kterou musíme udělat, je stáhnout jazykový model, který bude fungovat jako post‑processor pro surový OCR výstup. Aspose OCR přichází s pomocnou třídou `AsposeAI`, která může automaticky stáhnout model z HuggingFace hubu. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Proč je to důležité:** +- **download HuggingFace model python** – vyhnete se ručnímu zpracování zip souborů nebo autentizaci tokenu. +- Použití kvantizace `int8` zmenší model přibližně na čtvrtinu původní velikosti, což je klíčové, když později potřebujete **release GPU memory python**. + +> **Tip:** Uchovávejte `directory_model_path` na SSD pro rychlejší načítání. + +--- + +### 2. Inicializace AI Helper a povolení kontroly pravopisu + +Nyní vytvoříme instanci `AsposeAI` a připojíme post‑processor pro opravu pravopisu. Tady začíná magie **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Vysvětlení:** +Kontrola pravopisu prozkoumá každý token z OCR enginu a navrhne úpravy omezené parametrem `max_edits`. Tento malý zásah může změnit “rec0gn1tion” na “recognition” bez těžkopádného jazykového modelu. + +--- + +### 3. Připojení AI Helper k OCR enginu + +Aspose zavedl v verzi 23.4 novou metodu, která umožňuje přímo zapojit AI engine do OCR pipeline. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Proč to děláme:** +Propojením AI helpera brzy může OCR engine volitelně použít model pro vylepšení za běhu (např. detekce rozložení). Navíc udržuje kód přehledný – není potřeba samostatných smyček pro post‑processing později. + +--- + +### 4. Provedení OCR na naskenovaném obrázku + +Toto je hlavní krok, který skutečně **perform OCR on image** soubory. Nahraďte `YOUR_DIRECTORY/input.png` cestou k vašemu vlastnímu skenu. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Typický surový výstup může obsahovat zalomení řádků na podivných místech, špatně rozpoznané znaky nebo cizí symboly. Proto potřebujeme další krok. + +**Očekávaný surový výstup (příklad):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Vyčištění OCR textu v Pythonu pomocí AI post‑procesoru + +Nyní necháme AI uklidit ten nepořádek. To je jádro procesu **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Výsledek, který uvidíte:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Všimněte si, jak kontrola pravopisu opravila “Th1s” → “This” a odstranila cizí “4n”. Model také normalizuje mezery, což je často bolestivý bod, když později předáváte text do downstream NLP pipeline. + +--- + +### 6. Uvolnění GPU paměti v Pythonu – Kroky úklidu + +Když jste hotovi, je dobré uvolnit GPU zdroje, zejména pokud spouštíte více OCR úloh v dlouho běžící službě. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Co se děje pod kapotou:** +`free_resources()` odstraňuje model z GPU a vrací paměť zpět CUDA driveru. `dispose()` vypíná interní buffery OCR enginu. Vynechání těchto volání může vést k chybám out‑of‑memory už po několika obrázcích. + +> **Pamatujte:** Pokud plánujete zpracovávat dávky v cyklu, volejte úklid po každé dávce nebo znovu použijte stejný `ai_helper` bez uvolnění až do úplného konce. + +--- + +## Bonus: Úprava pipeline pro různé scénáře + +### Úprava kvantizace modelu + +Pokud máte výkonný GPU (např. RTX 4090) a chcete vyšší přesnost, změňte `hugging_face_quantization` na `"fp16"` a navýšte `gpu_layers` na `30`. To spotřebuje více paměti, takže budete muset **release GPU memory python** agresivněji po každé dávce. + +### Použití vlastního kontroloru pravopisu + +Můžete nahradit vestavěný `spell_corrector` vlastním post‑processorem, který provádí doménově specifické opravy (např. medicínskou terminologii). Stačí implementovat požadované rozhraní a předat jeho název do `set_post_processor`. + +### Dávkové zpracování více obrázků + +Zabalte OCR kroky do `for` smyčky, sbírejte `cleaned_result.text` do seznamu a volejte `ai_helper.free_resources()` až po smyčce, pokud máte dostatek GPU RAM. Tím snížíte režii opakovaného načítání modelu. + +--- + +## Závěr + +Právě jsme vám ukázali, jak **perform OCR on image** soubory v Pythonu, automaticky **download a HuggingFace model**, **clean OCR text** a bezpečně **release GPU memory**, když jste hotovi. Kompletní skript je připravený ke zkopírování a vložení a vysvětlení vám dávají jistotu, že jej můžete přizpůsobit větším projektům. + +Další kroky? Zkuste vyměnit model Qwen 2.5 za větší variantu LLaMA, experimentujte s různými post‑processory nebo integrujte vyčištěný výstup do prohledávatelného Elasticsearch indexu. Možnosti jsou neomezené a nyní máte solidní základ, na kterém můžete stavět. + +Šťastné kódování a ať jsou vaše OCR pipeline vždy čisté a paměťově přátelské! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/dutch/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..294b8d61b --- /dev/null +++ b/ocr/dutch/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,216 @@ +--- +category: general +date: 2026-04-29 +description: Haal tekst uit PDF met Aspose OCR in Python. Leer batch‑OCR‑PDF‑verwerking, + converteer gescande PDF‑tekst en verwerk pagina’s met een lage vertrouwensscore. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: nl +og_description: Tekst extraheren uit PDF met Aspose OCR in Python. Deze gids laat + batch‑OCR PDF‑verwerking zien, het converteren van gescande PDF‑tekst en het omgaan + met resultaten met een lage vertrouwensscore. +og_title: Tekst uit PDF extraheren – OCR PDF met Python +tags: +- OCR +- Python +- PDF processing +title: Tekst extraheren uit PDF – OCR PDF met Python +url: /nl/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Tekst extraheren uit PDF – OCR PDF met Python + +Heb je ooit **tekst uit PDF** moeten extraheren, maar is het bestand slechts een gescande afbeelding? Je bent niet de enige—veel ontwikkelaars lopen tegen die muur aan wanneer ze PDF's willen omzetten naar doorzoekbare data. Het goede nieuws? Met Aspose OCR voor Python kun je gescande PDF‑tekst in een paar regels converteren, en zelfs **batch OCR PDF verwerking** uitvoeren wanneer je tientallen bestanden moet verwerken. + +In deze tutorial lopen we de volledige workflow door: de bibliotheek installeren, OCR uitvoeren op een enkele PDF, opschalen naar een batch, en omgaan met pagina's met lage vertrouwensscore zodat je weet wanneer een handmatige controle nodig is. Aan het einde heb je een kant‑klaar script dat tekst uit elke gescande PDF extraheert, en begrijp je de reden achter elke stap. + +## Wat je nodig hebt + +- Python 3.8 of nieuwer (de code gebruikt f‑strings, dus 3.6+ werkt, maar 3.8+ wordt aanbevolen) +- Een Aspose OCR voor Python‑licentie of een gratis proeflicentiesleutel (je kunt er één krijgen op de Aspose‑website) +- Een map met één of meer gescande PDF‑bestanden die je wilt verwerken +- Een bescheiden hoeveelheid schijfruimte voor de gegenereerde *.txt*‑rapporten + +Dat is alles—geen zware externe afhankelijkheden, geen OpenCV‑gymnastiek. De Aspose OCR‑engine doet het zware werk voor je. + +## De omgeving instellen + +Eerst installeer je het Aspose OCR‑pakket van PyPI: + +```bash +pip install aspose-ocr +``` + +Als je een licentiebestand (`Aspose.OCR.lic`) hebt, plaats het dan in de root van je project en activeer het als volgt: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** Houd het licentiebestand buiten versiebeheer; voeg het toe aan `.gitignore` om onbedoelde blootstelling te voorkomen. + +## OCR uitvoeren op een enkele PDF + +Laten we nu tekst extraheren uit een enkele gescande PDF. De kernstappen zijn: + +1. Maak een `OcrEngine`‑instantie. +2. Verwijs deze naar het PDF‑bestand. +3. Haal een `OcrResult` op voor elke pagina. +4. Schrijf de platte‑tekstuitvoer naar schijf. +5. Vernietig de engine om native bronnen vrij te geven. + +Hier is het volledige, uitvoerbare script: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Wat je zult zien:** Voor elke pagina print het script iets als `Page 1: confidence 97.45%`. Als een pagina onder de 80 % drempel valt, verschijnt er een waarschuwing, zodat je weet dat de OCR mogelijk tekens heeft gemist. + +### Waarom dit werkt + +- **`OcrEngine`** is de toegangspoort tot de native Aspose OCR‑bibliotheek; hij behandelt alles van beeldvoorbewerking tot tekenherkenning. +- **`extract_from_pdf`** rastert automatisch elke PDF‑pagina, zodat je de PDF niet zelf naar afbeeldingen hoeft te converteren. +- **Vertrouwensscores** stellen je in staat kwaliteitscontroles te automatiseren—cruciaal wanneer je juridische of medische documenten verwerkt waar nauwkeurigheid van belang is. + +## Batch OCR PDF verwerking met Python + +De meeste projecten in de praktijk omvatten meer dan één bestand. Laten we het script voor één bestand uitbreiden naar een **batch OCR PDF verwerking**‑pipeline die een map doorloopt, elke PDF verwerkt en de resultaten opslaat in een bijbehorende sub‑map. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Hoe dit helpt + +- **Schaalbaarheid:** De functie doorloopt de map één keer en maakt een dedicated output‑sub‑map voor elke PDF. Dit houdt alles overzichtelijk wanneer je tientallen documenten hebt. +- **Herbruikbaarheid:** `ocr_pdf_file` kan worden aangeroepen vanuit andere scripts (bijv. een webservice) omdat het een pure functie is. +- **Foutafhandeling:** Het script print een vriendelijke melding als de invoermap leeg is, zodat je niet stilletjes faalt. + +## Gescande PDF-tekst converteren – Randgevallen afhandelen + +Hoewel de bovenstaande code voor de meeste PDF's werkt, kun je tegen enkele eigenaardigheden aanlopen: + +| Situatie | Waarom het gebeurt | Hoe te mitigeren | +|----------|--------------------|------------------| +| **Versleutelde PDF's** | De PDF is met een wachtwoord beveiligd. | Geef het wachtwoord door aan `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Meertalige documenten** | Aspose OCR gaat standaard uit van Engels. | Stel `ocr_engine.language = "spa"` in voor Spaans, of geef een lijst op voor gemengde talen. | +| **Zeer grote PDF's (>500 pagina's)** | Het geheugenverbruik stijgt omdat elke pagina in RAM wordt geladen. | Verwerk de PDF in delen met `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` en herhaal in een lus. | +| **Slechte scankwaliteit** | Lage DPI of veel ruis verlaagt de vertrouwensscore. | Pre‑process de PDF met `engine.image_preprocessing = True` of verhoog de DPI via `engine.dpi = 300`. | + +> **Let op:** Het inschakelen van beeldvoorbewerking kan de CPU‑tijd merkbaar verhogen. Als je een nachtelijke batch draait, plan dan voldoende tijd in of start een aparte worker. + +## De output verifiëren + +Na afloop van het script vind je een mapstructuur zoals: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Open een willekeurig `.txt`‑bestand; je zou schone, UTF‑8‑gecodeerde tekst moeten zien die de oorspronkelijke gescande inhoud weerspiegelt. Als je onleesbare tekens opmerkt, controleer dan de taalinstellingen van de PDF en zorg dat de juiste lettertype‑pakketten op de machine zijn geïnstalleerd. + +## Resources opruimen + +Aspose OCR maakt gebruik van native DLL's, dus het is essentieel om `engine.dispose()` aan te roepen zodra je klaar bent. Het vergeten van deze stap kan leiden tot geheugenlekken, vooral bij langdurige batch‑taken. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Volledig end‑to‑end voorbeeld + +Alles samengevoegd, hier is een enkele + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/dutch/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..b29c66948 --- /dev/null +++ b/ocr/dutch/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,277 @@ +--- +category: general +date: 2026-04-29 +description: Leer hoe je handschrift herkent in Python met Aspose OCR. Deze stapsgewijze + gids laat zien hoe je handgeschreven tekst efficiënt kunt extraheren. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: nl +og_description: Hoe herken je handschrift in Python? Volg deze volledige gids om handgeschreven + tekst te extraheren met Aspose OCR, inclusief code, tips en afhandeling van randgevallen. +og_title: Hoe handgeschreven tekst te herkennen in Python – volledige tutorial +tags: +- OCR +- Python +- HandwritingRecognition +title: Hoe Handgeschreven Tekst te Herkennen in Python – Volledige Tutorial +url: /nl/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hoe Handgeschreven Tekst Herkennen in Python – Volledige Tutorial + +Heb je ooit **hoe handgeschreven tekst te herkennen** nodig gehad in een Python‑project, maar wist je niet waar te beginnen? Je bent niet de enige—ontwikkelaars vragen voortdurend: “Kan ik tekst uit een gescande notitie halen?” Het goede nieuws is dat moderne OCR‑bibliotheken dit een fluitje van een cent maken. In deze gids lopen we stap voor stap door **hoe handgeschreven tekst te herkennen** met Aspose OCR, en leer je ook **handgeschreven tekst extraheren** op een betrouwbare manier. + +We behandelen alles, van het installeren van de bibliotheek tot het afstemmen van confidence‑drempels voor die rommelige cursieve scripts. Aan het einde heb je een uitvoerbaar script dat de geëxtraheerde tekst en een algemene confidence‑score afdrukt—perfect voor notitie‑apps, archiverings‑tools, of gewoon uit nieuwsgierigheid. Ervaring met OCR is niet vereist; basiskennis van Python is voldoende. + +--- + +## Wat je nodig hebt + +- **Python 3.9+** (de nieuwste stabiele versie werkt het beste) +- **Aspose.OCR for Python via .NET** – installeer met `pip install aspose-ocr` +- Een **handgeschreven afbeelding** (JPEG/PNG) die je wilt verwerken +- Optioneel: een virtuele omgeving om afhankelijkheden netjes te houden + +Als je deze items klaar hebt, laten we dan beginnen. + +![Voorbeeld van handgeschreven herkenning](/images/handwritten-sample.jpg "Voorbeeld van handgeschreven herkenning") + +*(Alt‑tekst: “voorbeeld van hoe handgeschreven tekst te herkennen, toont een gescande handgeschreven notitie”)* + +--- + +## Stap 1 – Installeer en Importeer Aspose OCR‑klassen + +Allereerst hebben we de OCR‑engine zelf nodig. Aspose biedt een nette API die herkenning van afgedrukte tekst scheidt van handgeschreven modus. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Waarom dit belangrijk is:* Het importeren van `HandwritingMode` laat ons de engine vertellen dat we **handgeschreven tekstherkenning python** doen in plaats van afgedrukte tekst, wat de nauwkeurigheid voor cursieve streken aanzienlijk verbetert. + +--- + +## Stap 2 – Maak en Configureer de OCR‑Engine + +Nu maken we een `OcrEngine`‑instantie aan en schakelen we over naar handgeschreven modus. Je kunt ook de confidence‑drempel aanpassen; lagere waarden accepteren wankele handschrift, hogere waarden eisen schonere invoer. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Pro‑tip:* Als je notities scant op 300 DPI of hoger, krijg je meestal een betere score. Voor afbeeldingen met lage resolutie kun je overwegen ze eerst op te schalen met Pillow voordat je ze aan de engine geeft. + +--- + +## Stap 3 – Bereid het Afbeeldingspad voor + +Zorg ervoor dat het bestandspad naar de afbeelding wijst die je wilt verwerken. Relatieve paden werken prima, maar absolute paden voorkomen “bestand niet gevonden” verrassingen. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Veelvoorkomende valkuil:* Het vergeten te escapen van backslashes op Windows (`C:\\folder\\image.jpg`). Het gebruik van raw strings (`r"C:\folder\image.jpg"`) omzeilt dat probleem. + +--- + +## Stap 4 – Voer de Herkenning uit en Leg Resultaten Vast + +De `recognize`‑methode doet het zware werk. Het retourneert een object met de eigenschappen `.text` en `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Verwachte output (voorbeeld):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Als de confidence onder 0,5 daalt, moet je mogelijk de afbeelding opschonen (schaduwen verwijderen, contrast verhogen) of de drempel in Stap 2 verlagen. + +--- + +## Stap 5 – Ruim Resources op + +Aspose OCR houdt native resources vast; het aanroepen van `dispose()` maakt ze vrij en voorkomt geheugenlekken, vooral bij het verwerken van veel afbeeldingen in een lus. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Waarom dispose?* In langdurige services (bijv. een Flask‑API die uploads accepteert) kan het vergeten vrij te geven van resources snel het systeemgeheugen uitputten. + +--- + +## Volledig Script – Eén‑Klik Uitvoering + +Alles samengevoegd, hier is een zelfstandige script die je kunt kopiëren‑plakken en uitvoeren. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Sla dit op als `handwritten_ocr.py` en voer `python handwritten_ocr.py` uit. Als alles correct is ingesteld, zie je de geëxtraheerde tekst in de console verschijnen. + +--- + +## Edge Cases en Veelvoorkomende Variaties Afhandelen + +### Lage‑contrast Afbeeldingen +Als de achtergrond in de inkt doorsijpelt, verhoog dan eerst het contrast: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Gedraaide Notities +Een scheve notitiepagina kan de herkenning verstoren. Gebruik Pillow om te deskewen: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Multi‑page PDF’s +Aspose OCR kan ook PDF‑pagina’s verwerken, maar je moet elke pagina eerst naar een afbeelding converteren (bijv. met `pdf2image`). Loop daarna door de afbeeldingen met dezelfde `recognize_handwriting`‑functie. + +--- + +## Pro‑tips voor Betere **Extract Handwritten Text** Resultaten + +- **DPI is belangrijk:** Streef naar 300 DPI of hoger bij het scannen. +- **Vermijd gekleurde achtergronden:** Zuiver wit of lichtgrijs levert de schoonste output. +- **Batchverwerking:** Plaats de functie in een `for`‑lus en log de confidence van elke pagina; verwijder resultaten onder een drempel om de kwaliteit hoog te houden. +- **Taalondersteuning:** Aspose OCR ondersteunt meerdere talen; stel `engine.set_language("en")` in voor optimalisatie alleen voor Engels. + +--- + +## Veelgestelde Vragen + +**Werkt dit op Linux?** +Ja—Aspose OCR wordt geleverd met native binaries voor Windows, macOS en Linux. Installeer gewoon het pip‑pakket en je bent klaar om te gaan. + +**Wat als mijn handschrift extreem cursief is?** +Probeer de confidence‑drempel te verlagen (`0.5` of zelfs `0.4`). Houd er rekening mee dat dit meer ruis kan introduceren, dus verwerk de output eventueel natrans (bijv. spell‑check). + +**Kan ik dit in een webservice gebruiken?** +Absoluut. De `recognize_handwriting`‑functie is stateless, waardoor hij perfect is voor Flask‑ of FastAPI‑endpoints. Vergeet alleen niet `dispose()` aan te roepen na elk verzoek of gebruik een contextmanager. + +--- + +## Conclusie + +We hebben **hoe handgeschreven tekst te herkennen** in Python van begin tot eind behandeld, laten zien hoe je **handgeschreven tekst kunt extraheren**, confidence‑instellingen kunt afstemmen en veelvoorkomende valkuilen zoals laag contrast of gedraaide pagina’s kunt aanpakken. Het volledige script hierboven staat klaar om te draaien, en de modulaire functie maakt integratie in grotere projecten eenvoudig—of je nu een notitie‑app bouwt, archieven digitaliseert, of gewoon experimenteert met **handwritten ocr tutorial python** technieken. + +Volgende stap: verken **handwritten text recognition python** voor meertalige notities, of combineer OCR met natural‑language processing om vergadernotities automatisch samen te vatten. De mogelijkheden zijn eindeloos—probeer het en laat je code leven geven aan krabbels. + +Happy coding, en voel je vrij om je vragen in de reacties te plaatsen! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/dutch/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..803e1ef4c --- /dev/null +++ b/ocr/dutch/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Leer hoe je OCR op je scans uitvoert, automatisch een Hugging Face‑model + gebruikt en tekst uit scans herkent met Aspose OCR in enkele minuten. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: nl +og_description: Hoe OCR op scans uit te voeren met Aspose OCR, automatisch een Hugging + Face‑model te downloaden en schone, gepuncteerde tekst te krijgen. +og_title: Hoe OCR uit te voeren met Aspose & Hugging Face – Complete gids +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Hoe OCR uit te voeren met Aspose & Hugging Face – Complete gids +url: /nl/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hoe OCR uit te voeren met Aspose & Hugging Face – Complete Gids + +Heb je je ooit afgevraagd **hoe je OCR kunt uitvoeren** op een stapel gescande documenten zonder uren te besteden aan het aanpassen van instellingen? Je bent niet de enige. In veel projecten moeten ontwikkelaars **tekst uit scans herkennen** snel, maar ze stuiten op modeldownloads en post‑processing. + +Goed nieuws: deze tutorial laat je een kant‑klaar oplossing zien die **een Hugging Face‑model gebruikt**, het automatisch downloadt, en interpunctie toevoegt zodat de output leest alsof een mens het heeft geschreven. Aan het einde heb je een script dat elke afbeelding in een map verwerkt en een schoon `.txt`‑bestand naast elke scan plaatst. + +## Wat je nodig hebt + +- Python 3.8+ (de code gebruikt f‑strings, dus oudere versies komen niet vol) +- `aspose-ocr` package (installeren via `pip install aspose-ocr`) +- Internettoegang voor de eerste modeldownload +- Een map met afbeeldingsscans (`.png`, `.jpg` of `.tif`) + +Dat is alles—geen extra binaries, geen handmatig modelgeknutsel. Laten we erin duiken. + +![voorbeeld van OCR uitvoeren](https://example.com/ocr-demo.png "voorbeeld van OCR uitvoeren") + +## Stap 1: Importeer Aspose OCR Klassen & Zet de Omgeving Op + +We beginnen met het ophalen van de benodigde klassen uit de Aspose OCR‑bibliotheek. Alles in één keer importeren houdt het script overzichtelijk en maakt het makkelijk om ontbrekende afhankelijkheden te spotten. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Waarom dit belangrijk is*: `OcrEngine` doet het zware werk, terwijl `AsposeAI` ons in staat stelt een groot taalmodel aan te sluiten voor slimmere post‑processing. Als je de import overslaat, compileert de rest van de code niet—dus vergeet het niet. + +## Stap 2: Configureer een GPU‑Bewust Hugging Face Model + +Nu vertellen we Aspose waar het model moet ophalen en hoeveel lagen op de GPU moeten draaien. De vlag `allow_auto_download="true"` zorgt voor het **automatisch downloaden van het model**. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Pro tip**: Als je geen GPU hebt, stel `gpu_layers=0`. Het model schakelt dan over naar CPU, wat langzamer is maar nog steeds werkt. + +### Waarom een Hugging Face Model Kiezen? + +Hugging Face host een enorme collectie kant‑klare LLM‑s. Door te wijzen naar `Qwen/Qwen2.5-3B-Instruct-GGUF` krijg je een compact, instruction‑tuned model dat interpunctie kan toevoegen, spatiëring kan corrigeren en zelfs kleine OCR‑fouten kan herstellen. Dit is de kern van **use hugging face model** in de praktijk. + +## Stap 3: Initialise de AI Engine en Schakel Punctuatie Post‑Processing In + +De AI‑engine is niet alleen voor fancy chat—hier koppelen we een *punctuation adder* die ruwe OCR‑output opschoont. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Wat gebeurt er?* De aanroep `set_post_processor` registreert een ingebouwde post‑processor die wordt uitgevoerd nadat de OCR‑engine klaar is. Het neemt de ruwe string en voegt komma’s, punten en hoofdletters toe waar ze horen, waardoor de uiteindelijke tekst veel leesbaarder wordt. + +## Stap 4: Maak de OCR Engine en Koppel de AI Engine + +Het koppelen van de AI‑engine aan de OCR‑engine geeft ons één object dat zowel tekens kan lezen als het resultaat kan polijsten. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Als je deze stap overslaat, werkt de OCR nog steeds, maar verlies je de interpunctie‑boost—dus de output ziet eruit als een aaneenschakeling van woorden. + +## Stap 5: Verwerk Elke Afbeelding in een Map + +Hier is het hart van de tutorial. We lopen door elke afbeelding, voeren OCR uit, passen de post‑processor toe, en schrijven de opgeschoonde tekst naar een naast‑liggende `.txt`‑file. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Wat te Verwachten + +Het uitvoeren van het script geeft iets als: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Elke regel toont de confidence‑score (een snelle health‑check) en maakt `invoice_001.png.txt`, `receipt_2024.tif.txt`, enz. aan, met gepunctueerde, menselijk leesbare tekst. + +### Randgevallen & Variaties + +- **Non‑English scans**: Wissel `hugging_face_repo_id` naar een meertalige model (bijv. `microsoft/Multilingual-LLM-GGUF`). +- **Large batches**: Wikkel de lus in een `concurrent.futures.ThreadPoolExecutor` voor parallelle verwerking, maar let op GPU‑geheugenlimieten. +- **Custom post‑processing**: Vervang `"punctuation_adder"` door je eigen script als je domeinspecifieke opschoning nodig hebt (bijv. het verwijderen van factuurnummers). + +## Stap 6: Ruim Resources Op + +Wanneer de taak klaar is, voorkomt het vrijgeven van resources geheugenlekken, vooral belangrijk als je dit binnen een langdurige service draait. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Het negeren van deze stap kan GPU‑geheugen laten hangen, wat volgende runs saboteert. + +## Samenvatting: Hoe OCR End‑to‑End Uitvoeren + +In slechts een handvol regels hebben we laten zien **hoe je OCR kunt uitvoeren** op een map scans, **een Hugging Face‑model gebruiken** dat zichzelf de eerste keer downloadt, en **tekst uit scans herkennen** met automatisch toegevoegde interpunctie. Het volledige script staat klaar om te kopiëren, je paden aan te passen en uit te voeren. + +## Volgende Stappen & Gerelateerde Onderwerpen + +- **Batch post‑processing**: Verken `ocr_engine.run_batch_postprocessor` voor nog snellere bulkafhandeling. +- **Alternative models**: Probeer de `openai/whisper`‑familie als je spraak‑naar‑tekst naast OCR nodig hebt. +- **Integration with databases**: Sla de geëxtraheerde tekst op in SQLite of Elasticsearch voor doorzoekbare archieven. + +Voel je vrij om te experimenteren—verwissel het model, pas `gpu_layers` aan, of voeg je eigen post‑processor toe. De flexibiliteit van Aspose OCR gecombineerd met de modelhub van Hugging Face maakt dit een veelzijdige basis voor elk document‑digitaliseringsproject. + +--- + +*Happy coding! If you hit a snag, drop a comment below or check the Aspose OCR docs for deeper configuration options.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/dutch/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..766edc373 --- /dev/null +++ b/ocr/dutch/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,208 @@ +--- +category: general +date: 2026-04-29 +description: Voer OCR uit op een afbeelding met Python, download automatisch een HuggingFace‑model + en maak GPU‑geheugen efficiënt vrij terwijl je de OCR‑tekst opschoont. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: nl +og_description: Leer hoe je OCR op een afbeelding uitvoert in Python, automatisch + een HuggingFace‑model downloadt, de tekst opschoont en GPU‑geheugen vrijmaakt. +og_title: Voer OCR uit op afbeelding met Python – Stapsgewijze handleiding +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: OCR uitvoeren op afbeelding met Python – Complete gids +url: /nl/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Voer OCR uit op afbeelding met Python – Complete gids + +Heb je ooit **OCR uitvoeren op afbeelding** bestanden nodig gehad, maar liep je vast bij het downloaden van het model of het opruimen van GPU‑geheugen? Je bent niet de enige—veel ontwikkelaars lopen tegen die muur aan wanneer ze voor het eerst proberen optische tekenherkenning te combineren met grote taalmodellen. + +In deze tutorial lopen we een enkele, end‑to‑end oplossing door die **downloads a HuggingFace model in Python**, Aspose OCR uitvoert, de ruwe output opschoont, en uiteindelijk **releases GPU memory Python** kan terugwinnen. Aan het einde heb je een kant‑klaar script dat een gescande PNG omzet in gepolijste, doorzoekbare tekst. + +> **Wat je krijgt:** een compleet, uitvoerbaar code‑voorbeeld, uitleg over waarom elke stap belangrijk is, tips om veelvoorkomende valkuilen te vermijden, en een kijkje hoe je de pipeline kunt aanpassen voor je eigen projecten. + +--- + +## Wat je nodig hebt + +- Python 3.9 of nieuwer (het voorbeeld is getest op 3.11) +- `aspose-ocr` pakket (installeren via `pip install aspose-ocr`) +- Een internetverbinding voor de **download HuggingFace model python** stap +- Een CUDA‑compatibele GPU als je de snelheidsboost wilt (optioneel maar aanbevolen) + +Er zijn geen extra systeem‑niveau afhankelijkheden nodig; de Aspose OCR engine bevat alles wat je nodig hebt. + +--- + +![Voorbeeld van OCR op afbeelding](image.png "Voorbeeld van OCR uitvoeren op afbeelding met Aspose OCR en een LLM post‑processor") + +*Afbeeldings‑alt‑tekst: “perform OCR on image – Aspose OCR output vóór en na AI‑opschoning”* + +--- + +## OCR uitvoeren op afbeelding – Stapsgewijze overzicht + +Hieronder splitsen we de workflow op in logische delen. Elk deel heeft zijn eigen kop, zodat AI‑assistenten snel kunnen springen naar het gedeelte waarin je geïnteresseerd bent, en zoekmachines de relevante trefwoorden kunnen indexeren. + +### 1. HuggingFace‑model downloaden in Python + +Het eerste wat we moeten doen is een taalmodel ophalen dat dient als post‑processor voor de ruwe OCR‑output. Aspose OCR wordt geleverd met een helper‑klasse genaamd `AsposeAI` die automatisch een model van de HuggingFace hub kan ophalen. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Waarom dit belangrijk is:** +- **download HuggingFace model python** – je vermijdt handmatig zip‑bestanden of token‑authenticatie te behandelen. +- Het gebruik van `int8` kwantisatie verkleint het model tot ongeveer een kwart van de oorspronkelijke grootte, wat cruciaal is wanneer je later **release GPU memory python** moet uitvoeren. + +> **Pro tip:** Houd `directory_model_path` op een SSD voor snellere laadtijden. + +--- + +### 2. Initialise de AI‑helper en schakel spell‑checking in + +Nu maken we een `AsposeAI`‑instantie aan en koppelen we een spell‑corrector post‑processor. Hier begint de **clean OCR text python** magie. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Uitleg:** +De spell‑corrector onderzoekt elk token van de OCR‑engine en stelt bewerkingen voor die beperkt zijn door `max_edits`. Deze kleine aanpassing kan “rec0gn1tion” omzetten in “recognition” zonder een zware taalmodel. + +--- + +### 3. Koppel de AI‑helper aan de OCR‑engine + +Aspose introduceerde een nieuwe methode in versie 23.4 waarmee je een AI‑engine direct in de OCR‑pipeline kunt pluggen. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Waarom we het doen:** +Door de AI‑helper vroeg aan te sluiten, kan de OCR‑engine optioneel het model gebruiken voor realtime verbeteringen (bijv. lay‑outdetectie). Het houdt ook de code overzichtelijk—geen aparte post‑processing lussen later. + +--- + +### 4. OCR uitvoeren op de gescande afbeelding + +Hier is de kernstap die daadwerkelijk **perform OCR on image** bestanden uitvoert. Vervang `YOUR_DIRECTORY/input.png` door het pad naar je eigen scan. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Typische ruwe output kan lijnbreuken op vreemde plaatsen bevatten, verkeerd herkende tekens, of vreemde symbolen. Daarom hebben we de volgende stap nodig. + +**Verwachte ruwe output (voorbeeld):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. OCR‑tekst opschonen in Python met de AI post‑processor + +Nu laten we de AI de rommel opruimen. Dit is het hart van het **clean OCR text python** proces. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Resultaat dat je zult zien:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Let op hoe de spell‑corrector “Th1s” → “This” heeft gecorrigeerd en het vreemde “4n” heeft verwijderd. Het model normaliseert ook spaties, wat vaak een pijnpunt is wanneer je de tekst later in downstream NLP‑pipelines stopt. + +--- + +### 6. GPU‑geheugen vrijgeven in Python – Opruimstappen + +Wanneer je klaar bent, is het goede praktijk om GPU‑bronnen vrij te geven, vooral als je meerdere OCR‑taken uitvoert in een langdurige service. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Wat er onder de motorkap gebeurt:** +`free_resources()` laadt het model van de GPU af, waardoor het geheugen terugkeert naar de CUDA‑driver. `dispose()` sluit de interne buffers van de OCR‑engine af. Het overslaan van deze aanroepen kan leiden tot out‑of‑memory fouten na slechts een handvol afbeeldingen. + +> **Onthoud:** Als je van plan bent om batches in een lus te verwerken, roep dan de opruiming aan na elke batch of hergebruik dezelfde `ai_helper` zonder deze vrij te geven tot het einde. + +--- + +## Bonus: De pipeline aanpassen voor verschillende scenario's + +### Modelkwantisatie aanpassen + +Als je een krachtige GPU hebt (bijv. RTX 4090) en hogere nauwkeurigheid wilt, wijzig `hugging_face_quantization` naar `"fp16"` en verhoog `gpu_layers` naar `30`. Dit zal meer geheugen verbruiken, dus je moet **release GPU memory python** agressiever uitvoeren na elke batch. + +### Een aangepaste spell‑checker gebruiken + +Je kunt de ingebouwde `spell_corrector` vervangen door een aangepaste post‑processor die domeinspecifieke correcties uitvoert (bijv. medische terminologie). Implementeer gewoon de vereiste interface en geef de naam door aan `set_post_processor`. + +### Batchverwerking van meerdere afbeeldingen + +Plaats de OCR‑stappen in een `for`‑lus, verzamel `cleaned_result.text` in een lijst, en roep `ai_helper.free_resources()` pas aan na de lus als je voldoende GPU‑RAM hebt. Dit vermindert de overhead van het herhaaldelijk laden van het model. + +--- + +## Conclusie + +We hebben je net laten zien hoe je **perform OCR on image** bestanden in Python kunt **download a HuggingFace model**, **clean OCR text**, en veilig **release GPU memory** kunt uitvoeren wanneer je klaar bent. Het volledige script is klaar om te kopiëren‑plakken, en de uitleg geeft je het vertrouwen om het aan te passen voor grotere projecten. + +Volgende stappen? Probeer het Qwen 2.5‑model te vervangen door een grotere LLaMA‑variant, experimenteer met verschillende post‑processors, of integreer de opgeschoonde output in een doorzoekbare Elasticsearch‑index. De mogelijkheden zijn eindeloos, en je hebt nu een solide basis om op te bouwen. + +Veel programmeerplezier, en moge je OCR‑pipelines altijd schoon en geheugen‑vriendelijk zijn! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/english/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..e9b2c362c --- /dev/null +++ b/ocr/english/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,218 @@ +--- +category: general +date: 2026-04-29 +description: Extract text from PDF using Aspose OCR in Python. Learn batch OCR PDF + processing, convert scanned PDF text, and handle low‑confidence pages. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: en +og_description: Extract text from PDF with Aspose OCR in Python. This guide shows + batch OCR PDF processing, converting scanned PDF text, and handling low‑confidence + results. +og_title: Extract Text from PDF – OCR PDF with Python +tags: +- OCR +- Python +- PDF processing +title: Extract Text from PDF – OCR PDF with Python +url: /python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Extract Text from PDF – OCR PDF with Python + +Ever needed to **extract text from PDF** but the file is just a scanned image? You're not alone—many developers hit that wall when trying to turn PDFs into searchable data. The good news? With Aspose OCR for Python you can convert scanned PDF text in a few lines, and even run **batch OCR PDF processing** when you have dozens of files to handle. + +In this tutorial we’ll walk through the whole workflow: setting up the library, running OCR on a single PDF, scaling to a batch, and dealing with low‑confidence pages so you know when a manual review is required. By the end you’ll have a ready‑to‑run script that extracts text from any scanned PDF, and you’ll understand the why behind each step. + +## What You’ll Need + +Before we dive in, make sure you have: + +- Python 3.8 or newer (the code uses f‑strings, so 3.6+ works, but 3.8+ is recommended) +- An Aspose OCR for Python license or a free trial key (you can get one from the Aspose website) +- A folder with one or more scanned PDFs you want to process +- A modest amount of disk space for the generated *.txt* reports + +That’s it—no heavy external dependencies, no OpenCV gymnastics. The Aspose OCR engine does the heavy lifting for you. + +## Setting Up the Environment + +First, install the Aspose OCR package from PyPI: + +```bash +pip install aspose-ocr +``` + +If you have a license file (`Aspose.OCR.lic`), place it in your project root and activate it like so: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** Keep the license file out of version control; add it to `.gitignore` to avoid accidental exposure. + +## Performing OCR on a Single PDF + +Now let’s extract text from a single scanned PDF. The core steps are: + +1. Create an `OcrEngine` instance. +2. Point it at the PDF file. +3. Retrieve an `OcrResult` for each page. +4. Write the plain‑text output to disk. +5. Dispose of the engine to free native resources. + +Here’s the full, runnable script: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**What you’ll see:** For each page the script prints something like `Page 1: confidence 97.45%`. If a page falls under the 80 % threshold, a warning appears, letting you know that the OCR might have missed characters. + +### Why This Works + +- **`OcrEngine`** is the gateway to the native Aspose OCR library; it handles everything from image preprocessing to character recognition. +- **`extract_from_pdf`** automatically rasterizes each PDF page, so you don’t need to convert the PDF to images yourself. +- **Confidence scores** let you automate quality checks—critical when you’re processing legal or medical documents where accuracy matters. + +## Batch OCR PDF Processing with Python + +Most real‑world projects involve more than one file. Let’s extend the single‑file script to a **batch OCR PDF processing** pipeline that walks through a directory, processes each PDF, and stores the results in a matching sub‑folder. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### How This Helps + +- **Scalability:** The function walks the folder once, creating a dedicated output sub‑folder for each PDF. This keeps things tidy when you have dozens of documents. +- **Reusability:** `ocr_pdf_file` can be called from other scripts (e.g., a web service) because it’s a pure function. +- **Error handling:** The script prints a friendly message if the input folder is empty, saving you from a silent failure. + +## Converting Scanned PDF Text – Handling Edge Cases + +While the code above works for most PDFs, you might run into a few quirks: + +| Situation | Why It Happens | How to Mitigate | +|-----------|----------------|-----------------| +| **Encrypted PDFs** | The PDF is password‑protected. | Pass the password to `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Multi‑language documents** | Aspose OCR defaults to English. | Set `ocr_engine.language = "spa"` for Spanish, or provide a list for mixed languages. | +| **Very large PDFs (>500 pages)** | Memory usage spikes because each page is loaded into RAM. | Process the PDF in chunks using `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` and loop. | +| **Poor scan quality** | Low DPI or heavy noise reduces confidence. | Pre‑process the PDF with `engine.image_preprocessing = True` or increase the DPI via `engine.dpi = 300`. | + +> **Watch out:** Turning on image preprocessing can increase CPU time noticeably. If you’re running a nightly batch, schedule enough time or spin up a separate worker. + +## Verifying the Output + +After the script finishes, you’ll find a folder structure like: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Open any `.txt` file; you should see clean, UTF‑8 encoded text that mirrors the original scanned content. If you notice garbled characters, double‑check the PDF’s language settings and ensure the correct font packs are installed on the machine. + +## Cleaning Up Resources + +Aspose OCR relies on native DLLs, so it’s essential to call `engine.dispose()` once you’re done. Forgetting this step can lead to memory leaks, especially in long‑running batch jobs. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Full End‑to‑End Example + +Putting everything together, here’s a single + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/english/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..197c0e5c8 --- /dev/null +++ b/ocr/english/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,277 @@ +--- +category: general +date: 2026-04-29 +description: Learn how to recognize handwriting in Python with Aspose OCR. This step‑by‑step + guide shows how to extract handwritten text efficiently. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: en +og_description: How to recognize handwriting in Python? Follow this complete guide + to extract handwritten text using Aspose OCR, with code, tips, and edge‑case handling. +og_title: How to Recognize Handwriting in Python – Full Tutorial +tags: +- OCR +- Python +- HandwritingRecognition +title: How to Recognize Handwriting in Python – Full Tutorial +url: /python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# How to Recognize Handwriting in Python – Full Tutorial + +Ever needed **how to recognize handwriting** in a Python project but weren’t sure where to start? You’re not alone—developers constantly ask, “Can I pull text out of a scanned note?” The good news is that modern OCR libraries make this a piece of cake. In this guide we’ll walk through **how to recognize handwriting** using Aspose OCR, and you’ll also learn to **extract handwritten text** reliably. + +We’ll cover everything from installing the library to tweaking confidence thresholds for those messy cursive scripts. By the end you’ll have a runnable script that prints the extracted text and an overall confidence score—perfect for note‑taking apps, archival tools, or just satisfying curiosity. No prior OCR experience is required; basic Python knowledge is enough. + +--- + +## What You’ll Need + +- **Python 3.9+** (the latest stable version works best) +- **Aspose.OCR for Python via .NET** – install with `pip install aspose-ocr` +- A **handwritten image** (JPEG/PNG) you want to process +- Optional: a virtual environment to keep dependencies tidy + +If you’ve got these items ready, let’s dive in. + +![How to recognize handwriting example](/images/handwritten-sample.jpg "How to recognize handwriting example") + +*(Alt text: “how to recognize handwriting example showing a scanned handwritten note”)* + +--- + +## Step 1 – Install and Import Aspose OCR Classes + +First things first, we need the OCR engine itself. Aspose provides a clean API that separates printed‑text recognition from handwritten mode. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Why this matters:* Importing `HandwritingMode` lets us tell the engine we’re dealing with **handwritten text recognition python** rather than printed text, which dramatically improves accuracy for cursive strokes. + +--- + +## Step 2 – Create and Configure the OCR Engine + +Now we spin up an `OcrEngine` instance and switch it to handwritten mode. You can also adjust the confidence threshold; lower values accept shaky writing, higher values demand cleaner input. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Pro tip:* If your notes are scanned at 300 DPI or higher, you’ll usually get a better score. For low‑resolution images, consider up‑scaling with Pillow before feeding them to the engine. + +--- + +## Step 3 – Prepare the Image Path + +Make sure the file path points to the image you want to process. Relative paths work fine, but absolute paths avoid “file not found” surprises. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Common pitfall:* Forgetting to escape backslashes on Windows (`C:\\folder\\image.jpg`). Using raw strings (`r"C:\folder\image.jpg"`) sidesteps that issue. + +--- + +## Step 4 – Run the Recognition and Capture Results + +The `recognize` method does the heavy lifting. It returns an object with `.text` and `.confidence` properties. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Expected output (example):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +If the confidence drops below 0.5, you might need to clean the image (remove shadows, increase contrast) or lower the threshold in Step 2. + +--- + +## Step 5 – Clean Up Resources + +Aspose OCR holds native resources; calling `dispose()` releases them and prevents memory leaks, especially when processing many images in a loop. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Why dispose?* In long‑running services (e.g., a Flask API that accepts uploads), forgetting to free resources can quickly exhaust system memory. + +--- + +## Full Script – One‑Click Run + +Putting everything together, here’s a self‑contained script you can copy‑paste and execute. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Save this as `handwritten_ocr.py` and run `python handwritten_ocr.py`. If everything is set up correctly, you’ll see the extracted text printed to the console. + +--- + +## Handling Edge Cases and Common Variations + +### Low‑Contrast Images +If the background bleeds into the ink, boost contrast first: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Rotated Notes +A slanted notebook page can throw off recognition. Use Pillow to deskew: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Multi‑Page PDFs +Aspose OCR can also handle PDF pages, but you need to convert each page to an image first (e.g., using `pdf2image`). Then loop through the images with the same `recognize_handwriting` function. + +--- + +## Pro Tips for Better **Extract Handwritten Text** Results + +- **DPI matters:** Aim for 300 DPI or higher when scanning. +- **Avoid colored backgrounds:** Pure white or light gray yields the cleanest output. +- **Batch processing:** Wrap the function in a `for` loop and log each page’s confidence; discard results below a threshold to keep quality high. +- **Language support:** Aspose OCR supports multiple languages; set `engine.set_language("en")` for English‑only optimization. + +--- + +## Frequently Asked Questions + +**Does this work on Linux?** +Yes—Aspose OCR ships with native binaries for Windows, macOS, and Linux. Just install the pip package and you’re good to go. + +**What if my handwriting is extremely cursive?** +Try lowering the confidence threshold (`0.5` or even `0.4`). Keep in mind this may introduce more noise, so post‑process the output (e.g., spell‑check) if needed. + +**Can I use this in a web service?** +Absolutely. The `recognize_handwriting` function is stateless, making it perfect for Flask or FastAPI endpoints. Just remember to call `dispose()` after each request or use a context manager. + +--- + +## Conclusion + +We’ve covered **how to recognize handwriting** in Python from start to finish, showing you how to **extract handwritten text**, tweak confidence settings, and handle common pitfalls like low contrast or rotated pages. The complete script above is ready to run, and the modular function makes it easy to integrate into larger projects—whether you’re building a note‑taking app, digitizing archives, or just experimenting with **handwritten ocr tutorial python** techniques. + +Next up, you might explore **handwritten text recognition python** for multilingual notes, or combine OCR with natural‑language processing to auto‑summarize meeting minutes. The sky’s the limit—give it a try and let your code give life to scribbles. + +Happy coding, and feel free to drop your questions in the comments! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/english/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..8ea7136e6 --- /dev/null +++ b/ocr/english/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Learn how to run OCR on your scans, use Hugging Face model automatically, + and recognize text from scans with Aspose OCR in minutes. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: en +og_description: How to run OCR on scans using Aspose OCR, automatically download a + Hugging Face model, and get clean, punctuated text. +og_title: How to Run OCR with Aspose & Hugging Face – Complete Guide +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: How to Run OCR with Aspose & Hugging Face – Complete Guide +url: /python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# How to Run OCR with Aspose & Hugging Face – Complete Guide + +Ever wondered **how to run OCR** on a pile of scanned documents without spending hours tweaking settings? You're not alone. In many projects, developers need to **recognize text from scans** quickly, yet they stumble over model downloads and post‑processing. + +Good news: this tutorial shows you a ready‑to‑run solution that **uses a Hugging Face model**, automatically pulls it down, and adds punctuation so the output reads like a human wrote it. By the end, you'll have a script that processes every image in a folder and drops a clean `.txt` file beside each scan. + +## What You’ll Need + +- Python 3.8+ (the code uses f‑strings, so older versions won’t cut it) +- `aspose-ocr` package (install via `pip install aspose-ocr`) +- Internet access for the first‑time model download +- A folder of image scans (`.png`, `.jpg`, or `.tif`) + +That’s it—no extra binaries, no manual model fiddling. Let’s dive in. + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## Step 1: Import Aspose OCR Classes & Set Up the Environment + +We start by pulling the necessary classes from the Aspose OCR library. Importing everything up front keeps the script tidy and makes it easy to spot missing dependencies. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Why this matters*: `OcrEngine` does the heavy lifting, while `AsposeAI` lets us plug in a large language model for smarter post‑processing. If you skip the import, the rest of the code won’t even compile—so don’t forget it. + +## Step 2: Configure a GPU‑Aware Hugging Face Model + +Now we tell Aspose where to fetch the model and how many layers should run on the GPU. The `allow_auto_download="true"` flag does the **download model automatically** part for you. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Pro tip**: If you don’t have a GPU, set `gpu_layers=0`. The model will fall back to CPU, which is slower but still works. + +### Why Choose a Hugging Face Model? + +Hugging Face hosts a massive collection of ready‑to‑use LLMs. By pointing to `Qwen/Qwen2.5-3B-Instruct-GGUF`, you get a compact, instruction‑tuned model that can add punctuation, correct spacing, and even fix minor OCR errors. This is the essence of **use hugging face model** in practice. + +## Step 3: Initialise the AI Engine and Enable Punctuation Post‑Processing + +The AI engine isn’t just for fancy chat—here we attach a *punctuation adder* that cleans up raw OCR output. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*What’s happening?* The `set_post_processor` call registers a built‑in post‑processor that runs after the OCR engine finishes. It takes the raw string and inserts commas, periods, and capital letters where they belong, making the final text far more readable. + +## Step 4: Create the OCR Engine and Attach the AI Engine + +Connecting the AI engine to the OCR engine gives us a single object that can both read characters and polish the result. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +If you skip this step, the OCR will still work, but you’ll lose the punctuation boost—so the output will look like a stream of words. + +## Step 5: Process Every Image in a Folder + +Here’s the heart of the tutorial. We loop over each image, run OCR, apply the post‑processor, and write the cleaned text to a side‑by‑side `.txt` file. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### What to Expect + +Running the script prints something like: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Each line tells you the confidence score (a quick health check) and creates `invoice_001.png.txt`, `receipt_2024.tif.txt`, etc., containing punctuated, human‑readable text. + +### Edge Cases & Variations + +- **Non‑English scans**: Switch the `hugging_face_repo_id` to a multilingual model (e.g., `microsoft/Multilingual-LLM-GGUF`). +- **Large batches**: Wrap the loop in a `concurrent.futures.ThreadPoolExecutor` for parallel processing, but be mindful of GPU memory limits. +- **Custom post‑processing**: Replace `"punctuation_adder"` with your own script if you need domain‑specific cleanup (e.g., removing invoice numbers). + +## Step 6: Clean Up Resources + +When the job finishes, freeing resources prevents memory leaks, especially important if you’re running this inside a long‑lived service. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Neglecting this step can leave GPU memory hanging, which would sabotage subsequent runs. + +## Recap: How to Run OCR End‑to‑End + +In just a handful of lines, we’ve shown **how to run OCR** on a folder of scans, **use a Hugging Face model** that downloads itself the first time, and **recognize text from scans** with punctuation added automatically. The complete script is ready to copy‑paste, adjust your paths, and execute. + +## Next Steps & Related Topics + +- **Batch post‑processing**: Explore `ocr_engine.run_batch_postprocessor` for even faster bulk handling. +- **Alternative models**: Try the `openai/whisper` family if you need speech‑to‑text alongside OCR. +- **Integration with databases**: Store the extracted text in SQLite or Elasticsearch for searchable archives. + +Feel free to experiment—swap the model, tweak `gpu_layers`, or add your own post‑processor. The flexibility of Aspose OCR combined with Hugging Face’s model hub makes this a versatile foundation for any document‑digitization project. + +--- + +*Happy coding! If you hit a snag, drop a comment below or check the Aspose OCR docs for deeper configuration options.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/english/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..9353f2387 --- /dev/null +++ b/ocr/english/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,208 @@ +--- +category: general +date: 2026-04-29 +description: Perform OCR on image using Python, auto‑download a HuggingFace model + and release GPU memory efficiently while cleaning OCR text. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: en +og_description: Learn how to perform OCR on image in Python, automatically download + a HuggingFace model, clean the text and free GPU memory. +og_title: Perform OCR on Image with Python – Step‑by‑Step Guide +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Perform OCR on Image with Python – Complete Guide +url: /python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Perform OCR on Image with Python – Complete Guide + +Ever needed to **perform OCR on image** files but got stuck at the model‑download or GPU‑memory cleanup stage? You're not the only one—many developers hit that wall when they first try to combine optical character recognition with large language models. + +In this tutorial we’ll walk through a single, end‑to‑end solution that **downloads a HuggingFace model in Python**, runs Aspose OCR, cleans the raw output, and finally **releases GPU memory Python** can reclaim. By the end you’ll have a ready‑to‑run script that turns a scanned PNG into polished, searchable text. + +> **What you’ll get:** a complete, runnable code sample, explanations of why each step matters, tips for avoiding common pitfalls, and a glimpse at how to tweak the pipeline for your own projects. + +--- + +## What You’ll Need + +- Python 3.9 or newer (the example was tested on 3.11) +- `aspose-ocr` package (install via `pip install aspose-ocr`) +- An internet connection for the **download HuggingFace model python** step +- A CUDA‑compatible GPU if you want the speed boost (optional but recommended) + +No extra system‑level dependencies are required; the Aspose OCR engine bundles everything you need. + +--- + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Image alt text: “perform OCR on image – Aspose OCR output before and after AI cleaning”* + +--- + +## Perform OCR on Image – Step‑by‑Step Overview + +Below we break the workflow into logical chunks. Each chunk has its own heading, so AI assistants can quickly jump to the part you’re interested in, and search engines can index the relevant keywords. + +### 1. Download HuggingFace Model in Python + +The first thing we have to do is fetch a language model that will act as a post‑processor for the raw OCR output. Aspose OCR ships with a helper class called `AsposeAI` that can automatically pull a model from the HuggingFace hub. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Why this matters:** +- **download HuggingFace model python** – you avoid manually handling zip files or token authentication. +- Using `int8` quantization shrinks the model to roughly a quarter of its original size, which is crucial when you later need to **release GPU memory python**. + +> **Pro tip:** Keep `directory_model_path` on an SSD for faster load times. + +--- + +### 2. Initialise the AI Helper and Enable Spell‑Checking + +Now we create an `AsposeAI` instance and attach a spell‑corrector post‑processor. This is where the **clean OCR text python** magic begins. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explanation:** +The spell‑corrector examines each token from the OCR engine and suggests edits limited by `max_edits`. This tiny tweak can turn “rec0gn1tion” into “recognition” without a heavyweight language model. + +--- + +### 3. Hook the AI Helper into the OCR Engine + +Aspose introduced a new method in version 23.4 that lets you plug an AI engine directly into the OCR pipeline. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Why we do it:** +By wiring the AI helper early, the OCR engine can optionally use the model for on‑the‑fly improvements (e.g., layout detection). It also keeps the code tidy—no need for separate post‑processing loops later. + +--- + +### 4. Perform OCR on the Scanned Image + +Here’s the core step that actually **perform OCR on image** files. Replace `YOUR_DIRECTORY/input.png` with the path to your own scan. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Typical raw output might contain line breaks in odd places, mis‑recognized characters, or stray symbols. That’s why we need the next step. + +**Expected raw output (example):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Clean OCR Text in Python with the AI Post‑Processor + +Now we let the AI clean up the mess. This is the heart of the **clean OCR text python** process. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Result you’ll see:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Notice how the spell‑corrector fixed the “Th1s” → “This” and removed the stray “4n”. The model also normalises spacing, which is often a pain point when you later feed the text into downstream NLP pipelines. + +--- + +### 6. Release GPU Memory in Python – Clean‑up Steps + +When you’re done, it’s good practice to free GPU resources, especially if you’re running multiple OCR jobs in a long‑running service. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**What happens under the hood:** +`free_resources()` unloads the model from GPU, returning the memory to the CUDA driver. `dispose()` shuts down the OCR engine’s internal buffers. Skipping these calls can lead to out‑of‑memory errors after just a handful of images. + +> **Remember:** If you plan to process batches in a loop, call the clean‑up after each batch or reuse the same `ai_helper` without freeing it until the very end. + +--- + +## Bonus: Tweaking the Pipeline for Different Scenarios + +### Adjusting Model Quantization + +If you have a powerful GPU (e.g., RTX 4090) and want higher accuracy, change `hugging_face_quantization` to `"fp16"` and bump `gpu_layers` to `30`. This will consume more memory, so you’ll need to **release GPU memory python** more aggressively after each batch. + +### Using a Custom Spell‑Checker + +You can swap out the built‑in `spell_corrector` for a custom post‑processor that does domain‑specific corrections (e.g., medical terminology). Just implement the required interface and pass its name to `set_post_processor`. + +### Batch Processing Multiple Images + +Wrap the OCR steps in a `for` loop, collect `cleaned_result.text` into a list, and call `ai_helper.free_resources()` only after the loop if you have enough GPU RAM. This reduces the overhead of repeatedly loading the model. + +--- + +## Conclusion + +We’ve just shown you how to **perform OCR on image** files in Python, automatically **download a HuggingFace model**, **clean OCR text**, and safely **release GPU memory** when you’re done. The complete script is ready to copy‑paste, and the explanations give you the confidence to adapt it to larger projects. + +Next steps? Try swapping the Qwen 2.5 model for a larger LLaMA variant, experiment with different post‑processors, or integrate the cleaned output into a searchable Elasticsearch index. The possibilities are endless, and you now have a solid foundation to build on. + +Happy coding, and may your OCR pipelines be ever‑clean and memory‑friendly! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/french/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..3596dfb9e --- /dev/null +++ b/ocr/french/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,217 @@ +--- +category: general +date: 2026-04-29 +description: Extraire du texte d’un PDF avec Aspose OCR en Python. Apprenez le traitement + OCR par lots des PDF, convertissez le texte des PDF numérisés et gérez les pages + à faible confiance. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: fr +og_description: Extrayez du texte d’un PDF avec Aspose OCR en Python. Ce guide montre + le traitement OCR par lots des PDF, la conversion du texte des PDF numérisés et + la gestion des résultats à faible confiance. +og_title: Extraire du texte d’un PDF – OCR PDF avec Python +tags: +- OCR +- Python +- PDF processing +title: Extraire le texte d'un PDF – OCR PDF avec Python +url: /fr/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Extraire du texte d'un PDF – OCR PDF avec Python + +Vous avez déjà eu besoin d'**extraire du texte d'un PDF** mais le fichier n'est qu'une image numérisée ? Vous n'êtes pas seul—de nombreux développeurs rencontrent ce problème lorsqu'ils essaient de transformer des PDF en données recherchables. La bonne nouvelle ? Avec Aspose OCR for Python, vous pouvez convertir le texte d'un PDF numérisé en quelques lignes, et même exécuter le **traitement OCR PDF par lots** lorsque vous avez des dizaines de fichiers à gérer. + +Dans ce tutoriel, nous parcourrons l’ensemble du flux de travail : configuration de la bibliothèque, exécution de l'OCR sur un PDF unique, passage à un traitement par lots, et gestion des pages à faible confiance afin de savoir quand une révision manuelle est nécessaire. À la fin, vous disposerez d'un script prêt à l'emploi qui extrait le texte de n'importe quel PDF numérisé, et vous comprendrez les raisons derrière chaque étape. + +## Ce dont vous avez besoin + +- Python 3.8 ou plus récent (le code utilise des f‑strings, donc 3.6+ fonctionne, mais 3.8+ est recommandé) +- Une licence Aspose OCR for Python ou une clé d'essai gratuite (vous pouvez en obtenir une sur le site web d'Aspose) +- Un dossier contenant un ou plusieurs PDF numérisés que vous souhaitez traiter +- Une quantité modeste d'espace disque pour les rapports *.txt* générés + +C’est tout—pas de dépendances externes lourdes, pas de gymnastique OpenCV. Le moteur Aspose OCR fait le travail lourd pour vous. + +## Configuration de l'environnement + +Tout d'abord, installez le package Aspose OCR depuis PyPI : + +```bash +pip install aspose-ocr +``` + +Si vous avez un fichier de licence (`Aspose.OCR.lic`), placez-le à la racine de votre projet et activez-le comme suit : + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Astuce :** Gardez le fichier de licence hors du contrôle de version ; ajoutez‑le à `.gitignore` pour éviter toute exposition accidentelle. + +## Effectuer l'OCR sur un PDF unique + +Examinons maintenant comment extraire du texte d'un PDF numérisé unique. Les étapes principales sont : + +1. Créez une instance `OcrEngine`. +2. Pointez‑la vers le fichier PDF. +3. Récupérez un `OcrResult` pour chaque page. +4. Écrivez la sortie en texte brut sur le disque. +5. Libérez le moteur pour libérer les ressources natives. + +Voici le script complet et exécutable : + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Ce que vous verrez :** Pour chaque page, le script affiche quelque chose comme `Page 1: confidence 97.45%`. Si une page est en dessous du seuil de 80 %, un avertissement apparaît, vous indiquant que l'OCR a peut‑être manqué des caractères. + +### Pourquoi cela fonctionne + +- **`OcrEngine`** est la porte d'accès à la bibliothèque native Aspose OCR ; il gère tout, du prétraitement d'image à la reconnaissance de caractères. +- **`extract_from_pdf`** rasterise automatiquement chaque page du PDF, vous n'avez donc pas besoin de convertir le PDF en images vous‑même. +- **Confidence scores** vous permettent d'automatiser les contrôles de qualité—crucial lorsque vous traitez des documents juridiques ou médicaux où la précision est importante. + +## Traitement OCR PDF par lots avec Python + +La plupart des projets réels impliquent plus d'un fichier. Étendons le script mono‑fichier à un pipeline de **traitement OCR PDF par lots** qui parcourt un répertoire, traite chaque PDF et stocke les résultats dans un sous‑dossier correspondant. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Comment cela aide + +- **Scalabilité :** La fonction parcourt le dossier une fois, créant un sous‑dossier de sortie dédié pour chaque PDF. Cela maintient l'ordre lorsque vous avez des dizaines de documents. +- **Réutilisabilité :** `ocr_pdf_file` peut être appelé depuis d'autres scripts (par ex., un service web) car c'est une fonction pure. +- **Gestion des erreurs :** Le script affiche un message convivial si le dossier d'entrée est vide, vous évitant ainsi un échec silencieux. + +## Conversion du texte d'un PDF numérisé – Gestion des cas limites + +Bien que le code ci‑dessus fonctionne pour la plupart des PDF, vous pourriez rencontrer quelques particularités : + +| Situation | Pourquoi cela se produit | Comment atténuer | +|-----------|--------------------------|------------------| +| **PDF chiffrés** | Le PDF est protégé par un mot de passe. | Passez le mot de passe à `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Documents multilingues** | Aspose OCR utilise l'anglais par défaut. | Définissez `ocr_engine.language = "spa"` pour l'espagnol, ou fournissez une liste pour des langues mixtes. | +| **PDF très volumineux (>500 pages)** | L'utilisation de la mémoire augmente car chaque page est chargée en RAM. | Traitez le PDF par morceaux en utilisant `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` et bouclez. | +| **Qualité de numérisation médiocre** | Une faible résolution DPI ou un bruit important réduit la confiance. | Pré‑traitez le PDF avec `engine.image_preprocessing = True` ou augmentez le DPI via `engine.dpi = 300`. | + +> **Attention :** Activer le prétraitement d'image peut augmenter sensiblement le temps CPU. Si vous exécutez un lot nocturne, prévoyez suffisamment de temps ou lancez un travailleur séparé. + +## Vérification de la sortie + +Après l'exécution du script, vous trouverez une structure de dossiers similaire à : + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Ouvrez n'importe quel fichier `.txt` ; vous devriez voir du texte propre, encodé en UTF‑8, qui reflète le contenu numérisé original. Si vous remarquez des caractères illisibles, revérifiez les paramètres de langue du PDF et assurez‑vous que les packs de polices appropriés sont installés sur la machine. + +## Nettoyage des ressources + +Aspose OCR repose sur des DLL natives, il est donc essentiel d'appeler `engine.dispose()` une fois terminé. Oublier cette étape peut entraîner des fuites de mémoire, surtout dans des traitements par lots de longue durée. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Exemple complet de bout en bout + +En réunissant tout, voici un exemple complet + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/french/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..a904dd7a6 --- /dev/null +++ b/ocr/french/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Apprenez à reconnaître l'écriture manuscrite en Python avec Aspose OCR. + Ce guide étape par étape montre comment extraire efficacement le texte manuscrit. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: fr +og_description: Comment reconnaître l'écriture manuscrite en Python ? Suivez ce guide + complet pour extraire le texte manuscrit à l’aide d’Aspose OCR, avec du code, des + astuces et la gestion des cas limites. +og_title: Comment reconnaître l'écriture manuscrite en Python – Tutoriel complet +tags: +- OCR +- Python +- HandwritingRecognition +title: Comment reconnaître l'écriture manuscrite en Python – Tutoriel complet +url: /fr/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Comment reconnaître l'écriture manuscrite en Python – Tutoriel complet + +Vous avez déjà eu besoin de **comment reconnaître l'écriture manuscrite** dans un projet Python mais vous ne saviez pas par où commencer ? Vous n'êtes pas seul — les développeurs demandent constamment « Puis‑je extraire du texte d'une note numérisée ? ». La bonne nouvelle, c’est que les bibliothèques OCR modernes rendent cela très simple. Dans ce guide, nous allons parcourir **comment reconnaître l'écriture manuscrite** en utilisant Aspose OCR, et vous apprendrez également à **extraire du texte manuscrit** de façon fiable. + +Nous couvrirons tout, de l'installation de la bibliothèque à l'ajustement des seuils de confiance pour les scripts cursifs désordonnés. À la fin, vous disposerez d’un script exécutable qui affiche le texte extrait et un score de confiance global — parfait pour les applications de prise de notes, les outils d’archivage ou simplement pour satisfaire votre curiosité. Aucune expérience préalable en OCR n’est requise ; des connaissances de base en Python suffisent. + +--- + +## Ce dont vous aurez besoin + +- **Python 3.9+** (la dernière version stable fonctionne le mieux) +- **Aspose.OCR for Python via .NET** – installez avec `pip install aspose-ocr` +- Une **image manuscrite** (JPEG/PNG) que vous souhaitez traiter +- Optionnel : un environnement virtuel pour garder les dépendances propres + +Si vous avez ces éléments prêts, plongeons‑y. + +![Exemple de reconnaissance d'écriture manuscrite](/images/handwritten-sample.jpg "Exemple de reconnaissance d'écriture manuscrite") + +*(Texte alternatif : « exemple de reconnaissance d'écriture manuscrite montrant une note manuscrite numérisée »)* + +--- + +## Étape 1 – Installer et importer les classes Aspose OCR + +Tout d’abord, nous avons besoin du moteur OCR lui‑même. Aspose fournit une API claire qui sépare la reconnaissance de texte imprimé du mode manuscrit. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Pourquoi c’est important :* L’importation de `HandwritingMode` permet d’indiquer au moteur que nous traitons de la **reconnaissance de texte manuscrit python** plutôt que du texte imprimé, ce qui améliore considérablement la précision pour les traits cursifs. + +--- + +## Étape 2 – Créer et configurer le moteur OCR + +Nous créons maintenant une instance `OcrEngine` et la passons en mode manuscrit. Vous pouvez également ajuster le seuil de confiance ; des valeurs plus basses acceptent une écriture tremblante, des valeurs plus hautes exigent une entrée plus nette. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Astuce :* Si vos notes sont numérisées à 300 DPI ou plus, vous obtiendrez généralement un meilleur score. Pour les images à basse résolution, envisagez de les agrandir avec Pillow avant de les transmettre au moteur. + +--- + +## Étape 3 – Préparer le chemin de l’image + +Assurez‑vous que le chemin de fichier pointe vers l’image que vous voulez traiter. Les chemins relatifs fonctionnent bien, mais les chemins absolus évitent les surprises « fichier introuvable ». + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Écueil fréquent :* Oublier d’échapper les antislashs sous Windows (`C:\\folder\\image.jpg`). Utiliser des chaînes brutes (`r"C:\folder\image.jpg"`) contourne ce problème. + +--- + +## Étape 4 – Exécuter la reconnaissance et capturer les résultats + +La méthode `recognize` fait le gros du travail. Elle renvoie un objet avec les propriétés `.text` et `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Sortie attendue (exemple) :** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Si la confiance chute en dessous de 0,5, il peut être nécessaire de nettoyer l’image (supprimer les ombres, augmenter le contraste) ou de baisser le seuil à l’étape 2. + +--- + +## Étape 5 – Nettoyer les ressources + +Aspose OCR conserve des ressources natives ; appeler `dispose()` les libère et empêche les fuites de mémoire, surtout lors du traitement de nombreuses images dans une boucle. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Pourquoi disposer ?* Dans les services de longue durée (par ex. une API Flask qui accepte des téléchargements), oublier de libérer les ressources peut rapidement épuiser la mémoire du système. + +--- + +## Script complet – Exécution en un clic + +En rassemblant le tout, voici un script autonome que vous pouvez copier‑coller et exécuter. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Enregistrez‑le sous le nom `handwritten_ocr.py` et lancez `python handwritten_ocr.py`. Si tout est correctement configuré, le texte extrait s’affichera dans la console. + +--- + +## Gestion des cas limites et variations courantes + +### Images à faible contraste +Si le fond se confond avec l’encre, augmentez d’abord le contraste : + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Notes inclinées +Une page de cahier penchée peut perturber la reconnaissance. Utilisez Pillow pour redresser : + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDFs multi‑pages +Aspose OCR peut également gérer les pages PDF, mais vous devez d’abord convertir chaque page en image (par ex. avec `pdf2image`). Puis parcourez les images avec la même fonction `recognize_handwriting`. + +--- + +## Astuces pro pour de meilleurs résultats d’**extraction de texte manuscrit** + +- **Le DPI compte :** Visez 300 DPI ou plus lors de la numérisation. +- **Évitez les fonds colorés :** Le blanc pur ou le gris clair donnent la sortie la plus propre. +- **Traitement par lots :** Enveloppez la fonction dans une boucle `for` et consignez la confiance de chaque page ; rejetez les résultats en dessous d’un seuil pour maintenir une haute qualité. +- **Support linguistique :** Aspose OCR prend en charge plusieurs langues ; définissez `engine.set_language("en")` pour une optimisation uniquement en anglais. + +--- + +## Questions fréquentes + +**Cela fonctionne‑t‑il sous Linux ?** +Oui — Aspose OCR fournit des binaires natifs pour Windows, macOS et Linux. Installez simplement le package pip et c’est prêt. + +**Et si mon écriture est extrêmement cursive ?** +Essayez de baisser le seuil de confiance (`0.5` voire `0.4`). Gardez à l’esprit que cela peut introduire plus de bruit, donc post‑traitez la sortie (par ex. correction orthographique) si nécessaire. + +**Puis‑je l’utiliser dans un service web ?** +Absolument. La fonction `recognize_handwriting` est sans état, ce qui la rend idéale pour des points de terminaison Flask ou FastAPI. N’oubliez pas d’appeler `dispose()` après chaque requête ou d’utiliser un gestionnaire de contexte. + +--- + +## Conclusion + +Nous avons couvert **comment reconnaître l'écriture manuscrite** en Python du début à la fin, en vous montrant comment **extraire du texte manuscrit**, ajuster les paramètres de confiance et gérer les problèmes courants comme le faible contraste ou les pages inclinées. Le script complet ci‑dessus est prêt à être exécuté, et la fonction modulaire facilite son intégration dans des projets plus vastes — que vous construisiez une application de prise de notes, numérisiez des archives ou que vous expérimentiez simplement avec des techniques **handwritten ocr tutorial python**. + +Ensuite, vous pourrez explorer la **reconnaissance de texte manuscrit python** pour des notes multilingues, ou combiner l’OCR avec le traitement du langage naturel pour résumer automatiquement les comptes‑rendus de réunion. Le ciel est la limite — essayez et laissez votre code donner vie aux griffonnages. + +Bon codage, et n’hésitez pas à poser vos questions dans les commentaires ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/french/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..6e65e22ab --- /dev/null +++ b/ocr/french/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,182 @@ +--- +category: general +date: 2026-04-29 +description: Apprenez à exécuter la reconnaissance optique de caractères sur vos numérisations, + à utiliser automatiquement le modèle Hugging Face et à reconnaître le texte des + numérisations avec Aspose OCR en quelques minutes. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: fr +og_description: Comment exécuter la reconnaissance optique de caractères sur des numérisations + avec Aspose OCR, télécharger automatiquement un modèle Hugging Face et obtenir un + texte propre et ponctué. +og_title: Comment exécuter l'OCR avec Aspose et Hugging Face – Guide complet +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Comment exécuter l’OCR avec Aspose et Hugging Face – Guide complet +url: /fr/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Comment exécuter l'OCR avec Aspose & Hugging Face – Guide complet + +Vous vous êtes déjà demandé **comment exécuter l'OCR** sur une pile de documents numérisés sans passer des heures à ajuster les paramètres ? Vous n'êtes pas seul. Dans de nombreux projets, les développeurs doivent **reconnaître du texte à partir de scans** rapidement, mais ils se heurtent aux téléchargements de modèles et au post‑traitement. + +Bonne nouvelle : ce tutoriel vous montre une solution prête à l’emploi qui **utilise un modèle Hugging Face**, le télécharge automatiquement, et ajoute la ponctuation afin que la sortie ressemble à un texte rédigé par un humain. À la fin, vous disposerez d’un script qui traite chaque image d’un dossier et crée un fichier `.txt` propre à côté de chaque scan. + +## Ce dont vous avez besoin + +- Python 3.8+ (le code utilise les f‑strings, les versions antérieures ne fonctionneront pas) +- Le package `aspose-ocr` (installez‑le avec `pip install aspose-ocr`) +- Un accès Internet pour le premier téléchargement du modèle +- Un dossier contenant des scans d’images (`.png`, `.jpg` ou `.tif`) + +C’est tout—pas de binaires supplémentaires, pas de manipulation manuelle de modèle. Plongeons‑y. + +![exemple d'exécution d'OCR](https://example.com/ocr-demo.png "exemple d'exécution d'OCR") + +## Étape 1 : Importer les classes Aspose OCR & configurer l’environnement + +Nous commençons par extraire les classes nécessaires de la bibliothèque Aspose OCR. Importer tout d’un coup garde le script propre et facilite la détection des dépendances manquantes. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Pourquoi c’est important* : `OcrEngine` effectue le travail lourd, tandis que `AsposeAI` nous permet de brancher un grand modèle de langage pour un post‑traitement plus intelligent. Si vous oubliez l’import, le reste du code ne compilera même pas—ne l’oubliez donc pas. + +## Étape 2 : Configurer un modèle Hugging Face compatible GPU + +Nous indiquons maintenant à Aspose où récupérer le modèle et combien de couches doivent s’exécuter sur le GPU. Le drapeau `allow_auto_download="true"` assure le **téléchargement automatique du modèle** pour vous. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Astuce** : Si vous n’avez pas de GPU, définissez `gpu_layers=0`. Le modèle reviendra alors au CPU, ce qui est plus lent mais fonctionne tout de même. + +### Pourquoi choisir un modèle Hugging Face ? + +Hugging Face héberge une vaste collection de LLM prêts à l’emploi. En pointant vers `Qwen/Qwen2.5-3B-Instruct-GGUF`, vous obtenez un modèle compact, ajusté pour les instructions, capable d’ajouter de la ponctuation, de corriger les espaces et même de réparer de petites erreurs d’OCR. C’est l’essence même de **l’utilisation d’un modèle Hugging Face** en pratique. + +## Étape 3 : Initialiser le moteur IA et activer le post‑traitement de ponctuation + +Le moteur IA ne sert pas uniquement aux chats sophistiqués—ici nous y attachons un *ajouteur de ponctuation* qui nettoie la sortie brute de l’OCR. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Que se passe‑t‑il ?* L’appel `set_post_processor` enregistre un post‑processeur intégré qui s’exécute après la fin du moteur OCR. Il prend la chaîne brute et insère des virgules, points et majuscules aux bons endroits, rendant le texte final beaucoup plus lisible. + +## Étape 4 : Créer le moteur OCR et y attacher le moteur IA + +Connecter le moteur IA au moteur OCR nous donne un seul objet capable à la fois de lire les caractères et de peaufiner le résultat. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Si vous sautez cette étape, l’OCR fonctionnera toujours, mais vous perdrez l’apport de ponctuation—le résultat ressemblera alors à un flot de mots. + +## Étape 5 : Traiter chaque image d’un dossier + +Voici le cœur du tutoriel. Nous parcourons chaque image, exécutons l’OCR, appliquons le post‑processeur, puis écrivons le texte nettoyé dans un fichier `.txt` parallèle. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### À quoi s’attendre + +L’exécution du script affiche quelque chose comme : + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Chaque ligne indique le score de confiance (un rapide contrôle de santé) et crée `invoice_001.png.txt`, `receipt_2024.tif.txt`, etc., contenant du texte ponctué et lisible par un humain. + +### Cas limites & variantes + +- **Scans non‑anglais** : changez le `hugging_face_repo_id` vers un modèle multilingue (par ex. `microsoft/Multilingual-LLM-GGUF`). +- **Grandes quantités** : encapsulez la boucle dans un `concurrent.futures.ThreadPoolExecutor` pour un traitement parallèle, mais surveillez les limites de mémoire GPU. +- **Post‑traitement personnalisé** : remplacez `"punctuation_adder"` par votre propre script si vous avez besoin d’un nettoyage spécifique au domaine (par ex. suppression de numéros de facture). + +## Étape 6 : Nettoyer les ressources + +Lorsque le travail se termine, libérer les ressources évite les fuites de mémoire, ce qui est crucial si vous exécutez cela dans un service de longue durée. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Négliger cette étape peut laisser de la mémoire GPU occupée, ce qui compromettrait les exécutions suivantes. + +## Récapitulatif : Comment exécuter l'OCR de bout en bout + +En quelques lignes seulement, nous avons montré **comment exécuter l'OCR** sur un dossier de scans, **utiliser un modèle Hugging Face** qui se télécharge automatiquement la première fois, et **reconnaître du texte à partir de scans** avec ponctuation ajoutée automatiquement. Le script complet est prêt à être copié‑collé, à ajuster vos chemins, et à exécuter. + +## Prochaines étapes & sujets associés + +- **Post‑traitement par lots** : explorez `ocr_engine.run_batch_postprocessor` pour un traitement en masse encore plus rapide. +- **Modèles alternatifs** : essayez la famille `openai/whisper` si vous avez besoin de reconnaissance vocale en plus de l’OCR. +- **Intégration avec des bases de données** : stockez le texte extrait dans SQLite ou Elasticsearch pour des archives consultables. + +N’hésitez pas à expérimenter—changez de modèle, ajustez `gpu_layers`, ou ajoutez votre propre post‑processeur. La flexibilité d’Aspose OCR combinée au hub de modèles de Hugging Face en fait une base polyvalente pour tout projet de numérisation de documents. + +--- + +*Bon codage ! Si vous rencontrez un problème, laissez un commentaire ci‑dessous ou consultez la documentation Aspose OCR pour des options de configuration plus avancées.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/french/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..bd8cdc543 --- /dev/null +++ b/ocr/french/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,191 @@ +--- +category: general +date: 2026-04-29 +description: Effectuer une OCR sur une image avec Python, télécharger automatiquement + un modèle HuggingFace et libérer efficacement la mémoire GPU tout en nettoyant le + texte OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: fr +og_description: Apprenez à réaliser de l'OCR sur une image en Python, à télécharger + automatiquement un modèle HuggingFace, à nettoyer le texte et à libérer la mémoire + GPU. +og_title: Effectuer la reconnaissance optique de caractères (OCR) sur une image avec + Python – Guide étape par étape +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Effectuer l'OCR sur une image avec Python – Guide complet +url: /fr/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Effectuer la reconnaissance optique de caractères (OCR) sur une image avec Python – Guide complet + +Vous avez déjà eu besoin d'**effectuer une OCR sur une image** mais vous êtes bloqué au stade du téléchargement du modèle ou du nettoyage de la mémoire GPU ? Vous n'êtes pas le seul — de nombreux développeurs rencontrent ce problème lorsqu'ils essaient pour la première fois de combiner la reconnaissance optique de caractères avec de grands modèles de langage. + +Dans ce tutoriel, nous allons parcourir une solution unique, de bout en bout, qui **télécharge un modèle HuggingFace en Python**, exécute Aspose OCR, nettoie la sortie brute, et enfin **libère la mémoire GPU que Python peut récupérer**. À la fin, vous disposerez d'un script prêt à l'emploi qui transforme un PNG numérisé en texte poli et interrogeable. + +> **Ce que vous obtiendrez :** un exemple de code complet et exécutable, des explications sur l'importance de chaque étape, des conseils pour éviter les pièges courants, et un aperçu de la façon d'ajuster le pipeline pour vos propres projets. + +## Ce dont vous avez besoin + +- Python 3.9 ou plus récent (l'exemple a été testé sur 3.11) +- paquet `aspose-ocr` (installer via `pip install aspose-ocr`) +- Une connexion Internet pour l'étape **download HuggingFace model python** +- Un GPU compatible CUDA si vous souhaitez le gain de vitesse (optionnel mais recommandé) + +Aucune dépendance système supplémentaire n'est requise ; le moteur Aspose OCR regroupe tout ce dont vous avez besoin. + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Texte alternatif de l'image : « perform OCR on image – sortie Aspose OCR avant et après le nettoyage IA »* + +## Effectuer une OCR sur une image – Vue d'ensemble étape par étape + +Ci-dessous, nous décomposons le flux de travail en sections logiques. Chaque section possède son propre titre, permettant aux assistants IA de sauter rapidement à la partie qui vous intéresse, et aux moteurs de recherche d'indexer les mots‑clés pertinents. + +### 1. Télécharger le modèle HuggingFace en Python + +La première chose à faire est de récupérer un modèle de langue qui servira de post‑processeur pour la sortie brute de l'OCR. Aspose OCR est fourni avec une classe d'aide appelée `AsposeAI` qui peut automatiquement télécharger un modèle depuis le hub HuggingFace. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Pourquoi c'est important :** +- **download HuggingFace model python** – vous évitez de gérer manuellement les fichiers zip ou l'authentification par jeton. +- L'utilisation de la quantification `int8` réduit le modèle à environ un quart de sa taille originale, ce qui est crucial lorsque vous devez ensuite **release GPU memory python**. + +> **Astuce :** Conservez `directory_model_path` sur un SSD pour des temps de chargement plus rapides. + +### 2. Initialiser l'aide IA et activer la correction orthographique + +Nous créons maintenant une instance `AsposeAI` et y attachons un post‑processeur correcteur orthographique. C'est ici que la magie du **clean OCR text python** commence. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explication :** +Le correcteur orthographique examine chaque token du moteur OCR et propose des modifications limitées par `max_edits`. Cette petite astuce peut transformer “rec0gn1tion” en “recognition” sans recourir à un modèle de langue lourd. + +### 3. Connecter l'aide IA au moteur OCR + +Aspose a introduit une nouvelle méthode dans la version 23.4 qui vous permet d'intégrer directement un moteur IA dans le pipeline OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Pourquoi le faire :** +En branchant l'aide IA dès le départ, le moteur OCR peut éventuellement utiliser le modèle pour des améliorations en temps réel (par ex., détection de mise en page). Cela garde également le code propre — pas besoin de boucles de post‑traitement séparées plus tard. + +### 4. Effectuer une OCR sur l'image numérisée + +Voici l'étape principale qui **effectue réellement une OCR sur des fichiers image**. Remplacez `YOUR_DIRECTORY/input.png` par le chemin de votre propre numérisation. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +La sortie brute typique peut contenir des sauts de ligne à des endroits étranges, des caractères mal reconnus ou des symboles parasites. C’est pourquoi nous avons besoin de l’étape suivante. + +**Sortie brute attendue (exemple) :** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +### 5. Nettoyer le texte OCR en Python avec le post‑processeur IA + +Nous laissons maintenant l'IA nettoyer le désordre. C'est le cœur du processus **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Résultat que vous verrez :** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Remarquez comment le correcteur orthographique a corrigé le “Th1s” → “This” et a supprimé le “4n” parasite. Le modèle normalise également les espaces, ce qui est souvent un point douloureux lorsque vous alimentez plus tard le texte dans des pipelines NLP en aval. + +### 6. Libérer la mémoire GPU en Python – Étapes de nettoyage + +Lorsque vous avez terminé, il est recommandé de libérer les ressources GPU, surtout si vous exécutez plusieurs tâches OCR dans un service de longue durée. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Ce qui se passe en coulisses :** +`free_resources()` décharge le modèle du GPU, retournant la mémoire au pilote CUDA. `dispose()` ferme les tampons internes du moteur OCR. Ignorer ces appels peut entraîner des erreurs de dépassement de mémoire après seulement quelques images. + +> **Rappel :** Si vous prévoyez de traiter des lots dans une boucle, appelez le nettoyage après chaque lot ou réutilisez le même `ai_helper` sans le libérer jusqu’à la toute fin. + +## Bonus : Ajuster le pipeline pour différents scénarios + +### Ajuster la quantification du modèle + +Si vous disposez d'un GPU puissant (par ex., RTX 4090) et souhaitez une précision supérieure, changez `hugging_face_quantization` en `"fp16"` et augmentez `gpu_layers` à `30`. Cela consommera plus de mémoire, vous devrez donc **release GPU memory python** de façon plus agressive après chaque lot. + +### Utiliser un correcteur orthographique personnalisé + +Vous pouvez remplacer le `spell_corrector` intégré par un post‑processeur personnalisé qui effectue des corrections spécifiques à un domaine (par ex., terminologie médicale). Il suffit d'implémenter l'interface requise et de passer son nom à `set_post_processor`. + +### Traitement par lots de plusieurs images + +Enveloppez les étapes OCR dans une boucle `for`, collectez `cleaned_result.text` dans une liste, et appelez `ai_helper.free_resources()` uniquement après la boucle si vous disposez de suffisamment de RAM GPU. Cela réduit la surcharge liée au chargement répété du modèle. + +## Conclusion + +Nous venons de vous montrer comment **effectuer une OCR sur des fichiers image** en Python, télécharger automatiquement un **modèle HuggingFace**, **nettoyer le texte OCR**, et libérer en toute sécurité la **mémoire GPU** une fois terminé. Le script complet est prêt à être copié‑collé, et les explications vous donnent la confiance nécessaire pour l'adapter à des projets plus importants. + +Etapes suivantes ? Essayez de remplacer le modèle Qwen 2.5 par une variante LLaMA plus grande, expérimentez différents post‑processeurs, ou intégrez la sortie nettoyée dans un index Elasticsearch interrogeable. Les possibilités sont infinies, et vous disposez désormais d'une base solide sur laquelle construire. + +Bon codage, et que vos pipelines OCR restent toujours propres et économes en mémoire ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/german/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..755eb9318 --- /dev/null +++ b/ocr/german/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Extrahieren Sie Text aus PDFs mit Aspose OCR in Python. Erfahren Sie, + wie Sie die Batch‑OCR‑PDF‑Verarbeitung durchführen, gescannte PDF‑Texte konvertieren + und Seiten mit geringer Vertrauenswürdigkeit behandeln. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: de +og_description: Extrahieren Sie Text aus PDF mit Aspose OCR in Python. Dieser Leitfaden + zeigt die stapelweise OCR‑PDF‑Verarbeitung, das Konvertieren von gescanntem PDF‑Text + und den Umgang mit Ergebnissen geringer Vertrauenswürdigkeit. +og_title: Text aus PDF extrahieren – PDF mit OCR in Python +tags: +- OCR +- Python +- PDF processing +title: Text aus PDF extrahieren – PDF mit OCR in Python +url: /de/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Text aus PDF extrahieren – OCR PDF mit Python + +Haben Sie jemals **Text aus PDF** extrahieren müssen, aber die Datei ist nur ein gescanntes Bild? Sie sind nicht allein – viele Entwickler stoßen an diese Grenze, wenn sie PDFs in durchsuchbare Daten umwandeln wollen. Die gute Nachricht? Mit Aspose OCR für Python können Sie gescannten PDF‑Text in wenigen Zeilen konvertieren und sogar **Batch-OCR-PDF-Verarbeitung** durchführen, wenn Sie Dutzende von Dateien zu bearbeiten haben. + +In diesem Tutorial führen wir Sie durch den gesamten Workflow: Bibliothek einrichten, OCR auf einer einzelnen PDF ausführen, auf ein Batch skalieren und mit Seiten niedriger Vertrauenswürdigkeit umgehen, sodass Sie wissen, wann eine manuelle Überprüfung erforderlich ist. Am Ende haben Sie ein sofort einsatzbereites Skript, das Text aus jeder gescannten PDF extrahiert, und Sie verstehen das „Warum“ hinter jedem Schritt. + +## Was Sie benötigen + +Bevor wir starten, stellen Sie sicher, dass Sie Folgendes haben: + +- Python 3.8 oder neuer (der Code verwendet f‑Strings, daher funktioniert 3.6+, aber 3.8+ wird empfohlen) +- Eine Aspose OCR für Python Lizenz oder ein kostenloser Testschlüssel (Sie können einen von der Aspose‑Website erhalten) +- Ein Ordner mit einem oder mehreren gescannten PDFs, die Sie verarbeiten möchten +- Ein bescheidener Speicherplatz für die erzeugten *.txt*-Berichte + +Das war’s – keine schweren externen Abhängigkeiten, kein OpenCV‑Gymnastik. Die Aspose‑OCR‑Engine übernimmt die schwere Arbeit für Sie. + +## Umgebung einrichten + +Zuerst installieren Sie das Aspose OCR‑Paket von PyPI: + +```bash +pip install aspose-ocr +``` + +Wenn Sie eine Lizenzdatei (`Aspose.OCR.lic`) haben, legen Sie sie im Projekt‑Root ab und aktivieren Sie sie wie folgt: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Profi‑Tipp:** Halten Sie die Lizenzdatei außerhalb der Versionskontrolle; fügen Sie sie zu `.gitignore` hinzu, um versehentliche Offenlegung zu vermeiden. + +## OCR auf einer einzelnen PDF ausführen + +Jetzt extrahieren wir Text aus einer einzelnen gescannten PDF. Die Kernschritte sind: + +1. Erstellen Sie eine `OcrEngine`‑Instanz. +2. Zeigen Sie sie auf die PDF‑Datei. +3. Rufen Sie ein `OcrResult` für jede Seite ab. +4. Schreiben Sie die Klartext‑Ausgabe auf die Festplatte. +5. Entsorgen Sie die Engine, um native Ressourcen freizugeben. + +Hier ist das vollständige, ausführbare Skript: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Was Sie sehen werden:** Für jede Seite gibt das Skript etwas wie `Page 1: confidence 97.45%` aus. Fällt eine Seite unter die 80 %-Schwelle, erscheint eine Warnung, die Sie darauf hinweist, dass das OCR Zeichen möglicherweise verpasst hat. + +### Warum das funktioniert + +- **`OcrEngine`** ist das Tor zur nativen Aspose‑OCR‑Bibliothek; sie übernimmt alles von der Bildvorverarbeitung bis zur Zeichenerkennung. +- **`extract_from_pdf`** rasterisiert jede PDF‑Seite automatisch, sodass Sie das PDF nicht selbst in Bilder umwandeln müssen. +- **Vertrauens‑Scores** ermöglichen automatisierte Qualitätsprüfungen – entscheidend, wenn Sie rechtliche oder medizinische Dokumente verarbeiten, bei denen Genauigkeit wichtig ist. + +## Batch-OCR-PDF-Verarbeitung mit Python + +Die meisten realen Projekte umfassen mehr als eine Datei. Wir erweitern das Einzel‑Datei‑Skript zu einer **Batch-OCR-PDF-Verarbeitung**‑Pipeline, die ein Verzeichnis durchläuft, jede PDF verarbeitet und die Ergebnisse in einem passenden Unterordner speichert. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Wie das hilft + +- **Skalierbarkeit:** Die Funktion durchläuft den Ordner einmal und erstellt für jede PDF einen eigenen Ausgabe‑Unterordner. Das hält die Dinge ordentlich, wenn Sie Dutzende von Dokumenten haben. +- **Wiederverwendbarkeit:** `ocr_pdf_file` kann aus anderen Skripten (z. B. einem Web‑Service) aufgerufen werden, weil es eine reine Funktion ist. +- **Fehlerbehandlung:** Das Skript gibt eine freundliche Meldung aus, wenn der Eingabe‑Ordner leer ist, und verhindert so ein stilles Scheitern. + +## Gescannten PDF-Text konvertieren – Sonderfälle behandeln + +Während der obige Code für die meisten PDFs funktioniert, können Ihnen ein paar Eigenheiten begegnen: + +| Situation | Warum es passiert | Wie man es mildert | +|-----------|-------------------|--------------------| +| **Verschlüsselte PDFs** | Das PDF ist passwortgeschützt. | Übergeben Sie das Passwort an `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Mehrsprachige Dokumente** | Aspose OCR verwendet standardmäßig Englisch. | Setzen Sie `ocr_engine.language = "spa"` für Spanisch oder geben Sie eine Liste für gemischte Sprachen an. | +| **Sehr große PDFs (>500 Seiten)** | Der Speicherverbrauch steigt, weil jede Seite in den RAM geladen wird. | Verarbeiten Sie das PDF in Teilen mit `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` und einer Schleife. | +| **Schlechte Scan‑Qualität** | Niedrige DPI oder starkes Rauschen reduziert das Vertrauen. | Vorverarbeiten Sie das PDF mit `engine.image_preprocessing = True` oder erhöhen Sie die DPI über `engine.dpi = 300`. | + +> **Achtung:** Das Einschalten der Bildvorverarbeitung kann die CPU‑Zeit deutlich erhöhen. Wenn Sie ein nächtliches Batch‑Job laufen lassen, planen Sie genug Zeit ein oder starten Sie einen separaten Worker. + +## Ausgabe überprüfen + +Nach Abschluss des Skripts finden Sie eine Ordnerstruktur wie: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Öffnen Sie eine beliebige `.txt`‑Datei; Sie sollten sauberen, UTF‑8‑kodierten Text sehen, der den ursprünglichen gescannten Inhalt widerspiegelt. Wenn Sie verzerrte Zeichen bemerken, überprüfen Sie die Spracheinstellungen des PDFs und stellen Sie sicher, dass die richtigen Schriftpakete auf dem Rechner installiert sind. + +## Ressourcen bereinigen + +Aspose OCR nutzt native DLLs, daher ist es wichtig, `engine.dispose()` aufzurufen, sobald Sie fertig sind. Das Vergessen dieses Schrittes kann zu Speicherlecks führen, besonders bei lang laufenden Batch‑Jobs. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Vollständiges End‑zu‑End‑Beispiel + +Alles zusammengeführt, hier ein einzelnes + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/german/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..4b2acbfb3 --- /dev/null +++ b/ocr/german/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,279 @@ +--- +category: general +date: 2026-04-29 +description: Erfahren Sie, wie Sie Handschrift in Python mit Aspose OCR erkennen. + Diese Schritt‑für‑Schritt‑Anleitung zeigt, wie Sie handgeschriebenen Text effizient + extrahieren. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: de +og_description: Wie erkennt man Handschrift in Python? Folgen Sie diesem umfassenden + Leitfaden, um handgeschriebenen Text mit Aspose OCR zu extrahieren, inklusive Code, + Tipps und dem Umgang mit Randfällen. +og_title: Wie man Handschrift in Python erkennt – Vollständiges Tutorial +tags: +- OCR +- Python +- HandwritingRecognition +title: Wie man Handschrift in Python erkennt – Vollständiges Tutorial +url: /de/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Wie man Handschrift in Python erkennt – Vollständiges Tutorial + +Haben Sie schon einmal **wie man Handschrift erkennt** in einem Python‑Projekt gebraucht, wussten aber nicht, wo Sie anfangen sollen? Sie sind nicht allein – Entwickler fragen ständig: „Kann ich Text aus einer gescannten Notiz extrahieren?“ Die gute Nachricht: Moderne OCR‑Bibliotheken machen das zum Kinderspiel. In diesem Leitfaden zeigen wir Ihnen **wie man Handschrift erkennt** mit Aspose OCR, und Sie lernen außerdem, **handgeschriebenen Text zuverlässig zu extrahieren**. + +Wir decken alles ab, von der Installation der Bibliothek bis zum Anpassen von Confidence‑Schwellenwerten für unordentliche Kurrentschriften. Am Ende haben Sie ein ausführbares Skript, das den extrahierten Text und einen Gesamtscore ausgibt – perfekt für Notiz‑Apps, Archivierungs‑Tools oder einfach aus Neugier. Vorherige OCR‑Erfahrung ist nicht nötig; Grundkenntnisse in Python reichen aus. + +--- + +## Was Sie benötigen + +- **Python 3.9+** (die neueste stabile Version funktioniert am besten) +- **Aspose.OCR for Python via .NET** – installieren Sie es mit `pip install aspose-ocr` +- Ein **handgeschriebenes Bild** (JPEG/PNG), das Sie verarbeiten möchten +- Optional: eine virtuelle Umgebung, um Abhängigkeiten sauber zu halten + +Wenn Sie diese Punkte bereit haben, legen wir los. + +![Beispiel für Handschriftenerkennung](/images/handwritten-sample.jpg "Beispiel für Handschriftenerkennung") + +*(Alt‑Text: “Beispiel für Handschriftenerkennung, das eine gescannte handgeschriebene Notiz zeigt”)* + +--- + +## Schritt 1 – Installieren und Importieren der Aspose OCR Klassen + +Zuerst benötigen wir die OCR‑Engine selbst. Aspose bietet eine klare API, die die Erkennung von Drucktext von der handschriftlichen Mode trennt. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Warum das wichtig ist:* Durch das Importieren von `HandwritingMode` teilen wir der Engine mit, dass wir **handwritten text recognition python** durchführen wollen und nicht Drucktext, was die Genauigkeit bei Kurrentschriften erheblich steigert. + +--- + +## Schritt 2 – Erstellen und Konfigurieren der OCR‑Engine + +Jetzt erzeugen wir eine `OcrEngine`‑Instanz und schalten sie in den handschriftlichen Modus. Sie können außerdem den Confidence‑Schwellenwert anpassen; niedrigere Werte akzeptieren wackelige Schrift, höhere Werte verlangen sauberere Eingaben. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Pro‑Tipp:* Wenn Ihre Notizen mit 300 DPI oder höher gescannt wurden, erhalten Sie in der Regel ein besseres Ergebnis. Bei niedrig aufgelösten Bildern sollten Sie vor dem Einspeisen in die Engine mit Pillow hochskalieren. + +--- + +## Schritt 3 – Bildpfad vorbereiten + +Stellen Sie sicher, dass der Dateipfad auf das Bild zeigt, das Sie verarbeiten wollen. Relative Pfade funktionieren, aber absolute Pfade vermeiden „Datei nicht gefunden“-Überraschungen. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Häufiges Stolperfeld:* Das Vergessen, Backslashes unter Windows zu escapen (`C:\\folder\\image.jpg`). Die Verwendung von rohen Strings (`r"C:\folder\image.jpg"`) umgeht dieses Problem. + +--- + +## Schritt 4 – Erkennung ausführen und Ergebnisse erfassen + +Die Methode `recognize` erledigt die schwere Arbeit. Sie gibt ein Objekt mit den Eigenschaften `.text` und `.confidence` zurück. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Erwartete Ausgabe (Beispiel):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Fällt die Confidence unter 0,5, müssen Sie das Bild eventuell säubern (Schatten entfernen, Kontrast erhöhen) oder den Schwellenwert in Schritt 2 senken. + +--- + +## Schritt 5 – Ressourcen aufräumen + +Aspose OCR hält native Ressourcen; das Aufrufen von `dispose()` gibt sie frei und verhindert Speicherlecks, besonders beim Verarbeiten vieler Bilder in einer Schleife. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Warum dispose?* In langlaufenden Diensten (z. B. einer Flask‑API, die Uploads entgegennimmt) kann das Vergessen, Ressourcen freizugeben, schnell den Systemspeicher erschöpfen. + +--- + +## Vollständiges Skript – Ein‑Klick‑Ausführung + +Alles zusammengeführt, hier ein eigenständiges Skript, das Sie kopieren und ausführen können. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Speichern Sie dies als `handwritten_ocr.py` und führen Sie `python handwritten_ocr.py` aus. Wenn alles korrekt eingerichtet ist, sehen Sie den extrahierten Text in der Konsole. + +--- + +## Umgang mit Randfällen und gängigen Variationen + +### Bilder mit geringem Kontrast +Wenn der Hintergrund in die Tinte übergeht, erhöhen Sie zuerst den Kontrast: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Gedrehte Notizen +Eine schiefe Notizenseite kann die Erkennung stören. Nutzen Sie Pillow zum Entzerren: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Mehrseitige PDFs +Aspose OCR kann auch PDF‑Seiten verarbeiten, Sie müssen jedoch jede Seite zuerst in ein Bild konvertieren (z. B. mit `pdf2image`). Dann iterieren Sie über die Bilder mit derselben `recognize_handwriting`‑Funktion. + +--- + +## Pro‑Tipps für bessere **Extract Handwritten Text**‑Ergebnisse + +- **DPI ist entscheidend:** Zielwert 300 DPI oder höher beim Scannen. +- **Vermeiden Sie farbige Hintergründe:** Reines Weiß oder helles Grau liefert das sauberste Ergebnis. +- **Batch‑Verarbeitung:** Packen Sie die Funktion in eine `for`‑Schleife und protokollieren Sie die Confidence jeder Seite; verwerfen Sie Ergebnisse unter einem Schwellenwert, um die Qualität hoch zu halten. +- **Sprachunterstützung:** Aspose OCR unterstützt mehrere Sprachen; setzen Sie `engine.set_language("en")` für eine Optimierung nur für Englisch. + +--- + +## Häufig gestellte Fragen + +**Funktioniert das unter Linux?** +Ja – Aspose OCR wird mit nativen Binaries für Windows, macOS und Linux geliefert. Installieren Sie einfach das Pip‑Paket und Sie sind startklar. + +**Was, wenn meine Handschrift extrem kursive ist?** +Versuchen Sie, den Confidence‑Schwellenwert zu senken (`0.5` oder sogar `0.4`). Beachten Sie, dass dadurch mehr Rauschen entstehen kann, also ggf. eine Nachbearbeitung (z. B. Rechtschreibprüfung) durchführen. + +**Kann ich das in einem Web‑Service nutzen?** +Absolut. Die Funktion `recognize_handwriting` ist zustandslos und eignet sich perfekt für Flask‑ oder FastAPI‑Endpoints. Denken Sie nur daran, nach jeder Anfrage `dispose()` aufzurufen oder einen Context‑Manager zu verwenden. + +--- + +## Fazit + +Wir haben **wie man Handschrift in Python erkennt** von Anfang bis Ende behandelt, Ihnen gezeigt, wie Sie **handgeschriebenen Text extrahieren**, Confidence‑Einstellungen anpassen und gängige Stolperfallen wie niedrigen Kontrast oder schiefe Seiten bewältigen. Das komplette Skript oben ist sofort einsatzbereit, und die modulare Funktion lässt sich leicht in größere Projekte integrieren – egal, ob Sie eine Notiz‑App bauen, Archive digitalisieren oder einfach mit **handwritten ocr tutorial python** experimentieren. + +Als Nächstes könnten Sie **handwritten text recognition python** für mehrsprachige Notizen erkunden oder OCR mit Natural‑Language‑Processing kombinieren, um Meeting‑Protokolle automatisch zusammenzufassen. Der Himmel ist die Grenze – probieren Sie es aus und lassen Sie Ihren Code den Kritzeleien Leben einhauchen. + +Viel Spaß beim Coden, und hinterlassen Sie gern Ihre Fragen in den Kommentaren! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/german/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..eedf5c3dd --- /dev/null +++ b/ocr/german/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Erfahren Sie, wie Sie OCR auf Ihren Scans ausführen, das Hugging‑Face‑Modell + automatisch nutzen und Text aus Scans mit Aspose OCR in wenigen Minuten erkennen. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: de +og_description: Wie man OCR auf Scans mit Aspose OCR ausführt, automatisch ein Hugging + Face‑Modell herunterlädt und sauberen, punktuierten Text erhält. +og_title: Wie man OCR mit Aspose & Hugging Face ausführt – Komplettanleitung +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Wie man OCR mit Aspose & Hugging Face ausführt – Vollständiger Leitfaden +url: /de/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Wie man OCR mit Aspose & Hugging Face ausführt – Komplettanleitung + +Haben Sie sich schon einmal gefragt, **wie man OCR** auf einem Stapel gescannter Dokumente ausführt, ohne Stunden mit Feineinstellungen zu verbringen? Sie sind nicht allein. In vielen Projekten müssen Entwickler **Text aus Scans erkennen** und das schnell, doch sie stolpern über Modell‑Downloads und Nachbearbeitung. + +Gute Neuigkeiten: Dieses Tutorial zeigt Ihnen eine sofort einsatzbereite Lösung, die **ein Hugging Face‑Modell verwendet**, es automatisch herunterlädt und Interpunktion hinzufügt, sodass die Ausgabe wie von einem Menschen geschrieben wirkt. Am Ende haben Sie ein Skript, das jedes Bild in einem Ordner verarbeitet und eine saubere `.txt`‑Datei neben jedem Scan ablegt. + +## Was Sie benötigen + +- Python 3.8+ (der Code verwendet f‑Strings, ältere Versionen reichen nicht) +- `aspose-ocr`‑Paket (Installation via `pip install aspose-ocr`) +- Internetzugang für den erstmaligen Modell‑Download +- Ein Ordner mit Bild‑Scans (`.png`, `.jpg` oder `.tif`) + +Das war’s – keine zusätzlichen Binärdateien, kein manuelles Modell‑Tuning. Lassen Sie uns loslegen. + +![wie man OCR ausführt Beispiel](https://example.com/ocr-demo.png "wie man OCR ausführt Beispiel") + +## Schritt 1: Aspose OCR‑Klassen importieren & Umgebung einrichten + +Wir beginnen damit, die notwendigen Klassen aus der Aspose OCR‑Bibliothek zu holen. Alles zu Beginn zu importieren hält das Skript übersichtlich und macht fehlende Abhängigkeiten leicht erkennbar. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Warum das wichtig ist*: `OcrEngine` übernimmt die schwere Arbeit, während `AsposeAI` uns ermöglicht, ein großes Sprachmodell für eine intelligentere Nachbearbeitung anzuschließen. Wenn Sie den Import weglassen, kompiliert der Rest des Codes nicht – also nicht vergessen. + +## Schritt 2: Ein GPU‑bewusstes Hugging Face‑Modell konfigurieren + +Jetzt teilen wir Aspose mit, wo das Modell heruntergeladen werden soll und wie viele Schichten auf der GPU laufen sollen. Das Flag `allow_auto_download="true"` übernimmt für Sie den **automatischen Modell‑Download**. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Pro‑Tipp**: Wenn Sie keine GPU haben, setzen Sie `gpu_layers=0`. Das Modell fällt dann auf die CPU zurück, was langsamer ist, aber trotzdem funktioniert. + +### Warum ein Hugging Face‑Modell wählen? + +Hugging Face hostet eine riesige Sammlung sofort einsetzbarer LLMs. Durch den Verweis auf `Qwen/Qwen2.5-3B-Instruct-GGUF` erhalten Sie ein kompaktes, instruktions‑feinabgestimmtes Modell, das Interpunktion hinzufügen, Abstände korrigieren und sogar kleinere OCR‑Fehler beheben kann. Das ist die praktische Umsetzung von **use hugging face model**. + +## Schritt 3: KI‑Engine initialisieren und Interpunktions‑Nachbearbeitung aktivieren + +Die KI‑Engine ist nicht nur für schicke Chats gedacht – hier hängen wir einen *Interpunktions‑Adder* an, der die rohe OCR‑Ausgabe säubert. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Was passiert?* Der Aufruf `set_post_processor` registriert einen eingebauten Nachbearbeiter, der nach Abschluss der OCR‑Engine läuft. Er nimmt den rohen String und fügt Kommas, Punkte und Großbuchstaben an den richtigen Stellen ein, sodass der finale Text deutlich lesbarer wird. + +## Schritt 4: OCR‑Engine erstellen und KI‑Engine anhängen + +Die Verknüpfung von KI‑Engine und OCR‑Engine liefert ein einzelnes Objekt, das sowohl Zeichen lesen als auch das Ergebnis veredeln kann. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Wenn Sie diesen Schritt überspringen, funktioniert die OCR weiterhin, aber Sie verlieren den Interpunktions‑Boost – die Ausgabe sieht dann aus wie ein Strom von Wörtern. + +## Schritt 5: Jede Bilddatei in einem Ordner verarbeiten + +Hier kommt das Herzstück des Tutorials. Wir iterieren über jedes Bild, führen OCR aus, wenden die Nachbearbeitung an und schreiben den bereinigten Text in eine nebeneinander liegende `.txt`‑Datei. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Was Sie erwarten können + +Beim Ausführen des Skripts erscheint etwa Folgendes: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Jede Zeile gibt den Vertrauens‑Score aus (ein schneller Gesundheits‑Check) und erstellt `invoice_001.png.txt`, `receipt_2024.tif.txt` usw., die interpunktierten, menschenlesbaren Text enthalten. + +### Sonderfälle & Variationen + +- **Scans in anderen Sprachen**: Ändern Sie `hugging_face_repo_id` zu einem mehrsprachigen Modell (z. B. `microsoft/Multilingual-LLM-GGUF`). +- **Große Stapel**: Verpacken Sie die Schleife in einen `concurrent.futures.ThreadPoolExecutor` für parallele Verarbeitung, achten Sie jedoch auf GPU‑Speichergrenzen. +- **Eigene Nachbearbeitung**: Ersetzen Sie `"punctuation_adder"` durch Ihr eigenes Skript, wenn Sie domänenspezifische Bereinigungen benötigen (z. B. das Entfernen von Rechnungsnummern). + +## Schritt 6: Ressourcen aufräumen + +Wenn der Job beendet ist, verhindert das Freigeben von Ressourcen Speicher‑Leaks – besonders wichtig, wenn Sie das in einem langlebigen Service ausführen. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Das Ignorieren dieses Schrittes kann GPU‑Speicher belegen und nachfolgende Läufe sabotieren. + +## Zusammenfassung: OCR End‑to‑End ausführen + +In nur wenigen Zeilen haben wir gezeigt, **wie man OCR** auf einem Ordner mit Scans ausführt, **ein Hugging Face‑Modell** nutzt, das sich beim ersten Mal selbst herunterlädt, und **Text aus Scans erkennt** mit automatisch hinzugefügter Interpunktion. Das komplette Skript ist bereit zum Kopieren, Pfade anpassen und Ausführen. + +## Nächste Schritte & verwandte Themen + +- **Batch‑Nachbearbeitung**: Erkunden Sie `ocr_engine.run_batch_postprocessor` für noch schnellere Massenverarbeitung. +- **Alternative Modelle**: Probieren Sie die `openai/whisper`‑Familie, wenn Sie neben OCR auch Sprache‑zu‑Text benötigen. +- **Integration mit Datenbanken**: Speichern Sie den extrahierten Text in SQLite oder Elasticsearch für durchsuchbare Archive. + +Experimentieren Sie gern – tauschen Sie das Modell aus, passen Sie `gpu_layers` an oder fügen Sie Ihre eigene Nachbearbeitung hinzu. Die Flexibilität von Aspose OCR kombiniert mit dem Modell‑Hub von Hugging Face bildet ein vielseitiges Fundament für jedes Dokument‑Digitalisierungsprojekt. + +--- + +*Viel Spaß beim Coden! Wenn Sie auf ein Problem stoßen, hinterlassen Sie einen Kommentar unten oder werfen Sie einen Blick in die Aspose OCR‑Dokumentation für weiterführende Konfigurationsoptionen.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/german/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..51db810f3 --- /dev/null +++ b/ocr/german/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,188 @@ +--- +category: general +date: 2026-04-29 +description: Führe OCR auf einem Bild mit Python durch, lade automatisch ein HuggingFace‑Modell + herunter und gib GPU‑Speicher effizient frei, während du den OCR‑Text bereinigst. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: de +og_description: Erfahren Sie, wie Sie OCR auf Bildern in Python durchführen, ein HuggingFace‑Modell + automatisch herunterladen, den Text bereinigen und GPU‑Speicher freigeben. +og_title: OCR auf Bild mit Python durchführen – Schritt‑für‑Schritt‑Anleitung +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: OCR auf Bild mit Python durchführen – Komplettanleitung +url: /de/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# OCR auf Bild mit Python – Komplettleitfaden + +Haben Sie jemals **OCR auf Bild** Dateien durchführen müssen, sind aber beim Modell‑Download oder der GPU‑Speicher‑Bereinigung hängen geblieben? Sie sind nicht allein – viele Entwickler stoßen an diese Grenze, wenn sie erstmals optische Zeichenerkennung mit großen Sprachmodellen kombinieren. + +In diesem Tutorial führen wir Sie durch eine einzige, durchgängige Lösung, die **ein HuggingFace‑Modell in Python** herunterlädt, Aspose OCR ausführt, die Rohausgabe bereinigt und schließlich **GPU‑Speicher freigibt, den Python zurückgewinnen kann**. Am Ende haben Sie ein einsatzbereites Skript, das ein gescanntes PNG in gepflegten, durchsuchbaren Text verwandelt. + +> **Was Sie erhalten:** ein vollständiges, ausführbares Code‑Beispiel, Erklärungen, warum jeder Schritt wichtig ist, Tipps zur Vermeidung häufiger Fallstricke und einen Einblick, wie Sie die Pipeline für Ihre eigenen Projekte anpassen können. + +## Was Sie benötigen + +- Python 3.9 oder neuer (das Beispiel wurde mit 3.11 getestet) +- `aspose-ocr`‑Paket (Installation via `pip install aspose-ocr`) +- Eine Internetverbindung für den **download HuggingFace model python**‑Schritt +- Eine CUDA‑kompatible GPU, wenn Sie den Geschwindigkeitsvorteil nutzen wollen (optional aber empfohlen) + +Es werden keine zusätzlichen System‑Abhängigkeiten benötigt; die Aspose‑OCR‑Engine enthält alles, was Sie benötigen. + +![Beispiel für OCR auf Bild](image.png "Beispiel für die Durchführung von OCR auf einem Bild mit Aspose OCR und einem LLM‑Post‑Processor") + +*Bild‑Alt‑Text: “perform OCR on image – Aspose OCR output before and after AI cleaning”* + +## OCR auf Bild – Schritt‑für‑Schritt‑Übersicht + +Im Folgenden teilen wir den Arbeitsablauf in logische Abschnitte auf. Jeder Abschnitt hat seine eigene Überschrift, sodass KI‑Assistenten schnell zu dem Teil springen können, der Sie interessiert, und Suchmaschinen die relevanten Schlüsselwörter indexieren können. + +### 1. HuggingFace‑Modell in Python herunterladen + +Das Erste, was wir tun müssen, ist ein Sprachmodell zu holen, das als Post‑Processor für die Roh‑OCR‑Ausgabe dient. Aspose OCR liefert eine Hilfsklasse namens `AsposeAI`, die automatisch ein Modell vom HuggingFace‑Hub herunterladen kann. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Warum das wichtig ist:** +- **download HuggingFace model python** – Sie vermeiden das manuelle Handling von Zip‑Dateien oder Token‑Authentifizierung. +- Die Verwendung von `int8`‑Quantisierung verkleinert das Modell auf etwa ein Viertel seiner ursprünglichen Größe, was entscheidend ist, wenn Sie später **release GPU memory python** müssen. + +> **Pro‑Tipp:** Halten Sie `directory_model_path` auf einer SSD für schnellere Ladezeiten. + +### 2. KI‑Hilfsklasse initialisieren und Rechtschreibprüfung aktivieren + +Jetzt erstellen wir eine `AsposeAI`‑Instanz und hängen einen Rechtschreib‑Korrektor‑Post‑Processor an. Hier beginnt die **clean OCR text python**‑Magie. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Erklärung:** +Der Rechtschreib‑Korrektor prüft jedes Token der OCR‑Engine und schlägt Änderungen vor, die durch `max_edits` begrenzt sind. Diese kleine Anpassung kann “rec0gn1tion” in “recognition” umwandeln, ohne ein schweres Sprachmodell zu benötigen. + +### 3. KI‑Hilfsklasse in die OCR‑Engine einbinden + +Aspose hat in Version 23.4 eine neue Methode eingeführt, mit der Sie eine KI‑Engine direkt in die OCR‑Pipeline einbinden können. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Warum wir das tun:** +Durch das frühe Anschließen der KI‑Hilfsklasse kann die OCR‑Engine das Modell optional für sofortige Verbesserungen (z. B. Layout‑Erkennung) nutzen. Es hält den Code außerdem übersichtlich – später sind keine separaten Post‑Processing‑Schleifen mehr nötig. + +### 4. OCR auf dem gescannten Bild ausführen + +Hier ist der zentrale Schritt, der tatsächlich **perform OCR on image** Dateien ausführt. Ersetzen Sie `YOUR_DIRECTORY/input.png` durch den Pfad zu Ihrem eigenen Scan. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Typische Rohausgaben können Zeilenumbrüche an merkwürdigen Stellen, falsch erkannte Zeichen oder fremde Symbole enthalten. Deshalb benötigen wir den nächsten Schritt. + +**Erwartete Rohausgabe (Beispiel):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +### 5. OCR‑Text in Python mit dem KI‑Post‑Processor bereinigen + +Jetzt lassen wir die KI das Durcheinander bereinigen. Das ist das Herzstück des **clean OCR text python**‑Prozesses. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Ergebnis, das Sie sehen werden:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Beachten Sie, wie der Rechtschreib‑Korrektor das “Th1s” → “This” korrigierte und das fremde “4n” entfernte. Das Modell normalisiert außerdem die Abstände, was häufig ein Problem darstellt, wenn Sie den Text später in nachgelagerte NLP‑Pipelines einspeisen. + +### 6. GPU‑Speicher in Python freigeben – Aufräum‑Schritte + +Wenn Sie fertig sind, ist es gute Praxis, GPU‑Ressourcen freizugeben, besonders wenn Sie mehrere OCR‑Aufgaben in einem langfristig laufenden Service ausführen. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Was im Hintergrund passiert:** +`free_resources()` entlädt das Modell von der GPU und gibt den Speicher an den CUDA‑Treiber zurück. `dispose()` schließt die internen Puffer der OCR‑Engine. Das Überspringen dieser Aufrufe kann nach nur wenigen Bildern zu Out‑of‑Memory‑Fehlern führen. + +> **Denken Sie daran:** Wenn Sie planen, Stapel in einer Schleife zu verarbeiten, rufen Sie das Aufräumen nach jedem Batch auf oder verwenden Sie denselben `ai_helper` erneut, ohne ihn bis zum endgültigen Abschluss freizugeben. + +## Bonus: Anpassung der Pipeline für verschiedene Szenarien + +### Modell‑Quantisierung anpassen + +Wenn Sie eine leistungsstarke GPU (z. B. RTX 4090) besitzen und höhere Genauigkeit wünschen, ändern Sie `hugging_face_quantization` zu `"fp16"` und erhöhen `gpu_layers` auf `30`. Dies verbraucht mehr Speicher, sodass Sie **release GPU memory python** nach jedem Batch aggressiver durchführen müssen. + +### Einen benutzerdefinierten Rechtschreibprüfer verwenden + +Sie können den integrierten `spell_corrector` gegen einen benutzerdefinierten Post‑Processor austauschen, der domänenspezifische Korrekturen vornimmt (z. B. medizinische Terminologie). Implementieren Sie einfach die erforderliche Schnittstelle und übergeben Sie dessen Namen an `set_post_processor`. + +### Batch‑Verarbeitung mehrerer Bilder + +Packen Sie die OCR‑Schritte in eine `for`‑Schleife, sammeln Sie `cleaned_result.text` in einer Liste und rufen Sie `ai_helper.free_resources()` erst nach der Schleife auf, wenn Sie genügend GPU‑RAM haben. Das reduziert den Aufwand, das Modell wiederholt zu laden. + +## Fazit + +Wir haben Ihnen gerade gezeigt, wie Sie **perform OCR on image** Dateien in Python ausführen, automatisch ein **HuggingFace‑Modell herunterladen**, **OCR‑Text bereinigen** und sicher **GPU‑Speicher freigeben**, wenn Sie fertig sind. Das vollständige Skript ist bereit zum Kopieren‑Einfügen, und die Erklärungen geben Ihnen das Vertrauen, es an größere Projekte anzupassen. + +Nächste Schritte? Versuchen Sie, das Qwen 2.5‑Modell gegen eine größere LLaMA‑Variante auszutauschen, experimentieren Sie mit verschiedenen Post‑Processor‑Varianten oder integrieren Sie die bereinigte Ausgabe in einen durchsuchbaren Elasticsearch‑Index. Die Möglichkeiten sind endlos, und Sie haben nun eine solide Grundlage zum Weiterbauen. + +Viel Spaß beim Coden, und mögen Ihre OCR‑Pipelines stets sauber und speichereffizient sein! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/greek/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..321ed2413 --- /dev/null +++ b/ocr/greek/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Εξαγωγή κειμένου από PDF χρησιμοποιώντας το Aspose OCR σε Python. Μάθετε + την επεξεργασία PDF με batch OCR, μετατρέψτε το κείμενο σκαναρισμένων PDF και διαχειριστείτε + σελίδες χαμηλής εμπιστοσύνης. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: el +og_description: Εξαγωγή κειμένου από PDF με το Aspose OCR σε Python. Αυτός ο οδηγός + δείχνει επεξεργασία PDF με OCR σε παρτίδες, μετατροπή κειμένου από σαρωμένα PDF + και διαχείριση αποτελεσμάτων χαμηλής εμπιστοσύνης. +og_title: Εξαγωγή κειμένου από PDF – OCR PDF με Python +tags: +- OCR +- Python +- PDF processing +title: Εξαγωγή κειμένου από PDF – OCR PDF με Python +url: /el/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Εξαγωγή Κειμένου από PDF – OCR PDF με Python + +Έχετε ποτέ χρειαστεί να **εξάγετε κείμενο από PDF** αλλά το αρχείο είναι μόνο μια σαρωμένη εικόνα; Δεν είστε μόνοι—πολλοί προγραμματιστές αντιμετωπίζουν αυτό το πρόβλημα όταν προσπαθούν να μετατρέψουν τα PDF σε αναζητήσιμα δεδομένα. Τα καλά νέα; Με το Aspose OCR for Python μπορείτε να μετατρέψετε το κείμενο από σαρωμένα PDF με λίγες γραμμές κώδικα, και ακόμη να εκτελέσετε **batch OCR PDF processing** όταν έχετε δεκάδες αρχεία προς επεξεργασία. + +Σε αυτό το tutorial θα περάσουμε από όλη τη ροή εργασίας: τη ρύθμιση της βιβλιοθήκης, την εκτέλεση OCR σε ένα μόνο PDF, την κλιμάκωση σε batch, και τη διαχείριση σελίδων με χαμηλή εμπιστοσύνη ώστε να ξέρετε πότε απαιτείται χειροκίνητη ανασκόπηση. Στο τέλος θα έχετε ένα έτοιμο‑για‑εκτέλεση script που εξάγει κείμενο από οποιοδήποτε σαρωμένο PDF, και θα κατανοήσετε το «γιατί» κάθε βήματος. + +## Τι Θα Χρειαστεί + +Πριν προχωρήσουμε, βεβαιωθείτε ότι έχετε: + +- Python 3.8 ή νεότερη (ο κώδικας χρησιμοποιεί f‑strings, επομένως η 3.6+ λειτουργεί, αλλά συνιστάται η 3.8+). +- Άδεια Aspose OCR for Python ή κλειδί δωρεάν δοκιμής (μπορείτε να αποκτήσετε ένα από την ιστοσελίδα Aspose). +- Έναν φάκελο με ένα ή περισσότερα σαρωμένα PDF που θέλετε να επεξεργαστείτε. +- Μια μέτρια ποσότητα χώρου στο δίσκο για τις παραγόμενες αναφορές *.txt*. + +Αυτό είναι όλο—χωρίς βαριές εξωτερικές εξαρτήσεις, χωρίς τρικαλό OpenCV. Η μηχανή Aspose OCR κάνει το σκληρό έργο για εσάς. + +## Ρύθμιση του Περιβάλλοντος + +Πρώτα, εγκαταστήστε το πακέτο Aspose OCR από το PyPI: + +```bash +pip install aspose-ocr +``` + +Αν έχετε αρχείο άδειας (`Aspose.OCR.lic`), τοποθετήστε το στη ρίζα του έργου σας και ενεργοποιήστε το ως εξής: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** Κρατήστε το αρχείο άδειας εκτός ελέγχου εκδόσεων· προσθέστε το στο `.gitignore` για να αποφύγετε τυχαία αποκάλυψη. + +## Εκτέλεση OCR σε Ένα PDF + +Τώρα ας εξάγουμε κείμενο από ένα μόνο σαρωμένο PDF. Τα βασικά βήματα είναι: + +1. Δημιουργήστε μια παρουσία `OcrEngine`. +2. Κατευθύνετέ το στο αρχείο PDF. +3. Ανακτήστε ένα `OcrResult` για κάθε σελίδα. +4. Γράψτε την έξοδο plain‑text στο δίσκο. +5. Αποδεσμεύστε τη μηχανή για να ελευθερώσετε τους εγγενείς πόρους. + +Ακολουθεί το πλήρες, εκτελέσιμο script: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**What you’ll see:** Για κάθε σελίδα το script εκτυπώνει κάτι όπως `Page 1: confidence 97.45%`. Αν μια σελίδα πέσει κάτω από το όριο 80 %, εμφανίζεται μια προειδοποίηση, ενημερώνοντάς σας ότι το OCR μπορεί να έχει χάσει χαρακτήρες. + +### Γιατί Λειτουργεί Αυτό + +- `OcrEngine` είναι η πύλη προς τη φυσική βιβλιοθήκη Aspose OCR· διαχειρίζεται τα πάντα από την προεπεξεργασία εικόνας μέχρι την αναγνώριση χαρακτήρων. +- `extract_from_pdf` ραστεροποιεί αυτόματα κάθε σελίδα PDF, έτσι δεν χρειάζεται να μετατρέψετε το PDF σε εικόνες εσείς. +- Οι βαθμολογίες εμπιστοσύνης σας επιτρέπουν να αυτοματοποιήσετε ελέγχους ποιότητας—κρίσιμα όταν επεξεργάζεστε νομικά ή ιατρικά έγγραφα όπου η ακρίβεια έχει σημασία. + +## Batch OCR Επεξεργασία PDF με Python + +Τα περισσότερα πραγματικά έργα περιλαμβάνουν περισσότερα από ένα αρχεία. Ας επεκτείνουμε το script ενός αρχείου σε μια **batch OCR PDF processing** pipeline που διασχίζει έναν φάκελο, επεξεργάζεται κάθε PDF και αποθηκεύει τα αποτελέσματα σε έναν αντίστοιχο υπο‑φάκελο. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Πώς Αυτό Βοηθά + +- **Scalability:** Η συνάρτηση διασχίζει το φάκελο μία φορά, δημιουργώντας έναν αφιερωμένο υπο‑φάκελο εξόδου για κάθε PDF. Αυτό διατηρεί τα πράγματα οργανωμένα όταν έχετε δεκάδες έγγραφα. +- **Reusability:** Η `ocr_pdf_file` μπορεί να κληθεί από άλλα scripts (π.χ., μια web υπηρεσία) επειδή είναι καθαρή συνάρτηση. +- **Error handling:** Το script εκτυπώνει ένα φιλικό μήνυμα αν ο φάκελος εισόδου είναι κενός, αποφεύγοντας μια σιωπηλή αποτυχία. + +## Μετατροπή Κειμένου από Σαρωμένα PDF – Διαχείριση Ακραίων Περιπτώσεων + +Αν και ο παραπάνω κώδικας λειτουργεί για τα περισσότερα PDF, μπορεί να αντιμετωπίσετε μερικές ιδιαιτερότητες: + +| Κατάσταση | Γιατί Συμβαίνει | Πώς να Μετριάσετε | +|-----------|----------------|-------------------| +| **Encrypted PDFs** | Το PDF είναι προστατευμένο με κωδικό. | Περάστε τον κωδικό στο `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Multi‑language documents** | Το Aspose OCR προεπιλογή είναι η Αγγλική. | Ορίστε `ocr_engine.language = "spa"` για Ισπανικά, ή δώστε μια λίστα για μεικτές γλώσσες. | +| **Very large PDFs (>500 pages)** | Η χρήση μνήμης αυξάνεται επειδή κάθε σελίδα φορτώνεται στη RAM. | Επεξεργαστείτε το PDF σε τμήματα χρησιμοποιώντας `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` και επαναλάβετε. | +| **Poor scan quality** | Χαμηλό DPI ή πολύ θόρυβος μειώνει την εμπιστοσύνη. | Προ‑επεξεργαστείτε το PDF με `engine.image_preprocessing = True` ή αυξήστε το DPI μέσω `engine.dpi = 300`. | + +> **Watch out:** Η ενεργοποίηση της προεπεξεργασίας εικόνας μπορεί να αυξήσει σημαντικά το χρόνο CPU. Αν εκτελείτε ένα νυχτερινό batch, προγραμματίστε αρκετό χρόνο ή ξεκινήστε έναν ξεχωριστό worker. + +## Επαλήθευση του Αποτελέσματος + +Μετά το τέλος του script, θα βρείτε μια δομή φακέλων όπως: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Ανοίξτε οποιοδήποτε αρχείο `.txt`; θα πρέπει να δείτε καθαρό κείμενο κωδικοποιημένο σε UTF‑8 που αντικατοπτρίζει το αρχικό σαρωμένο περιεχόμενο. Αν παρατηρήσετε παραμορφωμένους χαρακτήρες, ελέγξτε ξανά τις ρυθμίσεις γλώσσας του PDF και βεβαιωθείτε ότι τα σωστά πακέτα γραμματοσειρών είναι εγκατεστημένα στο μηχάνημα. + +## Καθαρισμός Πόρων + +Το Aspose OCR βασίζεται σε εγγενείς DLL, επομένως είναι απαραίτητο να καλέσετε `engine.dispose()` μόλις τελειώσετε. Η παράλειψη αυτού του βήματος μπορεί να οδηγήσει σε διαρροές μνήμης, ειδικά σε μακροχρόνιες batch εργασίες. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Πλήρες Παράδειγμα Από Αρχή έως Τέλος + +Συνδυάζοντας όλα, εδώ είναι ένα μόνο + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/greek/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..709b16d0d --- /dev/null +++ b/ocr/greek/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Μάθετε πώς να αναγνωρίζετε το χειρόγραφο σε Python με το Aspose OCR. + Αυτός ο οδηγός βήμα‑προς‑βήμα δείχνει πώς να εξάγετε το χειρόγραφο κείμενο αποδοτικά. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: el +og_description: Πώς να αναγνωρίσετε τη χειρόγραφη γραφή σε Python; Ακολουθήστε αυτόν + τον πλήρη οδηγό για την εξαγωγή χειρόγραφου κειμένου χρησιμοποιώντας το Aspose OCR, + με κώδικα, συμβουλές και διαχείριση ειδικών περιπτώσεων. +og_title: Πώς να αναγνωρίζετε την χειρόγραφη γραφή σε Python – Πλήρης οδηγός +tags: +- OCR +- Python +- HandwritingRecognition +title: Πώς να αναγνωρίζετε το χειρόγραφο στην Python – Πλήρης οδηγός +url: /el/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Πώς να Αναγνωρίσετε Χειρόγραφια σε Python – Πλήρης Οδηγός + +Έχετε ποτέ χρειαστεί **πώς να αναγνωρίσετε χειρόγραφια** σε ένα έργο Python αλλά δεν ήξερες από πού να ξεκινήσεις; Δεν είστε μόνοι—οι προγραμματιστές ρωτούν συνεχώς, «Μπορώ να εξάγω κείμενο από μια σαρωμένη σημείωση;» Τα καλά νέα είναι ότι οι σύγχρονες βιβλιοθήκες OCR το κάνουν παιχνιδάκι. Σε αυτόν τον οδηγό θα περάσουμε από **πώς να αναγνωρίσετε χειρόγραφια** χρησιμοποιώντας το Aspose OCR, και θα μάθετε επίσης να **εξάγετε χειρόγραφο κείμενο** αξιόπιστα. + +Θα καλύψουμε τα πάντα, από την εγκατάσταση της βιβλιοθήκης μέχρι τη ρύθμιση των ορίων εμπιστοσύνης για εκείνα τα ακατάστατα καλλιγραφικά κείμενα. Στο τέλος θα έχετε ένα εκτελέσιμο script που εκτυπώνει το εξαγόμενο κείμενο και ένα συνολικό σκορ εμπιστοσύνης—τέλειο για εφαρμογές λήψης σημειώσεων, εργαλεία αρχειοθέτησης ή απλώς για ικανοποίηση της περιέργειας. Δεν απαιτείται προηγούμενη εμπειρία OCR· βασικές γνώσεις Python αρκούν. + +--- + +## Τι Θα Χρειαστείτε + +- **Python 3.9+** (η πιο πρόσφατη σταθερή έκδοση λειτουργεί καλύτερα) +- **Aspose.OCR for Python via .NET** – εγκαταστήστε με `pip install aspose-ocr` +- Μια **χειρόγραφη εικόνα** (JPEG/PNG) που θέλετε να επεξεργαστείτε +- Προαιρετικά: ένα εικονικό περιβάλλον για να διατηρήσετε τις εξαρτήσεις τακτικές + +Αν έχετε αυτά τα στοιχεία έτοιμα, ας βουτήξουμε. + +![Παράδειγμα αναγνώρισης χειρόγραφου](/images/handwritten-sample.jpg "Παράδειγμα αναγνώρισης χειρόγραφου") + +*(Κείμενο alt: “παράδειγμα αναγνώρισης χειρόγραφου που δείχνει μια σαρωμένη χειρόγραφη σημείωση”)* + +--- + +## Βήμα 1 – Εγκατάσταση και Εισαγωγή των Κλάσεων Aspose OCR + +Πρώτα απ' όλα, χρειαζόμαστε τη μηχανή OCR. Η Aspose παρέχει ένα καθαρό API που διαχωρίζει την αναγνώριση τυπωμένου κειμένου από τη λειτουργία χειρόγραφου. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Γιατί είναι σημαντικό:* Η εισαγωγή του `HandwritingMode` μας επιτρέπει να πούμε στη μηχανή ότι ασχολούμαστε με **handwritten text recognition python** αντί για τυπωμένο κείμενο, κάτι που βελτιώνει δραματικά την ακρίβεια για καλλιγραφικά γράμματα. + +--- + +## Βήμα 2 – Δημιουργία και Διαμόρφωση της Μηχανής OCR + +Τώρα δημιουργούμε ένα στιγμιότυπο `OcrEngine` και το μετατρέπουμε σε λειτουργία χειρόγραφου. Μπορείτε επίσης να ρυθμίσετε το όριο εμπιστοσύνης· χαμηλότερες τιμές δέχονται ασταθή γραφή, υψηλότερες απαιτούν πιο καθαρή είσοδο. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Συμβουλή:* Αν οι σημειώσεις σας είναι σαρωμένες σε 300 DPI ή περισσότερο, συνήθως θα έχετε καλύτερο σκορ. Για εικόνες χαμηλής ανάλυσης, σκεφτείτε την ανύψωση με το Pillow πριν τις δώσετε στη μηχανή. + +--- + +## Βήμα 3 – Προετοιμασία της Διαδρομής της Εικόνας + +Βεβαιωθείτε ότι η διαδρομή του αρχείου δείχνει στην εικόνα που θέλετε να επεξεργαστείτε. Οι σχετικές διαδρομές λειτουργούν καλά, αλλά οι απόλυτες αποφεύγουν εκπλήξεις «αρχείο δεν βρέθηκε». + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Συνηθισμένο λάθος:* Η παράλειψη διαφυγής των ανάστροφων καθέτων σε Windows (`C:\\folder\\image.jpg`). Η χρήση raw strings (`r"C:\folder\image.jpg"`) παρακάμπτει αυτό το πρόβλημα. + +--- + +## Βήμα 4 – Εκτέλεση της Αναγνώρισης και Συλλογή Αποτελεσμάτων + +Η μέθοδος `recognize` κάνει τη βαριά δουλειά. Επιστρέφει ένα αντικείμενο με τις ιδιότητες `.text` και `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Αναμενόμενη έξοδος (παράδειγμα):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Αν η εμπιστοσύνη πέσει κάτω από 0.5, ίσως χρειαστεί να καθαρίσετε την εικόνα (αφαιρέστε σκιές, αυξήστε την αντίθεση) ή να μειώσετε το όριο στο Βήμα 2. + +--- + +## Βήμα 5 – Καθαρισμός Πόρων + +Το Aspose OCR κρατά εγγενείς πόρους· η κλήση του `dispose()` τους απελευθερώνει και αποτρέπει διαρροές μνήμης, ειδικά όταν επεξεργάζεστε πολλές εικόνες σε βρόχο. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Γιατί το dispose;* Σε υπηρεσίες που τρέχουν πολύ χρόνο (π.χ., ένα Flask API που δέχεται ανεβάσματα), η παράλειψη απελευθέρωσης πόρων μπορεί γρήγορα να εξαντλήσει τη μνήμη του συστήματος. + +--- + +## Πλήρες Script – Εκτέλεση με Ένα Κλικ + +Συνδυάζοντας όλα, εδώ είναι ένα αυτόνομο script που μπορείτε να αντιγράψετε‑επικολλήσετε και να εκτελέσετε. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Αποθηκεύστε το ως `handwritten_ocr.py` και τρέξτε `python handwritten_ocr.py`. Αν όλα είναι ρυθμισμένα σωστά, θα δείτε το εξαγόμενο κείμενο να εκτυπώνεται στην κονσόλα. + +--- + +## Διαχείριση Περιπτώσεων Άκρων και Κοινών Παραλλαγών + +### Χαμηλής Αντίθεσης Εικόνες +Αν το φόντο διαχέεται στο μελάνι, αυξήστε πρώτα την αντίθεση: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Περιστρεφόμενες Σημειώσεις +Μια κεκλιμένη σελίδα σημειωματάριου μπορεί να διαταράξει την αναγνώριση. Χρησιμοποιήστε το Pillow για ευθυγράμμιση: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDF Πολλαπλών Σελίδων +Το Aspose OCR μπορεί επίσης να διαχειριστεί σελίδες PDF, αλλά πρέπει πρώτα να μετατρέψετε κάθε σελίδα σε εικόνα (π.χ., χρησιμοποιώντας `pdf2image`). Στη συνέχεια, επαναλάβετε τις εικόνες με την ίδια συνάρτηση `recognize_handwriting`. + +--- + +## Συμβουλές για Καλύτερα Αποτελέσματα **Extract Handwritten Text** + +- **Το DPI μετρά:** Στοχεύστε σε 300 DPI ή περισσότερο κατά τη σάρωση. +- **Αποφύγετε τα χρωματιστά φόντα:** Το καθαρό λευκό ή ανοιχτό γκρι δίνει το πιο καθαρό αποτέλεσμα. +- **Επεξεργασία κατά παρτίδες:** Τυλίξτε τη συνάρτηση σε βρόχο `for` και καταγράψτε την εμπιστοσύνη κάθε σελίδας· απορρίψτε αποτελέσματα κάτω από ένα όριο για να διατηρήσετε την ποιότητα υψηλή. +- **Υποστήριξη γλωσσών:** Το Aspose OCR υποστηρίζει πολλές γλώσσες· ορίστε `engine.set_language("en")` για βελτιστοποίηση μόνο στα αγγλικά. + +--- + +## Συχνές Ερωτήσεις + +**Λειτουργεί αυτό σε Linux;** +Ναι—το Aspose OCR παρέχει εγγενή binaries για Windows, macOS και Linux. Απλώς εγκαταστήστε το πακέτο pip και είστε έτοιμοι. + +**Τι γίνεται αν η γραφή μου είναι εξαιρετικά καλλιγραφική;** +Δοκιμάστε να μειώσετε το όριο εμπιστοσύνης (`0.5` ή ακόμη `0.4`). Λάβετε υπόψη ότι αυτό μπορεί να εισάγει περισσότερο θόρυβο, οπότε επεξεργαστείτε το αποτέλεσμα (π.χ., ορθογραφικός έλεγχος) αν χρειάζεται. + +**Μπορώ να το χρησιμοποιήσω σε web υπηρεσία;** +Απολύτως. Η συνάρτηση `recognize_handwriting` είναι χωρίς κατάσταση, καθιστώντας την ιδανική για endpoints Flask ή FastAPI. Απλώς θυμηθείτε να καλέσετε `dispose()` μετά από κάθε αίτημα ή να χρησιμοποιήσετε έναν διαχειριστή περιβάλλοντος. + +--- + +## Συμπέρασμα + +Καλύψαμε **πώς να αναγνωρίσετε χειρόγραφια** σε Python από την αρχή μέχρι το τέλος, δείχνοντας πώς να **εξάγετε χειρόγραφο κείμενο**, να ρυθμίσετε τις ρυθμίσεις εμπιστοσύνης και να αντιμετωπίσετε κοινά προβλήματα όπως χαμηλή αντίθεση ή περιστρεφόμενες σελίδες. Το πλήρες script παραπάνω είναι έτοιμο για εκτέλεση, και η μονάδα λειτουργίας το κάνει εύκολο να ενσωματωθεί σε μεγαλύτερα έργα—είτε δημιουργείτε μια εφαρμογή λήψης σημειώσεων, ψηφιοποιείτε αρχεία, είτε πειραματίζεστε με τεχνικές **handwritten ocr tutorial python**. + +Στο επόμενο βήμα, μπορείτε να εξερευνήσετε **handwritten text recognition python** για πολυγλωσσικές σημειώσεις, ή να συνδυάσετε OCR με επεξεργασία φυσικής γλώσσας για αυτόματη σύνοψη πρακτικών συναντήσεων. Ο ουρανός είναι το όριο—δοκιμάστε το και αφήστε τον κώδικά σας να δώσει ζωή στις γραφίδες. + +Καλό κώδικα, και μη διστάσετε να αφήσετε τις ερωτήσεις σας στα σχόλια! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/greek/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..cee7c9c23 --- /dev/null +++ b/ocr/greek/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,181 @@ +--- +category: general +date: 2026-04-29 +description: Μάθετε πώς να εκτελείτε OCR στις σαρώσεις σας, να χρησιμοποιείτε αυτόματα + το μοντέλο Hugging Face και να αναγνωρίζετε κείμενο από σαρώσεις με το Aspose OCR + σε λίγα λεπτά. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: el +og_description: Πώς να εκτελέσετε OCR σε σαρώσεις χρησιμοποιώντας το Aspose OCR, να + κατεβάσετε αυτόματα ένα μοντέλο Hugging Face και να λάβετε καθαρό, με στίξη κείμενο. +og_title: Πώς να εκτελέσετε OCR με το Aspose & το Hugging Face – Πλήρης Οδηγός +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Πώς να εκτελέσετε OCR με το Aspose & το Hugging Face – Πλήρης Οδηγός +url: /el/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Πώς να Εκτελέσετε OCR με Aspose & Hugging Face – Πλήρης Οδηγός + +Έχετε αναρωτηθεί ποτέ **πώς να εκτελέσετε OCR** σε μια στοίβα σαρωμένων εγγράφων χωρίς να ξοδεύετε ώρες ρυθμίζοντας παραμέτρους; Δεν είστε μόνοι. Σε πολλά έργα, οι προγραμματιστές χρειάζονται **να αναγνωρίσουν κείμενο από σαρώσεις** γρήγορα, αλλά συναντούν δυσκολίες με τη λήψη μοντέλων και την επεξεργασία των αποτελεσμάτων. + +Καλή είδηση: αυτό το tutorial σας δείχνει μια έτοιμη λύση που **χρησιμοποιεί ένα μοντέλο Hugging Face**, το κατεβάζει αυτόματα και προσθέτει στίξη ώστε η έξοδος να διαβάζεται σαν να την έγραψε άνθρωπος. Στο τέλος, θα έχετε ένα script που επεξεργάζεται κάθε εικόνα σε έναν φάκελο και δημιουργεί ένα καθαρό αρχείο `.txt` δίπλα σε κάθε σάρωση. + +## Τι Θα Χρειαστείτε + +- Python 3.8+ (ο κώδικας χρησιμοποιεί f‑strings, οπότε παλαιότερες εκδόσεις δεν θα λειτουργήσουν) +- Πακέτο `aspose-ocr` (εγκατάσταση μέσω `pip install aspose-ocr`) +- Πρόσβαση στο Internet για τη λήψη του μοντέλου την πρώτη φορά +- Ένας φάκελος με εικόνες σαρώσεων (`.png`, `.jpg`, ή `.tif`) + +Αυτό είναι όλο—χωρίς επιπλέον δυαδικά αρχεία, χωρίς χειροκίνητη ρύθμιση μοντέλου. Ας βουτήξουμε. + +![παράδειγμα εκτέλεσης OCR](https://example.com/ocr-demo.png "παράδειγμα εκτέλεσης OCR") + +## Βήμα 1: Εισαγωγή Κλάσεων Aspose OCR & Ρύθμιση Περιβάλλοντος + +Ξεκινάμε φορτώνοντας τις απαραίτητες κλάσεις από τη βιβλιοθήκη Aspose OCR. Η εισαγωγή όλων των απαραίτητων στοιχείων στην αρχή κρατά το script τακτοποιημένο και διευκολύνει την ανίχνευση ελλιπών εξαρτήσεων. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Γιατί είναι σημαντικό*: Η `OcrEngine` κάνει το βαριά δουλειά, ενώ η `AsposeAI` μας επιτρέπει να συνδέσουμε ένα μεγάλο μοντέλο γλώσσας για πιο έξυπνη μετα‑επεξεργασία. Αν παραλείψετε την εισαγωγή, ο υπόλοιπος κώδικας δεν θα συνταχθεί—οπότε μην το ξεχάσετε. + +## Βήμα 2: Διαμόρφωση Μοντέλου Hugging Face με Υποστήριξη GPU + +Τώρα λέμε στην Aspose πού να κατεβάσει το μοντέλο και πόσα επίπεδα πρέπει να τρέξουν στο GPU. Η σημαία `allow_auto_download="true"` κάνει αυτόματα τη **λήψη του μοντέλου** για εσάς. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Συμβουλή**: Αν δεν έχετε GPU, ορίστε `gpu_layers=0`. Το μοντέλο θα πέσει στην CPU, που είναι πιο αργό αλλά λειτουργεί. + +### Γιατί να Επιλέξετε ένα Μοντέλο Hugging Face; + +Το Hugging Face φιλοξενεί μια τεράστια συλλογή έτοιμων LLM. Δείχνοντας στο `Qwen/Qwen2.5-3B-Instruct-GGUF`, παίρνετε ένα συμπαγές, προσαρμοσμένο σε εντολές μοντέλο που μπορεί να προσθέσει στίξη, να διορθώσει κενά και ακόμη να διορθώσει μικρά σφάλματα OCR. Αυτό είναι το νόημα της **χρήσης μοντέλου hugging face** στην πράξη. + +## Βήμα 3: Αρχικοποίηση του AI Engine και Ενεργοποίηση Μετά‑Επεξεργασίας Στίξης + +Η μηχανή AI δεν είναι μόνο για κομψές συνομιλίες—εδώ προσθέτουμε έναν *προσθέτη στίξης* που καθαρίζει το ακατέργαστο OCR αποτέλεσμα. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Τι συμβαίνει;* Η κλήση `set_post_processor` καταχωρεί έναν ενσωματωμένο μετα‑επεξεργαστή που εκτελείται μετά το τέλος της μηχανής OCR. Παίρνει το ακατέργαστο κείμενο και εισάγει κόμματα, τελείες και κεφαλαία γράμματα όπου χρειάζεται, κάνοντας το τελικό κείμενο πολύ πιο αναγνώσιμο. + +## Βήμα 4: Δημιουργία του OCR Engine και Σύνδεση με το AI Engine + +Η σύνδεση του AI engine με το OCR engine μας δίνει ένα ενιαίο αντικείμενο που μπορεί τόσο να διαβάζει χαρακτήρες όσο και να τελειοποιεί το αποτέλεσμα. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Αν παραλείψετε αυτό το βήμα, το OCR θα λειτουργήσει ακόμα, αλλά θα χάσετε το πλεονέκτημα της στίξης—οπότε η έξοδος θα μοιάζει με αλυσίδα λέξεων. + +## Βήμα 5: Επεξεργασία Κάθε Εικόνας σε Φάκελο + +Αυτή είναι η καρδιά του tutorial. Διατρέχουμε κάθε εικόνα, τρέχουμε OCR, εφαρμόζουμε τον μετα‑επεξεργαστή και γράφουμε το καθαρισμένο κείμενο σε ένα παράπλευρο αρχείο `.txt`. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Τι να Περιμένετε + +Η εκτέλεση του script εκτυπώνει κάτι σαν: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Κάθε γραμμή σας δείχνει το σκορ εμπιστοσύνης (γρήγορος έλεγχος υγείας) και δημιουργεί τα αρχεία `invoice_001.png.txt`, `receipt_2024.tif.txt`, κ.λπ., που περιέχουν στίξη και ανθρώπινα αναγνώσιμο κείμενο. + +### Ακραίες Περιπτώσεις & Παραλλαγές + +- **Σαρώσεις μη‑Αγγλικών**: Αλλάξτε το `hugging_face_repo_id` σε ένα πολυγλωσσικό μοντέλο (π.χ., `microsoft/Multilingual-LLM-GGUF`). +- **Μεγάλες παρτίδες**: Τυλίξτε το βρόχο σε ένα `concurrent.futures.ThreadPoolExecutor` για παράλληλη επεξεργασία, αλλά προσέξτε τα όρια μνήμης GPU. +- **Προσαρμοσμένη μετά‑επεξεργασία**: Αντικαταστήστε το `"punctuation_adder"` με το δικό σας script αν χρειάζεστε εξειδικευμένο καθαρισμό (π.χ., αφαίρεση αριθμών τιμολογίων). + +## Βήμα 6: Καθαρισμός Πόρων + +Όταν η εργασία ολοκληρωθεί, η απελευθέρωση πόρων αποτρέπει διαρροές μνήμης, κάτι ιδιαίτερα σημαντικό αν τρέχετε αυτό μέσα σε μια μακροχρόνια υπηρεσία. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Η παράλειψη αυτού του βήματος μπορεί να αφήσει μνήμη GPU κλειδωμένη, κάτι που θα εμποδίσει επόμενες εκτελέσεις. + +## Ανακεφαλαίωση: Πώς να Εκτελέσετε OCR Από‑Αρχή‑Τέλος + +Σε λίγες μόνο γραμμές, δείξαμε **πώς να εκτελέσετε OCR** σε έναν φάκελο σαρώσεων, **να χρησιμοποιήσετε ένα μοντέλο Hugging Face** που κατεβαίνει αυτόματα την πρώτη φορά, και **να αναγνωρίσετε κείμενο από σαρώσεις** με αυτόματη προσθήκη στίξης. Το πλήρες script είναι έτοιμο για αντιγραφή‑επικόλληση, προσαρμογή των διαδρομών σας και εκτέλεση. + +## Επόμενα Βήματα & Σχετικά Θέματα + +- **Μαζική μετά‑επεξεργασία**: Εξερευνήστε το `ocr_engine.run_batch_postprocessor` για ακόμη πιο γρήγορη μαζική διαχείριση. +- **Εναλλακτικά μοντέλα**: Δοκιμάστε την οικογένεια `openai/whisper` αν χρειάζεστε μετατροπή ομιλίας‑σε‑κείμενο μαζί με OCR. +- **Ενσωμάτωση με βάσεις δεδομένων**: Αποθηκεύστε το εξαγόμενο κείμενο σε SQLite ή Elasticsearch για αρχεία με δυνατότητα αναζήτησης. + +Νιώστε ελεύθεροι να πειραματιστείτε—αλλάξτε το μοντέλο, ρυθμίστε το `gpu_layers`, ή προσθέστε τον δικό σας μετα‑επεξεργαστή. Η ευελιξία του Aspose OCR σε συνδυασμό με το μοντέλο hub του Hugging Face κάνει αυτή τη λύση μια πολύπλευρη βάση για οποιοδήποτε έργο ψηφιοποίησης εγγράφων. + +--- + +*Καλή προγραμματιστική! Αν συναντήσετε κάποιο πρόβλημα, αφήστε ένα σχόλιο παρακάτω ή ελέγξτε τα έγγραφα Aspose OCR για πιο προχωρημένες επιλογές διαμόρφωσης.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/greek/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..c665b19bb --- /dev/null +++ b/ocr/greek/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,208 @@ +--- +category: general +date: 2026-04-29 +description: Εκτελέστε OCR σε εικόνα χρησιμοποιώντας Python, αυτόματη λήψη μοντέλου + HuggingFace και αποδέσμευση μνήμης GPU αποδοτικά ενώ καθαρίζετε το κείμενο OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: el +og_description: Μάθετε πώς να εκτελείτε OCR σε εικόνα με Python, να κατεβάζετε αυτόματα + ένα μοντέλο HuggingFace, να καθαρίζετε το κείμενο και να ελευθερώνετε τη μνήμη GPU. +og_title: Εκτελέστε OCR σε εικόνα με Python – Οδηγός βήμα‑προς‑βήμα +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Εκτελέστε OCR σε εικόνα με Python – Πλήρης Οδηγός +url: /el/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Εκτέλεση OCR σε εικόνα με Python – Πλήρης Οδηγός + +Κάποτε χρειάστηκε να **εκτελέσετε OCR σε εικόνα** αρχεία αλλά κολλήσατε στο στάδιο λήψης μοντέλου ή εκκαθάρισης μνήμης GPU; Δεν είστε οι μόνοι—πολλοί προγραμματιστές αντιμετωπίζουν αυτό το εμπόδιο όταν προσπαθούν για πρώτη φορά να συνδυάσουν την οπτική αναγνώριση χαρακτήρων με μεγάλα μοντέλα γλώσσας. + +Σε αυτό το tutorial θα περάσουμε βήμα‑βήμα από μια ολοκληρωμένη λύση που **κατεβάζει ένα μοντέλο HuggingFace σε Python**, εκτελεί Aspose OCR, καθαρίζει την ακατέργαστη έξοδο, και τελικά **απελευθερώνει τη μνήμη GPU** που η Python μπορεί να ανακτήσει. Στο τέλος θα έχετε ένα έτοιμο προς εκτέλεση script που μετατρέπει ένα σαρωμένο PNG σε επεξεργασμένο, αναζητήσιμο κείμενο. + +> **Τι θα λάβετε:** ένα πλήρες, εκτελέσιμο δείγμα κώδικα, εξηγήσεις για το γιατί κάθε βήμα είναι σημαντικό, συμβουλές για την αποφυγή κοινών παγίδων, και μια ματιά στο πώς να προσαρμόσετε τη ροή εργασίας για τα δικά σας έργα. + +--- + +## Τι Θα Χρειαστείτε + +- Python 3.9 ή νεότερη (το παράδειγμα δοκιμάστηκε σε 3.11) +- Πακέτο `aspose-ocr` (εγκατάσταση μέσω `pip install aspose-ocr`) +- Σύνδεση στο διαδίκτυο για το βήμα **download HuggingFace model python** +- GPU συμβατό με CUDA αν θέλετε την επιτάχυνση (προαιρετικό αλλά συνιστάται) + +Δεν απαιτούνται επιπλέον εξαρτήσεις σε επίπεδο συστήματος· η μηχανή OCR της Aspose περιλαμβάνει όλα όσα χρειάζεστε. + +--- + +![παράδειγμα εκτέλεσης OCR σε εικόνα](image.png "Παράδειγμα εκτέλεσης OCR σε εικόνα με Aspose OCR και έναν επεξεργαστή LLM") + +*Image alt text: “εκτέλεση OCR σε εικόνα – Έξοδος Aspose OCR πριν και μετά τον καθαρισμό AI”* + +--- + +## Εκτέλεση OCR σε Εικόνα – Επισκόπηση Βήμα‑προς‑Βήμα + +Παρακάτω χωρίζουμε τη ροή εργασίας σε λογικά τμήματα. Κάθε τμήμα έχει τη δική του επικεφαλίδα, ώστε οι βοηθοί AI να μπορούν γρήγορα να μεταβούν στο τμήμα που σας ενδιαφέρει, και οι μηχανές αναζήτησης να ευρετηριάσουν τις σχετικές λέξεις‑κλειδιά. + +### 1. Download HuggingFace Model in Python + +Το πρώτο που πρέπει να κάνουμε είναι να κατεβάσουμε ένα μοντέλο γλώσσας που θα λειτουργήσει ως post‑processor για την ακατέργαστη έξοδο OCR. Η Aspose OCR παρέχει μια βοηθητική κλάση που ονομάζεται `AsposeAI` η οποία μπορεί αυτόματα να τραβήξει ένα μοντέλο από το HuggingFace hub. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Γιατί είναι σημαντικό:** +- **download HuggingFace model python** – αποφεύγετε τη χειροκίνητη διαχείριση αρχείων zip ή την αυθεντικοποίηση με token. +- Η χρήση ποσοτικοποίησης `int8` μειώνει το μέγεθος του μοντέλου περίπου στο ένα τέταρτο του αρχικού, κάτι κρίσιμο όταν αργότερα χρειαστεί να **release GPU memory python**. + +> **Pro tip:** Κρατήστε το `directory_model_path` σε SSD για ταχύτερους χρόνους φόρτωσης. + +--- + +### 2. Initialise the AI Helper and Enable Spell‑Checking + +Τώρα δημιουργούμε ένα στιγμιότυπο `AsposeAI` και προσθέτουμε έναν post‑processor ελέγχου ορθογραφίας. Εδώ αρχίζει η μαγεία του **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Εξήγηση:** +Ο ελεγκτής ορθογραφίας εξετάζει κάθε token από τη μηχανή OCR και προτείνει διορθώσεις περιορισμένες από το `max_edits`. Αυτή η μικρή βελτίωση μπορεί να μετατρέψει το “rec0gn1tion” σε “recognition” χωρίς τη χρήση βαρύ μοντέλου γλώσσας. + +--- + +### 3. Hook the AI Helper into the OCR Engine + +Η Aspose εισήγαγε μια νέα μέθοδο στην έκδοση 23.4 που επιτρέπει την ενσωμάτωση μιας AI μηχανής απευθείας στην αλυσίδα OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Γιατί το κάνουμε:** +Συνδέοντας τον AI helper νωρίς, η μηχανή OCR μπορεί προαιρετικά να χρησιμοποιήσει το μοντέλο για βελτιώσεις σε πραγματικό χρόνο (π.χ., ανίχνευση διάταξης). Επιπλέον, διατηρεί τον κώδικα καθαρό—δεν χρειάζονται ξεχωριστοί βρόχοι post‑processing αργότερα. + +--- + +### 4. Perform OCR on the Scanned Image + +Αυτό είναι το κεντρικό βήμα που πραγματικά **perform OCR on image** αρχεία. Αντικαταστήστε το `YOUR_DIRECTORY/input.png` με τη διαδρομή του δικού σας σαρωμένου αρχείου. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Η τυπική ακατέργαστη έξοδος μπορεί να περιέχει αλλαγές γραμμής σε περίεργα σημεία, λανθασμένους χαρακτήρες ή άσχετα σύμβολα. Γι’ αυτό χρειάζεται το επόμενο βήμα. + +**Αναμενόμενη ακατέργαστη έξοδος (παράδειγμα):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Clean OCR Text in Python with the AI Post‑Processor + +Τώρα αφήνουμε το AI να καθαρίσει το χάος. Αυτό είναι η καρδιά της διαδικασίας **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Αποτέλεσμα που θα δείτε:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Παρατηρήστε πώς ο ελεγκτής ορθογραφίας διόρθωσε το “Th1s” → “This” και αφαίρεσε το άσχετο “4n”. Το μοντέλο επίσης ομαλοποιεί τα κενά, κάτι που συχνά αποτελεί πρόβλημα όταν αργότερα τροφοδοτείτε το κείμενο σε downstream pipelines NLP. + +--- + +### 6. Release GPU Memory in Python – Clean‑up Steps + +Όταν τελειώσετε, είναι καλή πρακτική να ελευθερώσετε τους πόρους GPU, ειδικά αν τρέχετε πολλαπλές εργασίες OCR σε μια μακροχρόνια υπηρεσία. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Τι συμβαίνει στο παρασκήνιο:** +Η `free_resources()` αποφορτώνει το μοντέλο από το GPU, επιστρέφοντας τη μνήμη στον οδηγό CUDA. Η `dispose()` κλείνει τις εσωτερικές μνήμες της μηχανής OCR. Η παράλειψη αυτών των κλήσεων μπορεί να οδηγήσει σε σφάλματα έλλειψης μνήμης μετά από λίγες μόνο εικόνες. + +> **Θυμηθείτε:** Αν σκοπεύετε να επεξεργαστείτε παρτίδες σε βρόχο, καλέστε το clean‑up μετά από κάθε παρτίδα ή επαναχρησιμοποιήστε το ίδιο `ai_helper` χωρίς ελευθέρωση μέχρι το πολύ τέλος. + +--- + +## Bonus: Προσαρμογή της Ροής Εργασίας για Διαφορετικά Σενάρια + +### Ρύθμιση Ποσοτικοποίησης Μοντέλου + +Αν διαθέτετε ισχυρό GPU (π.χ., RTX 4090) και θέλετε μεγαλύτερη ακρίβεια, αλλάξτε το `hugging_face_quantization` σε `"fp16"` και αυξήστε το `gpu_layers` σε `30`. Αυτό θα καταναλώσει περισσότερη μνήμη, οπότε θα χρειαστεί να **release GPU memory python** πιο επιθετικά μετά από κάθε παρτίδα. + +### Χρήση Προσαρμοσμένου Ελεγκτή Ορθογραφίας + +Μπορείτε να αντικαταστήσετε το ενσωματωμένο `spell_corrector` με έναν προσαρμοσμένο post‑processor που κάνει διορθώσεις ειδικές για το πεδίο σας (π.χ., ιατρική ορολογία). Απλώς υλοποιήστε τη απαιτούμενη διεπαφή και περάστε το όνομά του στο `set_post_processor`. + +### Επεξεργασία Πολλών Εικόνων σε Παρτίδες + +Τυλίξτε τα βήματα OCR μέσα σε έναν βρόχο `for`, συγκεντρώστε τα `cleaned_result.text` σε λίστα, και καλέστε το `ai_helper.free_resources()` μόνο μετά το βρόχο αν έχετε αρκετή μνήμη GPU. Αυτό μειώνει το κόστος φόρτωσης του μοντέλου επανειλημμένα. + +--- + +## Συμπέρασμα + +Σας δείξαμε πώς να **perform OCR on image** αρχεία σε Python, να **κατεβάσετε αυτόματα ένα μοντέλο HuggingFace**, να **καθαρίσετε το OCR κείμενο**, και να **απελευθερώσετε τη μνήμη GPU** με ασφάλεια όταν τελειώσετε. Το πλήρες script είναι έτοιμο για αντιγραφή‑επικόλληση, και οι εξηγήσεις σας δίνουν την αυτοπεποίθηση να το προσαρμόσετε σε μεγαλύτερα έργα. + +Τι ακολουθεί; Δοκιμάστε να αντικαταστήσετε το μοντέλο Qwen 2.5 με μια μεγαλύτερη παραλλαγή LLaMA, πειραματιστείτε με διαφορετικούς post‑processors, ή ενσωματώστε την καθαρισμένη έξοδο σε έναν αναζητήσιμο δείκτη Elasticsearch. Οι δυνατότητες είναι ατελείωτες, και τώρα έχετε μια σταθερή βάση για να χτίσετε πάνω της. + +Καλό coding, και εύχομαι οι OCR αλυσίδες σας να είναι πάντα καθαρές και φιλικές προς τη μνήμη! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/hindi/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..262bc551f --- /dev/null +++ b/ocr/hindi/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,217 @@ +--- +category: general +date: 2026-04-29 +description: Aspose OCR का उपयोग करके Python में PDF से टेक्स्ट निकालें। बैच OCR PDF + प्रोसेसिंग सीखें, स्कैन किए गए PDF टेक्स्ट को बदलें, और कम‑विश्वास वाले पृष्ठों + को संभालें। +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: hi +og_description: Python में Aspose OCR के साथ PDF से टेक्स्ट निकालें। यह गाइड बैच OCR + PDF प्रोसेसिंग, स्कैन किए गए PDF टेक्स्ट को बदलना, और कम‑विश्वास परिणामों को संभालना + दिखाता है। +og_title: PDF से टेक्स्ट निकालें – Python के साथ OCR PDF +tags: +- OCR +- Python +- PDF processing +title: PDF से टेक्स्ट निकालें – Python के साथ OCR PDF +url: /hi/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# PDF से टेक्स्ट निकालें – Python के साथ OCR PDF + +क्या आपको कभी **PDF से टेक्स्ट निकालने** की ज़रूरत पड़ी है लेकिन फ़ाइल सिर्फ एक स्कैन की हुई इमेज है? आप अकेले नहीं हैं—कई डेवलपर्स इस समस्या का सामना करते हैं जब वे PDFs को सर्चेबल डेटा में बदलने की कोशिश करते हैं। अच्छी खबर? Aspose OCR for Python के साथ आप कुछ लाइनों में स्कैन किए गए PDF टेक्स्ट को कन्वर्ट कर सकते हैं, और जब आपके पास दर्जनों फ़ाइलें हों तो **batch OCR PDF processing** भी चला सकते हैं। + +इस ट्यूटोरियल में हम पूरे वर्कफ़्लो को कवर करेंगे: लाइब्रेरी सेटअप, एकल PDF पर OCR चलाना, बैच में स्केल करना, और लो‑कन्फिडेंस पेज़ेस को हैंडल करना ताकि आपको पता चल सके कब मैन्युअल रिव्यू की ज़रूरत है। अंत तक आपके पास एक तैयार‑टू‑रन स्क्रिप्ट होगी जो किसी भी स्कैन किए गए PDF से टेक्स्ट निकालती है, और आप प्रत्येक स्टेप के पीछे का कारण समझ पाएँगे। + +## आपको क्या चाहिए + +- Python 3.8 या नया (कोड f‑strings का उपयोग करता है, इसलिए 3.6+ काम करता है, लेकिन 3.8+ की सलाह दी जाती है) +- Aspose OCR for Python लाइसेंस या एक फ्री ट्रायल की (आप इसे Aspose वेबसाइट से प्राप्त कर सकते हैं) +- एक फ़ोल्डर जिसमें एक या अधिक स्कैन किए गए PDFs हों जिन्हें आप प्रोसेस करना चाहते हैं +- जनरेट किए गए *.txt* रिपोर्टों के लिए पर्याप्त डिस्क स्पेस + +बस इतना ही—कोई भारी बाहरी डिपेंडेंसी नहीं, कोई OpenCV जिम्नास्टिक नहीं। Aspose OCR इंजन आपके लिए भारी काम संभालता है। + +## पर्यावरण सेटअप + +First, install the Aspose OCR package from PyPI: + +```bash +pip install aspose-ocr +``` + +If you have a license file (`Aspose.OCR.lic`), place it in your project root and activate it like so: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** लाइसेंस फ़ाइल को संस्करण नियंत्रण से बाहर रखें; इसे `.gitignore` में जोड़ें ताकि आकस्मिक एक्सपोज़र से बचा जा सके। + +## एकल PDF पर OCR करना + +अब चलिए एक स्कैन किए गए PDF से टेक्स्ट निकालते हैं। मुख्य चरण हैं: + +1. `OcrEngine` का एक इंस्टेंस बनाएं। +2. इसे PDF फ़ाइल की ओर इंगित करें। +3. प्रत्येक पेज के लिए `OcrResult` प्राप्त करें। +4. सादा‑टेक्स्ट आउटपुट को डिस्क पर लिखें। +5. इंजन को डिस्पोज़ करें ताकि नेटिव रिसोर्सेज़ मुक्त हो सकें। + +Here’s the full, runnable script: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**What you’ll see:** For each page the script prints something like `Page 1: confidence 97.45%`. If a page falls under the 80 % threshold, a warning appears, letting you know that the OCR might have missed characters. + +### यह क्यों काम करता है + +- **`OcrEngine`** नेटिव Aspose OCR लाइब्रेरी का गेटवे है; यह इमेज प्री‑प्रोसेसिंग से लेकर कैरेक्टर रिकग्निशन तक सब कुछ संभालता है। +- **`extract_from_pdf`** प्रत्येक PDF पेज को स्वचालित रूप से रास्टराइज़ करता है, इसलिए आपको PDF को इमेज में बदलने की ज़रूरत नहीं है। +- **Confidence स्कोर** आपको क्वालिटी चेक्स ऑटोमेट करने देते हैं—क़ानूनी या मेडिकल दस्तावेज़ों को प्रोसेस करते समय जहाँ सटीकता महत्वपूर्ण है, यह आवश्यक है। + +## Python के साथ बैच OCR PDF प्रोसेसिंग + +अधिकांश रियल‑वर्ल्ड प्रोजेक्ट्स में एक से अधिक फ़ाइलें होती हैं। चलिए सिंगल‑फ़ाइल स्क्रिप्ट को **batch OCR PDF processing** पाइपलाइन में विस्तारित करते हैं जो एक डायरेक्टरी को ट्रैवर्स करती है, प्रत्येक PDF को प्रोसेस करती है, और परिणामों को मिलते‑जुलते सब‑फ़ोल्डर में स्टोर करती है। + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### यह कैसे मदद करता है + +- **Scalability:** फ़ंक्शन फ़ोल्डर को एक बार ट्रैवर्स करता है, प्रत्येक PDF के लिए एक समर्पित आउटपुट सब‑फ़ोल्डर बनाता है। यह तब चीज़ों को व्यवस्थित रखता है जब आपके पास दर्जनों दस्तावेज़ हों। +- **Reusability:** `ocr_pdf_file` को अन्य स्क्रिप्ट्स (जैसे, वेब सर्विस) से कॉल किया जा सकता है क्योंकि यह एक शुद्ध फ़ंक्शन है। +- **Error handling:** यदि इनपुट फ़ोल्डर खाली है तो स्क्रिप्ट एक दोस्ताना संदेश प्रिंट करती है, जिससे आप साइलेंट फेल्योर से बचते हैं। + +## स्कैन किए गए PDF टेक्स्ट को कन्वर्ट करना – एज केस हैंडलिंग + +जबकि ऊपर दिया गया कोड अधिकांश PDFs के लिए काम करता है, आपको कुछ क्विर्क्स का सामना हो सकता है: + +| स्थिति | क्यों होता है | कैसे निपटें | +|-----------|----------------|-----------------| +| **Encrypted PDFs** | PDF पासवर्ड‑प्रोटेक्टेड है। | पासवर्ड को `extract_from_pdf(pdf_path, password="yourPwd")` में पास करें। | +| **Multi‑language documents** | Aspose OCR डिफ़ॉल्ट रूप से अंग्रेज़ी पर सेट है। | `ocr_engine.language = "spa"` सेट करें स्पेनिश के लिए, या मिश्रित भाषाओं के लिए सूची प्रदान करें। | +| **Very large PDFs (>500 pages)** | प्रत्येक पेज RAM में लोड होने के कारण मेमोरी उपयोग बढ़ जाता है। | `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` का उपयोग करके PDF को चंक्स में प्रोसेस करें और लूप करें। | +| **Poor scan quality** | कम DPI या अधिक शोर से confidence कम हो जाता है। | `engine.image_preprocessing = True` के साथ PDF को प्री‑प्रोसेस करें या DPI को `engine.dpi = 300` से बढ़ाएँ। | + +> **Watch out:** इमेज प्री‑प्रोसेसिंग को ऑन करने से CPU टाइम में उल्लेखनीय वृद्धि हो सकती है। यदि आप नाइटली बैच चला रहे हैं, तो पर्याप्त समय शेड्यूल करें या एक अलग वर्कर स्पिन अप करें। + +## आउटपुट की जाँच + +After the script finishes, you’ll find a folder structure like: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +किसी भी `.txt` फ़ाइल को खोलें; आपको साफ़, UTF‑8 एन्कोडेड टेक्स्ट दिखना चाहिए जो मूल स्कैन किए गए कंटेंट को प्रतिबिंबित करता है। यदि आपको गड़बड़ अक्षर दिखें, तो PDF की भाषा सेटिंग्स दोबारा जांचें और सुनिश्चित करें कि मशीन पर सही फ़ॉन्ट पैक्स इंस्टॉल हैं। + +## रिसोर्सेज़ को क्लीन अप करना + +Aspose OCR नेटिव DLLs पर निर्भर करता है, इसलिए काम खत्म होने पर `engine.dispose()` को कॉल करना आवश्यक है। इस स्टेप को भूलने से मेमोरी लीक्स हो सकते हैं, खासकर लंबे‑चलने वाले बैच जॉब्स में। + +```python +# Always the last line of your script +engine.dispose() +``` + +## पूर्ण एंड‑टू‑एंड उदाहरण + +Putting everything together, here’s a single + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/hindi/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..2fb0f0d22 --- /dev/null +++ b/ocr/hindi/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Aspose OCR के साथ Python में हस्तलेख को पहचानना सीखें। यह चरण‑दर‑चरण + गाइड दिखाता है कि कैसे कुशलतापूर्वक हस्तलेखित पाठ निकाला जाए। +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: hi +og_description: Python में हस्तलेख को कैसे पहचानें? Aspose OCR का उपयोग करके हस्तलेखित + पाठ निकालने के लिए इस पूर्ण गाइड का पालन करें, जिसमें कोड, टिप्स और किनारी मामलों + का समाधान शामिल है। +og_title: Python में हस्तलेखन कैसे पहचानें – पूर्ण ट्यूटोरियल +tags: +- OCR +- Python +- HandwritingRecognition +title: Python में हस्तलेख पहचान कैसे करें – पूर्ण ट्यूटोरियल +url: /hi/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python में हस्तलेख पहचान कैसे करें – पूर्ण ट्यूटोरियल + +क्या आपको कभी **हस्तलेख पहचान कैसे करें** Python प्रोजेक्ट में चाहिए था लेकिन शुरुआत नहीं पता थी? आप अकेले नहीं हैं—डेवलपर्स लगातार पूछते हैं, “क्या मैं स्कैन किए हुए नोट से टेक्स्ट निकाल सकता हूँ?” अच्छी खबर यह है कि आधुनिक OCR लाइब्रेरीज़ इसे बहुत आसान बना देती हैं। इस गाइड में हम **हस्तलेख पहचान कैसे करें** Aspose OCR का उपयोग करके दिखाएंगे, और आप **हस्तलेखित टेक्स्ट निकालना** भी विश्वसनीय रूप से सीखेंगे। + +हम लाइब्रेरी को इंस्टॉल करने से लेकर उन गंदे कर्सिव स्क्रिप्ट्स के लिए confidence थ्रेशोल्ड को ट्यून करने तक सब कुछ कवर करेंगे। अंत तक आपके पास एक runnable स्क्रिप्ट होगी जो निकाले गए टेक्स्ट और कुल confidence स्कोर को प्रिंट करेगी—नोट‑टेकिंग ऐप्स, आर्काइव टूल्स, या सिर्फ जिज्ञासा को संतुष्ट करने के लिए एकदम सही। कोई पूर्व OCR अनुभव आवश्यक नहीं; बुनियादी Python ज्ञान पर्याप्त है। + +--- + +## आपको क्या चाहिए + +- **Python 3.9+** (सबसे नया स्थिर संस्करण सबसे अच्छा काम करता है) +- **Aspose.OCR for Python via .NET** – `pip install aspose-ocr` से इंस्टॉल करें +- एक **हस्तलेखित इमेज** (JPEG/PNG) जिसे आप प्रोसेस करना चाहते हैं +- वैकल्पिक: निर्भरताओं को साफ़ रखने के लिए एक वर्चुअल एनवायरनमेंट + +यदि आपके पास ये सब तैयार हैं, तो चलिए शुरू करते हैं। + +![हस्तलेख पहचान का उदाहरण](/images/handwritten-sample.jpg "हस्तलेख पहचान का उदाहरण") + +*(Alt text: “हस्तलेख पहचान का उदाहरण जिसमें स्कैन किया हुआ हस्तलेखित नोट दिखाया गया है”)* + +--- + +## चरण 1 – Aspose OCR क्लासेज़ को इंस्टॉल और इम्पोर्ट करें + +सबसे पहले, हमें OCR इंजन की जरूरत है। Aspose एक साफ़ API प्रदान करता है जो प्रिंटेड‑टेक्स्ट पहचान को हस्तलेख मोड से अलग करता है। + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*यह क्यों महत्वपूर्ण है:* `HandwritingMode` को इम्पोर्ट करने से हम इंजन को बता सकते हैं कि हम **हस्तलेखित टेक्स्ट पहचान python** कर रहे हैं, न कि प्रिंटेड टेक्स्ट, जिससे कर्सिव स्ट्रोक्स की सटीकता काफी बढ़ जाती है। + +--- + +## चरण 2 – OCR इंजन बनाएं और कॉन्फ़िगर करें + +अब हम एक `OcrEngine` इंस्टेंस बनाते हैं और उसे हस्तलेख मोड में स्विच करते हैं। आप confidence थ्रेशोल्ड भी समायोजित कर सकते हैं; कम मान शेकी लिखावट को स्वीकार करेंगे, उच्च मान साफ़ इनपुट की मांग करेंगे। + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*प्रो टिप:* यदि आपके नोट्स 300 DPI या उससे अधिक पर स्कैन किए गए हैं, तो आमतौर पर आपको बेहतर स्कोर मिलेगा। लो‑रेज़ोल्यूशन इमेज़ के लिए, Pillow से अप‑स्केल करने पर विचार करें इससे पहले कि आप उन्हें इंजन को दें। + +--- + +## चरण 3 – इमेज पाथ तैयार करें + +सुनिश्चित करें कि फ़ाइल पाथ उस इमेज की ओर इशारा करता है जिसे आप प्रोसेस करना चाहते हैं। रिलेटिव पाथ ठीक काम करते हैं, लेकिन एब्सोल्यूट पाथ “फ़ाइल नहीं मिली” जैसी समस्याओं से बचाते हैं। + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*सामान्य गलती:* Windows पर बैकस्लैश एस्केप करना भूल जाना (`C:\\folder\\image.jpg`)। रॉ स्ट्रिंग्स (`r"C:\folder\image.jpg"`) इस समस्या से बचाती हैं। + +--- + +## चरण 4 – पहचान चलाएँ और परिणाम कैप्चर करें + +`recognize` मेथड भारी काम करता है। यह एक ऑब्जेक्ट रिटर्न करता है जिसमें `.text` और `.confidence` प्रॉपर्टीज़ होती हैं। + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**अपेक्षित आउटपुट (उदाहरण):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +यदि confidence 0.5 से नीचे गिर जाता है, तो आपको इमेज को साफ़ करना पड़ सकता है (शैडो हटाएँ, कंट्रास्ट बढ़ाएँ) या चरण 2 में थ्रेशोल्ड को कम करें। + +--- + +## चरण 5 – रिसोर्सेज़ को क्लीन अप करें + +Aspose OCR नेटिव रिसोर्सेज़ रखता है; `dispose()` कॉल करने से वे रिलीज़ हो जाते हैं और मेमोरी लीक्स से बचा जा सकता है, विशेषकर जब आप लूप में कई इमेज़ प्रोसेस कर रहे हों। + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*क्यों dispose?* लंबी‑चलने वाली सर्विसेज़ (जैसे Flask API जो अपलोड स्वीकार करता है) में रिसोर्सेज़ को फ्री न करने से सिस्टम मेमोरी जल्दी खत्म हो सकती है। + +--- + +## पूर्ण स्क्रिप्ट – एक‑क्लिक रन + +सब कुछ मिलाकर, यहाँ एक सेल्फ‑कंटेन्ड स्क्रिप्ट है जिसे आप कॉपी‑पेस्ट करके चला सकते हैं। + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +इसे `handwritten_ocr.py` के रूप में सेव करें और `python handwritten_ocr.py` चलाएँ। यदि सब कुछ सही ढंग से सेट है, तो आप कंसोल में निकाला गया टेक्स्ट देखेंगे। + +--- + +## एज केस और सामान्य वैरिएशन्स को हैंडल करना + +### लो‑कॉन्ट्रास्ट इमेज़ +यदि बैकग्राउंड इंक में मिल रहा है, तो पहले कंट्रास्ट बढ़ाएँ: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### घुमा हुआ नोट +एक तिरछा नोटबुक पेज पहचान को बिगाड़ सकता है। Pillow से डेस्क्यू करें: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### मल्टी‑पेज PDFs +Aspose OCR PDF पेजेज़ को भी हैंडल कर सकता है, लेकिन आपको पहले प्रत्येक पेज को इमेज़ में बदलना होगा (जैसे `pdf2image` का उपयोग करके)। फिर उसी `recognize_handwriting` फ़ंक्शन के साथ इमेज़ लूप करें। + +--- + +## बेहतर **हस्तलेखित टेक्स्ट निकालना** परिणामों के लिए प्रो टिप्स + +- **DPI मायने रखता है:** स्कैन करते समय 300 DPI या उससे अधिक लक्ष्य रखें। +- **रंगीन बैकग्राउंड से बचें:** शुद्ध सफ़ेद या हल्का ग्रे सबसे साफ़ आउटपुट देता है। +- **बैच प्रोसेसिंग:** फ़ंक्शन को `for` लूप में रैप करें और प्रत्येक पेज की confidence लॉग करें; थ्रेशोल्ड से नीचे के परिणामों को डिस्कार्ड करें ताकि क्वालिटी हाई रहे। +- **भाषा समर्थन:** Aspose OCR कई भाषाओं को सपोर्ट करता है; अंग्रेज़ी‑केवल ऑप्टिमाइज़ेशन के लिए `engine.set_language("en")` सेट करें। + +--- + +## अक्सर पूछे जाने वाले प्रश्न + +**क्या यह Linux पर काम करता है?** +हां—Aspose OCR Windows, macOS, और Linux के लिए नेटिव बाइनरीज़ के साथ आता है। सिर्फ pip पैकेज इंस्टॉल करें और आप तैयार हैं। + +**अगर मेरी हस्तलेख बहुत कर्सिव है तो क्या करें?** +confidence थ्रेशोल्ड को कम करें (`0.5` या यहाँ तक कि `0.4`)। ध्यान रखें कि इससे अधिक नॉइज़ आ सकता है, इसलिए आउटपुट को पोस्ट‑प्रोसेस (जैसे स्पेल‑चेक) करें। + +**क्या मैं इसे वेब सर्विस में इस्तेमाल कर सकता हूँ?** +बिल्कुल। `recognize_handwriting` फ़ंक्शन स्टेटलेस है, इसलिए Flask या FastAPI एंडपॉइंट्स के लिए एकदम उपयुक्त है। बस प्रत्येक रिक्वेस्ट के बाद `dispose()` कॉल करें या कंटेक्स्ट मैनेजर इस्तेमाल करें। + +--- + +## निष्कर्ष + +हमने **Python में हस्तलेख पहचान** को शुरू से अंत तक कवर किया, दिखाया कि **हस्तलेखित टेक्स्ट निकालना** कैसे किया जाता है, confidence सेटिंग्स को कैसे ट्यून किया जाता है, और लो‑कॉन्ट्रास्ट या घुमा पेज जैसे सामान्य समस्याओं को कैसे संभाला जाता है। ऊपर दिया गया पूर्ण स्क्रिप्ट रन करने के लिए तैयार है, और मॉड्यूलर फ़ंक्शन इसे बड़े प्रोजेक्ट्स में इंटीग्रेट करना आसान बनाता है—चाहे आप नोट‑टेकिंग ऐप बना रहे हों, आर्काइव्स को डिजिटलाइज़ कर रहे हों, या बस **हस्तलेख ocr ट्यूटोरियल python** तकनीकों के साथ प्रयोग कर रहे हों। + +आगे आप **हस्तलेखित टेक्स्ट पहचान python** को मल्टी‑लैंग्वेज नोट्स के लिए एक्सप्लोर कर सकते हैं, या OCR को नेचुरल‑लैंग्वेज प्रोसेसिंग के साथ जोड़कर मीटिंग मिनट्स को ऑटो‑समरीज़ कर सकते हैं। संभावनाएँ असीम हैं—एक बार ट्राय करें और अपने कोड को स्क्रिबल्स को जीवन दें। + +हैप्पी कोडिंग, और अपने प्रश्न कमेंट्स में पूछने में संकोच न करें! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/hindi/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..79de4e938 --- /dev/null +++ b/ocr/hindi/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: जानिए कैसे अपने स्कैन पर OCR चलाएँ, Hugging Face मॉडल को स्वचालित रूप + से उपयोग करें, और Aspose OCR के साथ मिनटों में स्कैन से टेक्स्ट पहचानें। +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: hi +og_description: Aspose OCR का उपयोग करके स्कैन पर OCR कैसे चलाएँ, स्वचालित रूप से + Hugging Face मॉडल डाउनलोड करें, और साफ़, विराम चिह्नों वाला टेक्स्ट प्राप्त करें। +og_title: Aspose और Hugging Face के साथ OCR कैसे चलाएँ – पूर्ण गाइड +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Aspose और Hugging Face के साथ OCR कैसे चलाएँ – पूर्ण मार्गदर्शिका +url: /hi/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Aspose और Hugging Face के साथ OCR कैसे चलाएँ – पूर्ण गाइड + +क्या आपने कभी सोचा है कि सेटिंग्स को घंटों तक समायोजित किए बिना स्कैन किए गए दस्तावेज़ों के ढेर पर **OCR कैसे चलाएँ**? आप अकेले नहीं हैं। कई प्रोजेक्ट्स में, डेवलपर्स को जल्दी से **स्कैन से टेक्स्ट पहचानना** होता है, फिर भी वे मॉडल डाउनलोड और पोस्ट‑प्रोसेसिंग में फँस जाते हैं। + +अच्छी खबर: यह ट्यूटोरियल आपको एक तैयार‑से‑चलाने वाला समाधान दिखाता है जो **Hugging Face मॉडल का उपयोग करता है**, इसे स्वचालित रूप से डाउनलोड करता है, और विराम चिह्न जोड़ता है ताकि आउटपुट ऐसा लगे जैसे इंसान ने लिखा हो। अंत तक, आपके पास एक स्क्रिप्ट होगी जो फ़ोल्डर में प्रत्येक इमेज को प्रोसेस करती है और प्रत्येक स्कैन के बगल में एक साफ़ `.txt` फ़ाइल रख देती है। + +## आपको क्या चाहिए + +- Python 3.8+ (कोड f‑strings का उपयोग करता है, इसलिए पुराने संस्करण काम नहीं करेंगे) +- `aspose-ocr` पैकेज (इंस्टॉल करने के लिए `pip install aspose-ocr`) +- पहली बार मॉडल डाउनलोड के लिए इंटरनेट एक्सेस +- इमेज स्कैन का फ़ोल्डर (`.png`, `.jpg`, या `.tif`) + +बस इतना ही—कोई अतिरिक्त बाइनरी नहीं, कोई मैन्युअल मॉडल सेटिंग नहीं। चलिए शुरू करते हैं। + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## चरण 1: Aspose OCR क्लासेस इम्पोर्ट करें और वातावरण सेट अप करें + +हम Aspose OCR लाइब्रेरी से आवश्यक क्लासेस को इम्पोर्ट करके शुरू करते हैं। सभी चीज़ों को पहले से इम्पोर्ट करने से स्क्रिप्ट साफ़ रहती है और लापता डिपेंडेंसीज़ को ढूँढ़ना आसान हो जाता है। + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*क्यों यह महत्वपूर्ण है*: `OcrEngine` भारी काम करता है, जबकि `AsposeAI` हमें बड़े भाषा मॉडल को प्लग‑इन करके smarter post‑processing करने देता है। यदि आप इम्पोर्ट छोड़ देते हैं, तो बाकी कोड भी कंपाइल नहीं होगा—इसलिए इसे मत भूलें। + +## चरण 2: GPU‑सजग Hugging Face मॉडल कॉन्फ़िगर करें + +अब हम Aspose को बताते हैं कि मॉडल कहाँ से लाना है और कितनी लेयर्स GPU पर चलनी चाहिए। `allow_auto_download="true"` फ़्लैग आपके लिए **मॉडल को स्वचालित रूप से डाउनलोड** करता है। + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **प्रो टिप**: यदि आपके पास GPU नहीं है, तो `gpu_layers=0` सेट करें। मॉडल CPU पर फॉल बैक हो जाएगा, जो धीमा है लेकिन फिर भी काम करता है। + +### Hugging Face मॉडल क्यों चुनें? + +Hugging Face एक विशाल संग्रह में तैयार‑से‑उपयोग LLMs रखता है। `Qwen/Qwen2.5-3B-Instruct-GGUF` की ओर इशारा करके, आपको एक कॉम्पैक्ट, इंस्ट्रक्शन‑ट्यून्ड मॉडल मिलता है जो विराम चिह्न जोड़ सकता है, स्पेसिंग सही कर सकता है, और छोटे OCR त्रुटियों को भी ठीक कर सकता है। यह व्यावहारिक रूप से **use hugging face model** का सार है। + +## चरण 3: AI इंजन को इनिशियलाइज़ करें और विराम चिह्न पोस्ट‑प्रोसेसिंग सक्षम करें + +AI इंजन सिर्फ फैंसी चैट के लिए नहीं है—यहाँ हम एक *punctuation adder* जोड़ते हैं जो कच्चे OCR आउटपुट को साफ़ करता है। + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*क्या हो रहा है?* `set_post_processor` कॉल एक बिल्ट‑इन पोस्ट‑प्रोसेसर को रजिस्टर करता है जो OCR इंजन समाप्त होने के बाद चलता है। यह कच्ची स्ट्रिंग लेता है और जहाँ‑जहाँ आवश्यक हो, कॉमा, पीरियड और बड़े अक्षर डालता है, जिससे अंतिम टेक्स्ट बहुत अधिक पठनीय बन जाता है। + +## चरण 4: OCR इंजन बनाएं और AI इंजन को अटैच करें + +AI इंजन को OCR इंजन से जोड़ने से हमें एक ही ऑब्जेक्ट मिलता है जो अक्षरों को पढ़ सकता है और परिणाम को पॉलिश भी कर सकता है। + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +यदि आप इस चरण को छोड़ देते हैं, तो OCR फिर भी काम करेगा, लेकिन आपको विराम चिह्न का बूस्ट नहीं मिलेगा—इसलिए आउटपुट शब्दों की धारा जैसा दिखेगा। + +## चरण 5: फ़ोल्डर में प्रत्येक इमेज को प्रोसेस करें + +यह ट्यूटोरियल का मुख्य भाग है। हम प्रत्येक इमेज पर लूप करते हैं, OCR चलाते हैं, पोस्ट‑प्रोसेसर लागू करते हैं, और साफ़ किया हुआ टेक्स्ट साइड‑बाय‑साइड `.txt` फ़ाइल में लिखते हैं। + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### क्या उम्मीद करें + +स्क्रिप्ट चलाने पर कुछ इस तरह का आउटपुट मिलता है: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +प्रत्येक लाइन आपको confidence स्कोर बताती है (एक त्वरित हेल्थ चेक) और `invoice_001.png.txt`, `receipt_2024.tif.txt` आदि बनाती है, जिसमें विराम चिह्नित, मानव‑पठनीय टेक्स्ट होता है। + +### एज केस और वैरिएशन्स + +- **Non‑English scans**: `hugging_face_repo_id` को एक मल्टीलेंग्वेज मॉडल (जैसे, `microsoft/Multilingual-LLM-GGUF`) में बदलें। +- **Large batches**: लूप को `concurrent.futures.ThreadPoolExecutor` में रैप करें ताकि पैरलल प्रोसेसिंग हो, लेकिन GPU मेमोरी लिमिट का ध्यान रखें। +- **Custom post‑processing**: यदि आपको डोमेन‑स्पेसिफिक क्लीनअप चाहिए (जैसे, इनवॉइस नंबर हटाना), तो `"punctuation_adder"` को अपनी स्क्रिप्ट से बदलें। + +## चरण 6: संसाधनों को साफ़ करें + +जब जॉब समाप्त हो जाता है, संसाधनों को मुक्त करने से मेमोरी लीक रोकता है, विशेषकर यदि आप इसे एक लंबी अवधि की सर्विस के अंदर चला रहे हैं। + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +इस चरण को नजरअंदाज करने से GPU मेमोरी लटक सकती है, जो बाद की रन को बाधित कर देगा। + +## पुनरावलोकन: OCR को एंड‑टू‑एंड कैसे चलाएँ + +सिर्फ कुछ लाइनों में, हमने दिखाया है **OCR कैसे चलाएँ** स्कैन के फ़ोल्डर पर, **Hugging Face मॉडल का उपयोग करें** जो पहली बार खुद डाउनलोड हो जाता है, और **स्कैन से टेक्स्ट पहचानें** जिसमें स्वचालित रूप से विराम चिह्न जोड़े जाते हैं। पूरा स्क्रिप्ट कॉपी‑पेस्ट, अपने पाथ्स समायोजित करने और चलाने के लिए तैयार है। + +## अगले कदम और संबंधित विषय + +- **Batch post‑processing**: तेज़ बुल्क हैंडलिंग के लिए `ocr_engine.run_batch_postprocessor` को एक्सप्लोर करें। +- **Alternative models**: यदि आपको OCR के साथ स्पीच‑टू‑टेक्स्ट चाहिए तो `openai/whisper` फ़ैमिली आज़माएँ। +- **Integration with databases**: निकाले गए टेक्स्ट को SQLite या Elasticsearch में स्टोर करें ताकि सर्चेबल आर्काइव बन सके। + +बिल्कुल प्रयोग करें—मॉडल बदलें, `gpu_layers` को ट्यून करें, या अपना पोस्ट‑प्रोसेसर जोड़ें। Aspose OCR की लचीलापन और Hugging Face के मॉडल हब का संयोजन इसे किसी भी दस्तावेज़‑डिजिटलीकरण प्रोजेक्ट के लिए एक बहुमुखी आधार बनाता है। + +--- + +*हैप्पी कोडिंग! यदि आपको कोई समस्या आती है, तो नीचे कमेंट करें या गहरी कॉन्फ़िगरेशन विकल्पों के लिए Aspose OCR दस्तावेज़ देखें।* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/hindi/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..e87b70bb5 --- /dev/null +++ b/ocr/hindi/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,207 @@ +--- +category: general +date: 2026-04-29 +description: Python का उपयोग करके छवि पर OCR करें, HuggingFace मॉडल को स्वचालित रूप + से डाउनलोड करें, और OCR टेक्स्ट को साफ़ करते हुए GPU मेमोरी को कुशलतापूर्वक मुक्त + करें। +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: hi +og_description: Python में इमेज पर OCR कैसे करें, HuggingFace मॉडल को स्वचालित रूप + से डाउनलोड करना, टेक्स्ट को साफ़ करना और GPU मेमोरी को मुक्त करना सीखें। +og_title: Python के साथ इमेज पर OCR करें – चरण-दर-चरण गाइड +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Python के साथ इमेज पर OCR करें – पूर्ण गाइड +url: /hi/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python के साथ इमेज पर OCR करें – पूर्ण गाइड + +क्या आपको कभी **perform OCR on image** फ़ाइलों पर OCR करना पड़ा लेकिन मॉडल‑डाउनलोड या GPU‑मेमोरी क्लीन‑अप चरण पर अटक गए? आप अकेले नहीं हैं—कई डेवलपर्स को यह समस्या पहली बार ऑप्टिकल कैरेक्टर रिकग्निशन को बड़े लैंग्वेज मॉडल के साथ मिलाते समय आती है। + +इस ट्यूटोरियल में हम एक सिंगल, एंड‑टू‑एंड समाधान के माध्यम से चलेंगे जो **downloads a HuggingFace model in Python**, Aspose OCR चलाता है, रॉ आउटपुट को साफ़ करता है, और अंत में **releases GPU memory Python** को पुनः प्राप्त करने देता है। अंत तक आपके पास एक तैयार‑टू‑रन स्क्रिप्ट होगी जो स्कैन की गई PNG को पॉलिश्ड, सर्चेबल टेक्स्ट में बदल देती है। + +> **What you’ll get:** एक पूर्ण, रन करने योग्य कोड सैंपल, प्रत्येक स्टेप के महत्व की व्याख्याएँ, सामान्य पिटफ़ॉल्स से बचने के टिप्स, और आपके प्रोजेक्ट्स के लिए पाइपलाइन को कैसे ट्यून करें, इसका एक झलक। + +--- + +## आपको क्या चाहिए + +- Python 3.9 या नया (उदाहरण 3.11 पर परीक्षण किया गया था) +- `aspose-ocr` पैकेज (`pip install aspose-ocr` के माध्यम से स्थापित करें) +- एक इंटरनेट कनेक्शन **download HuggingFace model python** चरण के लिए +- एक CUDA‑संगत GPU यदि आप गति वृद्धि चाहते हैं (वैकल्पिक लेकिन अनुशंसित) + +कोई अतिरिक्त सिस्टम‑लेवल डिपेंडेंसीज़ आवश्यक नहीं हैं; Aspose OCR इंजन सब कुछ बंडल करता है जो आपको चाहिए। + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Image alt text: “perform OCR on image – Aspose OCR आउटपुट AI सफ़ाई से पहले और बाद में”* + +--- + +## इमेज पर OCR करें – चरण‑दर‑चरण अवलोकन + +नीचे हम वर्कफ़्लो को लॉजिकल चंक्स में विभाजित करते हैं। प्रत्येक चंक का अपना हेडिंग है, ताकि AI असिस्टेंट्स जल्दी से उस भाग पर जा सकें जिसमें आप रुचि रखते हैं, और सर्च इंजन संबंधित कीवर्ड्स को इंडेक्स कर सकें। + +### 1. Python में HuggingFace मॉडल डाउनलोड करें + +पहला काम हमें एक लैंग्वेज मॉडल फ़ेच करना है जो रॉ OCR आउटपुट के पोस्ट‑प्रोसेसर के रूप में कार्य करेगा। Aspose OCR एक हेल्पर क्लास `AsposeAI` के साथ आता है जो HuggingFace हब से मॉडल को ऑटोमैटिकली पुल कर सकता है। + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Why this matters:** +- **download HuggingFace model python** – आप ज़िप फ़ाइलों या टोकन ऑथेंटिकेशन को मैन्युअली हैंडल करने से बचते हैं। +- `int8` क्वांटाइज़ेशन मॉडल को उसके मूल आकार के लगभग एक चौथाई तक छोटा कर देता है, जो तब महत्वपूर्ण होता है जब आपको बाद में **release GPU memory python** की आवश्यकता होती है। + +> **Pro tip:** तेज़ लोड टाइम्स के लिए `directory_model_path` को SSD पर रखें। + +--- + +### 2. AI हेल्पर को इनिशियलाइज़ करें और स्पेल‑चेकिंग सक्षम करें + +अब हम एक `AsposeAI` इंस्टेंस बनाते हैं और एक स्पेल‑करेक्टर पोस्ट‑प्रोसेसर अटैच करते हैं। यही वह जगह है जहाँ **clean OCR text python** जादू शुरू होता है। + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explanation:** +स्पेल‑करेक्टर OCR इंजन से प्रत्येक टोकन की जाँच करता है और `max_edits` द्वारा सीमित एडिट्स सुझाता है। यह छोटा बदलाव “rec0gn1tion” को “recognition” में बदल सकता है बिना भारी लैंग्वेज मॉडल के। + +--- + +### 3. AI हेल्पर को OCR इंजन में जोड़ें + +Aspose ने संस्करण 23.4 में एक नया मेथड पेश किया है जो आपको AI इंजन को सीधे OCR पाइपलाइन में प्लग करने देता है। + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Why we do it:** +AI हेल्पर को शुरुआती चरण में वायर करके, OCR इंजन वैकल्पिक रूप से मॉडल का उपयोग ऑन‑द‑फ्लाई सुधारों (जैसे लेआउट डिटेक्शन) के लिए कर सकता है। यह कोड को साफ़ रखता है—बाद में अलग पोस्ट‑प्रोसेसिंग लूप की ज़रूरत नहीं रहती। + +--- + +### 4. स्कैन की गई इमेज पर OCR करें + +यह वह कोर स्टेप है जो वास्तव में **perform OCR on image** फ़ाइलों को प्रोसेस करता है। `YOUR_DIRECTORY/input.png` को अपने स्कैन के पाथ से बदलें। + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +आम तौर पर रॉ आउटपुट में अजीब जगहों पर लाइन ब्रेक, गलत पहचान वाले कैरेक्टर्स, या स्ट्रे सिम्बॉल्स हो सकते हैं। इसलिए हमें अगले स्टेप की ज़रूरत है। + +**Expected raw output (example):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. AI पोस्ट‑प्रोसेसर के साथ Python में OCR टेक्स्ट साफ़ करें + +अब हम AI को गड़बड़ी साफ़ करने देते हैं। यह **clean OCR text python** प्रक्रिया का दिल है। + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Result you’ll see:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +ध्यान दें कि स्पेल‑करेक्टर ने “Th1s” → “This” को ठीक किया और स्ट्रे “4n” को हटा दिया। मॉडल स्पेसिंग को भी नॉर्मलाइज़ करता है, जो अक्सर तब समस्या बन जाता है जब आप टेक्स्ट को डाउनस्ट्रीम NLP पाइपलाइन में फीड करते हैं। + +--- + +### 6. Python में GPU मेमोरी रिलीज़ करें – क्लीन‑अप स्टेप्स + +जब आप काम खत्म कर लें, तो GPU रिसोर्सेज़ को फ्री करना एक अच्छी प्रैक्टिस है, विशेषकर यदि आप एक लोंग‑रनिंग सर्विस में कई OCR जॉब्स चला रहे हैं। + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**What happens under the hood:** +`free_resources()` मॉडल को GPU से अनलोड कर देता है, मेमोरी को CUDA ड्राइवर को वापस देता है। `dispose()` OCR इंजन के इंटरनल बफ़र्स को शट डाउन करता है। इन कॉल्स को स्किप करने से कुछ ही इमेजेज़ के बाद आउट‑ऑफ़‑मेमोरी एरर हो सकता है। + +> **Remember:** यदि आप लूप में बैच प्रोसेस करने की योजना बनाते हैं, तो प्रत्येक बैच के बाद क्लीन‑अप कॉल करें या अंत तक वही `ai_helper` री‑यूज़ करें और तब तक फ्री न करें। + +--- + +## बोनस: विभिन्न परिदृश्यों के लिए पाइपलाइन को ट्यून करना + +### मॉडल क्वांटाइज़ेशन समायोजित करना + +यदि आपके पास एक पावरफ़ुल GPU (जैसे RTX 4090) है और आप उच्च सटीकता चाहते हैं, तो `hugging_face_quantization` को `"fp16"` में बदलें और `gpu_layers` को `30` तक बढ़ाएँ। इससे मेमोरी अधिक खपत होगी, इसलिए आपको प्रत्येक बैच के बाद **release GPU memory python** अधिक आक्रामक रूप से करना पड़ेगा। + +### कस्टम स्पेल‑चेकर का उपयोग करना + +आप बिल्ट‑इन `spell_corrector` को एक कस्टम पोस्ट‑प्रोसेसर से बदल सकते हैं जो डोमेन‑स्पेसिफिक करेक्शन करता है (जैसे मेडिकल टर्मिनोलॉजी)। बस आवश्यक इंटरफ़ेस इम्प्लीमेंट करें और उसका नाम `set_post_processor` को पास करें। + +### कई इमेज की बैच प्रोसेसिंग + +OCR स्टेप्स को एक `for` लूप में रैप करें, `cleaned_result.text` को एक लिस्ट में कलेक्ट करें, और यदि आपके पास पर्याप्त GPU RAM है तो लूप के बाद ही `ai_helper.free_resources()` कॉल करें। इससे मॉडल को बार‑बार लोड करने का ओवरहेड कम हो जाता है। + +--- + +## निष्कर्ष + +हमने अभी दिखाया कि कैसे **perform OCR on image** फ़ाइलों को Python में किया जाए, स्वचालित रूप से **download a HuggingFace model**, **clean OCR text**, और काम खत्म होने पर सुरक्षित रूप से **release GPU memory** किया जाए। पूरा स्क्रिप्ट कॉपी‑पेस्ट करने के लिए तैयार है, और व्याख्याएँ आपको बड़े प्रोजेक्ट्स में इसे अनुकूलित करने का आत्मविश्वास देती हैं। + +अगले कदम? Qwen 2.5 मॉडल को एक बड़े LLaMA वैरिएंट से बदलें, विभिन्न पोस्ट‑प्रोसेसर के साथ प्रयोग करें, या साफ़ किए गए आउटपुट को सर्चेबल Elasticsearch इंडेक्स में इंटीग्रेट करें। संभावनाएँ अनंत हैं, और अब आपके पास एक ठोस आधार है जिस पर आप निर्माण कर सकते हैं। + +कोडिंग का आनंद लें, और आपकी OCR पाइपलाइन हमेशा साफ़ और मेमोरी‑फ्रेंडली रहे! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/hongkong/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..2413c8628 --- /dev/null +++ b/ocr/hongkong/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,215 @@ +--- +category: general +date: 2026-04-29 +description: 使用 Aspose OCR 於 Python 從 PDF 提取文字。了解批次 OCR PDF 處理、將掃描 PDF 轉換為文字,並處理低可信度頁面。 +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: zh-hant +og_description: 使用 Aspose OCR 於 Python 從 PDF 提取文字。本指南示範批次 OCR PDF 處理、將掃描 PDF 轉換為文字,以及處理低信心結果。 +og_title: 從 PDF 提取文字 – 使用 Python 進行 PDF OCR +tags: +- OCR +- Python +- PDF processing +title: 從 PDF 提取文字 – 使用 Python 進行 OCR +url: /zh-hant/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 從 PDF 提取文字 – 使用 Python 進行 OCR PDF + +有沒有曾經需要 **從 PDF 提取文字**,但檔案只是掃描圖像?你並不孤單——許多開發者在嘗試將 PDF 轉換成可搜尋的資料時都會碰到這個問題。好消息是?使用 Aspose OCR for Python,你可以用幾行程式碼將掃描的 PDF 文字轉換,甚至在有數十個檔案需要處理時執行 **批次 OCR PDF 處理**。 + +在本教學中,我們將逐步說明完整工作流程:設定函式庫、在單一 PDF 上執行 OCR、擴展為批次處理,以及處理低信心分頁讓你知道何時需要人工檢查。完成後,你將擁有一個可直接執行的腳本,能從任何掃描 PDF 中提取文字,並了解每一步背後的原因。 + +## 您需要的條件 + +在開始之前,請確保你已具備: + +- Python 3.8 或更新版本(程式碼使用 f‑strings,所以 3.6+ 也可運作,但建議使用 3.8+) +- Aspose OCR for Python 授權或免費試用金鑰(可從 Aspose 官方網站取得) +- 一個包含一個或多個欲處理掃描 PDF 的資料夾 +- 足夠的磁碟空間以存放產生的 *.txt* 報告 + +就這樣——不需要繁重的外部相依套件,也不需要 OpenCV 的複雜操作。Aspose OCR 引擎會為你完成繁重的工作。 + +## 設定環境 + +首先,從 PyPI 安裝 Aspose OCR 套件: + +```bash +pip install aspose-ocr +``` + +如果你有授權檔案 (`Aspose.OCR.lic`),請將它放在專案根目錄,並依以下方式啟用: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **專業提示:** 請將授權檔案排除於版本控制之外;將其加入 `.gitignore` 以避免意外外洩。 + +## 在單一 PDF 上執行 OCR + +現在讓我們從單一掃描 PDF 中提取文字。核心步驟如下: + +1. 建立一個 `OcrEngine` 實例。 +2. 指向 PDF 檔案。 +3. 為每一頁取得 `OcrResult`。 +4. 將純文字輸出寫入磁碟。 +5. 釋放引擎以釋放原生資源。 + +以下是完整、可執行的腳本: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**你會看到的結果:** 每一頁腳本會印出類似 `Page 1: confidence 97.45%` 的訊息。若某頁的信心分數低於 80 % 閾值,會顯示警告,提醒你 OCR 可能遺漏了字元。 + +### 為什麼這樣可行 + +- **`OcrEngine`** 是連接原生 Aspose OCR 函式庫的入口;它負責從影像前處理到字元辨識的全部工作。 +- **`extract_from_pdf`** 會自動將每頁 PDF 光柵化,因此你不必自行將 PDF 轉成影像。 +- **信心分數** 讓你能自動化品質檢查——在處理法律或醫療文件等對準確度要求極高的情境時尤為重要。 + +## 使用 Python 進行批次 OCR PDF 處理 + +大多數實務專案會處理多於一個檔案。讓我們將單檔腳本擴充為 **批次 OCR PDF 處理** 管線,遍歷目錄、處理每個 PDF,並將結果存入相對應的子資料夾。 + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### 此方式的好處 + +- **可擴充性:** 此函式一次走訪資料夾,為每個 PDF 建立專屬的輸出子資料夾。當文件數量達到數十甚至上百時,能保持結構整潔。 +- **可重用性:** `ocr_pdf_file` 可被其他腳本(例如 Web 服務)呼叫,因為它是一個純函式。 +- **錯誤處理:** 若輸入資料夾為空,腳本會印出友善訊息,避免靜默失敗。 + +## 轉換掃描 PDF 文字 – 處理邊緣情況 + +雖然上述程式碼能處理大多數 PDF,但你可能會遇到以下幾種特殊情況: + +| 情況 | 發生原因 | 緩解方式 | +|-----------|----------------|-----------------| +| **加密的 PDF** | PDF 受密碼保護。 | 將密碼傳遞給 `extract_from_pdf(pdf_path, password="yourPwd")`。 | +| **多語言文件** | Aspose OCR 預設為英文。 | 設定 `ocr_engine.language = "spa"` 以使用西班牙文,或提供語言清單以處理混合語言。 | +| **非常大的 PDF(>500 頁)** | 因為每頁都載入至記憶體,導致記憶體使用量激增。 | 使用 `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` 分段處理 PDF,並在迴圈中執行。 | +| **掃描品質差** | 低 DPI 或大量噪點會降低信心分數。 | 使用 `engine.image_preprocessing = True` 進行前處理,或透過 `engine.dpi = 300` 提高 DPI。 | + +> **注意:** 開啟影像前處理會顯著增加 CPU 耗時。如果你在執行夜間批次,請安排足夠時間或啟動獨立的工作者。 + +## 驗證輸出結果 + +腳本執行完畢後,你會看到類似以下的資料夾結構: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +打開任一 `.txt` 檔案,你應該會看到乾淨的 UTF‑8 編碼文字,與原始掃描內容相符。若出現亂碼,請再次確認 PDF 的語言設定,並確保機器已安裝正確的字型套件。 + +## 清理資源 + +Aspose OCR 依賴原生 DLL,完成工作後務必呼叫 `engine.dispose()`。遺忘此步驟會導致記憶體泄漏,尤其在長時間執行的批次作業中更為嚴重。 + +```python +# Always the last line of your script +engine.dispose() +``` + +## 完整端對端範例 + +將所有步驟整合起來,以下是一個單一 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/hongkong/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..53f891311 --- /dev/null +++ b/ocr/hongkong/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,275 @@ +--- +category: general +date: 2026-04-29 +description: 學習如何使用 Aspose OCR 在 Python 中辨識手寫文字。本分步指南展示如何高效提取手寫文字。 +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: zh-hant +og_description: 如何在 Python 中辨識手寫文字?跟隨本完整指南,使用 Aspose OCR 提取手寫文本,內含程式碼、技巧與邊緣案例處理。 +og_title: 如何在 Python 中辨識手寫字 – 完整教學 +tags: +- OCR +- Python +- HandwritingRecognition +title: 如何在 Python 中辨識手寫字 – 完整教學 +url: /zh-hant/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 如何在 Python 中辨識手寫文字 – 完整教學 + +有沒有在 Python 專案中需要 **how to recognize handwriting** 卻不知從何下手?你並不孤單——開發者常常會問:「我能從掃描的筆記中擷取文字嗎?」好消息是,現代的 OCR 函式庫讓這件事變得輕而易舉。在本指南中,我們將示範如何使用 Aspose OCR 來 **how to recognize handwriting**,同時教你如何可靠地 **extract handwritten text**。 + +我們會從安裝函式庫說起,直到微調信心門檻以應對雜亂的手寫文字。完成後,你將擁有一個可執行的腳本,能印出擷取的文字與整體信心分數——非常適合筆記應用、檔案保存工具,或單純滿足好奇心。無需事先的 OCR 經驗;只要具備基本的 Python 知識即可。 + +--- + +## 你需要的環境 + +- **Python 3.9+**(最新的穩定版效果最佳) +- **Aspose.OCR for Python via .NET** – 使用 `pip install aspose-ocr` 安裝 +- 一張 **handwritten image**(JPEG/PNG)你想處理的手寫圖片 +- 可選:使用虛擬環境以保持相依套件整潔 + +如果你已備妥上述項目,讓我們開始吧。 + +![手寫辨識範例](/images/handwritten-sample.jpg "手寫辨識範例") + +(替代文字:“how to recognize handwriting example showing a scanned handwritten note”) + +--- + +## 步驟 1 – 安裝並匯入 Aspose OCR 類別 + +首先,我們需要 OCR 引擎本身。Aspose 提供了簡潔的 API,能將印刷文字辨識與手寫模式分開。 + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*為什麼這很重要:* 匯入 `HandwritingMode` 讓我們告訴引擎我們正在處理 **handwritten text recognition python** 而非印刷文字,這能大幅提升對手寫筆劃的辨識準確度。 + +--- + +## 步驟 2 – 建立並設定 OCR 引擎 + +現在我們建立一個 `OcrEngine` 實例,並切換至手寫模式。你也可以調整信心門檻;較低的數值接受抖動的筆跡,較高的數值則要求較乾淨的輸入。 + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*小技巧:* 若你的筆記以 300 DPI 或更高解析度掃描,通常會得到較好的分數。對於低解析度的圖片,考慮先使用 Pillow 進行放大,再送入引擎。 + +--- + +## 步驟 3 – 準備圖片路徑 + +確保檔案路徑指向你要處理的圖片。相對路徑可正常使用,但絕對路徑可避免「找不到檔案」的意外。 + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*常見陷阱:* 在 Windows 上忘記對反斜線進行跳脫 (`C:\\folder\\image.jpg`)。使用原始字串 (`r"C:\folder\image.jpg"`) 可避免此問題。 + +--- + +## 步驟 4 – 執行辨識並取得結果 + +`recognize` 方法負責主要運算。它會回傳一個包含 `.text` 與 `.confidence` 屬性的物件。 + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**預期輸出(範例):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +如果信心分數低於 0.5,可能需要清理圖片(去除陰影、提升對比)或在步驟 2 中降低門檻。 + +--- + +## 步驟 5 – 清理資源 + +Aspose OCR 會佔用原生資源;呼叫 `dispose()` 可釋放這些資源,防止記憶體泄漏,特別是在迴圈中處理大量圖片時。 + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*為什麼要 dispose?* 在長時間執行的服務(例如接受上傳的 Flask API)中,若忘記釋放資源,系統記憶體會很快被耗盡。 + +--- + +## 完整腳本 – 一鍵執行 + +將所有步驟整合在一起,以下是一個可直接複製貼上執行的獨立腳本。 + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +將此檔案儲存為 `handwritten_ocr.py`,然後執行 `python handwritten_ocr.py`。若環境設定正確,將會在主控台看到擷取的文字。 + +--- + +## 處理邊緣案例與常見變化 + +### 低對比度圖片 +如果背景與墨水混合,請先提升對比度: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### 旋轉的筆記 +傾斜的筆記本頁面會影響辨識。可使用 Pillow 進行去斜校正: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### 多頁 PDF +Aspose OCR 也能處理 PDF 頁面,但必須先將每頁轉為圖片(例如使用 `pdf2image`)。之後使用相同的 `recognize_handwriting` 函式對圖片逐一迴圈處理。 + +--- + +## 提升 **Extract Handwritten Text** 效果的專業技巧 + +- **DPI 重要性:** 掃描時目標為 300 DPI 或更高。 +- **避免彩色背景:** 純白或淡灰色可產生最乾淨的輸出。 +- **批次處理:** 將函式包在 `for` 迴圈中,記錄每頁的信心分數;將低於門檻的結果捨棄,以維持高品質。 +- **語言支援:** Aspose OCR 支援多種語言;若僅使用英文,可設定 `engine.set_language("en")` 以最佳化。 + +--- + +## 常見問答 + +**這在 Linux 上可用嗎?** +是的——Aspose OCR 隨附 Windows、macOS 與 Linux 的原生二進位檔。只要安裝 pip 套件即可使用。 + +**如果我的手寫字非常草寫呢?** +嘗試降低信心門檻(例如 `0.5` 或甚至 `0.4`)。請注意這可能會產生更多雜訊,必要時可對輸出進行後處理(例如拼寫檢查)。 + +**我可以在網路服務中使用嗎?** +當然可以。`recognize_handwriting` 函式是無狀態的,非常適合用於 Flask 或 FastAPI 的端點。只要記得在每個請求後呼叫 `dispose()`,或使用 context manager。 + +--- + +## 結論 + +我們已完整說明如何在 Python 中 **how to recognize handwriting**,示範如何 **extract handwritten text**、調整信心設定,並處理低對比度或旋轉頁面等常見問題。上方的完整腳本已可直接執行,且模組化的函式便於整合至更大型的專案——無論是開發筆記應用、數位化檔案,或僅是嘗試 **handwritten ocr tutorial python** 技術。 + +接下來,你可以探索 **handwritten text recognition python** 以支援多語言筆記,或結合 OCR 與自然語言處理自動摘要會議紀要。可能性無限——快試試看,讓程式為手寫塗鴉注入生命。 + +祝程式開發順利,歡迎在留言區提出任何問題! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/hongkong/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..50793b475 --- /dev/null +++ b/ocr/hongkong/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,178 @@ +--- +category: general +date: 2026-04-29 +description: 學習如何對掃描檔執行 OCR,自動使用 Hugging Face 模型,並在數分鐘內使用 Aspose OCR 辨識掃描文字。 +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: zh-hant +og_description: 如何使用 Aspose OCR 於掃描檔執行 OCR,自動下載 Hugging Face 模型,並取得乾淨且有標點的文字。 +og_title: 如何使用 Aspose 與 Hugging Face 進行 OCR – 完整指南 +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: 如何使用 Aspose 與 Hugging Face 執行 OCR – 完整指南 +url: /zh-hant/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 如何使用 Aspose 與 Hugging Face 執行 OCR – 完整指南 + +有沒有想過 **如何在一堆掃描文件上執行 OCR**,卻不需要花上數小時調整設定?你並不孤單。在許多專案中,開發者需要快速 **從掃描中辨識文字**,卻常因模型下載與後處理卡關。 + +好消息:本教學示範一個即開即用的解決方案,**使用 Hugging Face 模型**,自動下載,並加入標點符號,使輸出看起來像是人類寫的。完成後,你將擁有一支腳本,能處理資料夾內的每張圖片,並在每個掃描檔旁產生乾淨的 `.txt` 檔案。 + +## 需要的環境 + +- Python 3.8+(程式碼使用 f‑string,較舊版本無法執行) +- `aspose-ocr` 套件(透過 `pip install aspose-ocr` 安裝) +- 首次下載模型時需要網路連線 +- 一個放置影像掃描檔的資料夾(`.png`、`.jpg` 或 `.tif`) + +就這些——不需要額外的二進位檔,也不需要手動調整模型。現在就開始吧。 + +![如何執行 OCR 示例](https://example.com/ocr-demo.png "如何執行 OCR 示例") + +## 步驟 1:匯入 Aspose OCR 類別並設定環境 + +我們先從 Aspose OCR 函式庫中匯入必要的類別。一次匯入全部可以讓腳本保持整潔,也方便快速發現缺少的相依性。 + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*為什麼這很重要*:`OcrEngine` 負責主要的文字辨識工作,而 `AsposeAI` 讓我們能接入大型語言模型,以進行更智慧的後處理。如果省略匯入,後續程式碼根本無法編譯——千萬別忘了。 + +## 步驟 2:設定支援 GPU 的 Hugging Face 模型 + +現在告訴 Aspose 從哪裡取得模型,以及有多少層要在 GPU 上執行。`allow_auto_download="true"` 旗標會自動為你 **下載模型**。 + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **小技巧**:若沒有 GPU,將 `gpu_layers=0`。模型會回退到 CPU,速度較慢但仍可正常運作。 + +### 為什麼選擇 Hugging Face 模型? + +Hugging Face 提供大量即用的 LLM。指向 `Qwen/Qwen2.5-3B-Instruct-GGUF` 後,你會得到一個小型、指令微調的模型,能加入標點、校正空格,甚至修正少量 OCR 錯誤。這正是 **使用 Hugging Face 模型** 的實際效益。 + +## 步驟 3:初始化 AI 引擎並啟用標點後處理 + +AI 引擎不只是用來聊天——在這裡我們掛上 *標點添加器*,把原始 OCR 輸出清理乾淨。 + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*發生了什麼事?* `set_post_processor` 會註冊內建的後處理器,於 OCR 引擎完成後執行。它會把原始字串插入逗號、句號與大寫字母,使最終文字更易閱讀。 + +## 步驟 4:建立 OCR 引擎並掛上 AI 引擎 + +將 AI 引擎連結到 OCR 引擎,我們就得到一個同時能讀取字元與潤飾結果的單一物件。 + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +如果跳過此步驟,OCR 仍能運作,但會失去標點增強——輸出會變成一長串沒有標點的文字。 + +## 步驟 5:處理資料夾內的每張影像 + +以下是本教學的核心。我們會遍歷每張影像、執行 OCR、套用後處理,最後把清理過的文字寫入同名的 `.txt` 檔案。 + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### 預期結果 + +執行腳本時會印出類似以下的資訊: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +每一行會顯示信心分數(快速健康檢查),並產生 `invoice_001.png.txt`、`receipt_2024.tif.txt` 等檔案,內含已加標點、可供人閱讀的文字。 + +### 邊緣案例與變化 + +- **非英文掃描**:將 `hugging_face_repo_id` 改為多語言模型(例如 `microsoft/Multilingual-LLM-GGUF`)。 +- **大量批次**:將迴圈包在 `concurrent.futures.ThreadPoolExecutor` 中以平行處理,但需留意 GPU 記憶體上限。 +- **自訂後處理**:若需要領域特定的清理(例如移除發票號碼),可將 `"punctuation_adder"` 換成自訂腳本。 + +## 步驟 6:釋放資源 + +工作結束後釋放資源可防止記憶體洩漏,特別是當你在長時間服務中執行此程式時。 + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +忽略此步驟可能會讓 GPU 記憶體卡住,進而影響後續執行。 + +## 小結:完整的 OCR 流程 + +只需幾行程式碼,我們就示範了 **如何在資料夾內的掃描文件上執行 OCR**、**使用會自動下載的 Hugging Face 模型**,以及 **自動加入標點的文字辨識**。完整腳本已可直接複製、調整路徑後執行。 + +## 後續步驟與相關主題 + +- **批次後處理**:探索 `ocr_engine.run_batch_postprocessor` 以加速大量處理。 +- **替代模型**:若需要語音轉文字,可嘗試 `openai/whisper` 系列。 +- **與資料庫整合**:將擷取的文字存入 SQLite 或 Elasticsearch,打造可搜尋的檔案庫。 + +盡情實驗吧——換模型、調整 `gpu_layers`,或加入自訂後處理器。Aspose OCR 結合 Hugging Face 模型中心的彈性,讓它成為任何文件數位化專案的多功能基礎。 + +--- + +*祝開發順利!若遇到問題,歡迎在下方留言或查閱 Aspose OCR 文件,了解更深入的設定選項。* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/hongkong/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..564649536 --- /dev/null +++ b/ocr/hongkong/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,204 @@ +--- +category: general +date: 2026-04-29 +description: 使用 Python 對圖像執行 OCR,自動下載 HuggingFace 模型,並在清理 OCR 文字的同時有效釋放 GPU 記憶體。 +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: zh-hant +og_description: 學習如何在 Python 中對影像執行 OCR、自动下載 HuggingFace 模型、清理文字並釋放 GPU 記憶體。 +og_title: 使用 Python 對圖像執行 OCR – 逐步指南 +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: 使用 Python 進行圖像 OCR – 完整指南 +url: /zh-hant/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 在 Python 中對圖像執行 OCR – 完整指南 + +是否曾需要**對圖像執行 OCR**但在模型下載或 GPU 記憶體清理階段卡住?你並非唯一——許多開發者在首次嘗試將光學字符識別與大型語言模型結合時都會碰到這個問題。 + +在本教學中,我們將逐步說明一個完整的端到端解決方案,該方案**在 Python 中下載 HuggingFace 模型**、執行 Aspose OCR、清理原始輸出,最後**釋放 Python 可回收的 GPU 記憶體**。完成後,你將擁有一個可直接執行的腳本,將掃描的 PNG 轉換為精緻且可搜尋的文字。 + +> **你將獲得:** 完整、可執行的程式碼範例、每個步驟重要性的說明、避免常見陷阱的技巧,以及如何為自己的專案微調管線的概覽。 + +--- + +## 需求環境 + +- Python 3.9 或更新版本(範例在 3.11 上測試) +- `aspose-ocr` 套件(透過 `pip install aspose-ocr` 安裝) +- 需要網路連線以下載 **在 Python 中下載 HuggingFace 模型** 步驟 +- 如果想要加速,請使用相容 CUDA 的 GPU(非必須但建議) + +不需要額外的系統層級相依性;Aspose OCR 引擎已捆綁所有必要的元件。 + +![對圖像執行 OCR 範例](image.png "使用 Aspose OCR 與 LLM 後處理器執行 OCR 的範例") + +*圖片說明文字:“對圖像執行 OCR – Aspose OCR 在 AI 清理前後的輸出”* + +--- + +## 執行 OCR 步驟概覽 + +以下我們將工作流程拆分為邏輯區塊。每個區塊都有自己的標題,讓 AI 助手能快速跳至你感興趣的部分,搜尋引擎也能索引相關關鍵字。 + +### 1. 在 Python 中下載 HuggingFace 模型 + +我們首先需要取得一個語言模型,作為原始 OCR 輸出的後處理器。Aspose OCR 附帶一個名為 `AsposeAI` 的輔助類別,能自動從 HuggingFace hub 下載模型。 + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**為什麼這很重要:** +- **在 Python 中下載 HuggingFace 模型** – 你可以避免手動處理 zip 檔或令牌驗證。 +- 使用 `int8` 量化可將模型縮減至原始大小約四分之一,這在之後需要**釋放 Python GPU 記憶體**時至關重要。 + +> **專業提示:** 將 `directory_model_path` 放在 SSD 上以加快載入速度。 + +--- + +### 2. 初始化 AI 輔助工具並啟用拼寫檢查 + +現在我們建立 `AsposeAI` 實例並附加拼寫校正後處理器。這就是 **在 Python 中清理 OCR 文字** 魔法的開始。 + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**說明:** +拼寫校正器會檢查 OCR 引擎的每個 token,並根據 `max_edits` 提出編輯建議。這個小調整即可將 “rec0gn1tion” 轉為 “recognition”,而不需要大型語言模型。 + +--- + +### 3. 將 AI 輔助工具掛接至 OCR 引擎 + +Aspose 在 23.4 版中引入了一個新方法,允許你直接將 AI 引擎插入 OCR 流程。 + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**為什麼這麼做:** +提前接入 AI 輔助工具後,OCR 引擎可以選擇性地使用模型進行即時改進(例如版面偵測)。同時保持程式碼整潔——之後不需要額外的後處理迴圈。 + +--- + +### 4. 對掃描圖像執行 OCR + +以下是實際**對圖像執行 OCR**的核心步驟。將 `YOUR_DIRECTORY/input.png` 替換為你自己的掃描檔案路徑。 + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +典型的原始輸出可能在奇怪的位置出現換行、錯誤辨識的字元或雜散符號。這就是為什麼需要下一步。 + +**預期的原始輸出(範例):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. 使用 AI 後處理器在 Python 中清理 OCR 文字 + +現在讓 AI 清理這些雜訊。這就是 **在 Python 中清理 OCR 文字** 流程的核心。 + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**你將看到的結果:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +請注意拼寫校正器將 “Th1s” 修正為 “This”,並移除雜散的 “4n”。模型同時會正規化空格,這在之後將文字輸入下游 NLP 流程時常是個痛點。 + +--- + +### 6. 在 Python 中釋放 GPU 記憶體 – 清理步驟 + +完成後,釋放 GPU 資源是良好實踐,特別是當你在長時間服務中執行多個 OCR 任務時。 + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**背後發生的事:** +`free_resources()` 從 GPU 卸載模型,將記憶體歸還給 CUDA 驅動程式。`dispose()` 關閉 OCR 引擎的內部緩衝區。若省略這些呼叫,僅在處理少量圖像後就可能出現記憶體不足錯誤。 + +> **請記住:** 若打算在迴圈中批次處理,請在每個批次後呼叫清理,或在最後才釋放 `ai_helper`,以避免過早釋放。 + +--- + +## 加分項:為不同情境微調管線 + +### 調整模型量化 + +如果你擁有強大的 GPU(例如 RTX 4090)且想要更高的準確度,請將 `hugging_face_quantization` 改為 `"fp16"`,並將 `gpu_layers` 提升至 `30`。這會消耗更多記憶體,因此需要在每個批次後更積極地**釋放 Python GPU 記憶體**。 + +### 使用自訂拼寫檢查器 + +你可以將內建的 `spell_corrector` 替換為自訂的後處理器,以執行領域特定的校正(例如醫學術語)。只需實作所需介面,並將其名稱傳遞給 `set_post_processor` 即可。 + +### 批次處理多張圖像 + +將 OCR 步驟包在 `for` 迴圈中,將 `cleaned_result.text` 收集到列表,若 GPU 記憶體足夠,則僅在迴圈結束後呼叫 `ai_helper.free_resources()`。這可減少重複載入模型的開銷。 + +--- + +## 結論 + +我們剛剛示範了如何在 Python 中**對圖像執行 OCR**,自動**下載 HuggingFace 模型**、**清理 OCR 文字**,並在完成後安全**釋放 GPU 記憶體**。完整腳本已可直接複製貼上,說明則提供了將其套用至更大型專案的信心。 + +接下來的步驟?嘗試將 Qwen 2.5 模型換成更大的 LLaMA 變種,實驗不同的後處理器,或將清理後的輸出整合至可搜尋的 Elasticsearch 索引。可能性無窮,而你已擁有堅實的基礎可供構建。 + +祝編程愉快,願你的 OCR 管線永遠乾淨且記憶體友好! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/hungarian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..e7153f8c5 --- /dev/null +++ b/ocr/hungarian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Szöveg kinyerése PDF‑ből az Aspose OCR használatával Pythonban. Tanulja + meg a kötegelt OCR PDF feldolgozást, a beolvasott PDF szöveg konvertálását, és az + alacsony megbízhatóságú oldalak kezelését. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: hu +og_description: Szöveg kinyerése PDF‑ből az Aspose OCR‑rel Pythonban. Ez az útmutató + bemutatja a kötegelt OCR PDF‑feldolgozást, a beolvasott PDF‑szöveg konvertálását + és az alacsony bizalomú eredmények kezelését. +og_title: Szöveg kinyerése PDF-ből – OCR PDF Python-nal +tags: +- OCR +- Python +- PDF processing +title: Szöveg kinyerése PDF-ből – OCR PDF Python-nal +url: /hu/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Szöveg kinyerése PDF-ből – OCR PDF Pythonban + +Valaha szükséged volt **szöveg kinyerésére PDF-ből**, de a fájl csak egy beolvasott kép? Nem vagy egyedül – sok fejlesztő ütközik ebbe a falba, amikor a PDF-eket kereshető adatokká szeretnék alakítani. A jó hír? Az Aspose OCR for Python segítségével néhány sorban átalakíthatod a beolvasott PDF szövegét, és akár **csoportos OCR PDF feldolgozást** is futtathatsz, ha tucatnyi fájlt kell kezelned. + +Ebben az útmutatóban végigvezetünk a teljes munkafolyamaton: a könyvtár beállítása, OCR futtatása egyetlen PDF-en, a folyamat skálázása csoportos feldolgozásra, valamint az alacsony bizalomú oldalakkal való foglalkozás, hogy tudd, mikor szükséges a kézi felülvizsgálat. A végére egy kész‑futtatható szkriptet kapsz, amely bármely beolvasott PDF-ből kinyeri a szöveget, és megérted az egyes lépések okát. + +## Szükséged lesz + +Mielőtt belemerülnénk, győződj meg róla, hogy rendelkezel: + +- Python 3.8 vagy újabb (a kód f‑stringeket használ, így a 3.6+ működik, de a 3.8+ ajánlott) +- Aspose OCR for Python licenc vagy egy ingyenes próbaverzió kulcs (a Aspose weboldaláról szerezhető be) +- Egy mappa egy vagy több beolvasott PDF-fájllal, amelyet feldolgozni szeretnél +- Mérsékelt mennyiségű lemezterület a generált *.txt* jelentésekhez + +Ennyi—nincs nehéz külső függőség, nincs OpenCV akrobácia. Az Aspose OCR motor elvégzi a nehéz munkát helyetted. + +## A környezet beállítása + +Először telepítsd az Aspose OCR csomagot a PyPI‑ról: + +```bash +pip install aspose-ocr +``` + +Ha rendelkezel licencfájllal (`Aspose.OCR.lic`), helyezd a projekt gyökerébe, és aktiváld a következő módon: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tipp:** Tartsd a licencfájlt a verziókezelésen kívül; add hozzá a `.gitignore`‑hoz, hogy elkerüld a véletlen kiszivárgást. + +## OCR végrehajtása egyetlen PDF-en + +Most vonjunk ki szöveget egyetlen beolvasott PDF-ből. A fő lépések a következők: + +1. Hozz létre egy `OcrEngine` példányt. +2. Add meg a PDF fájlt. +3. Szerezd meg az `OcrResult`‑ot minden oldalhoz. +4. Írd a sima szöveges kimenetet a lemezre. +5. Felszabadítsd a motort a natív erőforrások felszabadításához. + +Itt a teljes, futtatható szkript: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Ami látható:** Minden oldalnál a szkript valami ilyesmit ír ki: `Page 1: confidence 97.45%`. Ha egy oldal a 80 % küszöb alá esik, figyelmeztetés jelenik meg, jelezve, hogy az OCR esetleg kihagyott karaktereket. + +### Miért működik ez + +- **`OcrEngine`** a natív Aspose OCR könyvtár kapuja; mindent kezel a képelőfeldolgozástól a karakterfelismerésig. +- **`extract_from_pdf`** automatikusan rasterizálja a PDF minden oldalát, így neked nem kell a PDF-et képekké konvertálni. +- **A bizalmi pontszámok** lehetővé teszik a minőség-ellenőrzés automatizálását – kritikus, ha jogi vagy orvosi dokumentumokat dolgozol fel, ahol a pontosság számít. + +## Csoportos OCR PDF feldolgozás Pythonban + +A legtöbb valós projekt több fájlt is érint. Bővítsük a egyfájlos szkriptet egy **csoportos OCR PDF feldolgozó** csővezetékké, amely bejár egy könyvtárat, feldolgozza minden PDF-et, és az eredményeket egy megfelelő alkönyvtárba menti. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Hogyan segít ez + +- **Skálázhatóság:** A függvény egyszer bejárja a mappát, minden PDF-hez létrehoz egy dedikált kimeneti alkönyvtárat. Ez rendezetten tartja a dolgokat, ha tucatnyi dokumentumod van. +- **Újrahasználhatóság:** A `ocr_pdf_file` más szkriptekből (pl. egy webszolgáltatásból) is meghívható, mivel tiszta függvény. +- **Hibakezelés:** A szkript barátságos üzenetet ír ki, ha a bemeneti mappa üres, így elkerülve a csendes hibát. + +## Beolvasott PDF szöveg konvertálása – Szélsőséges esetek kezelése + +Bár a fenti kód a legtöbb PDF-re működik, előfordulhatnak néhány furcsaságok: + +| Helyzet | Miért fordul elő | Hogyan lehet enyhíteni | +|-----------|----------------|-----------------| +| **Titkosított PDF-ek** | A PDF jelszóval védett. | Add meg a jelszót a `extract_from_pdf(pdf_path, password="yourPwd")` hívásban. | +| **Többnyelvű dokumentumok** | Az Aspose OCR alapértelmezés szerint angol. | Állítsd be `ocr_engine.language = "spa"` spanyolhoz, vagy adj meg egy listát vegyes nyelvekhez. | +| **Nagyon nagy PDF-ek (>500 oldal)** | A memóriahasználat megugrik, mivel minden oldal RAM-ba töltődik. | A PDF-et darabokban dolgozd fel a `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` használatával, és ismételd a ciklust. | +| **Rossz minőségű beolvasás** | Alacsony DPI vagy erős zaj csökkenti a bizalmat. | Előfeldolgozd a PDF-et a `engine.image_preprocessing = True` beállítással, vagy növeld a DPI-t a `engine.dpi = 300` segítségével. | + +> **Figyelem:** A képelőfeldolgozás bekapcsolása jelentősen növelheti a CPU időt. Ha éjszakai batch-et futtatsz, ütemezz elegendő időt, vagy indíts egy külön munkavégzőt. + +## A kimenet ellenőrzése + +A szkript befejezése után egy ilyen mappaszerkezetet találsz: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Nyiss meg egy `.txt` fájlt; tiszta, UTF‑8 kódolású szöveget kell látnod, amely tükrözi az eredeti beolvasott tartalmat. Ha torz karaktereket látsz, ellenőrizd a PDF nyelvi beállításait, és győződj meg róla, hogy a megfelelő betűkészletek telepítve vannak a gépen. + +## Erőforrások tisztítása + +Az Aspose OCR natív DLL-ekre támaszkodik, ezért elengedhetetlen a `engine.dispose()` meghívása a munka befejezése után. Ennek elhagyása memória szivárgáshoz vezethet, különösen hosszú ideig futó batch feladatoknál. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Teljes vég‑től‑végig példa + +Mindent összevonva, itt egy egyetlen + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/hungarian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..e759c56b0 --- /dev/null +++ b/ocr/hungarian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,279 @@ +--- +category: general +date: 2026-04-29 +description: Tanulja meg, hogyan ismerje fel a kézírást Pythonban az Aspose OCR-rel. + Ez a lépésről‑lépésre útmutató bemutatja, hogyan lehet hatékonyan kinyerni a kézírásos + szöveget. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: hu +og_description: Hogyan ismerjük fel a kézírást Pythonban? Kövesd ezt a teljes útmutatót + a kézírásos szöveg kinyeréséhez az Aspose OCR használatával, kóddal, tippekkel és + szélsőséges esetek kezelésével. +og_title: Hogyan ismerjünk fel kézírást Pythonban – Teljes útmutató +tags: +- OCR +- Python +- HandwritingRecognition +title: Hogyan ismerjünk fel kézírást Pythonban – Teljes útmutató +url: /hu/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hogyan ismerjünk fel kézírást Pythonban – Teljes útmutató + +Valaha szükséged volt **hogyan ismerjünk fel kézírást** egy Python projektben, de nem tudtad, hol kezdj? Nem vagy egyedül – a fejlesztők gyakran kérdezik: „Kivonhatok szöveget egy beolvasott jegyzetből?” A jó hír, hogy a modern OCR könyvtárak ezt gyerekjátékra változtatják. Ebben az útmutatóban végigvezetünk a **kézírás felismerésének** folyamatán az Aspose OCR használatával, és megtanulod, hogyan **vonj ki kézírásos szöveget** megbízhatóan. + +Mindent lefedünk a könyvtár telepítésétől a megbízhatósági küszöbök finomhangolásáig a kusza, folyó írásokhoz. A végére egy futtatható szkriptet kapsz, amely kiírja a kinyert szöveget és egy általános megbízhatósági pontszámot – tökéletes jegyzetkészítő alkalmazásokhoz, archiváló eszközökhöz, vagy egyszerűen csak a kíváncsiság kielégítéséhez. Korábbi OCR tapasztalat nem szükséges; elegendő az alap Python tudás. + +--- + +## Amire szükséged lesz + +- **Python 3.9+** (a legújabb stabil verzió a legjobb) +- **Aspose.OCR for Python via .NET** – telepítsd a `pip install aspose-ocr` paranccsal +- Egy **kézírásos kép** (JPEG/PNG), amelyet fel szeretnél dolgozni +- Opcionálisan: egy virtuális környezet a függőségek rendezett tartásához + +Ha ezek megvannak, merüljünk el benne. + +![Kézírás felismerésének példája](/images/handwritten-sample.jpg "Kézírás felismerésének példája") + +*(Alt szöveg: “kézírás felismerésének példája, amely egy beolvasott kézírásos jegyzetet mutat”)* + +--- + +## 1. lépés – Aspose OCR osztályok telepítése és importálása + +Először is szükségünk van magára az OCR motorra. Az Aspose egy tiszta API-t biztosít, amely elválasztja a nyomtatott szöveg felismerését a kézírásos módtól. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Miért fontos:* A `HandwritingMode` importálásával jelezhetjük a motor számára, hogy **kézírásos szövegfelismerés python**-ról van szó, nem nyomtatott szövegről, ami drámaian javítja a pontosságot a folyó írások esetén. + +--- + +## 2. lépés – OCR motor létrehozása és konfigurálása + +Most elindítunk egy `OcrEngine` példányt, és átváltjuk kézírásos módra. A megbízhatósági küszöböt is beállíthatod; az alacsonyabb értékek engedélyezik a remegő írást, a magasabb értékek tisztább bemenetet igényelnek. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Pro tipp:* Ha a jegyzeteid 300 DPI vagy magasabb felbontásban vannak beolvasva, általában jobb pontszámot kapsz. Alacsony felbontású képek esetén fontold meg a Pillow‑al történő felméretezést, mielőtt a motorba adod őket. + +--- + +## 3. lépés – Kép útvonalának előkészítése + +Győződj meg róla, hogy a fájl útvonal a feldolgozni kívánt képre mutat. Relatív útvonalak is működnek, de az abszolút útvonalak elkerülik a „fájl nem található” meglepetéseket. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Gyakori hibapont:* Elfelejteni a visszaperjelek escape‑elését Windows-on (`C:\\folder\\image.jpg`). A raw stringek használata (`r"C:\folder\image.jpg"`) megkerüli ezt a problémát. + +--- + +## 4. lépés – Felismerés futtatása és eredmények rögzítése + +A `recognize` metódus végzi a nehéz munkát. Egy objektumot ad vissza, amelynek `.text` és `.confidence` tulajdonságai vannak. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Várható kimenet (példa):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Ha a megbízhatóság 0,5 alá esik, előfordulhat, hogy tisztítani kell a képet (árnyékok eltávolítása, kontraszt növelése) vagy alacsonyabb küszöböt kell beállítani a 2. lépésben. + +--- + +## 5. lépés – Erőforrások felszabadítása + +Az Aspose OCR natív erőforrásokat tart fenn; a `dispose()` meghívása felszabadítja ezeket és megakadályozza a memória szivárgást, különösen ha sok képet dolgozol fel egy ciklusban. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Miért kell felszabadítani?* Hosszú ideig futó szolgáltatásokban (például egy Flask API, amely feltöltéseket fogad) az erőforrások felszabadításának elhagyása gyorsan kimerítheti a rendszer memóriáját. + +--- + +## Teljes szkript – Egy kattintásos futtatás + +Mindent összevonva, itt egy önálló szkript, amelyet másolhatsz és futtathatsz. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Mentsd el `handwritten_ocr.py` néven, és futtasd a `python handwritten_ocr.py` parancsot. Ha minden helyesen van beállítva, a kinyert szöveg megjelenik a konzolon. + +--- + +## Szélsőséges esetek és gyakori variációk kezelése + +### Alacsony kontrasztú képek +Ha a háttér átjár a tintába, először növeld a kontrasztot: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Elforgatott jegyzetek +A ferde füzetoldal megzavarhatja a felismerést. Használd a Pillow‑t a kiegyenesítéshez: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Többoldalas PDF-ek +Az Aspose OCR képes PDF oldalakat is kezelni, de előbb minden oldalt képpé kell konvertálni (például a `pdf2image` használatával). Ezután a képeken ugyanazzal a `recognize_handwriting` függvénnyel iterálhatsz. + +--- + +## Pro tippek a jobb **Extract Handwritten Text** eredményekhez + +- **A DPI számít:** A beolvasáskor célozz 300 DPI vagy magasabb felbontást. +- **Kerüld a színes háttereket:** A tiszta fehér vagy világosszürke a legjobb kimenetet adja. +- **Kötegelt feldolgozás:** Csomagold a függvényt egy `for` ciklusba, és naplózd minden oldal megbízhatóságát; a küszöbnél alacsonyabb eredményeket dobd el a magas minőség fenntartása érdekében. +- **Nyelvtámogatás:** Az Aspose OCR több nyelvet támogat; állítsd be `engine.set_language("en")`-t az angolra optimalizáláshoz. + +--- + +## Gyakran ismételt kérdések + +**Működik ez Linuxon?** +Igen – az Aspose OCR natív binárisokkal érkezik Windows, macOS és Linux számára. Csak telepítsd a pip csomagot, és már használhatod. + +**Mi van, ha a kézírásom rendkívül folyó?** +Próbáld meg alacsonyabbra állítani a megbízhatósági küszöböt (`0.5` vagy akár `0.4`). Vedd figyelembe, hogy ez több zajt hozhat, ezért szükség esetén utófeldolgozd a kimenetet (például helyesírás-ellenőrzéssel). + +**Használhatom ezt webszolgáltatásban?** +Természetesen. A `recognize_handwriting` függvény állapotmentes, így tökéletes Flask vagy FastAPI végpontokhoz. Ne felejtsd el minden kérés után meghívni a `dispose()`-t, vagy használj context managert. + +--- + +## Összegzés + +Áttekintettük, hogyan **ismerhetünk fel kézírást** Pythonban a kezdetektől a végéig, megmutatva, hogyan **vonhatsz ki kézírásos szöveget**, hogyan állíthatod be a megbízhatósági paramétereket, és hogyan kezelheted a gyakori buktatókat, mint az alacsony kontraszt vagy az elforgatott oldalak. A fenti teljes szkript készen áll a futtatásra, és a moduláris függvény könnyű integrációt tesz lehetővé nagyobb projektekbe – legyen szó jegyzetkészítő alkalmazásról, archívumok digitalizálásáról vagy egyszerűen csak a **handwritten ocr tutorial python** technikák kísérletezéséről. + +A következő lépésben felfedezheted a **handwritten text recognition python** lehetőségeit többnyelvű jegyzetekhez, vagy kombinálhatod az OCR-t természetes nyelvfeldolgozással, hogy automatikusan összefoglalja a megbeszélések jegyzeteit. A lehetőségek végtelenek – próbáld ki, és hagyd, hogy a kódod életre keltse a firkókat. + +Boldog kódolást, és nyugodtan tedd fel kérdéseidet a megjegyzésekben! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/hungarian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..464099438 --- /dev/null +++ b/ocr/hungarian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,183 @@ +--- +category: general +date: 2026-04-29 +description: Tanulja meg, hogyan futtathat OCR-t a szkenneken, automatikusan használhatja + a Hugging Face modellt, és percek alatt felismerheti a szkennelésből származó szöveget + az Aspose OCR-rel. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: hu +og_description: Hogyan futtassunk OCR-t beolvasott képeken az Aspose OCR használatával, + automatikusan töltsünk le egy Hugging Face modellt, és kapjunk tiszta, írásjelekkel + ellátott szöveget. +og_title: Hogyan futtassunk OCR-t az Aspose és a Hugging Face segítségével – Teljes + útmutató +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Hogyan futtassunk OCR-t az Aspose és a Hugging Face segítségével – Teljes útmutató +url: /hu/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hogyan futtassunk OCR-t az Aspose & Hugging Face segítségével – Teljes útmutató + +Gondolkodtál már azon, **hogyan futtassunk OCR-t** egy halom beolvasott dokumentumon anélkül, hogy órákat töltenél a beállítások finomhangolásával? Nem vagy egyedül. Sok projektben a fejlesztőknek gyorsan kell **szöveget felismerni a beolvasott képekről**, de elakadnak a modell letöltéseknél és az utófeldolgozásnál. + +Jó hír: ez a tutorial egy azonnal futtatható megoldást mutat be, amely **Hugging Face modellt használ**, automatikusan letölti, és írásjeleket ad hozzá, hogy a kimenet úgy hangozzon, mintha egy ember írta volna. A végére egy szkriptet kapsz, amely egy mappában lévő minden képet feldolgoz, és egy tiszta `.txt` fájlt helyez el a beolvasott fájl mellé. + +## Amire szükséged lesz + +- Python 3.8+ (a kód f‑stringeket használ, ezért a régebbi verziók nem elegendőek) +- `aspose-ocr` csomag (telepítsd a `pip install aspose-ocr` paranccsal) +- Internetkapcsolat az első modellletöltéshez +- Egy mappa képes beolvasott fájlokkal (`.png`, `.jpg`, vagy `.tif`) + +Ennyi—nincsenek extra binárisok, nincs kézi modellkezelés. Merüljünk el benne. + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## 1. lépés: Aspose OCR osztályok importálása és a környezet beállítása + +Először beolvassuk a szükséges osztályokat az Aspose OCR könyvtárból. Az összes importálása előre rendezi a szkriptet, és megkönnyíti a hiányzó függőségek észlelését. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Miért fontos*: `OcrEngine` végzi a nehéz munkát, míg `AsposeAI` lehetővé teszi, hogy egy nagy nyelvi modellt csatlakoztassunk az intelligensebb utófeldolgozáshoz. Ha kihagyod az importálást, a kód többi része nem is fordul le – ezért ne felejtsd el. + +## 2. lépés: GPU‑tudatos Hugging Face modell konfigurálása + +Most megadjuk az Aspose-nak, hogy hol töltse le a modellt, és hány réteg fusson a GPU-n. A `allow_auto_download="true"` jelző automatikusan **letölti a modellt** helyetted. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Pro tipp**: Ha nincs GPU-d, állítsd be a `gpu_layers=0` értéket. A modell CPU-ra vált vissza, ami lassabb, de még mindig működik. + +### Miért válassz Hugging Face modellt? + +A Hugging Face hatalmas gyűjteményt kínál azonnal használható LLM-ekből. A `Qwen/Qwen2.5-3B-Instruct-GGUF` hivatkozásával egy kompakt, instrukcióra finomhangolt modellt kapsz, amely képes írásjeleket hozzáadni, a szóközöket javítani, sőt kisebb OCR hibákat is kijavítani. Ez a **use hugging face model** gyakorlati lényege. + +## 3. lépés: AI motor inicializálása és írásjel utófeldolgozás engedélyezése + +Az AI motor nem csak a csinos csevegéshez való—itt egy *írásjel hozzáadó* csatolásával tisztítjuk meg a nyers OCR kimenetet. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Mi történik?* A `set_post_processor` hívás regisztrál egy beépített utófeldolgozót, amely az OCR motor befejezése után fut. A nyers szöveget vesszőkkel, pontokkal és nagybetűkkel egészíti ki a megfelelő helyeken, így a végső szöveg sokkal olvashatóbb lesz. + +## 4. lépés: OCR motor létrehozása és az AI motor csatolása + +Az AI motor OCR motorhoz való csatlakoztatása egyetlen objektumot ad, amely képes karaktereket olvasni és a végeredményt finomítani. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Ha kihagyod ezt a lépést, az OCR még mindig működni fog, de elveszíted az írásjel javítást – így a kimenet egy szavakból álló folyamban fog megjelenni. + +## 5. lépés: Minden kép feldolgozása egy mappában + +Itt van a tutorial szíve. Végig iterálunk minden képen, futtatjuk az OCR-t, alkalmazzuk az utófeldolgozót, és a megtisztított szöveget egy mellékelt `.txt` fájlba írjuk. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Mire számíthatsz + +A szkript futtatása valami ilyesmit ír ki: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Minden sor megmutatja a biztonsági pontszámot (egy gyors állapotellenőrzés), és létrehozza a `invoice_001.png.txt`, `receipt_2024.tif.txt` stb. fájlokat, amelyek írásjelezett, ember által olvasható szöveget tartalmaznak. + +### Szélsőséges esetek és variációk + +- **Nem‑angol beolvasott képek**: Cseréld a `hugging_face_repo_id`-t egy többnyelvű modellre (pl. `microsoft/Multilingual-LLM-GGUF`). +- **Nagy kötegek**: Tedd a ciklust egy `concurrent.futures.ThreadPoolExecutor`-be a párhuzamos feldolgozáshoz, de vedd figyelembe a GPU memória korlátait. +- **Egyedi utófeldolgozás**: Cseréld a `"punctuation_adder"`-t a saját scriptedre, ha domain‑specifikus tisztításra van szükség (pl. számlaszámok eltávolítása). + +## 6. lépés: Erőforrások felszabadítása + +Amikor a feladat befejeződik, az erőforrások felszabadítása megakadályozza a memória szivárgásokat, ami különösen fontos, ha ezt egy hosszú élettartamú szolgáltatásban futtatod. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Ennek a lépésnek a mellőzése GPU memória maradványt hagyhat, ami megzavarhatja a későbbi futásokat. + +## Összefoglalás: Hogyan futtassunk OCR-t vég‑től‑végig + +Csak néhány sorban bemutattuk, **hogyan futtassunk OCR-t** egy beolvasott fájlok mappáján, **használjunk Hugging Face modellt**, amely az első futtatáskor magát letölti, és **szöveget ismerjen fel a beolvasott képekről** automatikusan hozzáadott írásjelekkel. A teljes szkript készen áll a másolás‑beillesztésre, az útvonalak módosítására és a futtatásra. + +## Következő lépések és kapcsolódó témák + +- **Kötegelt utófeldolgozás**: Fedezd fel a `ocr_engine.run_batch_postprocessor`-t a még gyorsabb tömeges kezeléshez. +- **Alternatív modellek**: Próbáld ki a `openai/whisper` családot, ha beszédből‑szöveg átalakításra is szükséged van az OCR mellett. +- **Integráció adatbázisokkal**: Tárold a kinyert szöveget SQLite vagy Elasticsearch adatbázisban, hogy kereshető archívumot hozz létre. + +Nyugodtan kísérletezz—cseréld ki a modellt, módosítsd a `gpu_layers`-t, vagy adj hozzá saját utófeldolgozót. Az Aspose OCR rugalmassága a Hugging Face modellközponttal együtt egy sokoldalú alapot biztosít bármely dokumentum‑digitalizációs projekthez. + +--- + +*Boldog kódolást! Ha elakadsz, hagyj egy megjegyzést alább, vagy nézd meg az Aspose OCR dokumentációt a mélyebb konfigurációs beállításokért.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/hungarian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..1b98fe38c --- /dev/null +++ b/ocr/hungarian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,209 @@ +--- +category: general +date: 2026-04-29 +description: OCR végrehajtása képen Python segítségével, a HuggingFace modell automatikus + letöltése és a GPU memória hatékony felszabadítása az OCR szöveg tisztítása közben. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: hu +og_description: Tanulja meg, hogyan végezzen OCR-t képen Pythonban, automatikusan + töltse le a HuggingFace modellt, tisztítsa meg a szöveget és szabadítsa fel a GPU + memóriát. +og_title: OCR végrehajtása képen Python segítségével – Lépésről lépésre útmutató +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: OCR végrehajtása képen Python segítségével – Teljes útmutató +url: /hu/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Kép OCR végrehajtása Pythonban – Teljes útmutató + +Valaha is szükséged volt **perform OCR on image** fájlok feldolgozására, de elakadtál a modell letöltése vagy a GPU‑memória tisztítási lépésnél? Nem vagy egyedül – sok fejlesztő ütközik ebbe a falba, amikor először próbálja kombinálni az optikai karakterfelismerést (OCR) nagy nyelvi modellekkel. + +Ebben az útmutatóban egyetlen, vég‑től‑végig megoldáson keresztül vezetünk végig, amely **downloads a HuggingFace model in Python**, futtatja az Aspose OCR‑t, megtisztítja a nyers kimenetet, és végül **releases GPU memory Python**-t felszabadítja. A végére egy kész‑futásra kész szkriptet kapsz, amely egy beolvasott PNG‑t átalakít kifinomult, kereshető szöveggé. + +> **Amit kapsz:** egy teljes, futtatható kódminta, magyarázatok arra, hogy miért fontos minden lépés, tippek a gyakori hibák elkerüléséhez, és egy pillantás arra, hogyan lehet a csővezetéket a saját projektjeidhez igazítani. + +--- + +## Amire szükséged lesz + +- Python 3.9 vagy újabb (a példa 3.11‑en lett tesztelve) +- `aspose-ocr` csomag (telepítés: `pip install aspose-ocr`) +- Internetkapcsolat a **download HuggingFace model python** lépéshez +- CUDA‑kompatibilis GPU, ha a sebességnövekedést szeretnéd (opcionális, de ajánlott) + +Nem szükséges további rendszer‑szintű függőség; az Aspose OCR motor mindent tartalmaz, amire szükséged van. + +--- + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Image alt text: “perform OCR on image – Aspose OCR output before and after AI cleaning”* + +--- + +## Kép OCR végrehajtása – Lépésről‑lépésre áttekintés + +Az alábbiakban a munkafolyamatot logikai részekre bontjuk. Minden résznek saját címe van, így az AI asszisztensek gyorsan a számodra érdekes részre ugorhatnak, és a keresőmotorok indexelhetik a releváns kulcsszavakat. + +### 1. HuggingFace modell letöltése Pythonban + +Az első dolog, amit meg kell tennünk, egy nyelvi modell lekérése, amely a nyers OCR kimenet utánfeldolgozójaként működik. Az Aspose OCR egy `AsposeAI` nevű segédosztállyal érkezik, amely automatikusan letöltheti a modellt a HuggingFace hub‑ról. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Miért fontos ez:** +- **download HuggingFace model python** – elkerülöd a zip fájlok vagy token hitelesítés manuális kezelését. +- Az `int8` kvantálás a modellt körülbelül a negyedére csökkenti, ami kulcsfontosságú, amikor később **release GPU memory python**-t kell végrehajtani. + +> **Pro tip:** Tartsd a `directory_model_path`-t SSD‑n a gyorsabb betöltési idő érdekében. + +--- + +### 2. AI segéd inicializálása és helyesírás‑ellenőrzés engedélyezése + +Most létrehozunk egy `AsposeAI` példányt, és csatolunk egy helyesírás‑javító utánfeldolgozót. Itt kezdődik a **clean OCR text python** varázslat. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Magyarázat:** +A helyesírás‑javító megvizsgálja az OCR motor minden tokenjét, és `max_edits` által korlátozott szerkesztéseket javasol. Ez a kis finomítás a “rec0gn1tion” szót “recognition”‑re változtathatja anélkül, hogy nehéz nyelvi modellt kellene használnod. + +--- + +### 3. AI segéd csatlakoztatása az OCR motorhoz + +Az Aspose a 23.4‑es verzióban új módszert vezetett be, amely lehetővé teszi, hogy egy AI motort közvetlenül az OCR csővezetékbe csatlakoztass. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Miért csináljuk:** +A AI segéd korai bekötésével az OCR motor opcionálisan használhatja a modellt valós‑időben történő javításokra (pl. elrendezés felismerés). Emellett a kódot rendezetté teszi – nincs szükség külön utófeldolgozó ciklusokra később. + +--- + +### 4. OCR végrehajtása a beolvasott képen + +Itt van a központi lépés, amely ténylegesen **perform OCR on image** fájlok feldolgozását végzi. Cseréld le a `YOUR_DIRECTORY/input.png`-t a saját beolvasott képed elérési útjára. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +A tipikus nyers kimenet furcsa helyeken tartalmazhat sortöréseket, félreolvasott karaktereket vagy idegen szimbólumokat. Ezért van szükség a következő lépésre. + +**Várható nyers kimenet (példa):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. OCR szöveg tisztítása Pythonban az AI utánfeldolgozóval + +Most hagyjuk, hogy az AI rendbe tegye a rendetlenséget. Ez a **clean OCR text python** folyamat szíve. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Az eredmény, amit látsz:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Vedd észre, hogy a helyesírás‑javító kijavította a “Th1s” → “This” szót, és eltávolította a felesleges “4n” karaktert. A modell emellett normalizálja a szóközöket, ami gyakran problémát jelent, amikor később a szöveget downstream NLP csővezetékekbe táplálod. + +--- + +### 6. GPU memória felszabadítása Pythonban – Tisztítási lépések + +Amikor végeztél, jó gyakorlat a GPU erőforrások felszabadítása, különösen ha több OCR feladatot futtatsz egy hosszú‑távú szolgáltatásban. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Mi történik a háttérben:** +`free_resources()` eltávolítja a modellt a GPU‑ról, visszaadva a memóriát a CUDA drivernek. `dispose()` leállítja az OCR motor belső puffereit. Ezeknek a hívásoknak a kihagyása memória‑hiány hibához vezethet már néhány kép után. + +> **Ne feledd:** Ha kötegelt feldolgozást tervezel egy ciklusban, hívd meg a tisztítást minden köteg után, vagy használd újra ugyanazt a `ai_helper`-t anélkül, hogy felszabadítanád a végéig. + +--- + +## Bónusz: A csővezeték finomhangolása különböző helyzetekhez + +### Modell kvantálásának beállítása + +Ha erős GPU-val rendelkezel (pl. RTX 4090) és nagyobb pontosságot szeretnél, állítsd a `hugging_face_quantization` értékét `"fp16"`-re, és növeld a `gpu_layers`-t `30`-ra. Ez több memóriát fog fogyasztani, ezért **release GPU memory python**-t agresszívebben kell alkalmaznod minden köteg után. + +### Egyedi helyesírás‑ellenőrző használata + +Kicserélheted a beépített `spell_corrector`-t egy egyedi utánfeldolgozóra, amely domain‑specifikus javításokat végez (pl. orvosi terminológia). Csak valósítsd meg a szükséges interfészt, és add át a nevét a `set_post_processor`-nek. + +### Tömeges feldolgozás több képen + +Tedd az OCR lépéseket egy `for` ciklusba, gyűjtsd a `cleaned_result.text`-et egy listába, és hívd meg a `ai_helper.free_resources()`-t csak a ciklus után, ha elegendő GPU RAM-od van. Ez csökkenti a modell többszöri betöltésének terhelését. + +--- + +## Következtetés + +Most megmutattuk, hogyan **perform OCR on image** fájlokat dolgozhatsz fel Pythonban, automatikusan **download a HuggingFace model**-t, **clean OCR text**-et, és biztonságosan **release GPU memory**-t, amikor befejezted. A teljes szkript készen áll a másolás‑beillesztésre, és a magyarázatok bizalmat adnak ahhoz, hogy nagyobb projektekhez is adaptáld. + +Következő lépések? Próbáld megcserélni a Qwen 2.5 modellt egy nagyobb LLaMA változatra, kísérletezz különböző utánfeldolgozókkal, vagy integráld a megtisztított kimenetet egy kereshető Elasticsearch indexbe. A lehetőségek végtelenek, és most már egy szilárd alapod van a további fejlesztéshez. + +Boldog kódolást, és legyenek az OCR csővezetékeid mindig tiszták és memória‑kímélőek! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/indonesian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..686bf3c3d --- /dev/null +++ b/ocr/indonesian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Ekstrak teks dari PDF menggunakan Aspose OCR di Python. Pelajari pemrosesan + PDF OCR batch, konversi teks PDF yang dipindai, dan tangani halaman dengan kepercayaan + rendah. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: id +og_description: Ekstrak teks dari PDF dengan Aspose OCR di Python. Panduan ini menunjukkan + pemrosesan OCR PDF secara batch, mengonversi teks PDF yang dipindai, dan menangani + hasil dengan kepercayaan rendah. +og_title: Ekstrak Teks dari PDF – OCR PDF dengan Python +tags: +- OCR +- Python +- PDF processing +title: Ekstrak Teks dari PDF – OCR PDF dengan Python +url: /id/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Ekstrak Teks dari PDF – OCR PDF dengan Python + +Pernah perlu **mengekstrak teks dari PDF** tetapi file tersebut hanya berupa gambar yang dipindai? Anda tidak sendirian—banyak pengembang mengalami hal yang sama saat mencoba mengubah PDF menjadi data yang dapat dicari. Kabar baiknya? Dengan Aspose OCR untuk Python Anda dapat mengonversi teks PDF yang dipindai dalam beberapa baris kode, bahkan menjalankan **pemrosesan batch OCR PDF** ketika Anda memiliki puluhan file untuk diproses. + +Dalam tutorial ini kami akan membahas seluruh alur kerja: menyiapkan pustaka, menjalankan OCR pada satu PDF, memperluas ke batch, dan menangani halaman dengan kepercayaan rendah sehingga Anda tahu kapan diperlukan tinjauan manual. Pada akhir tutorial Anda akan memiliki skrip siap‑jalankan yang mengekstrak teks dari PDF yang dipindai apa pun, dan Anda akan memahami alasan di balik setiap langkah. + +## Apa yang Anda Butuhkan + +- Python 3.8 atau lebih baru (kode menggunakan f‑strings, jadi 3.6+ dapat bekerja, tetapi 3.8+ disarankan) +- Lisensi Aspose OCR untuk Python atau kunci percobaan gratis (Anda dapat mendapatkannya dari situs web Aspose) +- Sebuah folder dengan satu atau lebih PDF yang dipindai yang ingin Anda proses +- Sebuah ruang disk yang cukup untuk laporan *.txt* yang dihasilkan + +Itu saja—tidak ada ketergantungan eksternal yang berat, tidak ada akrobatik OpenCV. Mesin OCR Aspose melakukan pekerjaan berat untuk Anda. + +## Menyiapkan Lingkungan + +Pertama, instal paket Aspose OCR dari PyPI: + +```bash +pip install aspose-ocr +``` + +Jika Anda memiliki file lisensi (`Aspose.OCR.lic`), letakkan di root proyek Anda dan aktifkan seperti berikut: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** Simpan file lisensi di luar kontrol versi; tambahkan ke `.gitignore` untuk menghindari paparan tidak sengaja. + +## Melakukan OCR pada Satu PDF + +Sekarang mari kita ekstrak teks dari satu PDF yang dipindai. Langkah‑langkah inti adalah: + +1. Buat instance `OcrEngine`. +2. Arahkan ke file PDF. +3. Dapatkan `OcrResult` untuk setiap halaman. +4. Tulis output teks biasa ke disk. +5. Hapus (dispose) engine untuk membebaskan sumber daya native. + +Berikut adalah skrip lengkap yang dapat dijalankan: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Apa yang akan Anda lihat:** Untuk setiap halaman skrip mencetak sesuatu seperti `Page 1: confidence 97.45%`. Jika sebuah halaman berada di bawah ambang 80 %, sebuah peringatan muncul, memberi tahu Anda bahwa OCR mungkin telah melewatkan karakter. + +### Mengapa Ini Berfungsi + +- **`OcrEngine`** adalah gerbang ke pustaka Aspose OCR native; ia menangani segala hal mulai dari pra‑pemrosesan gambar hingga pengenalan karakter. +- **`extract_from_pdf`** secara otomatis meraster setiap halaman PDF, sehingga Anda tidak perlu mengonversi PDF menjadi gambar secara manual. +- **Skor kepercayaan** memungkinkan Anda mengotomatisasi pemeriksaan kualitas—penting ketika Anda memproses dokumen hukum atau medis di mana akurasi sangat penting. + +## Pemrosesan Batch OCR PDF dengan Python + +Sebagian besar proyek dunia nyata melibatkan lebih dari satu file. Mari kita perpanjang skrip satu‑file menjadi pipeline **pemrosesan batch OCR PDF** yang menelusuri sebuah direktori, memproses setiap PDF, dan menyimpan hasilnya di sub‑folder yang sesuai. + +Berikut kode contoh: + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Bagaimana Ini Membantu + +- **Skalabilitas:** Fungsi ini menelusuri folder sekali, membuat sub‑folder output khusus untuk setiap PDF. Ini menjaga keteraturan ketika Anda memiliki puluhan dokumen. +- **Dapat digunakan kembali:** `ocr_pdf_file` dapat dipanggil dari skrip lain (mis., layanan web) karena merupakan fungsi murni. +- **Penanganan error:** Skrip mencetak pesan ramah jika folder input kosong, menyelamatkan Anda dari kegagalan diam. + +## Mengonversi Teks PDF yang Dipindai – Menangani Kasus Pinggir + +Meskipun kode di atas berfungsi untuk kebanyakan PDF, Anda mungkin menemui beberapa keanehan: + +| Situation | Why It Happens | How to Mitigate | +|-----------|----------------|-----------------| +| **PDF terenkripsi** | PDF dilindungi kata sandi. | Berikan kata sandi ke `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Dokumen multi‑bahasa** | Aspose OCR secara default ke bahasa Inggris. | Atur `ocr_engine.language = "spa"` untuk bahasa Spanyol, atau berikan daftar untuk bahasa campuran. | +| **PDF sangat besar (>500 halaman)** | Penggunaan memori melonjak karena setiap halaman dimuat ke RAM. | Proses PDF dalam potongan menggunakan `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` dan lakukan loop. | +| **Kualitas pemindaian buruk** | DPI rendah atau banyak noise mengurangi kepercayaan. | Pra‑proses PDF dengan `engine.image_preprocessing = True` atau tingkatkan DPI melalui `engine.dpi = 300`. | + +> **Perhatian:** Mengaktifkan pra‑pemrosesan gambar dapat meningkatkan waktu CPU secara signifikan. Jika Anda menjalankan batch malam, jadwalkan cukup waktu atau jalankan pekerja terpisah. + +## Memverifikasi Output + +Setelah skrip selesai, Anda akan menemukan struktur folder seperti berikut: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Buka file `.txt` apa pun; Anda akan melihat teks bersih ber‑encoding UTF‑8 yang mencerminkan konten yang dipindai asli. Jika Anda melihat karakter yang kacau, periksa kembali pengaturan bahasa PDF dan pastikan paket font yang tepat terpasang di mesin. + +## Membersihkan Sumber Daya + +Aspose OCR bergantung pada DLL native, jadi penting untuk memanggil `engine.dispose()` setelah selesai. Lupa langkah ini dapat menyebabkan kebocoran memori, terutama pada pekerjaan batch yang berjalan lama. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Contoh End‑to‑End Lengkap + +Menggabungkan semuanya, berikut contoh lengkap satu + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/indonesian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..9c239e234 --- /dev/null +++ b/ocr/indonesian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,277 @@ +--- +category: general +date: 2026-04-29 +description: Pelajari cara mengenali tulisan tangan di Python dengan Aspose OCR. Panduan + langkah demi langkah ini menunjukkan cara mengekstrak teks tulisan tangan secara + efisien. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: id +og_description: Bagaimana cara mengenali tulisan tangan di Python? Ikuti panduan lengkap + ini untuk mengekstrak teks tulisan tangan menggunakan Aspose OCR, dengan kode, tips, + dan penanganan kasus khusus. +og_title: Cara Mengenali Tulisan Tangan di Python – Tutorial Lengkap +tags: +- OCR +- Python +- HandwritingRecognition +title: Cara Mengenali Tulisan Tangan di Python – Tutorial Lengkap +url: /id/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cara Mengenali Tulisan Tangan di Python – Tutorial Lengkap + +Pernah membutuhkan **cara mengenali tulisan tangan** dalam proyek Python tetapi tidak yakin harus mulai dari mana? Anda tidak sendirian—para pengembang terus bertanya, “Bisakah saya mengambil teks dari catatan yang dipindai?” Kabar baiknya, perpustakaan OCR modern membuat ini sangat mudah. Dalam panduan ini kami akan membahas **cara mengenali tulisan tangan** menggunakan Aspose OCR, dan Anda juga akan belajar **mengekstrak teks tulisan tangan** secara andal. + +Kami akan membahas semuanya mulai dari menginstal perpustakaan hingga menyesuaikan ambang kepercayaan untuk skrip kursif yang berantakan. Pada akhir tutorial Anda akan memiliki skrip yang dapat dijalankan yang mencetak teks yang diekstrak dan skor kepercayaan keseluruhan—sempurna untuk aplikasi pencatatan, alat arsip, atau sekadar memuaskan rasa ingin tahu. Tidak diperlukan pengalaman OCR sebelumnya; pengetahuan dasar Python sudah cukup. + +--- + +## Apa yang Anda Butuhkan + +- **Python 3.9+** (versi stabil terbaru paling cocok) +- **Aspose.OCR for Python via .NET** – instal dengan `pip install aspose-ocr` +- Sebuah **gambar tulisan tangan** (JPEG/PNG) yang ingin Anda proses +- Opsional: lingkungan virtual untuk menjaga ketergantungan tetap rapi + +![Contoh cara mengenali tulisan tangan](/images/handwritten-sample.jpg "Contoh cara mengenali tulisan tangan") + +*(Teks alt: “contoh cara mengenali tulisan tangan menampilkan catatan tulisan tangan yang dipindai”)* + +--- + +## Langkah 1 – Instal dan Impor Kelas Aspose OCR + +Pertama-tama, kita membutuhkan mesin OCR itu sendiri. Aspose menyediakan API yang bersih yang memisahkan pengenalan teks cetak dari mode tulisan tangan. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Mengapa ini penting:* Mengimpor `HandwritingMode` memungkinkan kita memberi tahu mesin bahwa kita sedang menangani **pengenalan teks tulisan tangan python** bukan teks cetak, yang secara dramatis meningkatkan akurasi untuk goresan kursif. + +--- + +## Langkah 2 – Buat dan Konfigurasikan Mesin OCR + +Sekarang kita membuat instance `OcrEngine` dan mengalihkannya ke mode tulisan tangan. Anda juga dapat menyesuaikan ambang kepercayaan; nilai lebih rendah menerima tulisan yang goyah, nilai lebih tinggi menuntut input yang lebih bersih. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Tips profesional:* Jika catatan Anda dipindai pada 300 DPI atau lebih, biasanya Anda akan mendapatkan skor yang lebih baik. Untuk gambar beresolusi rendah, pertimbangkan untuk memperbesar dengan Pillow sebelum memberi mereka ke mesin. + +--- + +## Langkah 3 – Siapkan Jalur Gambar + +Pastikan jalur file mengarah ke gambar yang ingin Anda proses. Jalur relatif berfungsi baik, tetapi jalur absolut menghindari kejutan “file tidak ditemukan”. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Kesalahan umum:* Lupa meng-escape backslash pada Windows (`C:\\folder\\image.jpg`). Menggunakan string mentah (`r"C:\folder\image.jpg"`) mengatasi masalah tersebut. + +--- + +## Langkah 4 – Jalankan Pengenalan dan Tangkap Hasil + +Metode `recognize` melakukan pekerjaan berat. Ia mengembalikan objek dengan properti `.text` dan `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Output yang diharapkan (contoh):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Jika kepercayaan turun di bawah 0.5, Anda mungkin perlu membersihkan gambar (menghilangkan bayangan, meningkatkan kontras) atau menurunkan ambang pada Langkah 2. + +--- + +## Langkah 5 – Bersihkan Sumber Daya + +Aspose OCR menyimpan sumber daya native; memanggil `dispose()` melepaskannya dan mencegah kebocoran memori, terutama saat memproses banyak gambar dalam loop. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Mengapa dispose?* Dalam layanan yang berjalan lama (mis., API Flask yang menerima unggahan), melupakan untuk membebaskan sumber daya dapat dengan cepat menghabiskan memori sistem. + +--- + +## Skrip Lengkap – Jalankan Sekali‑Klik + +Menggabungkan semuanya, berikut skrip mandiri yang dapat Anda salin‑tempel dan jalankan. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Simpan ini sebagai `handwritten_ocr.py` dan jalankan `python handwritten_ocr.py`. Jika semuanya sudah diatur dengan benar, Anda akan melihat teks yang diekstrak dicetak ke konsol. + +--- + +## Menangani Kasus Tepi dan Variasi Umum + +### Gambar dengan Kontras Rendah +Jika latar belakang menyatu dengan tinta, tingkatkan kontras terlebih dahulu: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Catatan yang Diputar +Halaman notebook yang miring dapat mengacaukan pengenalan. Gunakan Pillow untuk mengoreksi kemiringan: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDF Multi‑Halaman +Aspose OCR juga dapat menangani halaman PDF, tetapi Anda harus mengonversi setiap halaman menjadi gambar terlebih dahulu (mis., menggunakan `pdf2image`). Kemudian lakukan loop melalui gambar dengan fungsi `recognize_handwriting` yang sama. + +--- + +## Tips Pro untuk Hasil **Extract Handwritten Text** yang Lebih Baik + +- **DPI penting:** Targetkan 300 DPI atau lebih saat memindai. +- **Hindari latar belakang berwarna:** Putih murni atau abu‑abu terang menghasilkan output paling bersih. +- **Pemrosesan batch:** Bungkus fungsi dalam loop `for` dan catat kepercayaan setiap halaman; buang hasil di bawah ambang untuk menjaga kualitas tinggi. +- **Dukungan bahasa:** Aspose OCR mendukung banyak bahasa; set `engine.set_language("en")` untuk optimasi hanya bahasa Inggris. + +--- + +## Pertanyaan yang Sering Diajukan + +**Apakah ini bekerja di Linux?** +Ya—Aspose OCR dilengkapi dengan binary native untuk Windows, macOS, dan Linux. Cukup instal paket pip dan Anda siap. + +**Bagaimana jika tulisan tangan saya sangat kursif?** +Coba turunkan ambang kepercayaan (`0.5` atau bahkan `0.4`). Ingat bahwa ini dapat menambah noise, jadi lakukan pasca‑proses pada output (mis., pemeriksaan ejaan) jika diperlukan. + +**Bisakah saya menggunakan ini dalam layanan web?** +Tentu saja. Fungsi `recognize_handwriting` bersifat stateless, menjadikannya sempurna untuk endpoint Flask atau FastAPI. Hanya ingat untuk memanggil `dispose()` setelah setiap permintaan atau gunakan context manager. + +--- + +## Kesimpulan + +Kami telah membahas **cara mengenali tulisan tangan** di Python dari awal hingga akhir, menunjukkan cara **mengekstrak teks tulisan tangan**, menyesuaikan pengaturan kepercayaan, dan menangani jebakan umum seperti kontras rendah atau halaman yang diputar. Skrip lengkap di atas siap dijalankan, dan fungsi modular memudahkan integrasi ke proyek yang lebih besar—apakah Anda membangun aplikasi pencatatan, mendigitalkan arsip, atau hanya bereksperimen dengan teknik **handwritten ocr tutorial python**. + +Selanjutnya, Anda mungkin ingin menjelajahi **handwritten text recognition python** untuk catatan multibahasa, atau menggabungkan OCR dengan pemrosesan bahasa alami untuk secara otomatis merangkum notulen rapat. Tidak ada batasan—cobalah dan biarkan kode Anda memberi kehidupan pada coretan. + +Selamat coding, dan silakan tinggalkan pertanyaan Anda di komentar! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/indonesian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..bcfdb8ee1 --- /dev/null +++ b/ocr/indonesian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,181 @@ +--- +category: general +date: 2026-04-29 +description: Pelajari cara menjalankan OCR pada pemindaian Anda, gunakan model Hugging + Face secara otomatis, dan kenali teks dari pemindaian dengan Aspose OCR dalam hitungan + menit. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: id +og_description: Cara menjalankan OCR pada pemindaian menggunakan Aspose OCR, secara + otomatis mengunduh model Hugging Face, dan mendapatkan teks yang bersih serta berpunctuation. +og_title: Cara Menjalankan OCR dengan Aspose & Hugging Face – Panduan Lengkap +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Cara Menjalankan OCR dengan Aspose & Hugging Face – Panduan Lengkap +url: /id/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cara Menjalankan OCR dengan Aspose & Hugging Face – Panduan Lengkap + +Pernah bertanya-tanya **bagaimana cara menjalankan OCR** pada tumpukan dokumen yang dipindai tanpa menghabiskan berjam-jam mengatur pengaturan? Anda tidak sendirian. Dalam banyak proyek, pengembang perlu **mengenali teks dari pemindaian** dengan cepat, namun mereka terhambat oleh unduhan model dan proses pasca‑pemrosesan. + +Kabar baik: tutorial ini menunjukkan solusi siap‑jalankan yang **menggunakan model Hugging Face**, secara otomatis mengunduhnya, dan menambahkan tanda baca sehingga outputnya terbaca seperti ditulis manusia. Pada akhir tutorial, Anda akan memiliki skrip yang memproses setiap gambar dalam sebuah folder dan menaruh file `.txt` bersih di samping setiap pemindaian. + +## Apa yang Anda Butuhkan + +- Python 3.8+ (kode menggunakan f‑strings, jadi versi yang lebih lama tidak akan cukup) +- `aspose-ocr` package (pasang via `pip install aspose-ocr`) +- Akses internet untuk unduhan model pertama kali +- Folder berisi pemindaian gambar (`.png`, `.jpg`, atau `.tif`) + +Itu saja—tidak ada binari tambahan, tidak ada penyesuaian model manual. Mari kita mulai. + +![contoh cara menjalankan OCR](https://example.com/ocr-demo.png "contoh cara menjalankan OCR") + +## Langkah 1: Impor Kelas Aspose OCR & Siapkan Lingkungan + +Kita mulai dengan mengambil kelas yang diperlukan dari pustaka Aspose OCR. Mengimpor semuanya di awal membuat skrip rapi dan memudahkan menemukan ketergantungan yang hilang. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Mengapa ini penting*: `OcrEngine` melakukan pekerjaan berat, sementara `AsposeAI` memungkinkan kita menyambungkan model bahasa besar untuk post‑processing yang lebih cerdas. Jika Anda melewatkan impor, sisa kode tidak akan dapat dikompilasi—jadi jangan lupa. + +## Langkah 2: Konfigurasikan Model Hugging Face yang Sadar GPU + +Sekarang kita memberi tahu Aspose dari mana mengambil model dan berapa banyak lapisan yang harus dijalankan di GPU. Flag `allow_auto_download="true"` melakukan bagian **mengunduh model secara otomatis** untuk Anda. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Tip pro**: Jika Anda tidak memiliki GPU, setel `gpu_layers=0`. Model akan beralih ke CPU, yang lebih lambat tetapi tetap berfungsi. + +### Mengapa Memilih Model Hugging Face? + +Hugging Face menyimpan koleksi besar LLM siap‑pakai. Dengan menunjuk ke `Qwen/Qwen2.5-3B-Instruct-GGUF`, Anda mendapatkan model yang kompak dan disetel instruksi yang dapat menambahkan tanda baca, memperbaiki spasi, dan bahkan memperbaiki kesalahan OCR minor. Inilah esensi **menggunakan model hugging face** dalam praktik. + +## Langkah 3: Inisialisasi Mesin AI dan Aktifkan Post‑Processing Tanda Baca + +Mesin AI bukan hanya untuk obrolan mewah—di sini kami menambahkan *penambah tanda baca* yang membersihkan output OCR mentah. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Apa yang terjadi?* Panggilan `set_post_processor` mendaftarkan post‑processor bawaan yang dijalankan setelah mesin OCR selesai. Ia mengambil string mentah dan menyisipkan koma, titik, serta huruf kapital di tempat yang tepat, membuat teks akhir jauh lebih mudah dibaca. + +## Langkah 4: Buat Mesin OCR dan Lampirkan Mesin AI + +Menghubungkan mesin AI ke mesin OCR memberi kita satu objek yang dapat membaca karakter sekaligus memoles hasilnya. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Jika Anda melewatkan langkah ini, OCR tetap akan berfungsi, tetapi Anda akan kehilangan peningkatan tanda baca—sehingga output akan terlihat seperti rangkaian kata. + +## Langkah 5: Proses Setiap Gambar dalam Folder + +Berikut inti tutorial. Kami melakukan loop pada setiap gambar, menjalankan OCR, menerapkan post‑processor, dan menulis teks yang sudah dibersihkan ke file `.txt` berdampingan. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Apa yang Diharapkan + +Menjalankan skrip akan mencetak sesuatu seperti: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Setiap baris memberi Anda skor kepercayaan (pemeriksaan cepat) dan membuat `invoice_001.png.txt`, `receipt_2024.tif.txt`, dll., yang berisi teks berpunctuation, dapat dibaca manusia. + +### Kasus Pinggir & Variasi + +- **Pemindaian non‑Inggris**: Ganti `hugging_face_repo_id` ke model multibahasa (mis., `microsoft/Multilingual-LLM-GGUF`). +- **Batch besar**: Bungkus loop dalam `concurrent.futures.ThreadPoolExecutor` untuk pemrosesan paralel, tetapi perhatikan batas memori GPU. +- **Post‑processing kustom**: Ganti `"punctuation_adder"` dengan skrip Anda sendiri jika Anda memerlukan pembersihan khusus domain (mis., menghapus nomor faktur). + +## Langkah 6: Bersihkan Sumber Daya + +Saat pekerjaan selesai, membebaskan sumber daya mencegah kebocoran memori, terutama penting jika Anda menjalankan ini dalam layanan yang berjalan lama. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Mengabaikan langkah ini dapat meninggalkan memori GPU tergantung, yang akan mengganggu jalannya selanjutnya. + +## Ringkasan: Cara Menjalankan OCR End‑to‑End + +Dalam hanya beberapa baris, kami telah menunjukkan **cara menjalankan OCR** pada folder pemindaian, **menggunakan model Hugging Face** yang mengunduh dirinya sendiri pada pertama kali, dan **mengenali teks dari pemindaian** dengan tanda baca yang ditambahkan secara otomatis. Skrip lengkap siap untuk disalin‑tempel, menyesuaikan jalur Anda, dan dijalankan. + +## Langkah Selanjutnya & Topik Terkait + +- **Post‑processing batch**: Jelajahi `ocr_engine.run_batch_postprocessor` untuk penanganan massal yang lebih cepat. +- **Model alternatif**: Coba keluarga `openai/whisper` jika Anda membutuhkan speech‑to‑text bersamaan dengan OCR. +- **Integrasi dengan basis data**: Simpan teks yang diekstrak di SQLite atau Elasticsearch untuk arsip yang dapat dicari. + +Rasakan **bebas** untuk bereksperimen—ganti model, ubah `gpu_layers`, atau tambahkan post‑processor Anda sendiri. Fleksibilitas **Aspose OCR** yang digabungkan **dengan hub model Hugging Face** menjadikan ini **dasar yang serbaguna** untuk proyek **digitalisasi dokumen** apa pun. + +--- + +*Selamat coding! Jika Anda mengalami kendala, tinggalkan komentar di bawah atau periksa dokumentasi Aspose OCR untuk opsi konfigurasi yang lebih mendalam.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/indonesian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..773dcae15 --- /dev/null +++ b/ocr/indonesian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,188 @@ +--- +category: general +date: 2026-04-29 +description: Lakukan OCR pada gambar menggunakan Python, unduh otomatis model HuggingFace, + dan bebaskan memori GPU secara efisien sambil membersihkan teks OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: id +og_description: Pelajari cara melakukan OCR pada gambar di Python, secara otomatis + mengunduh model HuggingFace, membersihkan teks, dan membebaskan memori GPU. +og_title: Lakukan OCR pada Gambar dengan Python – Panduan Langkah demi Langkah +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Lakukan OCR pada Gambar dengan Python – Panduan Lengkap +url: /id/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Lakukan OCR pada Gambar dengan Python – Panduan Lengkap + +Pernah membutuhkan untuk **perform OCR on image** file tetapi terjebak pada tahap pengunduhan model atau pembersihan memori GPU? Anda bukan satu-satunya—banyak pengembang mengalami hal yang sama ketika pertama kali mencoba menggabungkan optical character recognition dengan large language models. + +Dalam tutorial ini kami akan menjelaskan solusi tunggal, end‑to‑end yang **downloads a HuggingFace model in Python**, menjalankan Aspose OCR, membersihkan output mentah, dan akhirnya **releases GPU memory Python** yang dapat dipulihkan. Pada akhir tutorial Anda akan memiliki skrip siap‑jalankan yang mengubah PNG yang dipindai menjadi teks yang rapi dan dapat dicari. + +> **Apa yang akan Anda dapatkan:** contoh kode lengkap yang dapat dijalankan, penjelasan mengapa setiap langkah penting, tips untuk menghindari jebakan umum, dan sekilas tentang cara menyesuaikan pipeline untuk proyek Anda sendiri. + +## Apa yang Anda Butuhkan + +- Python 3.9 atau lebih baru (contoh diuji pada 3.11) +- `aspose-ocr` paket (install via `pip install aspose-ocr`) +- Koneksi internet untuk langkah **download HuggingFace model python** +- GPU yang kompatibel dengan CUDA jika Anda menginginkan peningkatan kecepatan (opsional tetapi disarankan) + +Tidak ada dependensi tingkat sistem tambahan yang diperlukan; mesin Aspose OCR menyertakan semua yang Anda butuhkan. + +![contoh melakukan OCR pada gambar](image.png "Contoh melakukan OCR pada gambar dengan Aspose OCR dan post‑processor LLM") + +*Image alt text: “perform OCR on image – Output Aspose OCR sebelum dan sesudah pembersihan AI”* + +## Lakukan OCR pada Gambar – Ikhtisar Langkah‑per‑Langkah + +Di bawah ini kami membagi alur kerja menjadi bagian‑bagian logis. Setiap bagian memiliki judulnya sendiri, sehingga asisten AI dapat dengan cepat melompat ke bagian yang Anda minati, dan mesin pencari dapat mengindeks kata kunci yang relevan. + +### 1. Unduh Model HuggingFace dengan Python + +Hal pertama yang harus kita lakukan adalah mengambil model bahasa yang akan berfungsi sebagai post‑processor untuk output OCR mentah. Aspose OCR dilengkapi dengan kelas pembantu bernama `AsposeAI` yang dapat secara otomatis mengambil model dari hub HuggingFace. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Mengapa ini penting:** +- **download HuggingFace model python** – Anda menghindari penanganan file zip secara manual atau otentikasi token. +- Menggunakan kuantisasi `int8` memperkecil model menjadi kira‑kira seperempat ukuran aslinya, yang penting ketika Anda kemudian perlu **release GPU memory python**. + +> **Pro tip:** Simpan `directory_model_path` di SSD untuk waktu pemuatan yang lebih cepat. + +### 2. Inisialisasi AI Helper dan Aktifkan Pemeriksaan Ejaan + +Sekarang kami membuat instance `AsposeAI` dan melampirkan post‑processor spell‑corrector. Di sinilah keajaiban **clean OCR text python** dimulai. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Penjelasan:** +Spell‑corrector memeriksa setiap token dari mesin OCR dan menyarankan edit yang dibatasi oleh `max_edits`. Penyesuaian kecil ini dapat mengubah “rec0gn1tion” menjadi “recognition” tanpa model bahasa yang berat. + +### 3. Sambungkan AI Helper ke Mesin OCR + +Aspose memperkenalkan metode baru pada versi 23.4 yang memungkinkan Anda menyambungkan mesin AI langsung ke pipeline OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Mengapa kami melakukannya:** +Dengan menghubungkan AI helper lebih awal, mesin OCR dapat secara opsional menggunakan model untuk perbaikan secara langsung (mis., deteksi tata letak). Ini juga membuat kode tetap rapi—tidak perlu loop post‑processing terpisah nanti. + +### 4. Lakukan OCR pada Gambar yang Dipindai + +Berikut langkah inti yang sebenarnya **perform OCR on image** file. Ganti `YOUR_DIRECTORY/input.png` dengan path ke pemindaian Anda sendiri. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Output mentah biasanya dapat berisi pemisah baris di tempat yang aneh, karakter yang salah dikenali, atau simbol yang tidak diinginkan. Itulah mengapa kita memerlukan langkah berikutnya. + +**Output mentah yang diharapkan (contoh):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +### 5. Bersihkan Teks OCR di Python dengan AI Post‑Processor + +Sekarang kami membiarkan AI membersihkan kekacauan. Ini adalah inti dari proses **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Hasil yang akan Anda lihat:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Perhatikan bagaimana spell‑corrector memperbaiki “Th1s” → “This” dan menghapus “4n” yang tidak diinginkan. Model juga menormalkan spasi, yang sering menjadi masalah ketika Anda kemudian memasukkan teks ke dalam pipeline NLP hilir. + +### 6. Lepaskan Memori GPU di Python – Langkah Pembersihan + +Setelah selesai, sebaiknya membebaskan sumber daya GPU, terutama jika Anda menjalankan banyak pekerjaan OCR dalam layanan yang berjalan lama. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Apa yang terjadi di balik layar:** +`free_resources()` membongkar model dari GPU, mengembalikan memori ke driver CUDA. `dispose()` menutup buffer internal mesin OCR. Melewatkan pemanggilan ini dapat menyebabkan error out‑of‑memory setelah hanya beberapa gambar. + +> **Ingat:** Jika Anda berencana memproses batch dalam loop, panggil pembersihan setelah setiap batch atau gunakan kembali `ai_helper` yang sama tanpa membebaskannya sampai akhir. + +## Bonus: Menyesuaikan Pipeline untuk Berbagai Skenario + +### Menyesuaikan Kuantisasi Model + +Jika Anda memiliki GPU yang kuat (mis., RTX 4090) dan menginginkan akurasi lebih tinggi, ubah `hugging_face_quantization` menjadi `"fp16"` dan tingkatkan `gpu_layers` menjadi `30`. Ini akan mengonsumsi lebih banyak memori, jadi Anda perlu **release GPU memory python** lebih agresif setelah setiap batch. + +### Menggunakan Spell‑Checker Kustom + +Anda dapat mengganti `spell_corrector` bawaan dengan post‑processor kustom yang melakukan koreksi spesifik domain (mis., terminologi medis). Cukup implementasikan antarmuka yang diperlukan dan berikan namanya ke `set_post_processor`. + +### Pemrosesan Batch Banyak Gambar + +Bungkus langkah OCR dalam loop `for`, kumpulkan `cleaned_result.text` ke dalam list, dan panggil `ai_helper.free_resources()` hanya setelah loop jika Anda memiliki cukup RAM GPU. Ini mengurangi overhead pemuatan model berulang kali. + +## Kesimpulan + +Kami baru saja menunjukkan cara **perform OCR on image** file di Python, secara otomatis **download a HuggingFace model**, **clean OCR text**, dan dengan aman **release GPU memory** saat selesai. Skrip lengkap siap untuk disalin‑tempel, dan penjelasannya memberi Anda kepercayaan untuk menyesuaikannya dengan proyek yang lebih besar. + +Langkah selanjutnya? Coba ganti model Qwen 2.5 dengan varian LLaMA yang lebih besar, bereksperimen dengan post‑processor yang berbeda, atau integrasikan output yang dibersihkan ke dalam indeks Elasticsearch yang dapat dicari. Kemungkinannya tak terbatas, dan Anda kini memiliki fondasi yang kuat untuk dibangun. + +Selamat coding, semoga pipeline OCR Anda selalu bersih dan ramah memori! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/italian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..d623939ed --- /dev/null +++ b/ocr/italian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Estrai testo da PDF usando Aspose OCR in Python. Impara l'elaborazione + OCR di PDF in batch, converti il testo dei PDF scansionati e gestisci le pagine + a bassa confidenza. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: it +og_description: Estrai il testo da PDF con Aspose OCR in Python. Questa guida mostra + l'elaborazione OCR batch di PDF, la conversione del testo di PDF scansionati e la + gestione dei risultati a bassa confidenza. +og_title: Estrai testo da PDF – OCR PDF con Python +tags: +- OCR +- Python +- PDF processing +title: Estrai testo da PDF – OCR PDF con Python +url: /it/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Estrarre testo da PDF – OCR PDF con Python + +Hai mai dovuto **estrarre testo da un PDF** ma il file è solo un’immagine scannerizzata? Non sei solo: molti sviluppatori si trovano di fronte a questo ostacolo quando cercano di trasformare i PDF in dati ricercabili. La buona notizia? Con Aspose OCR per Python puoi convertire il testo di PDF scannerizzati in poche righe di codice, e persino eseguire **elaborazione batch OCR PDF** quando hai dozzine di file da gestire. + +In questo tutorial percorreremo l’intero flusso di lavoro: configurare la libreria, eseguire l’OCR su un singolo PDF, scalare a un batch e gestire le pagine a bassa confidenza così saprai quando è necessaria una revisione manuale. Alla fine avrai uno script pronto all’uso che estrae testo da qualsiasi PDF scannerizzato e comprenderai il perché di ogni passaggio. + +## Cosa ti servirà + +Prima di iniziare, assicurati di avere: + +- Python 3.8 o superiore (il codice usa le f‑string, quindi 3.6+ funziona, ma 3.8+ è consigliato) +- Una licenza Aspose OCR per Python o una chiave di prova gratuita (puoi ottenerla dal sito Aspose) +- Una cartella con uno o più PDF scannerizzati da elaborare +- Una quantità modesta di spazio su disco per i report *.txt* generati + +Tutto qui—nessuna dipendenza esterna pesante, nessun esercizio con OpenCV. Il motore OCR di Aspose fa il lavoro pesante per te. + +## Configurare l’ambiente + +Per prima cosa, installa il pacchetto Aspose OCR da PyPI: + +```bash +pip install aspose-ocr +``` + +Se disponi di un file di licenza (`Aspose.OCR.lic`), posizionalo nella radice del progetto e attivalo così: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** Tieni il file di licenza fuori dal controllo di versione; aggiungilo a `.gitignore` per evitare esposizioni accidentali. + +## Eseguire l’OCR su un singolo PDF + +Ora estraiamo il testo da un singolo PDF scannerizzato. I passaggi fondamentali sono: + +1. Creare un’istanza di `OcrEngine`. +2. Puntarla al file PDF. +3. Recuperare un `OcrResult` per ogni pagina. +4. Scrivere l’output di testo semplice su disco. +5. Rilasciare il motore per liberare le risorse native. + +Ecco lo script completo, pronto per l’esecuzione: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Cosa vedrai:** Per ogni pagina lo script stampa qualcosa del tipo `Page 1: confidence 97.45%`. Se una pagina scende sotto la soglia dell’80 %, appare un avviso, indicandoti che l’OCR potrebbe aver perso dei caratteri. + +### Perché funziona + +- **`OcrEngine`** è il gateway verso la libreria nativa Aspose OCR; gestisce tutto, dalla pre‑elaborazione delle immagini al riconoscimento dei caratteri. +- **`extract_from_pdf`** rasterizza automaticamente ogni pagina del PDF, quindi non è necessario convertire il PDF in immagini manualmente. +- **I punteggi di confidenza** ti permettono di automatizzare i controlli di qualità—critico quando elabori documenti legali o medici dove la precisione è fondamentale. + +## Elaborazione batch OCR PDF con Python + +La maggior parte dei progetti reali coinvolge più di un file. Estendiamo lo script singolo a una pipeline di **elaborazione batch OCR PDF** che attraversa una directory, elabora ogni PDF e salva i risultati in una sottocartella corrispondente. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Come aiuta + +- **Scalabilità:** La funzione attraversa la cartella una sola volta, creando una sottocartella di output dedicata per ogni PDF. Questo mantiene tutto ordinato quando hai decine di documenti. +- **Riutilizzabilità:** `ocr_pdf_file` può essere chiamata da altri script (ad es., un servizio web) perché è una funzione pura. +- **Gestione degli errori:** Lo script stampa un messaggio amichevole se la cartella di input è vuota, evitando un fallimento silenzioso. + +## Conversione del testo PDF scannerizzato – Gestione dei casi limite + +Sebbene il codice sopra funzioni per la maggior parte dei PDF, potresti incontrare alcune particolarità: + +| Situazione | Perché accade | Come mitigare | +|-----------|----------------|-----------------| +| **PDF criptati** | Il PDF è protetto da password. | Passa la password a `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Documenti multilingua** | Aspose OCR usa l’inglese per impostazione predefinita. | Imposta `ocr_engine.language = "spa"` per lo spagnolo, o fornisci una lista per lingue miste. | +| **PDF molto grandi (>500 pagine)** | L’uso di memoria aumenta perché ogni pagina viene caricata in RAM. | Elabora il PDF a blocchi usando `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` e itera. | +| **Scansioni di scarsa qualità** | DPI basso o molto rumore riducono la confidenza. | Pre‑elabora il PDF con `engine.image_preprocessing = True` o aumenta il DPI tramite `engine.dpi = 300`. | + +> **Attenzione:** Attivare la pre‑elaborazione delle immagini può aumentare notevolmente il tempo CPU. Se esegui un batch notturno, programma abbastanza tempo o avvia un worker separato. + +## Verificare l’output + +Al termine dello script, troverai una struttura di cartelle simile a: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Apri qualsiasi file `.txt`; dovresti vedere testo pulito, codificato in UTF‑8, che rispecchia il contenuto originale scannerizzato. Se noti caratteri illeggibili, ricontrolla le impostazioni di lingua del PDF e assicurati che i pacchetti di font corretti siano installati sulla macchina. + +## Pulizia delle risorse + +Aspose OCR si basa su DLL native, quindi è fondamentale chiamare `engine.dispose()` una volta terminato. Dimenticare questo passaggio può provocare perdite di memoria, soprattutto in job batch a lunga esecuzione. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Esempio completo end‑to‑end + +Mettendo tutto insieme, ecco un singolo + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/italian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..56f248d63 --- /dev/null +++ b/ocr/italian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Impara a riconoscere la scrittura a mano in Python con Aspose OCR. Questa + guida passo passo mostra come estrarre il testo scritto a mano in modo efficiente. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: it +og_description: Come riconoscere la scrittura a mano in Python? Segui questa guida + completa per estrarre il testo scritto a mano usando Aspose OCR, con codice, consigli + e gestione dei casi limite. +og_title: Come riconoscere la scrittura a mano in Python – Tutorial completo +tags: +- OCR +- Python +- HandwritingRecognition +title: Come riconoscere la scrittura a mano in Python – Tutorial completo +url: /it/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Come riconoscere la scrittura a mano in Python – Tutorial completo + +Hai mai avuto bisogno di **come riconoscere la scrittura a mano** in un progetto Python ma non sapevi da dove cominciare? Non sei solo—gli sviluppatori chiedono continuamente, “Posso estrarre testo da una nota scansionata?” La buona notizia è che le moderne librerie OCR rendono tutto un gioco da ragazzi. In questa guida vedremo **come riconoscere la scrittura a mano** usando Aspose OCR, e imparerai anche a **estrarre testo scritto a mano** in modo affidabile. + +Copriremo tutto, dall'installazione della libreria alla regolazione delle soglie di confidenza per quelle script cursive disordinate. Alla fine avrai uno script eseguibile che stampa il testo estratto e un punteggio di confidenza complessivo—perfetto per app di presa appunti, strumenti di archiviazione o semplicemente per soddisfare la curiosità. Non è necessaria alcuna esperienza pregressa con l'OCR; basta una conoscenza di base di Python. + +--- + +## Di cosa avrai bisogno + +- **Python 3.9+** (l'ultima versione stabile funziona al meglio) +- **Aspose.OCR for Python via .NET** – installa con `pip install aspose-ocr` +- Un'**immagine scritta a mano** (JPEG/PNG) che desideri elaborare +- Opzionale: un ambiente virtuale per tenere ordinate le dipendenze + +Se hai già questi elementi pronti, immergiamoci. + +![Esempio di riconoscimento della scrittura a mano](/images/handwritten-sample.jpg "Esempio di riconoscimento della scrittura a mano") + +*(Testo alternativo: “esempio di riconoscimento della scrittura a mano che mostra una nota scansionata a mano”)* + +--- + +## Passo 1 – Installa e importa le classi Aspose OCR + +Prima di tutto, ci serve il motore OCR stesso. Aspose fornisce un'API pulita che separa il riconoscimento del testo stampato dalla modalità scritta a mano. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Perché è importante:* Importare `HandwritingMode` ci permette di indicare al motore che stiamo gestendo **handwritten text recognition python** anziché testo stampato, migliorando drasticamente la precisione per i tratti corsivi. + +--- + +## Passo 2 – Crea e configura il motore OCR + +Ora creiamo un'istanza di `OcrEngine` e la impostiamo in modalità scritta a mano. Puoi anche regolare la soglia di confidenza; valori più bassi accettano scrittura traballante, valori più alti richiedono input più pulito. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Consiglio professionale:* Se le tue note sono scansionate a 300 DPI o più, otterrai di solito un punteggio migliore. Per immagini a bassa risoluzione, considera di ingrandirle con Pillow prima di passarle al motore. + +--- + +## Passo 3 – Prepara il percorso dell'immagine + +Assicurati che il percorso del file punti all'immagine che vuoi elaborare. I percorsi relativi funzionano bene, ma i percorsi assoluti evitano sorprese del tipo “file non trovato”. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Errore comune:* Dimenticare di escapare le barre rovesciate su Windows (`C:\\folder\\image.jpg`). Usare stringhe raw (`r"C:\folder\image.jpg"`) elimina questo problema. + +--- + +## Passo 4 – Esegui il riconoscimento e cattura i risultati + +Il metodo `recognize` fa il lavoro pesante. Restituisce un oggetto con le proprietà `.text` e `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Output previsto (esempio):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Se la confidenza scende sotto 0,5, potresti dover pulire l'immagine (rimuovere ombre, aumentare il contrasto) o abbassare la soglia nel Passo 2. + +--- + +## Passo 5 – Pulisci le risorse + +Aspose OCR mantiene risorse native; chiamare `dispose()` le rilascia e previene perdite di memoria, specialmente quando si elaborano molte immagini in un ciclo. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Perché dispose?* Nei servizi a lunga esecuzione (ad esempio un'API Flask che accetta upload), dimenticare di liberare le risorse può esaurire rapidamente la memoria di sistema. + +--- + +## Script completo – Esecuzione con un click + +Mettendo tutto insieme, ecco uno script autonomo che puoi copiare‑incollare ed eseguire. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Salva questo file come `handwritten_ocr.py` ed esegui `python handwritten_ocr.py`. Se tutto è configurato correttamente, vedrai il testo estratto stampato sulla console. + +--- + +## Gestione dei casi limite e delle variazioni comuni + +### Immagini a basso contrasto +Se lo sfondo si mescola con l'inchiostro, aumenta prima il contrasto: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Note ruotate +Una pagina di taccuino inclinata può compromettere il riconoscimento. Usa Pillow per correggere l'inclinazione: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDF multi‑pagina +Aspose OCR può gestire anche pagine PDF, ma è necessario convertire ogni pagina in un'immagine prima (ad esempio usando `pdf2image`). Poi itera le immagini con la stessa funzione `recognize_handwriting`. + +--- + +## Consigli professionali per risultati migliori di **Extract Handwritten Text** + +- **DPI matters:** Punta a 300 DPI o più quando scansioni. +- **Avoid colored backgrounds:** Il bianco puro o il grigio chiaro producono l'output più pulito. +- **Batch processing:** Avvolgi la funzione in un ciclo `for` e registra la confidenza di ogni pagina; scarta i risultati al di sotto di una soglia per mantenere alta la qualità. +- **Language support:** Aspose OCR supporta più lingue; imposta `engine.set_language("en")` per ottimizzare solo l'inglese. + +--- + +## Domande frequenti + +**Questo funziona su Linux?** +Sì—Aspose OCR include binari nativi per Windows, macOS e Linux. Basta installare il pacchetto pip e sei pronto all'uso. + +**E se la mia scrittura è estremamente corsiva?** +Prova ad abbassare la soglia di confidenza (`0.5` o anche `0.4`). Tieni presente che ciò può introdurre più rumore, quindi potresti dover post‑processare l'output (ad esempio con un correttore ortografico) se necessario. + +**Posso usarlo in un servizio web?** +Assolutamente. La funzione `recognize_handwriting` è senza stato, rendendola perfetta per endpoint Flask o FastAPI. Ricorda solo di chiamare `dispose()` dopo ogni richiesta o di usare un context manager. + +--- + +## Conclusione + +Abbiamo coperto **come riconoscere la scrittura a mano** in Python dall'inizio alla fine, mostrandoti come **estrarre testo scritto a mano**, regolare le impostazioni di confidenza e gestire ostacoli comuni come basso contrasto o pagine ruotate. Lo script completo sopra è pronto per l'esecuzione, e la funzione modulare lo rende facile da integrare in progetti più grandi—che tu stia costruendo un'app per prendere appunti, digitalizzare archivi o semplicemente sperimentare con tecniche di **handwritten ocr tutorial python**. + +Come prossimo passo, potresti esplorare **handwritten text recognition python** per note multilingue, o combinare OCR con elaborazione del linguaggio naturale per riassumere automaticamente i verbali delle riunioni. Il cielo è il limite—provalo e lascia che il tuo codice dia vita ai graffi. + +Buon coding, e sentiti libero di lasciare le tue domande nei commenti! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/italian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..564993994 --- /dev/null +++ b/ocr/italian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,181 @@ +--- +category: general +date: 2026-04-29 +description: Scopri come eseguire l'OCR sulle tue scansioni, utilizzare automaticamente + il modello Hugging Face e riconoscere il testo dalle scansioni con Aspose OCR in + pochi minuti. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: it +og_description: Come eseguire l'OCR su scansioni usando Aspose OCR, scaricare automaticamente + un modello Hugging Face e ottenere testo pulito e con punteggiatura. +og_title: Come eseguire OCR con Aspose e Hugging Face – Guida completa +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Come eseguire OCR con Aspose e Hugging Face – Guida completa +url: /it/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Come eseguire OCR con Aspose & Hugging Face – Guida completa + +Ti sei mai chiesto **come eseguire OCR** su una pila di documenti scansionati senza passare ore a regolare le impostazioni? Non sei il solo. In molti progetti, gli sviluppatori devono **riconoscere il testo dalle scansioni** rapidamente, ma si imbattono in download dei modelli e post‑processing. + +Buone notizie: questo tutorial ti mostra una soluzione pronta all'uso che **utilizza un modello Hugging Face**, lo scarica automaticamente e aggiunge la punteggiatura in modo che l'output sembri scritto da un umano. Alla fine avrai uno script che elabora ogni immagine in una cartella e genera un file `.txt` pulito accanto a ciascuna scansione. + +## Cosa ti serve + +- Python 3.8+ (il codice usa le f‑string, quindi le versioni più vecchie non vanno bene) +- Pacchetto `aspose-ocr` (installalo con `pip install aspose-ocr`) +- Accesso a Internet per il download del modello al primo avvio +- Una cartella di scansioni immagine (`.png`, `.jpg` o `.tif`) + +Tutto qui—nessun binario extra, nessuna manipolazione manuale del modello. Iniziamo. + +![esempio di come eseguire OCR](https://example.com/ocr-demo.png "esempio di come eseguire OCR") + +## Passo 1: Importa le classi Aspose OCR e configura l'ambiente + +Iniziamo importando le classi necessarie dalla libreria Aspose OCR. Importare tutto all'inizio mantiene lo script ordinato e facilita l'individuazione di dipendenze mancanti. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Perché è importante*: `OcrEngine` fa il lavoro pesante, mentre `AsposeAI` ci permette di collegare un modello di linguaggio di grandi dimensioni per un post‑processing più intelligente. Se salti l'import, il resto del codice non verrà nemmeno compilato—quindi non dimenticarlo. + +## Passo 2: Configura un modello Hugging Face consapevole della GPU + +Ora indichiamo ad Aspose dove scaricare il modello e quante layer devono essere eseguite sulla GPU. Il flag `allow_auto_download="true"` gestisce automaticamente il **download del modello** per te. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Consiglio professionale**: se non hai una GPU, imposta `gpu_layers=0`. Il modello ricadrà sulla CPU, più lenta ma comunque funzionante. + +### Perché scegliere un modello Hugging Face? + +Hugging Face ospita una collezione enorme di LLM pronti all'uso. Puntando a `Qwen/Qwen2.5-3B-Instruct-GGUF`, ottieni un modello compatto, istruito per istruzioni, che può aggiungere punteggiatura, correggere spaziature e persino sistemare piccoli errori OCR. Questa è l'essenza dell'**uso di un modello Hugging Face** nella pratica. + +## Passo 3: Inizializza il motore AI e abilita il post‑processing di punteggiatura + +Il motore AI non serve solo per chat sofisticate—qui colleghiamo un *aggiuntore di punteggiatura* che pulisce l'output grezzo dell'OCR. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Cosa succede?* La chiamata `set_post_processor` registra un post‑processor integrato che viene eseguito dopo che il motore OCR ha terminato. Prende la stringa grezza e inserisce virgole, punti e lettere maiuscole dove necessario, rendendo il testo finale molto più leggibile. + +## Passo 4: Crea il motore OCR e collega il motore AI + +Collegare il motore AI al motore OCR ci fornisce un unico oggetto capace sia di leggere i caratteri sia di rifinire il risultato. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Se salti questo passo, l'OCR funzionerà comunque, ma perderai il miglioramento della punteggiatura—quindi l'output sembrerà un flusso di parole senza separazioni. + +## Passo 5: Elabora ogni immagine in una cartella + +Ecco il cuore del tutorial. Iteriamo su ciascuna immagine, eseguiamo l'OCR, applichiamo il post‑processor e scriviamo il testo pulito in un file `.txt` affiancato. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Cosa aspettarsi + +L'esecuzione dello script stampa qualcosa di simile a: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Ogni riga indica il punteggio di confidenza (un rapido controllo di salute) e crea file come `invoice_001.png.txt`, `receipt_2024.tif.txt`, ecc., contenenti testo punteggiato e leggibile da un umano. + +### Casi limite e variazioni + +- **Scansioni non inglesi**: cambia `hugging_face_repo_id` con un modello multilingue (ad es., `microsoft/Multilingual-LLM-GGUF`). +- **Lotti grandi**: avvolgi il ciclo in un `concurrent.futures.ThreadPoolExecutor` per l'elaborazione parallela, ma fai attenzione ai limiti di memoria della GPU. +- **Post‑processing personalizzato**: sostituisci `"punctuation_adder"` con il tuo script se hai bisogno di una pulizia specifica per dominio (ad es., rimuovere numeri di fattura). + +## Passo 6: Pulisci le risorse + +Quando il lavoro termina, liberare le risorse evita perdite di memoria, soprattutto importante se lo esegui all'interno di un servizio a lunga durata. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Trascurare questo passo può lasciare memoria GPU occupata, sabotando esecuzioni successive. + +## Riepilogo: Come eseguire OCR end‑to‑end + +In poche righe di codice, abbiamo mostrato **come eseguire OCR** su una cartella di scansioni, **usare un modello Hugging Face** che si scarica da solo al primo avvio, e **riconoscere il testo dalle scansioni** con punteggiatura aggiunta automaticamente. Lo script completo è pronto per il copia‑incolla, per aggiustare i percorsi e per l'esecuzione. + +## Prossimi passi e argomenti correlati + +- **Post‑processing batch**: esplora `ocr_engine.run_batch_postprocessor` per una gestione di massa ancora più veloce. +- **Modelli alternativi**: prova la famiglia `openai/whisper` se ti serve anche speech‑to‑text insieme all'OCR. +- **Integrazione con database**: memorizza il testo estratto in SQLite o Elasticsearch per archivi ricercabili. + +Sentiti libero di sperimentare—cambia modello, regola `gpu_layers` o aggiungi il tuo post‑processor. La flessibilità di Aspose OCR combinata con l'hub di modelli di Hugging Face rende questa base versatile per qualsiasi progetto di digitalizzazione documentale. + +--- + +*Buon coding! Se incontri problemi, lascia un commento qui sotto o consulta la documentazione di Aspose OCR per opzioni di configurazione più approfondite.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/italian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..9df48c208 --- /dev/null +++ b/ocr/italian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,209 @@ +--- +category: general +date: 2026-04-29 +description: Esegui OCR su un'immagine usando Python, scarica automaticamente un modello + HuggingFace e rilascia la memoria GPU in modo efficiente mentre pulisci il testo + OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: it +og_description: Scopri come eseguire OCR su un'immagine in Python, scaricare automaticamente + un modello HuggingFace, pulire il testo e liberare la memoria GPU. +og_title: Esegui OCR su un'immagine con Python – Guida passo passo +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Esegui OCR su immagine con Python – Guida completa +url: /it/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Esegui OCR su Immagine con Python – Guida Completa + +Ti è mai capitato di **eseguire OCR su immagine** file ma di restare bloccato nella fase di download del modello o di pulizia della memoria GPU? Non sei l'unico—molti sviluppatori incontrano questo ostacolo quando provano per la prima volta a combinare il riconoscimento ottico dei caratteri con grandi modelli linguistici. + +In questo tutorial percorreremo una soluzione unica, end‑to‑end, che **downloads a HuggingFace model in Python**, esegue Aspose OCR, pulisce l'output grezzo e infine **releases GPU memory Python** può recuperare. Alla fine avrai uno script pronto all'uso che trasforma un PNG scansionato in testo rifinito e ricercabile. + +> **What you’ll get:** un esempio di codice completo e eseguibile, spiegazioni sul perché ogni passaggio è importante, consigli per evitare gli errori più comuni e uno sguardo su come personalizzare la pipeline per i tuoi progetti. + +--- + +## Cosa ti serve + +- Python 3.9 o più recente (l'esempio è stato testato su 3.11) +- pacchetto `aspose-ocr` (installalo via `pip install aspose-ocr`) +- Una connessione internet per il passaggio **download HuggingFace model python** +- Una GPU compatibile CUDA se desideri il boost di velocità (opzionale ma consigliato) + +Non sono richieste dipendenze di sistema aggiuntive; il motore Aspose OCR include tutto il necessario. + +--- + +![perform OCR on image example](image.png "Esempio di eseguire OCR su immagine con Aspose OCR e un post‑processore LLM") + +*Testo alternativo dell'immagine: “eseguire OCR su immagine – output di Aspose OCR prima e dopo la pulizia AI”* + +--- + +## Eseguire OCR su Immagine – Panoramica Passo‑per‑Passo + +Di seguito suddividiamo il flusso di lavoro in blocchi logici. Ogni blocco ha la propria intestazione, così gli assistenti AI possono saltare rapidamente alla parte di interesse, e i motori di ricerca possono indicizzare le parole chiave rilevanti. + +### 1. Scaricare il Modello HuggingFace in Python + +La prima cosa da fare è recuperare un modello linguistico che fungerà da post‑processore per l'output grezzo dell'OCR. Aspose OCR include una classe helper chiamata `AsposeAI` che può scaricare automaticamente un modello dal hub HuggingFace. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Perché è importante:** +- **download HuggingFace model python** – eviti di gestire manualmente file zip o l'autenticazione con token. +- L'uso della quantizzazione `int8` riduce il modello a circa un quarto della sua dimensione originale, il che è cruciale quando in seguito devi **release GPU memory python**. + +> **Consiglio pro:** Mantieni `directory_model_path` su un SSD per tempi di caricamento più rapidi. + +--- + +### 2. Inizializzare l'Ai Helper e Abilitare il Controllo Ortografico + +Ora creiamo un'istanza `AsposeAI` e colleghiamo un post‑processore correttore ortografico. È qui che inizia la magia del **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Spiegazione:** +Il correttore ortografico esamina ogni token del motore OCR e suggerisce modifiche limitate da `max_edits`. Questa piccola modifica può trasformare “rec0gn1tion” in “recognition” senza un modello linguistico pesante. + +--- + +### 3. Collegare l'Ai Helper al Motore OCR + +Aspose ha introdotto un nuovo metodo nella versione 23.4 che consente di collegare direttamente un motore AI alla pipeline OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Perché lo facciamo:** +Collegando l'Ai helper in anticipo, il motore OCR può opzionalmente utilizzare il modello per miglioramenti in tempo reale (ad esempio, rilevamento del layout). Mantiene anche il codice ordinato—non è necessario creare loop di post‑processing separati in seguito. + +--- + +### 4. Eseguire OCR sull'Immagine Scansionata + +Ecco il passaggio centrale che effettivamente **perform OCR on image** file. Sostituisci `YOUR_DIRECTORY/input.png` con il percorso della tua scansione. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +L'output grezzo tipico può contenere interruzioni di riga in posizioni strane, caratteri riconosciuti erroneamente o simboli estranei. Ecco perché è necessario il passaggio successivo. + +**Expected raw output (example):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Pulire il Testo OCR in Python con il Post‑Processor AI + +Ora lasciamo che l'AI pulisca il caos. Questo è il cuore del processo **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Result you’ll see:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Nota come il correttore ortografico abbia corretto “Th1s” → “This” e rimosso il “4n” estraneo. Il modello normalizza anche gli spazi, che è spesso un punto critico quando in seguito inserisci il testo in pipeline NLP successive. + +--- + +### 6. Rilasciare la Memoria GPU in Python – Passaggi di Pulizia + +Quando hai finito, è buona pratica liberare le risorse GPU, specialmente se esegui più job OCR in un servizio a lungo termine. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Cosa succede dietro le quinte:** +`free_resources()` scarica il modello dalla GPU, restituendo la memoria al driver CUDA. `dispose()` chiude i buffer interni del motore OCR. Saltare queste chiamate può provocare errori di out‑of‑memory dopo solo poche immagini. + +> **Ricorda:** Se prevedi di elaborare batch in un ciclo, chiama la pulizia dopo ogni batch o riutilizza lo stesso `ai_helper` senza liberarlo fino alla fine. + +--- + +## Bonus: Personalizzare la Pipeline per Scenari Differenti + +### Regolare la Quantizzazione del Modello + +Se disponi di una GPU potente (ad esempio, RTX 4090) e desideri maggiore accuratezza, cambia `hugging_face_quantization` in `"fp16"` e aumenta `gpu_layers` a `30`. Questo consumerà più memoria, quindi dovrai **release GPU memory python** in modo più aggressivo dopo ogni batch. + +### Utilizzare un Correttore Ortografico Personalizzato + +Puoi sostituire il `spell_corrector` integrato con un post‑processore personalizzato che esegue correzioni specifiche per dominio (ad esempio, terminologia medica). Basta implementare l'interfaccia richiesta e passare il suo nome a `set_post_processor`. + +### Elaborazione in Batch di Più Immagini + +Racchiudi i passaggi OCR in un ciclo `for`, raccogli `cleaned_result.text` in una lista e chiama `ai_helper.free_resources()` solo dopo il ciclo se disponi di sufficiente RAM GPU. Questo riduce l'overhead del caricamento ripetuto del modello. + +--- + +## Conclusione + +Ti abbiamo appena mostrato come **perform OCR on image** file in Python, scaricare automaticamente un **HuggingFace model**, **clean OCR text**, e rilasciare in modo sicuro **GPU memory** quando hai finito. Lo script completo è pronto per il copia‑incolla, e le spiegazioni ti danno la fiducia per adattarlo a progetti più grandi. + +Prossimi passi? Prova a sostituire il modello Qwen 2.5 con una variante LLaMA più grande, sperimenta con diversi post‑processori, o integra l'output pulito in un indice Elasticsearch ricercabile. Le possibilità sono infinite, e ora hai una solida base su cui costruire. + +Buon coding, e che le tue pipeline OCR siano sempre pulite e amiche della memoria! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/japanese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..c2a569f99 --- /dev/null +++ b/ocr/japanese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,217 @@ +--- +category: general +date: 2026-04-29 +description: Aspose OCR を使用して Python で PDF からテキストを抽出します。バッチ OCR PDF 処理を学び、スキャンされた + PDF のテキストを変換し、信頼度の低いページを処理します。 +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: ja +og_description: Aspose OCR を使用して Python で PDF からテキストを抽出します。このガイドでは、バッチ OCR PDF 処理、スキャンされた + PDF のテキスト変換、低信頼度結果の処理方法を示します。 +og_title: PDFからテキストを抽出 – PythonでPDFをOCR +tags: +- OCR +- Python +- PDF processing +title: PDFからテキストを抽出 – PythonでPDFをOCR +url: /ja/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# PDFからテキストを抽出 – PythonでOCR PDF + +PDFから**テキストを抽出**したいが、ファイルが単なるスキャン画像だったことはありませんか? あなた一人ではありません—多くの開発者がPDFを検索可能なデータに変換しようとしてこの壁にぶつかります。良いニュースは、Aspose OCR for Python を使えば、数行のコードでスキャンしたPDFのテキストを変換でき、さらに**バッチOCR PDF処理**を数十ファイルに対して実行できることです。 + +このチュートリアルでは、ライブラリのセットアップ、単一PDFでのOCR実行、バッチへのスケーリング、そして低信頼度ページの処理(手動レビューが必要なときが分かるように)という全体のワークフローを順に解説します。最後まで読むと、任意のスキャンPDFからテキストを抽出できる実行可能なスクリプトが手に入り、各ステップの背景も理解できるようになります。 + +## 必要なもの + +Before we dive in, make sure you have: + +- Python 3.8 以上(コードはf文字列を使用しているため、3.6以降でも動作しますが、3.8以上を推奨します) +- Aspose OCR for Python のライセンスまたは無料トライアルキー(Aspose のウェブサイトから取得できます) +- 処理したいスキャンPDFが1つ以上入っているフォルダー +- 生成される *.txt* レポート用の適度なディスク容量 + +以上です—重い外部依存関係は不要で、OpenCVの操作も必要ありません。Aspose OCR エンジンが重い処理をすべて担ってくれます。 + +## 環境設定 + +まず、PyPI から Aspose OCR パッケージをインストールします。 + +```bash +pip install aspose-ocr +``` + +ライセンスファイル(`Aspose.OCR.lic`)を持っている場合は、プロジェクトのルートに配置し、以下のように有効化します。 + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **プロのコツ:** ライセンスファイルはバージョン管理に含めないでください。`.gitignore` に追加して、誤って公開されるのを防ぎましょう。 + +## 単一PDFでのOCR実行 + +それでは、単一のスキャンPDFからテキストを抽出しましょう。主な手順は次の通りです。 + +1. `OcrEngine` のインスタンスを作成します。 +2. 対象のPDFファイルを指定します。 +3. 各ページの `OcrResult` を取得します。 +4. プレーンテキストの出力をディスクに書き込みます。 +5. エンジンを破棄してネイティブリソースを解放します。 + +以下が完全な実行可能スクリプトです。 + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**実行結果の例:** 各ページでスクリプトは `Page 1: confidence 97.45%` のように出力します。信頼度が80 %未満のページがあると、警告が表示され、OCR が文字を見逃した可能性があることが分かります。 + +### なぜこれが機能するのか + +- **`OcrEngine`** はネイティブな Aspose OCR ライブラリへのゲートウェイで、画像前処理から文字認識まで全てを処理します。 +- **`extract_from_pdf`** は各PDFページを自動的にラスタライズするため、PDF を画像に変換する手間が不要です。 +- **信頼度スコア** を利用すれば品質チェックを自動化できます。正確さが重要な法務や医療文書の処理に特に有用です。 + +## PythonでバッチOCR PDF処理 + +実務では複数のファイルを扱うことがほとんどです。単一ファイル用スクリプトを拡張し、ディレクトリ内を走査して各PDFを処理し、結果を対応するサブフォルダーに保存する **バッチOCR PDF処理** パイプラインを作りましょう。 + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### これが役立つ理由 + +- **スケーラビリティ:** 関数はフォルダーを一度走査し、各PDFごとに専用の出力サブフォルダーを作成します。多数の文書がある場合でも整理された状態を保てます。 +- **再利用性:** `ocr_pdf_file` は純粋関数なので、他のスクリプト(例: Webサービス)から呼び出すことができます。 +- **エラーハンドリング:** 入力フォルダーが空の場合、スクリプトは親切なメッセージを表示し、無音の失敗を防ぎます。 + +## スキャンPDFテキストの変換 – エッジケースの処理 + +上記のコードは**ほとんどのPDF**で動作しますが、いくつかの特殊ケースに遭遇することがあります。 + +| 状況 | 発生理由 | 対策 | +|-----------|----------------|-----------------| +| **暗号化PDF** | PDFがパスワードで保護されています。 | `extract_from_pdf(pdf_path, password="yourPwd")` にパスワードを渡してください。 | +| **多言語文書** | Aspose OCR のデフォルト言語は英語です。 | スペイン語の場合は `ocr_engine.language = "spa"` と設定するか、混在言語の場合はリストで指定してください。 | +| **非常に大きなPDF(>500ページ)** | 各ページがRAMに読み込まれるため、メモリ使用量が急増します。 | `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` などでPDFを分割して処理し、ループさせます。 | +| **スキャン品質が低い** | DPIが低い、またはノイズが多いと信頼度が低下します。 | `engine.image_preprocessing = True` で前処理を行うか、`engine.dpi = 300` でDPIを上げてください。 | + +> **注意:** 画像前処理を有効にすると CPU 時間が顕著に増加します。夜間バッチを実行する場合は、十分な時間を確保するか、別のワーカーを立ち上げてください。 + +## 出力の検証 + +スクリプトが完了すると、以下のようなフォルダー構造が作成されます。 + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +任意の `.txt` ファイルを開くと、元のスキャン内容を反映したクリーンな UTF‑8 エンコードのテキストが表示されます。文字化けが見られる場合は、PDF の言語設定を再確認し、マシンに適切なフォントパックがインストールされているか確認してください。 + +## リソースのクリーンアップ + +Aspose OCR はネイティブ DLL に依存しているため、完了後に `engine.dispose()` を呼び出すことが重要です。この手順を忘れると、特に長時間実行するバッチジョブでメモリリークが発生する可能性があります。 + +```python +# Always the last line of your script +engine.dispose() +``` + +## 完全なエンドツーエンド例 + +すべてをまとめると、以下は単一の + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/japanese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..a3a06bfe9 --- /dev/null +++ b/ocr/japanese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,275 @@ +--- +category: general +date: 2026-04-29 +description: Aspose OCR を使用して Python で手書き文字を認識する方法を学びましょう。このステップバイステップガイドでは、手書きテキストを効率的に抽出する方法を示します。 +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: ja +og_description: Pythonで手書き文字を認識する方法は?Aspose OCRを使用して手書きテキストを抽出する完全ガイドを、コード、ヒント、エッジケースの対処法と共にご覧ください。 +og_title: Pythonで手書き文字を認識する方法 – 完全チュートリアル +tags: +- OCR +- Python +- HandwritingRecognition +title: Pythonで手書き文字を認識する方法 – 完全チュートリアル +url: /ja/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Pythonで手書き文字を認識する方法 – 完全チュートリアル + +Pythonプロジェクトで **手書き文字を認識する方法** が必要だったけど、どこから始めればいいかわからないことはありませんか? あなたは一人ではありません—開発者は常に「スキャンしたメモからテキストを抽出できるか?」と質問します。朗報は、最新のOCRライブラリのおかげでこれがとても簡単になることです。このガイドでは Aspose OCR を使って **手書き文字を認識する方法** を解説し、**手書きテキストを抽出** する方法も確実に学べます。 + +インストールから、乱れた筆記体スクリプト向けに信頼度閾値を調整する方法まで、すべてをカバーします。最後まで読めば、抽出したテキストと全体の信頼度スコアを出力する実行可能なスクリプトが手に入ります—ノート取りアプリ、アーカイブツール、あるいは単なる好奇心の満足に最適です。OCRの事前経験は不要で、基本的なPythonの知識があれば十分です。 + +--- + +## 必要なもの + +- **Python 3.9+**(最新の安定版が最適です) +- **Aspose.OCR for Python via .NET** – `pip install aspose-ocr` でインストール +- 処理したい **手書き画像**(JPEG/PNG) +- 任意:依存関係を整理するための仮想環境 + +これらが揃ったら、さっそく始めましょう。 + +![手書き文字認識の例](/images/handwritten-sample.jpg "手書き文字認識の例") + +*(Alt text: “手書き文字認識の例(スキャンした手書きメモを示す)”)* + +--- + +## ステップ 1 – Aspose OCR クラスのインストールとインポート + +まず最初に、OCRエンジンそのものが必要です。Aspose は印刷文字認識と手書きモードを分離したクリーンな API を提供しています。 + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Why this matters:* `HandwritingMode` をインポートすることで、エンジンに **handwritten text recognition python** を扱っていることを伝え、印刷文字ではなく手書き文字の認識精度が大幅に向上します。 + +--- + +## ステップ 2 – OCRエンジンの作成と設定 + +`OcrEngine` インスタンスを作成し、手書きモードに切り替えます。信頼度閾値も調整可能です。低い値は揺らいだ文字を受け入れ、高い値はきれいな入力を要求します。 + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Pro tip:* メモが 300 DPI 以上でスキャンされていれば、通常はスコアが向上します。低解像度画像の場合は、エンジンに渡す前に Pillow でアップスケールすることを検討してください。 + +--- + +## ステップ 3 – 画像パスの準備 + +処理したい画像へのファイルパスが正しく指していることを確認してください。相対パスでも問題ありませんが、絶対パスを使うと「ファイルが見つからない」エラーを防げます。 + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Common pitfall:* Windows でバックスラッシュをエスケープし忘れること(`C:\\folder\\image.jpg`)。生文字列(`r"C:\folder\image.jpg"`)を使うとこの問題を回避できます。 + +--- + +## ステップ 4 – 認識を実行し結果を取得 + +`recognize` メソッドが本格的な処理を行います。返されるオブジェクトには `.text` と `.confidence` プロパティがあります。 + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Expected output (example):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +信頼度が 0.5 未満に下がった場合は、画像をクリーニング(影の除去、コントラスト上げ)するか、ステップ 2で閾値を下げる必要があります。 + +--- + +## ステップ 5 – リソースのクリーンアップ + +Aspose OCR はネイティブリソースを保持します。`dispose()` を呼び出すことでそれらを解放し、特にループで多数の画像を処理する際のメモリリークを防げます。 + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Why dispose?* 長時間稼働するサービス(例:アップロードを受け付ける Flask API)では、リソース解放を忘れるとシステムメモリがすぐに枯渇します。 + +--- + +## 完全スクリプト – ワンクリック実行 + +すべてをまとめた、コピー&ペーストで実行できる自己完結型スクリプトです。 + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +`handwritten_ocr.py` として保存し、`python handwritten_ocr.py` を実行してください。環境が正しく設定されていれば、抽出されたテキストがコンソールに表示されます。 + +--- + +## エッジケースと一般的なバリエーションの処理 + +### 低コントラスト画像 +背景がインクににじんでいる場合は、まずコントラストを上げます: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### 回転したメモ +斜めに撮影されたノートページは認識精度を下げることがあります。Pillow を使ってデスキュー(傾き補正)してください: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### マルチページPDF +Aspose OCR は PDF ページも扱えますが、まず各ページを画像に変換する必要があります(例:`pdf2image` を使用)。変換後は同じ `recognize_handwriting` 関数で画像をループ処理します。 + +--- + +## **手書きテキスト抽出** の結果を向上させるプロのコツ + +- **DPI matters:** スキャン時は 300 DPI 以上を目指しましょう。 +- **Avoid colored backgrounds:** 純白または薄いグレーが最もクリーンな出力を得られます。 +- **Batch processing:** 関数を `for` ループでラップし、各ページの信頼度を記録。閾値以下の結果は破棄して品質を保ちます。 +- **Language support:** Aspose OCR は複数言語に対応しています。英語のみ最適化したい場合は `engine.set_language("en")` を設定してください。 + +--- + +## よくある質問 + +**Does this work on Linux?** +はい—Aspose OCR は Windows、macOS、Linux 用のネイティブバイナリを同梱しています。pip パッケージをインストールすればすぐに使用可能です。 + +**What if my handwriting is extremely cursive?** +信頼度閾値を下げてみてください(`0.5` あるいは `0.4`)。ただし、ノイズが増える可能性があるため、必要に応じて出力を後処理(例:スペルチェック)してください。 + +**Can I use this in a web service?** +もちろんです。`recognize_handwriting` 関数はステートレスなので、Flask や FastAPI のエンドポイントに最適です。各リクエスト後に `dispose()` を呼び出すか、コンテキストマネージャを使用することを忘れないでください。 + +--- + +## 結論 + +Python における **手書き文字を認識する方法** を最初から最後まで網羅し、**手書きテキストを抽出** する手順、信頼度設定の調整、低コントラストや回転ページといった一般的な落とし穴への対処法を示しました。上記の完全スクリプトはすぐに実行可能で、モジュラー化された関数はノート取りアプリの構築、アーカイブのデジタル化、あるいは **handwritten ocr tutorial python** の実験など、より大規模なプロジェクトへの統合を容易にします。 + +次のステップとして、マルチリンガルなメモ向けに **handwritten text recognition python** を探求したり、OCR と自然言語処理を組み合わせて会議議事録を自動要約したりすることが考えられます。可能性は無限です—ぜひ試してみて、コードで落書きに命を吹き込みましょう。 + +Happy coding, and feel free to drop your questions in the comments! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/japanese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..f1d569cad --- /dev/null +++ b/ocr/japanese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,178 @@ +--- +category: general +date: 2026-04-29 +description: スキャンにOCRを実行し、Hugging Faceモデルを自動的に使用し、Aspose OCRで数分でスキャンからテキストを認識する方法を学びましょう。 +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: ja +og_description: Aspose OCR を使用してスキャン画像で OCR を実行し、Hugging Face のモデルを自動的にダウンロードして、きれいで句読点付きのテキストを取得する方法。 +og_title: Aspose と Hugging Face を使用した OCR の実行方法 – 完全ガイド +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Aspose と Hugging Face で OCR を実行する方法 – 完全ガイド +url: /ja/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Aspose と Hugging Face で OCR を実行する方法 – 完全ガイド + +スキャンした文書の山から **OCR を実行** したいのに、設定に何時間も費やすのはもううんざりですか? 多くのプロジェクトで、開発者は **スキャンからテキストを認識** したいのに、モデルのダウンロードや後処理でつまずきます。 + +朗報です:このチュートリアルでは、**Hugging Face のモデル** を使用し、モデルを自動で取得し、句読点を付加して人が書いたかのように出力する、すぐに使えるソリューションを紹介します。最後まで読めば、フォルダー内のすべての画像を処理し、各スキャンの横にクリーンな `.txt` ファイルを生成するスクリプトが手に入ります。 + +## 必要なもの + +- Python 3.8+(コードは f‑strings を使用しているため、古いバージョンは不可) +- `aspose-ocr` パッケージ(`pip install aspose-ocr` でインストール) +- 初回モデルダウンロードのためのインターネット接続 +- 画像スキャンが入ったフォルダー(`.png`, `.jpg`, または `.tif`) + +以上です—余計なバイナリや手動でのモデル設定は不要です。さっそく始めましょう。 + +![OCR 実行例](https://example.com/ocr-demo.png "OCR 実行例") + +## 手順 1: Aspose OCR クラスをインポートし環境を設定 + +まず Aspose OCR ライブラリから必要なクラスを取得します。最初にすべてインポートしておくと、スクリプトがすっきりし、依存関係の抜けもすぐに分かります。 + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*重要ポイント*: `OcrEngine` が本体の処理を担い、`AsposeAI` が大規模言語モデルを組み込んで高度な後処理を可能にします。インポートを忘れるとコードはコンパイルすらできませんので、必ず記述してください。 + +## 手順 2: GPU 対応の Hugging Face モデルを設定 + +次に Aspose にモデルの取得先と、GPU 上で実行するレイヤー数を指示します。`allow_auto_download="true"` フラグが **モデルを自動でダウンロード** してくれます。 + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **プロのコツ**: GPU が無い場合は `gpu_layers=0` に設定してください。モデルは CPU にフォールバックし、遅くなりますが動作します。 + +### なぜ Hugging Face モデルを選ぶのか? + +Hugging Face にはすぐに使える LLM が大量に揃っています。`Qwen/Qwen2.5-3B-Instruct-GGUF` を指定すれば、コンパクトで指示に従うチューニング済みモデルが手に入り、句読点付加やスペース修正、軽微な OCR エラーの補正が可能です。これが実務で **use hugging face model** する本質です。 + +## 手順 3: AI エンジンを初期化し句読点後処理を有効化 + +AI エンジンはチャットだけのものではありません—ここでは *句読点付加* 機能を組み込み、OCR の生データをきれいにします。 + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*何が起きているか*? `set_post_processor` 呼び出しで組み込みの後処理器を登録します。OCR エンジンが完了した後に生文字列にカンマや句点、適切な大文字を挿入し、最終テキストをはるかに読みやすくします。 + +## 手順 4: OCR エンジンを作成し AI エンジンを結合 + +AI エンジンを OCR エンジンに接続すると、文字認識と結果の磨き上げを同時に行える単一オブジェクトが得られます。 + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +このステップを省くと OCR は動作しますが、句読点の強化が失われ、出力は単語が連続しただけのものになります。 + +## 手順 5: フォルダー内のすべての画像を処理 + +チュートリアルの核心です。各画像をループで回し、OCR を実行し、後処理を適用し、クリーンなテキストを同名の `.txt` ファイルに書き出します。 + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### 期待される結果 + +スクリプト実行時に次のような出力が表示されます: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +各行は信頼度スコア(簡易ヘルスチェック)を示し、`invoice_001.png.txt`、`receipt_2024.tif.txt` など、句読点が付いた人間が読めるテキストファイルが生成されます。 + +### エッジケースとバリエーション + +- **非英語スキャン**: `hugging_face_repo_id` を多言語モデル(例: `microsoft/Multilingual-LLM-GGUF`)に変更してください。 +- **大量バッチ**: ループを `concurrent.futures.ThreadPoolExecutor` でラップして並列処理できますが、GPU メモリ上限に注意してください。 +- **カスタム後処理**: ドメイン固有のクリーンアップが必要な場合は `"punctuation_adder"` を独自スクリプトに置き換えます(例: 請求書番号の除去)。 + +## 手順 6: リソースをクリーンアップ + +ジョブが完了したらリソースを解放し、メモリリークを防ぎます。特に長時間稼働するサービス内で実行する場合は重要です。 + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +このステップを怠ると GPU メモリが残り、次回の実行が失敗する原因になります。 + +## まとめ: OCR をエンドツーエンドで実行する方法 + +数行のコードで、フォルダー内のスキャンに対して **OCR を実行** し、**初回実行時に自動ダウンロードされる Hugging Face モデル** を使用し、**句読点付きでテキストを認識** できるようにしました。完成したスクリプトをコピーしてパスを調整すればすぐに実行可能です。 + +## 次のステップと関連トピック + +- **バッチ後処理**: `ocr_engine.run_batch_postprocessor` を使ってさらに高速な大量処理を検討してください。 +- **代替モデル**: OCR と併せて音声認識が必要なら `openai/whisper` 系列を試してみましょう。 +- **データベース連携**: 抽出したテキストを SQLite や Elasticsearch に保存し、検索可能なアーカイブを構築できます。 + +自由に実験してください—モデルを入れ替えたり、`gpu_layers` を調整したり、独自の後処理を追加したり。Aspose OCR と Hugging Face のモデルハブを組み合わせることで、あらゆる文書デジタル化プロジェクトの柔軟な基盤が手に入ります。 + +--- + +*Happy coding! If you hit a snag, drop a comment below or check the Aspose OCR docs for deeper configuration options.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/japanese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..73f28fe2f --- /dev/null +++ b/ocr/japanese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,205 @@ +--- +category: general +date: 2026-04-29 +description: Python を使用して画像の OCR を実行し、HuggingFace のモデルを自動ダウンロードし、OCR テキストをクリーンアップしながら + GPU メモリを効率的に解放する。 +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: ja +og_description: Pythonで画像のOCRを実行し、HuggingFaceモデルを自動でダウンロードし、テキストをクリーンアップし、GPUメモリを解放する方法を学びましょう。 +og_title: Pythonで画像のOCRを実行する – ステップバイステップガイド +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Pythonで画像のOCRを実行する – 完全ガイド +url: /ja/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Pythonで画像のOCRを実行する – 完全ガイド + +画像ファイルで **perform OCR on image** を実行しようとして、モデルのダウンロードや GPU メモリのクリーンアップ段階で行き詰まったことはありませんか? あなただけではありません—光学文字認識と大規模言語モデルを組み合わせようとしたとき、多くの開発者が同じ壁にぶつかります。 + +このチュートリアルでは、**downloads a HuggingFace model in Python** を行い、Aspose OCR を実行し、出力をクリーンアップし、最後に **releases GPU memory Python** を解放する、シングルでエンドツーエンドのソリューションを順に解説します。最後まで読むと、スキャンした PNG を洗練された検索可能なテキストに変換する、すぐに実行できるスクリプトが手に入ります。 + +> **What you’ll get:** 完全な実行可能コードサンプル、各ステップが重要な理由の解説、一般的な落とし穴を回避するためのヒント、そして自分のプロジェクト向けにパイプラインを調整する方法の一端。 + +--- + +## 必要なもの + +- Python 3.9 以上(例は 3.11 でテスト) +- `aspose-ocr` パッケージ(`pip install aspose-ocr` でインストール) +- **download HuggingFace model python** 手順のためのインターネット接続 +- 速度向上を望む場合の CUDA 対応 GPU(オプションだが推奨) + +追加のシステムレベルの依存関係は不要です;Aspose OCR エンジンが必要なものをすべてバンドルしています。 + +![画像で OCR を実行する例](image.png "Aspose OCR と LLM ポストプロセッサを使用した画像 OCR 実行例") + +*画像の代替テキスト: “perform OCR on image – Aspose OCR output before and after AI cleaning”* + +--- + +## 画像で OCR を実行する – ステップバイステップ概要 + +以下では、ワークフローを論理的なチャンクに分割します。各チャンクは独自の見出しを持つため、AI アシスタントは関心のある部分へすぐにジャンプでき、検索エンジンは関連キーワードをインデックスできます。 + +### 1. Pythonで HuggingFace モデルをダウンロード + +最初に行うべきことは、OCR の生データのポストプロセッサとして機能する言語モデルを取得することです。Aspose OCR には `AsposeAI` というヘルパークラスが同梱されており、HuggingFace ハブからモデルを自動的に取得できます。 + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Why this matters:** +- **download HuggingFace model python** – zip ファイルやトークン認証を手動で処理する必要がなくなります。 +- `int8` 量子化を使用すると、モデルサイズが元の約 1/4 に縮小され、後で **release GPU memory python** が必要になる際に重要です。 + +> **Pro tip:** `directory_model_path` を SSD に置くとロード時間が速くなります。 + +--- + +### 2. AI ヘルパーを初期化し、スペルチェックを有効化 + +ここで `AsposeAI` インスタンスを作成し、スペル補正ポストプロセッサを添付します。ここから **clean OCR text python** の魔法が始まります。 + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explanation:** +スペル補正は OCR エンジンからの各トークンを検査し、`max_edits` で制限された編集案を提示します。この小さな調整だけで、重い言語モデルを使わずに “rec0gn1tion” を “recognition” に変換できます。 + +--- + +### 3. AI ヘルパーを OCR エンジンにフック + +Aspose はバージョン 23.4 で新しいメソッドを導入し、AI エンジンを OCR パイプラインに直接組み込めるようにしました。 + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Why we do it:** +AI ヘルパーを早期に接続することで、OCR エンジンは必要に応じてモデルをオンザフライでの改善(例:レイアウト検出)に使用できます。また、コードがすっきりし、後で別個のポストプロセッシングループを作成する必要がなくなります。 + +--- + +### 4. スキャン画像で OCR を実行 + +ここが実際に **perform OCR on image** ファイルを実行する核心ステップです。`YOUR_DIRECTORY/input.png` を自分のスキャン画像のパスに置き換えてください。 + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +典型的な生出力には、奇妙な位置での改行、誤認識文字、または余分な記号が含まれることがあります。そこで次のステップが必要になります。 + +**期待される生出力(例):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. AI ポストプロセッサで Python の OCR テキストをクリーンアップ + +ここで AI に乱れたテキストをクリーンアップさせます。これが **clean OCR text python** プロセスの核心です。 + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**結果:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +スペル補正が “Th1s” → “This” を修正し、余分な “4n” を除去したことに注目してください。モデルはスペースも正規化し、後でテキストを下流の NLP パイプラインに渡す際に頻繁に問題になる点を解消します。 + +--- + +### 6. Pythonで GPU メモリを解放 – クリーンアップ手順 + +作業が完了したら、特に長時間稼働するサービスで複数の OCR ジョブを実行している場合は、GPU リソースを解放することがベストプラクティスです。 + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**内部で何が起きるか:** +`free_resources()` はモデルを GPU からアンロードし、メモリを CUDA ドライバに返します。`dispose()` は OCR エンジンの内部バッファをシャットダウンします。これらの呼び出しを省略すると、数枚の画像だけでメモリ不足エラーが発生する可能性があります。 + +> **Remember:** ループでバッチ処理を行う場合、各バッチの後にクリーンアップを呼び出すか、最後まで解放せずに同じ `ai_helper` を再利用してください。 + +--- + +## ボーナス: シナリオ別パイプライン調整 + +### モデル量子化の調整 + +強力な GPU(例: RTX 4090)を持ち、より高い精度を求める場合は、`hugging_face_quantization` を `"fp16"` に変更し、`gpu_layers` を `30` に増やします。これによりメモリ使用量が増えるため、各バッチ後に **release GPU memory python** をより積極的に行う必要があります。 + +### カスタムスペルチェッカーの使用 + +組み込みの `spell_corrector` を、ドメイン固有の補正(例: 医療用語)を行うカスタムポストプロセッサに置き換えることができます。必要なインターフェースを実装し、その名前を `set_post_processor` に渡すだけです。 + +### 複数画像のバッチ処理 + +OCR 手順を `for` ループで囲み、`cleaned_result.text` をリストに収集し、GPU RAM が十分であればループ後にのみ `ai_helper.free_resources()` を呼び出します。これにより、モデルを繰り返しロードするオーバーヘッドが削減されます。 + +--- + +## 結論 + +ここでは、Python で **perform OCR on image** ファイルを実行し、**download a HuggingFace model** を自動化し、**clean OCR text** を行い、完了時に安全に **release GPU memory** する方法を示しました。完全なスクリプトはコピー&ペースト可能で、解説により大規模プロジェクトへの適用自信が得られます。 + +次のステップは? Qwen 2.5 モデルをより大きな LLaMA バリアントに置き換えてみたり、異なるポストプロセッサを試したり、クリーンアップされた出力を検索可能な Elasticsearch インデックスに統合したりしてください。可能性は無限で、今や堅実な基盤が整いました。 + +コーディングを楽しんで、OCR パイプラインが常にクリーンでメモリに優しいものになりますように! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/korean/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..a56314b08 --- /dev/null +++ b/ocr/korean/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,217 @@ +--- +category: general +date: 2026-04-29 +description: Python에서 Aspose OCR을 사용하여 PDF에서 텍스트를 추출합니다. 배치 OCR PDF 처리 방법을 배우고, 스캔된 + PDF 텍스트를 변환하며, 신뢰도가 낮은 페이지를 처리합니다. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: ko +og_description: Python에서 Aspose OCR을 사용하여 PDF에서 텍스트를 추출합니다. 이 가이드는 배치 OCR PDF 처리, + 스캔된 PDF 텍스트 변환 및 낮은 신뢰도 결과 처리를 보여줍니다. +og_title: PDF에서 텍스트 추출 – Python으로 PDF OCR +tags: +- OCR +- Python +- PDF processing +title: PDF에서 텍스트 추출 – Python으로 PDF OCR +url: /ko/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# PDF에서 텍스트 추출 – Python으로 OCR PDF + +PDF에서 **텍스트를 추출**해야 하는데 파일이 스캔된 이미지일 때가 있나요? 혼자가 아닙니다—많은 개발자들이 PDF를 검색 가능한 데이터로 변환하려다 이 문제에 부딪힙니다. 좋은 소식은? Aspose OCR for Python을 사용하면 몇 줄의 코드로 스캔된 PDF 텍스트를 변환할 수 있고, 파일이 수십 개일 때는 **배치 OCR PDF 처리**도 실행할 수 있습니다. + +이 튜토리얼에서는 전체 워크플로우를 단계별로 살펴보겠습니다: 라이브러리 설정, 단일 PDF에 대한 OCR 실행, 배치로 확장, 그리고 낮은 신뢰도 페이지를 처리하여 언제 수동 검토가 필요한지 알 수 있도록 합니다. 마지막까지 진행하면 스캔된 PDF에서 텍스트를 추출하는 실행 가능한 스크립트를 얻을 수 있으며, 각 단계의 이유도 이해하게 됩니다. + +## 필요 사항 + +시작하기 전에 다음을 준비하세요: + +- Python 3.8 이상 (코드가 f‑strings를 사용하므로 3.6+에서도 동작하지만, 3.8+을 권장합니다) +- Aspose OCR for Python 라이선스 또는 무료 체험 키 (Aspose 웹사이트에서 얻을 수 있습니다) +- 처리하려는 하나 이상의 스캔된 PDF가 들어 있는 폴더 +- 생성된 *.txt* 보고서를 저장할 충분한 디스크 공간 + +그게 전부입니다—무거운 외부 종속성도 없고, OpenCV 같은 복잡한 설정도 필요 없습니다. Aspose OCR 엔진이 무거운 작업을 대신 수행합니다. + +## 환경 설정 + +먼저 PyPI에서 Aspose OCR 패키지를 설치합니다: + +```bash +pip install aspose-ocr +``` + +라이선스 파일(`Aspose.OCR.lic`)이 있다면 프로젝트 루트에 두고 다음과 같이 활성화합니다: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** 라이선스 파일을 버전 관리에서 제외하세요; `.gitignore`에 추가하여 실수로 노출되는 것을 방지합니다. + +## 단일 PDF에서 OCR 수행 + +이제 단일 스캔된 PDF에서 텍스트를 추출해 보겠습니다. 핵심 단계는 다음과 같습니다: + +1. `OcrEngine` 인스턴스를 생성합니다. +2. PDF 파일을 지정합니다. +3. 각 페이지에 대해 `OcrResult`를 가져옵니다. +4. 평문 텍스트 출력을 디스크에 저장합니다. +5. 엔진을 해제하여 네이티브 리소스를 해제합니다. + +전체 실행 가능한 스크립트는 다음과 같습니다: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**예상 출력:** 각 페이지마다 스크립트가 `Page 1: confidence 97.45%`와 같은 정보를 출력합니다. 페이지 신뢰도가 80 % 이하이면 경고가 표시되어 OCR이 문자를 놓쳤을 가능성을 알려줍니다. + +### 왜 이렇게 동작하나요 + +- **`OcrEngine`** 은 네이티브 Aspose OCR 라이브러리의 진입점으로, 이미지 전처리부터 문자 인식까지 모든 작업을 처리합니다. +- **`extract_from_pdf`** 은 각 PDF 페이지를 자동으로 래스터화하므로 직접 이미지를 변환할 필요가 없습니다. +- **Confidence scores** 를 활용하면 품질 검사를 자동화할 수 있습니다—법률 문서나 의료 문서처럼 정확도가 중요한 경우에 필수적입니다. + +## Python으로 배치 OCR PDF 처리 + +실제 프로젝트에서는 파일이 하나 이상인 경우가 대부분입니다. 단일 파일 스크립트를 **배치 OCR PDF 처리** 파이프라인으로 확장해 보겠습니다. 이 파이프라인은 디렉터리를 순회하면서 각 PDF를 처리하고 결과를 일치하는 하위 폴더에 저장합니다. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### 이점 + +- **Scalability:** 함수가 폴더를 한 번만 순회하면서 각 PDF마다 전용 출력 하위 폴더를 생성합니다. 수십 개의 문서가 있을 때도 정리가 깔끔합니다. +- **Reusability:** `ocr_pdf_file` 은 다른 스크립트(예: 웹 서비스)에서 호출할 수 있는 순수 함수입니다. +- **Error handling:** 입력 폴더가 비어 있으면 친절한 메시지를 출력해 조용히 실패하는 상황을 방지합니다. + +## 스캔된 PDF 텍스트 변환 – 엣지 케이스 처리 + +위 코드는 대부분의 PDF에서 잘 동작하지만, 몇 가지 특수 상황에 부딪힐 수 있습니다: + +| Situation | Why It Happens | How to Mitigate | +|-----------|----------------|-----------------| +| **Encrypted PDFs** | PDF가 비밀번호로 보호되어 있습니다. | `extract_from_pdf(pdf_path, password="yourPwd")` 로 비밀번호를 전달합니다. | +| **Multi‑language documents** | Aspose OCR 기본값이 영어이기 때문입니다. | `ocr_engine.language = "spa"` 로 스페인어를 지정하거나, 혼합 언어의 경우 리스트를 제공합니다. | +| **Very large PDFs (>500 pages)** | 각 페이지가 RAM에 로드되면서 메모리 사용량이 급증합니다. | `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` 와 같이 청크 단위로 처리하고 루프를 돌립니다. | +| **Poor scan quality** | 낮은 DPI 또는 노이즈가 많아 신뢰도가 떨어집니다. | `engine.image_preprocessing = True` 로 전처리를 활성화하거나 `engine.dpi = 300` 으로 DPI를 높입니다. | + +> **Watch out:** 이미지 전처리를 켜면 CPU 사용 시간이 눈에 띄게 증가할 수 있습니다. 야간 배치를 실행한다면 충분한 시간을 예약하거나 별도의 워커를 띄워 운영하세요. + +## 출력 확인 + +스크립트가 완료되면 다음과 같은 폴더 구조를 확인할 수 있습니다: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +`.txt` 파일을 열면 원본 스캔 내용과 일치하는 깨끗한 UTF‑8 인코딩 텍스트가 표시됩니다. 문자 깨짐이 보이면 PDF의 언어 설정을 다시 확인하고, 머신에 올바른 폰트 팩이 설치되어 있는지 점검하세요. + +## 리소스 정리 + +Aspose OCR은 네이티브 DLL에 의존하므로 작업이 끝난 뒤 `engine.dispose()` 를 호출해 주는 것이 중요합니다. 이 단계를 놓치면 특히 장시간 실행되는 배치 작업에서 메모리 누수가 발생할 수 있습니다. + +```python +# Always the last line of your script +engine.dispose() +``` + +## 전체 엔드‑투‑엔드 예제 + +모든 내용을 하나로 합치면, 다음과 같은 단일 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/korean/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..e23622544 --- /dev/null +++ b/ocr/korean/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,277 @@ +--- +category: general +date: 2026-04-29 +description: Aspose OCR을 사용하여 Python에서 손글씨를 인식하는 방법을 배워보세요. 이 단계별 가이드는 손글씨 텍스트를 효율적으로 + 추출하는 방법을 보여줍니다. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: ko +og_description: Python에서 손글씨를 인식하는 방법은? Aspose OCR을 사용해 손글씨 텍스트를 추출하는 완전 가이드를 따라가세요. + 코드, 팁, 그리고 엣지 케이스 처리까지 포함됩니다. +og_title: Python에서 손글씨 인식하는 방법 – 전체 튜토리얼 +tags: +- OCR +- Python +- HandwritingRecognition +title: Python에서 손글씨 인식하는 방법 – 전체 튜토리얼 +url: /ko/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 파이썬에서 손글씨 인식하기 – 전체 튜토리얼 + +파이썬 프로젝트에서 **손글씨를 인식하는 방법**이 필요했지만 어디서 시작해야 할지 몰랐던 적이 있나요? 당신만 그런 것이 아닙니다—개발자들은 “스캔한 메모에서 텍스트를 추출할 수 있을까?” 라는 질문을 자주 합니다. 좋은 소식은 최신 OCR 라이브러리 덕분에 이 작업이 아주 쉬워졌다는 것입니다. 이 가이드에서는 Aspose OCR을 사용해 **손글씨를 인식하는 방법**을 단계별로 살펴보고, **손글씨 텍스트를 추출**하는 방법도 배울 수 있습니다. + +설치부터 흐릿한 필기체에 대한 신뢰도 임계값 조정까지 모두 다룹니다. 최종적으로 추출된 텍스트와 전체 신뢰도 점수를 출력하는 실행 가능한 스크립트를 얻을 수 있으니, 메모 앱, 아카이브 도구, 혹은 단순히 호기심을 만족시키는 데도 안성맞춤입니다. OCR 경험이 없어도 괜찮습니다; 기본적인 파이썬 지식만 있으면 됩니다. + +--- + +## 준비물 + +- **Python 3.9+** (최신 안정 버전 권장) +- **Aspose.OCR for Python via .NET** – `pip install aspose-ocr` 로 설치 +- 처리하고 싶은 **손글씨 이미지** (JPEG/PNG) +- 선택 사항: 의존성을 깔끔하게 관리할 수 있는 가상 환경 + +위 항목들을 모두 준비했다면, 바로 시작해봅시다. + +![How to recognize handwriting example](/images/handwritten-sample.jpg "How to recognize handwriting example") + +*(Alt text: “how to recognize handwriting example showing a scanned handwritten note”)* + +--- + +## Step 1 – Install and Import Aspose OCR Classes + +먼저 OCR 엔진 자체를 가져와야 합니다. Aspose는 인쇄 텍스트 인식과 손글씨 모드를 구분하는 깔끔한 API를 제공합니다. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*왜 중요한가:* `HandwritingMode` 를 임포트하면 엔진에 **handwritten text recognition python**을 수행하고 있음을 알려줄 수 있어, 필기체 스트로크에 대한 정확도가 크게 향상됩니다. + +--- + +## Step 2 – Create and Configure the OCR Engine + +이제 `OcrEngine` 인스턴스를 생성하고 손글씨 모드로 전환합니다. 신뢰도 임계값도 조정할 수 있는데, 낮은 값은 흔들리는 필기를 허용하고 높은 값은 더 깨끗한 입력을 요구합니다. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*프로 팁:* 스캔 해상도가 300 DPI 이상이면 보통 더 좋은 점수를 얻을 수 있습니다. 저해상도 이미지의 경우 Pillow 로 업스케일링한 뒤 엔진에 전달하는 것을 고려하세요. + +--- + +## Step 3 – Prepare the Image Path + +처리하려는 이미지 파일 경로가 정확한지 확인하세요. 상대 경로도 동작하지만, 절대 경로를 사용하면 “파일을 찾을 수 없음” 오류를 방지할 수 있습니다. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*흔히 놓치는 점:* Windows 경로에서 역슬래시를 이스케이프하지 않으면 오류가 발생합니다 (`C:\\folder\\image.jpg`). 원시 문자열(`r"C:\folder\image.jpg"`)을 사용하면 문제를 피할 수 있습니다. + +--- + +## Step 4 – Run the Recognition and Capture Results + +`recognize` 메서드가 핵심 작업을 수행합니다. 반환된 객체는 `.text` 와 `.confidence` 속성을 제공합니다. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**예상 출력 (예시):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +신뢰도가 0.5 이하로 떨어지면 이미지 정리(그림자 제거, 대비 증가) 또는 Step 2에서 임계값을 낮춰야 할 수 있습니다. + +--- + +## Step 5 – Clean Up Resources + +Aspose OCR은 네이티브 리소스를 보유하고 있으므로 `dispose()` 를 호출해 해제해야 메모리 누수를 방지할 수 있습니다. 특히 루프에서 다수의 이미지를 처리할 때 중요합니다. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*왜 dispose 해야 할까?* Flask API처럼 업로드를 받아 처리하는 장기 실행 서비스에서는 리소스를 해제하지 않으면 시스템 메모리가 빠르게 고갈됩니다. + +--- + +## Full Script – One‑Click Run + +모든 코드를 하나로 합친 완전한 스크립트를 아래에 제공합니다. 복사‑붙여넣기 후 바로 실행할 수 있습니다. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +`handwritten_ocr.py` 로 저장하고 `python handwritten_ocr.py` 를 실행하세요. 설정이 올바르게 되어 있다면 콘솔에 추출된 텍스트가 출력됩니다. + +--- + +## Handling Edge Cases and Common Variations + +### Low‑Contrast Images +배경과 잉크가 섞여 보인다면 먼저 대비를 높여야 합니다: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Rotated Notes +기울어진 노트 페이지는 인식률을 떨어뜨릴 수 있습니다. Pillow 로 이미지 정렬을 수행하세요: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Multi‑Page PDFs +Aspose OCR은 PDF 페이지도 처리할 수 있지만, 각 페이지를 이미지로 변환해야 합니다(예: `pdf2image` 사용). 그런 다음 동일한 `recognize_handwriting` 함수를 이미지 배열에 적용하면 됩니다. + +--- + +## Pro Tips for Better **Extract Handwritten Text** Results + +- **DPI 중요:** 스캔 시 300 DPI 이상을 목표로 합니다. +- **색 배경 피하기:** 순수 흰색 또는 연한 회색이 가장 깨끗한 결과를 제공합니다. +- **배치 처리:** 함수를 `for` 루프로 감싸고 각 페이지의 신뢰도를 로그에 남기세요; 일정 임계값 이하 결과는 품질 유지를 위해 버립니다. +- **언어 지원:** Aspose OCR은 다국어를 지원합니다. 영어 전용 최적화가 필요하면 `engine.set_language("en")` 을 설정하세요. + +--- + +## Frequently Asked Questions + +**이것이 Linux에서도 동작하나요?** +네—Aspose OCR은 Windows, macOS, Linux용 네이티브 바이너리를 모두 포함합니다. pip 패키지만 설치하면 바로 사용할 수 있습니다. + +**손글씨가 너무 굵게 연결돼 있으면 어떻게 해야 하나요?** +신뢰도 임계값을 낮춰 보세요(`0.5` 혹은 `0.4`까지). 다만 이 경우 잡음이 늘어날 수 있으니, 필요하면 출력 후 맞춤법 검사 등 후처리를 적용하세요. + +**웹 서비스에 적용할 수 있나요?** +물론 가능합니다. `recognize_handwriting` 함수는 상태를 유지하지 않으므로 Flask나 FastAPI 엔드포인트에 적합합니다. 각 요청 후 `dispose()` 를 호출하거나 컨텍스트 매니저를 사용해 리소스를 자동 해제하세요. + +--- + +## Conclusion + +파이썬에서 **손글씨를 인식하는 방법**을 처음부터 끝까지 살펴보았습니다. 이제 **손글씨 텍스트를 추출**하고, 신뢰도 설정을 조정하며, 저대비 이미지나 회전된 페이지와 같은 일반적인 문제들을 해결하는 방법을 알게 되었습니다. 위의 완전한 스크립트는 바로 실행할 수 있으며, 모듈화된 함수 덕분에 노트‑테이킹 앱, 아카이브 디지털화, 혹은 **handwritten ocr tutorial python** 실험 등 다양한 프로젝트에 손쉽게 통합할 수 있습니다. + +다음 단계로는 다국어 손글씨 인식이나 OCR 결과를 자연어 처리와 결합해 회의록을 자동 요약하는 작업을 시도해볼 수 있습니다. 가능성은 무한합니다—코드로 필기를 살아 움직이게 해보세요. + +행복한 코딩 되시고, 질문이 있으면 댓글로 남겨 주세요! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/korean/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..85a6fa21b --- /dev/null +++ b/ocr/korean/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: 스캔에 OCR을 적용하고, Hugging Face 모델을 자동으로 사용하며, Aspose OCR을 이용해 스캔에서 텍스트를 + 몇 분 만에 인식하는 방법을 배워보세요. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: ko +og_description: Aspose OCR을 사용하여 스캔에 OCR을 실행하고, Hugging Face 모델을 자동으로 다운로드하며, 깔끔하고 + 구두점이 포함된 텍스트를 얻는 방법. +og_title: Aspose와 Hugging Face를 사용한 OCR 실행 방법 – 완전 가이드 +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Aspose와 Hugging Face로 OCR 실행하기 – 완전 가이드 +url: /ko/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Aspose & Hugging Face로 OCR 실행하기 – 완전 가이드 + +스캔한 문서가 한 무더기 쌓여 있을 때 **OCR을 어떻게 실행**해야 하는지 고민해 본 적 있나요? 설정을 조정하느라 시간을 허비하고 계신가요? 많은 프로젝트에서 개발자는 **스캔에서 텍스트를 인식**해야 하지만 모델 다운로드와 후처리에서 어려움을 겪습니다. + +좋은 소식: 이 튜토리얼에서는 **Hugging Face 모델**을 사용하고 자동으로 다운로드하며, 구두점을 추가해 사람이 쓴 것처럼 읽히는 출력물을 만드는 즉시 실행 가능한 솔루션을 보여줍니다. 끝까지 따라오시면 폴더에 있는 모든 이미지에 대해 처리하고 각 스캔 옆에 깔끔한 `.txt` 파일을 생성하는 스크립트를 얻게 됩니다. + +## 준비물 + +- Python 3.8+ (코드가 f‑strings를 사용하므로 구버전은 동작하지 않음) +- `aspose-ocr` 패키지 (`pip install aspose-ocr` 로 설치) +- 최초 모델 다운로드를 위한 인터넷 연결 +- 이미지 스캔 폴더 (`.png`, `.jpg`, 또는 `.tif`) + +그게 전부—추가 바이너리나 수동 모델 설정이 필요 없습니다. 바로 시작해 보세요. + +![OCR 실행 예시](https://example.com/ocr-demo.png "OCR 실행 예시") + +## 1단계: Aspose OCR 클래스 가져오기 및 환경 설정 + +먼저 Aspose OCR 라이브러리에서 필요한 클래스를 가져옵니다. 모든 것을 앞에 임포트하면 스크립트가 깔끔해지고 누락된 의존성을 쉽게 확인할 수 있습니다. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*왜 중요한가*: `OcrEngine`이 핵심 작업을 수행하고, `AsposeAI`는 더 똑똑한 후처리를 위해 대형 언어 모델을 연결합니다. 임포트를 빼먹으면 나머지 코드가 컴파일조차 되지 않으니 꼭 포함하세요. + +## 2단계: GPU 인식 Hugging Face 모델 구성 + +이제 Aspose에 모델을 어디서 가져올지와 GPU에서 실행할 레이어 수를 알려줍니다. `allow_auto_download="true"` 플래그가 **모델을 자동으로 다운로드**하도록 해줍니다. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **전문가 팁**: GPU가 없으면 `gpu_layers=0`으로 설정하세요. 모델이 CPU로 전환되며, 속도는 느리지만 여전히 작동합니다. + +### 왜 Hugging Face 모델을 선택하나요? + +Hugging Face는 방대한 즉시 사용 가능한 LLM 컬렉션을 제공합니다. `Qwen/Qwen2.5-3B-Instruct-GGUF`를 지정하면 구두점 추가, 공백 교정, 작은 OCR 오류 수정까지 가능한 컴팩트하고 instruction‑tuned 모델을 얻을 수 있습니다. 이것이 **use hugging face model**을 실제로 적용하는 핵심입니다. + +## 3단계: AI 엔진 초기화 및 구두점 후처리 활성화 + +AI 엔진은 단순 채팅용이 아니라 *구두점 추가기*를 연결해 원시 OCR 출력물을 정리합니다. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*무슨 일이 일어나나요?* `set_post_processor` 호출은 OCR 엔진이 끝난 뒤 실행되는 내장 후처리를 등록합니다. 원시 문자열에 쉼표, 마침표, 대문자를 적절히 삽입해 최종 텍스트를 훨씬 읽기 쉽게 만들어 줍니다. + +## 4단계: OCR 엔진 생성 및 AI 엔진 연결 + +AI 엔진을 OCR 엔진에 연결하면 문자 인식과 결과 정제를 한 객체에서 수행할 수 있습니다. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +이 단계를 건너뛰면 OCR 자체는 동작하지만 구두점 보강이 없어 출력이 단어들의 흐름처럼 보입니다. + +## 5단계: 폴더 내 모든 이미지 처리 + +튜토리얼의 핵심 부분입니다. 각 이미지를 순회하면서 OCR을 실행하고, 후처리를 적용한 뒤, 정제된 텍스트를 같은 위치에 `.txt` 파일로 저장합니다. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### 기대 결과 + +스크립트를 실행하면 다음과 같은 출력이 표시됩니다: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +각 라인은 신뢰도 점수를 보여주며 `invoice_001.png.txt`, `receipt_2024.tif.txt` 등 구두점이 추가된 사람이 읽을 수 있는 텍스트 파일을 생성합니다. + +### 엣지 케이스 및 변형 + +- **비영어 스캔**: `hugging_face_repo_id`를 다국어 모델(예: `microsoft/Multilingual-LLM-GGUF`)로 바꾸세요. +- **대용량 배치**: 루프를 `concurrent.futures.ThreadPoolExecutor` 로 감싸 병렬 처리하지만 GPU 메모리 한계에 유의하세요. +- **맞춤형 후처리**: 도메인에 특화된 정리가 필요하면 `"punctuation_adder"`를 직접 만든 스크립트로 교체하세요(예: 청구서 번호 제거). + +## 6단계: 리소스 정리 + +작업이 끝나면 리소스를 해제해 메모리 누수를 방지해야 합니다. 특히 장시간 실행 서비스에서는 중요합니다. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +이 단계를 놓치면 GPU 메모리가 남아 다음 실행을 방해할 수 있습니다. + +## 요약: OCR 엔드‑투‑엔드 실행 방법 + +몇 줄의 코드만으로 **폴더에 있는 스캔에 OCR을 실행**하고, **첫 실행 시 자동으로 다운로드되는 Hugging Face 모델**을 사용하며, **구두점이 자동으로 추가된 텍스트 인식**을 구현했습니다. 완전한 스크립트를 복사‑붙여넣고 경로만 조정하면 바로 실행할 수 있습니다. + +## 다음 단계 및 관련 주제 + +- **배치 후처리**: `ocr_engine.run_batch_postprocessor` 로 대량 처리 속도를 더욱 높이세요. +- **대체 모델**: OCR과 함께 음성‑텍스트 변환이 필요하면 `openai/whisper` 계열을 시도해 보세요. +- **데이터베이스 연동**: 추출된 텍스트를 SQLite 또는 Elasticsearch에 저장해 검색 가능한 아카이브를 구축하세요. + +모델을 교체하거나 `gpu_layers`를 조정하고, 직접 후처리기를 추가해 보세요. Aspose OCR과 Hugging Face 모델 허브의 조합은 어떤 문서 디지털화 프로젝트에도 유연한 기반을 제공합니다. + +--- + +*코딩 즐겁게! 문제가 발생하면 아래 댓글을 남기거나 Aspose OCR 문서를 참고해 더 깊은 설정 옵션을 확인하세요.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/korean/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..99447a9ed --- /dev/null +++ b/ocr/korean/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,206 @@ +--- +category: general +date: 2026-04-29 +description: Python을 사용해 이미지에서 OCR을 수행하고, HuggingFace 모델을 자동으로 다운로드하며, OCR 텍스트를 정리하면서 + GPU 메모리를 효율적으로 해제합니다. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: ko +og_description: Python에서 이미지에 대한 OCR을 수행하는 방법을 배우고, HuggingFace 모델을 자동으로 다운로드하며, 텍스트를 + 정리하고 GPU 메모리를 해제하세요. +og_title: Python으로 이미지 OCR 수행 – 단계별 가이드 +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Python으로 이미지 OCR 수행 – 완전 가이드 +url: /ko/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python으로 이미지에서 OCR 수행 – 완전 가이드 + +이미지 파일에서 **perform OCR on image** 를 수행해야 했지만 모델 다운로드나 GPU 메모리 정리 단계에서 막힌 적이 있나요? 당신만 그런 것이 아닙니다—많은 개발자들이 처음으로 광학 문자 인식과 대형 언어 모델을 결합하려 할 때 이 장벽에 부딪힙니다. + +이 튜토리얼에서는 **downloads a HuggingFace model in Python** 를 수행하고 Aspose OCR을 실행하며 원시 출력을 정리하고, 마지막으로 **releases GPU memory Python** 가 회수할 수 있도록 하는 단일 엔드‑투‑엔드 솔루션을 단계별로 살펴봅니다. 끝까지 따라오면 스캔한 PNG를 깔끔하고 검색 가능한 텍스트로 변환하는 실행 가능한 스크립트를 얻게 됩니다. + +> **What you’ll get:** 완전한 실행 가능한 코드 샘플, 각 단계가 중요한 이유에 대한 설명, 흔히 발생하는 함정을 피하는 팁, 그리고 파이프라인을 자체 프로젝트에 맞게 조정하는 방법에 대한 통찰을 제공합니다. + +--- + +## 필요 사항 + +- Python 3.9 이상 (예제는 3.11에서 테스트됨) +- `aspose-ocr` 패키지 (`pip install aspose-ocr` 로 설치) +- **download HuggingFace model python** 단계에 필요한 인터넷 연결 +- 속도 향상이 필요하다면 CUDA 호환 GPU (선택 사항이지만 권장) + +추가적인 시스템 수준 의존성은 필요하지 않습니다; Aspose OCR 엔진이 필요한 모든 것을 포함하고 있습니다. + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Image alt text: “perform OCR on image – Aspose OCR 출력 전후 AI 정리”* + +--- + +## 이미지에서 OCR 수행 – 단계별 개요 + +아래에서는 작업 흐름을 논리적인 청크로 나눕니다. 각 청크마다 자체 헤딩이 있어 AI 어시스턴트가 관심 있는 부분으로 빠르게 이동할 수 있고, 검색 엔진도 관련 키워드를 색인할 수 있습니다. + +### 1. Python에서 HuggingFace 모델 다운로드 + +먼저 해야 할 일은 원시 OCR 출력에 대한 포스트‑프로세서 역할을 할 언어 모델을 가져오는 것입니다. Aspose OCR은 `AsposeAI` 라는 헬퍼 클래스를 제공하며, 이를 통해 HuggingFace 허브에서 모델을 자동으로 가져올 수 있습니다. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Why this matters:** +- **download HuggingFace model python** – zip 파일을 직접 다루거나 토큰 인증을 수동으로 처리할 필요가 없습니다. +- `int8` 양자화를 사용하면 모델 크기가 원래의 약 ¼ 수준으로 축소되어 나중에 **release GPU memory python** 가 필요할 때 매우 중요합니다. + +> **Pro tip:** `directory_model_path` 를 SSD에 두면 로드 속도가 빨라집니다. + +--- + +### 2. AI 헬퍼 초기화 및 맞춤법 검사 활성화 + +이제 `AsposeAI` 인스턴스를 생성하고 맞춤법 교정 포스트‑프로세서를 연결합니다. 여기서 **clean OCR text python** 마법이 시작됩니다. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explanation:** +맞춤법 교정기는 OCR 엔진에서 나온 각 토큰을 검사하고 `max_edits` 로 제한된 편집을 제안합니다. 이 작은 조정만으로도 “rec0gn1tion”을 “recognition”으로 바꿀 수 있어 무거운 언어 모델 없이도 정확도를 높일 수 있습니다. + +--- + +### 3. AI 헬퍼를 OCR 엔진에 연결 + +Aspose는 버전 23.4에서 새로운 메서드를 도입했으며, 이를 통해 AI 엔진을 OCR 파이프라인에 직접 플러그인할 수 있습니다. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Why we do it:** +AI 헬퍼를 일찍 연결하면 OCR 엔진이 필요에 따라 모델을 실시간으로 활용해 레이아웃 감지 등 개선을 수행할 수 있습니다. 또한 별도의 포스트‑프로세싱 루프가 필요 없어 코드가 깔끔해집니다. + +--- + +### 4. 스캔 이미지에서 OCR 수행 + +다음은 실제로 **perform OCR on image** 파일을 처리하는 핵심 단계입니다. `YOUR_DIRECTORY/input.png` 를 자신의 스캔 파일 경로로 바꾸세요. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +일반적인 원시 출력에는 이상한 위치에 줄 바꿈이 들어가 있거나, 인식 오류가 발생한 문자, 혹은 잡다한 기호가 포함될 수 있습니다. 그래서 다음 단계가 필요합니다. + +**Expected raw output (example):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. AI 포스트 프로세서로 Python에서 OCR 텍스트 정리 + +이제 AI가 혼란을 정리하도록 합니다. 이것이 바로 **clean OCR text python** 프로세스의 핵심입니다. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Result you’ll see:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +맞춤법 교정기가 “Th1s” → “This” 를 수정하고, 떠돌던 “4n” 을 제거한 것을 확인할 수 있습니다. 모델은 또한 공백을 정규화하는데, 이는 텍스트를 이후 NLP 파이프라인에 넣을 때 흔히 겪는 문제를 해결해 줍니다. + +--- + +### 6. Python에서 GPU 메모리 해제 – 정리 단계 + +작업이 끝났다면 GPU 자원을 해제하는 것이 좋은 습관입니다. 특히 장시간 실행되는 서비스에서 여러 OCR 작업을 수행할 경우 더욱 중요합니다. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**What happens under the hood:** +`free_resources()` 가 모델을 GPU에서 언로드하여 메모리를 CUDA 드라이버에 반환합니다. `dispose()` 는 OCR 엔진 내부 버퍼를 종료합니다. 이 호출들을 생략하면 몇 장의 이미지만 처리해도 메모리 부족 오류가 발생할 수 있습니다. + +> **Remember:** 배치를 루프에서 처리할 계획이라면 각 배치 후에 정리 함수를 호출하거나, 마지막까지 같은 `ai_helper` 를 재사용하고 마지막에만 해제하세요. + +--- + +## 보너스: 다양한 시나리오에 맞춘 파이프라인 조정 + +### 모델 양자화 조정 + +강력한 GPU (예: RTX 4090)가 있고 정확도를 높이고 싶다면 `hugging_face_quantization` 을 `"fp16"` 로 바꾸고 `gpu_layers` 를 `30` 으로 늘리세요. 이 경우 메모리 사용량이 증가하므로 각 배치 후에 **release GPU memory python** 를 보다 적극적으로 수행해야 합니다. + +### 커스텀 맞춤법 검사기 사용 + +내장된 `spell_corrector` 를 도메인‑특화 교정을 수행하는 커스텀 포스트‑프로세서로 교체할 수 있습니다 (예: 의료 용어). 필요한 인터페이스를 구현하고 이름을 `set_post_processor` 에 전달하면 됩니다. + +### 여러 이미지 배치 처리 + +OCR 단계를 `for` 루프로 감싸고 `cleaned_result.text` 를 리스트에 수집한 뒤, 충분한 GPU RAM 이 있다면 루프가 끝난 후에만 `ai_helper.free_resources()` 를 호출하세요. 이렇게 하면 모델을 반복적으로 로드하는 오버헤드를 줄일 수 있습니다. + +--- + +## 결론 + +우리는 Python에서 **perform OCR on image** 파일을 처리하고, 자동으로 **download a HuggingFace model** 을 수행하며, **clean OCR text** 를 정리하고, 작업이 끝난 뒤 안전하게 **release GPU memory** 를 해제하는 방법을 보여주었습니다. 완전한 스크립트는 바로 복사‑붙여넣기 할 수 있으며, 설명을 통해 더 큰 프로젝트에 적용할 자신감을 얻을 수 있습니다. + +다음 단계는? Qwen 2.5 모델을 더 큰 LLaMA 변형으로 교체해 보거나, 다양한 포스트‑프로세서를 실험하거나, 정리된 출력을 검색 가능한 Elasticsearch 인덱스로 통합해 보세요. 가능성은 무궁무진하며, 이제 탄탄한 기반을 갖추었습니다. + +행복한 코딩 되세요, 그리고 여러분의 OCR 파이프라인이 언제나 깨끗하고 메모리 친화적이길 바랍니다! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/polish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..198b1f77a --- /dev/null +++ b/ocr/polish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Wyodrębnij tekst z PDF przy użyciu Aspose OCR w Pythonie. Dowiedz się, + jak przetwarzać PDF w trybie wsadowym OCR, konwertować tekst zeskanowanego PDF oraz + obsługiwać strony o niskim poziomie pewności. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: pl +og_description: Wyodrębnij tekst z PDF przy użyciu Aspose OCR w Pythonie. Ten przewodnik + pokazuje przetwarzanie PDF w trybie wsadowym OCR, konwertowanie zeskanowanego tekstu + PDF oraz obsługę wyników o niskim poziomie pewności. +og_title: Wyodrębnij tekst z PDF – OCR PDF w Pythonie +tags: +- OCR +- Python +- PDF processing +title: Wyodrębnij tekst z PDF – OCR PDF w Pythonie +url: /pl/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Wyodrębnianie tekstu z PDF – OCR PDF w Pythonie + +Kiedykolwiek potrzebowałeś **wyodrębnić tekst z PDF**, a plik był jedynie zeskanowanym obrazem? Nie jesteś sam — wielu programistów napotyka ten problem, próbując przekształcić PDF‑y w dane przeszukiwalne. Dobra wiadomość? Dzięki Aspose OCR for Python możesz konwertować zeskanowany tekst PDF w kilku linijkach kodu, a nawet uruchomić **przetwarzanie wsadowe OCR PDF**, gdy masz dziesiątki plików do obsłużenia. + +W tym tutorialu przejdziemy przez cały proces: skonfigurujemy bibliotekę, uruchomimy OCR na pojedynczym PDF, skalujemy do trybu wsadowego i zajmiemy się stronami o niskim poziomie pewności, abyś wiedział, kiedy wymagana jest ręczna weryfikacja. Po zakończeniu będziesz mieć gotowy skrypt, który wyodrębnia tekst z dowolnego zeskanowanego PDF, oraz zrozumiesz, dlaczego wykonujemy poszczególne kroki. + +## Co będzie potrzebne + +Zanim zaczniemy, upewnij się, że masz: + +- Python 3.8 lub nowszy (kod używa f‑stringów, więc 3.6+ działa, ale zalecane jest 3.8+) +- Licencję Aspose OCR for Python lub klucz darmowej wersji próbnej (można go pobrać ze strony Aspose) +- Folder z jednym lub większą liczbą zeskanowanych PDF‑ów, które chcesz przetworzyć +- Umiarkowaną ilość miejsca na dysku na generowane raporty *.txt* + +To wszystko — bez ciężkich zewnętrznych zależności, bez akrobacji OpenCV. Silnik Aspose OCR wykona ciężką pracę za Ciebie. + +## Konfiguracja środowiska + +Najpierw zainstaluj pakiet Aspose OCR z PyPI: + +```bash +pip install aspose-ocr +``` + +Jeśli masz plik licencyjny (`Aspose.OCR.lic`), umieść go w katalogu głównym projektu i aktywuj w następujący sposób: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Pro tip:** Trzymaj plik licencji poza systemem kontroli wersji; dodaj go do `.gitignore`, aby uniknąć przypadkowego ujawnienia. + +## Przeprowadzanie OCR na pojedynczym PDF + +Teraz wyodrębnijmy tekst z jednego zeskanowanego PDF. Główne kroki to: + +1. Utwórz instancję `OcrEngine`. +2. Wskaż plik PDF. +3. Pobierz `OcrResult` dla każdej strony. +4. Zapisz wynikowy tekst do pliku. +5. Zwolnij zasoby natywne, wywołując `dispose`. + +Oto pełny, gotowy do uruchomienia skrypt: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Co zobaczysz:** Dla każdej strony skrypt wypisze coś w stylu `Page 1: confidence 97.45%`. Jeśli strona spadnie poniżej progu 80 %, pojawi się ostrzeżenie, informujące, że OCR mógł pominąć znaki. + +### Dlaczego to działa + +- **`OcrEngine`** to brama do natywnej biblioteki Aspose OCR; obsługuje wszystko, od wstępnego przetwarzania obrazu po rozpoznawanie znaków. +- **`extract_from_pdf`** automatycznie rasteryzuje każdą stronę PDF, więc nie musisz samodzielnie konwertować PDF‑ów na obrazy. +- **Wyniki pewności** pozwalają automatyzować kontrole jakości — co jest kluczowe przy przetwarzaniu dokumentów prawnych lub medycznych, gdzie dokładność ma znaczenie. + +## Przetwarzanie wsadowe OCR PDF w Pythonie + +Większość projektów w praktyce obejmuje więcej niż jeden plik. Rozszerzmy skrypt jednoplikowy do **przetwarzania wsadowego OCR PDF**, które przeszukuje katalog, przetwarza każdy PDF i zapisuje wyniki w odpowiadającym podkatalogu. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Jak to pomaga + +- **Skalowalność:** Funkcja przeszukuje folder raz, tworząc dedykowany podkatalog wyjściowy dla każdego PDF. Dzięki temu porządek pozostaje zachowany, nawet przy dziesiątkach dokumentów. +- **Wielokrotne użycie:** `ocr_pdf_file` może być wywoływana z innych skryptów (np. usługi webowej), ponieważ jest czystą funkcją. +- **Obsługa błędów:** Skrypt wypisuje przyjazny komunikat, jeśli folder wejściowy jest pusty, co zapobiega cichej awarii. + +## Konwersja tekstu ze zeskanowanego PDF — obsługa przypadków brzegowych + +Choć powyższy kod działa dla większości PDF‑ów, możesz napotkać kilka specyficznych sytuacji: + +| Sytuacja | Dlaczego się pojawia | Jak złagodzić | +|----------|----------------------|---------------| +| **Zaszyfrowane PDF‑y** | PDF jest zabezpieczony hasłem. | Przekaż hasło do `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Dokumenty wielojęzyczne** | Aspose OCR domyślnie używa języka angielskiego. | Ustaw `ocr_engine.language = "spa"` dla hiszpańskiego lub podaj listę języków dla mieszanych dokumentów. | +| **Bardzo duże PDF‑y (>500 stron)** | Zużycie pamięci rośnie, ponieważ każda strona jest ładowana do RAM. | Przetwarzaj PDF w partiach używając `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` i iteruj. | +| **Słaba jakość skanu** | Niska rozdzielczość DPI lub duży szum obniża pewność. | Włącz wstępne przetwarzanie obrazu `engine.image_preprocessing = True` lub zwiększ DPI poprzez `engine.dpi = 300`. | + +> **Uwaga:** Włączenie wstępnego przetwarzania obrazu może zauważalnie wydłużyć czas CPU. Jeśli uruchamiasz nocny batch, zaplanuj wystarczająco dużo czasu lub uruchom osobny worker. + +## Weryfikacja wyników + +Po zakończeniu skryptu znajdziesz strukturę katalogów podobną do tej: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Otwórz dowolny plik `.txt`; powinieneś zobaczyć czysty, zakodowany w UTF‑8 tekst odzwierciedlający oryginalną zeskanowaną treść. Jeśli zauważysz nieczytelne znaki, sprawdź ustawienia języka PDF oraz upewnij się, że na maszynie zainstalowane są odpowiednie pakiety czcionek. + +## Czyszczenie zasobów + +Aspose OCR korzysta z natywnych DLL‑ów, dlatego ważne jest wywołanie `engine.dispose()` po zakończeniu pracy. Zapomnienie tego kroku może prowadzić do wycieków pamięci, szczególnie w długotrwale działających zadaniach wsadowych. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Pełny przykład od początku do końca + +Łącząc wszystkie elementy, oto kompletny skrypt: + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/polish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..b39dfff67 --- /dev/null +++ b/ocr/polish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,279 @@ +--- +category: general +date: 2026-04-29 +description: Dowiedz się, jak rozpoznawać odręczne pismo w Pythonie przy użyciu Aspose + OCR. Ten przewodnik krok po kroku pokazuje, jak efektywnie wyodrębniać odręczny + tekst. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: pl +og_description: Jak rozpoznać odręczne pismo w Pythonie? Zapoznaj się z tym kompletnym + przewodnikiem, aby wyodrębnić odręczny tekst przy użyciu Aspose OCR, zawierającym + kod, wskazówki i obsługę przypadków brzegowych. +og_title: Jak rozpoznawać odręczne pismo w Pythonie – pełny poradnik +tags: +- OCR +- Python +- HandwritingRecognition +title: Jak rozpoznawać odręczne pismo w Pythonie – pełny tutorial +url: /pl/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Jak rozpoznać odręczne pismo w Pythonie – pełny tutorial + +Kiedykolwiek potrzebowałeś **jak rozpoznać odręczne pismo** w projekcie Pythona, ale nie wiedziałeś od czego zacząć? Nie jesteś sam — programiści często pytają: „Czy mogę wyodrębnić tekst ze zeskanowanej notatki?” Dobra wiadomość jest taka, że nowoczesne biblioteki OCR czynią to dziecinnie proste. W tym przewodniku przejdziemy przez **jak rozpoznać odręczne pismo** przy użyciu Aspose OCR, a także nauczysz się **wyodrębniać odręczny tekst** w sposób niezawodny. + +Omówimy wszystko, od instalacji biblioteki po dostosowywanie progów pewności dla niechlujnych, kursywnych skryptów. Na koniec będziesz mieć gotowy skrypt, który wypisuje wyodrębniony tekst oraz ogólny wynik pewności — idealny dla aplikacji do notatek, narzędzi archiwizacyjnych lub po prostu zaspokojenia ciekawości. Nie wymagana jest wcześniejsza znajomość OCR; wystarczy podstawowa znajomość Pythona. + +--- + +## Czego będziesz potrzebować + +- **Python 3.9+** (najlepiej najnowsza stabilna wersja) +- **Aspose.OCR for Python via .NET** – zainstaluj za pomocą `pip install aspose-ocr` +- Obraz **odręcznego pisma** (JPEG/PNG), który chcesz przetworzyć +- Opcjonalnie: wirtualne środowisko, aby utrzymać zależności w porządku + +Jeśli masz już te elementy, zanurzmy się w temat. + +![Przykład rozpoznawania odręcznego pisma](/images/handwritten-sample.jpg "Przykład rozpoznawania odręcznego pisma") + +*(Alt text: „przykład rozpoznawania odręcznego pisma pokazujący zeskanowaną odręczną notatkę”)* + +--- + +## Krok 1 – Instalacja i import klas Aspose OCR + +Na początek potrzebujemy samego silnika OCR. Aspose udostępnia przejrzyste API, które oddziela rozpoznawanie tekstu drukowanego od trybu odręcznego. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Dlaczego to ważne:* Importowanie `HandwritingMode` pozwala nam powiedzieć silnikowi, że zajmujemy się **rozpoznawaniem odręcznego tekstu w Pythonie**, a nie tekstem drukowanym, co znacząco zwiększa dokładność przy kursywnych pociągnięciach. + +--- + +## Krok 2 – Utworzenie i skonfigurowanie silnika OCR + +Teraz tworzymy instancję `OcrEngine` i przełączamy ją w tryb odręczny. Możesz także dostosować próg pewności; niższe wartości akceptują niepewne pismo, wyższe wymagają czystszego wejścia. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Wskazówka:* Jeśli twoje notatki są skanowane w 300 DPI lub wyżej, zazwyczaj uzyskasz lepszy wynik. W przypadku obrazów o niskiej rozdzielczości rozważ zwiększenie rozmiaru przy pomocy Pillow przed przekazaniem ich do silnika. + +--- + +## Krok 3 – Przygotowanie ścieżki do obrazu + +Upewnij się, że ścieżka pliku wskazuje na obraz, który chcesz przetworzyć. Ścieżki względne działają, ale ścieżki bezwzględne unikają niespodzianek typu „plik nie znaleziony”. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Typowy problem:* Zapomnienie o ucieczce odwrotnych ukośników w Windows (`C:\\folder\\image.jpg`). Użycie surowych łańcuchów (`r"C:\folder\image.jpg"`) rozwiązuje ten problem. + +--- + +## Krok 4 – Uruchomienie rozpoznawania i pobranie wyników + +Metoda `recognize` wykonuje najcięższą pracę. Zwraca obiekt z właściwościami `.text` i `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Oczekiwany wynik (przykład):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Jeśli pewność spadnie poniżej 0,5, może być konieczne oczyszczenie obrazu (usunięcie cieni, zwiększenie kontrastu) lub obniżenie progu w Kroku 2. + +--- + +## Krok 5 – Zwolnienie zasobów + +Aspose OCR utrzymuje zasoby natywne; wywołanie `dispose()` zwalnia je i zapobiega wyciekom pamięci, szczególnie przy przetwarzaniu wielu obrazów w pętli. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Dlaczego dispose?* W długotrwałych usługach (np. API Flask przyjmującym pliki) zapomnienie o zwolnieniu zasobów może szybko wyczerpać pamięć systemową. + +--- + +## Pełny skrypt – jednorazowe uruchomienie + +Łącząc wszystko razem, oto samodzielny skrypt, który możesz skopiować i uruchomić. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Zapisz go jako `handwritten_ocr.py` i uruchom `python handwritten_ocr.py`. Jeśli wszystko jest poprawnie skonfigurowane, zobaczysz wyodrębniony tekst wypisany w konsoli. + +--- + +## Obsługa przypadków brzegowych i typowych wariacji + +### Obrazy o niskim kontraście +Jeśli tło miesza się z tuszem, najpierw zwiększ kontrast: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Przekręcone notatki +Pochylona strona zeszytu może zaburzyć rozpoznawanie. Użyj Pillow do prostowania: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Wielostronicowe PDF‑y +Aspose OCR radzi sobie także z PDF‑ami, ale najpierw musisz przekonwertować każdą stronę na obraz (np. przy pomocy `pdf2image`). Następnie przeiteruj obrazy przy użyciu tej samej funkcji `recognize_handwriting`. + +--- + +## Profesjonalne wskazówki dla lepszych wyników **Extract Handwritten Text** + +- **DPI ma znaczenie:** Celuj w 300 DPI lub wyżej przy skanowaniu. +- **Unikaj kolorowych teł:** Czysta biel lub jasny szary dają najczystszy wynik. +- **Przetwarzanie wsadowe:** Owiń funkcję w pętlę `for` i loguj pewność każdej strony; odrzucaj wyniki poniżej progu, aby utrzymać wysoką jakość. +- **Wsparcie językowe:** Aspose OCR obsługuje wiele języków; ustaw `engine.set_language("en")` dla optymalizacji pod angielski. + +--- + +## Najczęściej zadawane pytania + +**Czy to działa na Linuksie?** +Tak — Aspose OCR dostarcza natywne binaria dla Windows, macOS i Linuxa. Wystarczy zainstalować pakiet pip i wszystko gotowe. + +**Co jeśli moje pismo jest bardzo kursywne?** +Spróbuj obniżyć próg pewności (`0.5` lub nawet `0.4`). Pamiętaj, że może to wprowadzić więcej szumów, więc warto później przetworzyć wynik (np. sprawdzić pisownię). + +**Czy mogę używać tego w usłudze webowej?** +Oczywiście. Funkcja `recognize_handwriting` jest bezstanowa, co czyni ją idealną dla endpointów Flask lub FastAPI. Pamiętaj tylko, aby wywołać `dispose()` po każdym żądaniu lub używać menedżera kontekstu. + +--- + +## Zakończenie + +Omówiliśmy **jak rozpoznać odręczne pismo** w Pythonie od początku do końca, pokazując, jak **wyodrębniać odręczny tekst**, dostosowywać ustawienia pewności i radzić sobie z typowymi problemami, takimi jak niski kontrast czy przekręcone strony. Pełny skrypt powyżej jest gotowy do uruchomienia, a modularna funkcja ułatwia integrację z większymi projektami — niezależnie od tego, czy tworzysz aplikację do notatek, digitalizujesz archiwa, czy po prostu eksperymentujesz z **handwritten ocr tutorial python**. + +Następnie możesz zgłębić **handwritten text recognition python** dla wielojęzycznych notatek lub połączyć OCR z przetwarzaniem języka naturalnego, aby automatycznie podsumowywać protokoły spotkań. Nie ma granic — wypróbuj i pozwól swojemu kodowi ożywić odręczne zapiski. + +Miłego kodowania i śmiało zadawaj pytania w komentarzach! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/polish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..74346aa36 --- /dev/null +++ b/ocr/polish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,181 @@ +--- +category: general +date: 2026-04-29 +description: Dowiedz się, jak uruchomić OCR na swoich skanach, automatycznie korzystać + z modelu Hugging Face i rozpoznawać tekst ze skanów za pomocą Aspose OCR w kilka + minut. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: pl +og_description: Jak przeprowadzić OCR na skanach przy użyciu Aspose OCR, automatycznie + pobrać model z Hugging Face i uzyskać czysty, poprawnie interpunkowany tekst. +og_title: Jak uruchomić OCR z Aspose i Hugging Face – Kompletny przewodnik +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Jak uruchomić OCR z Aspose i Hugging Face – Kompletny przewodnik +url: /pl/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Jak uruchomić OCR z Aspose i Hugging Face – Kompletny przewodnik + +Zastanawiałeś się kiedyś **jak uruchomić OCR** na stosie zeskanowanych dokumentów, nie tracąc godzin na dopasowywanie ustawień? Nie jesteś sam. W wielu projektach programiści muszą **rozpoznawać tekst ze skanów** szybko, ale napotykają problemy z pobieraniem modeli i przetwarzaniem końcowym. + +Dobre wieści: ten poradnik pokazuje gotowe rozwiązanie, które **używa modelu Hugging Face**, automatycznie go pobiera i dodaje interpunkcję, tak aby wynik wyglądał, jakby napisał go człowiek. Po zakończeniu będziesz mieć skrypt, który przetwarza każde zdjęcie w folderze i zapisuje czysty plik `.txt` obok każdego skanu. + +## Czego będziesz potrzebować + +- Python 3.8+ (kod używa f‑stringów, więc starsze wersje nie będą wystarczające) +- pakiet `aspose-ocr` (zainstaluj przy pomocy `pip install aspose-ocr`) +- Dostęp do Internetu w celu jednorazowego pobrania modelu +- Folder ze skanami obrazów (`.png`, `.jpg` lub `.tif`) + +To wszystko—bez dodatkowych binarek, bez ręcznego manipulowania modelem. Zanurzmy się. + +![przykład uruchamiania OCR](https://example.com/ocr-demo.png "przykład uruchamiania OCR") + +## Krok 1: Importuj klasy Aspose OCR i skonfiguruj środowisko + +Zaczynamy od pobrania niezbędnych klas z biblioteki Aspose OCR. Importowanie wszystkiego na początku utrzymuje skrypt w porządku i ułatwia wykrycie brakujących zależności. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Dlaczego to ważne*: `OcrEngine` wykonuje ciężką pracę, natomiast `AsposeAI` pozwala podłączyć duży model językowy do inteligentniejszego przetwarzania końcowego. Jeśli pominiesz import, reszta kodu nie skompiluje się – więc nie zapomnij o tym. + +## Krok 2: Skonfiguruj model Hugging Face z obsługą GPU + +Teraz informujemy Aspose, skąd pobrać model i ile warstw ma działać na GPU. Flaga `allow_auto_download="true"` automatycznie **pobiera model** za Ciebie. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Wskazówka**: Jeśli nie masz GPU, ustaw `gpu_layers=0`. Model przełączy się na CPU, co jest wolniejsze, ale nadal działa. + +### Dlaczego wybrać model Hugging Face? + +Hugging Face udostępnia ogromną kolekcję gotowych do użycia modeli LLM. Wskazując na `Qwen/Qwen2.5-3B-Instruct-GGUF`, otrzymujesz kompaktowy, dostrojony pod instrukcje model, który potrafi dodać interpunkcję, poprawić odstępy i nawet naprawić drobne błędy OCR. To jest istota **używania modelu Hugging Face** w praktyce. + +## Krok 3: Zainicjalizuj silnik AI i włącz przetwarzanie końcowe z interpunkcją + +Silnik AI nie służy tylko do zaawansowanego czatu — tutaj podłączamy *dodawacz interpunkcji*, który oczyszcza surowy wynik OCR. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Co się dzieje?* Wywołanie `set_post_processor` rejestruje wbudowany post‑procesor, który uruchamia się po zakończeniu pracy silnika OCR. Pobiera surowy ciąg znaków i wstawia przecinki, kropki oraz wielkie litery w odpowiednie miejsca, dzięki czemu końcowy tekst jest znacznie czytelniejszy. + +## Krok 4: Utwórz silnik OCR i podłącz silnik AI + +Połączenie silnika AI z silnikiem OCR daje nam pojedynczy obiekt, który może zarówno odczytywać znaki, jak i dopracowywać wynik. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Jeśli pominiesz ten krok, OCR nadal będzie działać, ale utracisz dodatkową interpunkcję — więc wynik będzie wyglądał jak ciąg słów. + +## Krok 5: Przetwórz każde zdjęcie w folderze + +Oto sedno poradnika. Iterujemy po każdym obrazie, uruchamiamy OCR, stosujemy post‑procesor i zapisujemy oczyszczony tekst w sąsiadującym pliku `.txt`. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Czego się spodziewać + +Uruchomienie skryptu wypisuje coś w rodzaju: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Każda linia podaje wynik pewności (szybka kontrola jakości) i tworzy pliki takie jak `invoice_001.png.txt`, `receipt_2024.tif.txt` itp., zawierające tekst z interpunkcją, czytelny dla człowieka. + +### Przypadki brzegowe i warianty + +- **Skanowanie w językach innych niż angielski**: Zmień `hugging_face_repo_id` na model wielojęzyczny (np. `microsoft/Multilingual-LLM-GGUF`). +- **Duże partie**: Owiń pętlę w `concurrent.futures.ThreadPoolExecutor` w celu równoległego przetwarzania, ale pamiętaj o limitach pamięci GPU. +- **Niestandardowe przetwarzanie końcowe**: Zamień `"punctuation_adder"` na własny skrypt, jeśli potrzebujesz czyszczenia specyficznego dla domeny (np. usuwanie numerów faktur). + +## Krok 6: Zwolnij zasoby + +Gdy zadanie się kończy, zwolnienie zasobów zapobiega wyciekom pamięci, co jest szczególnie ważne, jeśli uruchamiasz to w długotrwałej usłudze. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Zignorowanie tego kroku może pozostawić pamięć GPU zajętą, co utrudni kolejne uruchomienia. + +## Podsumowanie: Jak uruchomić OCR od początku do końca + +W zaledwie kilku linijkach pokazaliśmy **jak uruchomić OCR** na folderze skanów, **użyć modelu Hugging Face**, który pobiera się przy pierwszym uruchomieniu, oraz **rozpoznawać tekst ze skanów** z automatycznie dodaną interpunkcją. Pełny skrypt jest gotowy do skopiowania, dostosowania ścieżek i uruchomienia. + +## Kolejne kroki i powiązane tematy + +- **Przetwarzanie wsadowe**: Zbadaj `ocr_engine.run_batch_postprocessor` dla jeszcze szybszej obsługi dużych ilości. +- **Alternatywne modele**: Wypróbuj rodzinę `openai/whisper`, jeśli potrzebujesz konwersji mowy na tekst obok OCR. +- **Integracja z bazami danych**: Przechowuj wyodrębniony tekst w SQLite lub Elasticsearch, aby mieć archiwa możliwe do przeszukiwania. + +Śmiało eksperymentuj — zamień model, dostosuj `gpu_layers` lub dodaj własny post‑procesor. Elastyczność Aspose OCR w połączeniu z hubem modeli Hugging Face tworzy wszechstronną bazę dla każdego projektu digitalizacji dokumentów. + +--- + +*Miłego kodowania! Jeśli napotkasz problem, zostaw komentarz poniżej lub sprawdź dokumentację Aspose OCR, aby uzyskać bardziej zaawansowane opcje konfiguracji.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/polish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..d2fda4671 --- /dev/null +++ b/ocr/polish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,209 @@ +--- +category: general +date: 2026-04-29 +description: Wykonaj OCR na obrazie przy użyciu Pythona, automatycznie pobierz model + z HuggingFace i efektywnie zwalniaj pamięć GPU, jednocześnie czyszcząc tekst OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: pl +og_description: Dowiedz się, jak wykonać OCR na obrazie w Pythonie, automatycznie + pobrać model z HuggingFace, oczyścić tekst i zwolnić pamięć GPU. +og_title: Wykonaj OCR na obrazie przy użyciu Pythona – Przewodnik krok po kroku +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Przeprowadź OCR na obrazie w Pythonie – Kompletny przewodnik +url: /pl/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Wykonaj OCR na obrazie w Pythonie – Kompletny przewodnik + +Czy kiedykolwiek potrzebowałeś **wykonać OCR na obrazie** i utknąłeś na etapie pobierania modelu lub czyszczenia pamięci GPU? Nie jesteś sam — wielu programistów napotyka ten problem, gdy po raz pierwszy łączy rozpoznawanie znaków optycznych z dużymi modelami językowymi. + +W tym samouczku przeprowadzimy Cię przez jedyne, kompleksowe rozwiązanie, które **pobiera model HuggingFace w Pythonie**, uruchamia Aspose OCR, czyści surowy wynik i w końcu **zwalnia pamięć GPU**, którą Python może odzyskać. Po zakończeniu będziesz mieć gotowy do uruchomienia skrypt, który zamieni zeskanowany PNG w dopracowany, przeszukiwalny tekst. + +> **Co otrzymasz:** kompletny, działający przykład kodu, wyjaśnienia, dlaczego każdy krok ma znaczenie, wskazówki, jak unikać typowych pułapek, oraz przegląd możliwości dostosowania potoku do własnych projektów. + +--- + +## Czego będziesz potrzebować + +- Python 3.9 lub nowszy (przykład testowano na 3.11) +- pakiet `aspose-ocr` (instalacja: `pip install aspose-ocr`) +- połączenie internetowe do kroku **download HuggingFace model python** +- kompatybilna z CUDA karta graficzna, jeśli chcesz przyspieszyć działanie (opcjonalnie, ale zalecane) + +Nie są wymagane dodatkowe zależności systemowe; silnik Aspose OCR zawiera wszystko, co jest potrzebne. + +--- + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Image alt text: “perform OCR on image – Aspose OCR output before and after AI cleaning”* +*Tekst alternatywny obrazu: “przykład wykonywania OCR na obrazie – wynik Aspose OCR przed i po czyszczeniu AI”* + +--- + +## Wykonaj OCR na obrazie – przegląd krok po kroku + +Poniżej dzielimy przepływ pracy na logiczne fragmenty. Każdy fragment ma własny nagłówek, dzięki czemu asystenci AI mogą szybko przejść do interesującej Cię części, a wyszukiwarki mogą indeksować odpowiednie słowa kluczowe. + +### 1. Pobierz model HuggingFace w Pythonie + +Pierwszą rzeczą, którą musimy zrobić, jest pobranie modelu językowego, który będzie pełnił rolę post‑procesora dla surowego wyniku OCR. Aspose OCR dostarcza klasę pomocniczą `AsposeAI`, która może automatycznie pobrać model z repozytorium HuggingFace. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Dlaczego to ważne:** +- **download HuggingFace model python** – unikasz ręcznego obchodzenia się z plikami zip i uwierzytelnianiem tokenów. +- Zastosowanie kwantyzacji `int8` zmniejsza model do mniej więcej jednej czwartej jego pierwotnego rozmiaru, co jest kluczowe, gdy później musisz **release GPU memory python**. + +> **Wskazówka:** Trzymaj `directory_model_path` na dysku SSD, aby przyspieszyć ładowanie. + +--- + +### 2. Zainicjuj pomocnika AI i włącz sprawdzanie pisowni + +Teraz tworzymy instancję `AsposeAI` i podłączamy post‑procesor korekty pisowni. To tutaj zaczyna się magia **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Wyjaśnienie:** +Korektor pisowni analizuje każdy token z silnika OCR i sugeruje poprawki ograniczone parametrem `max_edits`. Ta mała zmiana może zamienić „rec0gn1tion” na „recognition” bez potrzeby używania ciężkiego modelu językowego. + +--- + +### 3. Podłącz pomocnika AI do silnika OCR + +Aspose w wersji 23.4 wprowadziło nową metodę, która pozwala podłączyć silnik AI bezpośrednio do potoku OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Dlaczego to robimy:** +Podłączając pomocnika AI już na wczesnym etapie, silnik OCR może opcjonalnie korzystać z modelu w czasie rzeczywistym (np. do wykrywania układu). Dzięki temu kod pozostaje schludny — nie potrzebujesz osobnych pętli post‑procesingu później. + +--- + +### 4. Wykonaj OCR na zeskanowanym obrazie + +Oto kluczowy krok, który faktycznie **perform OCR on image** pliki. Zastąp `YOUR_DIRECTORY/input.png` ścieżką do własnego skanu. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Typowy surowy wynik może zawierać nieoczekiwane podziały linii, błędnie rozpoznane znaki lub niechciane symbole. Dlatego potrzebny jest kolejny krok. + +**Przykładowy surowy wynik (przykład):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Oczyść tekst OCR w Pythonie przy użyciu post‑procesora AI + +Teraz pozwalamy AI posprzątać bałagan. To serce procesu **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Wynik, który zobaczysz:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Zauważ, że korektor pisowni naprawił „Th1s” → „This” i usunął niechciane „4n”. Model także normalizuje odstępy, co często jest problematyczne, gdy później podajesz tekst do kolejnych potoków NLP. + +--- + +### 6. Zwolnij pamięć GPU w Pythonie – kroki czyszczenia + +Po zakończeniu warto zwolnić zasoby GPU, szczególnie jeśli uruchamiasz wiele zadań OCR w długotrwałej usłudze. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Co się dzieje „pod maską”:** +`free_resources()` usuwa model z GPU, zwracając pamięć sterownikowi CUDA. `dispose()` zamyka wewnętrzne bufory silnika OCR. Pomijanie tych wywołań może prowadzić do błędów „out‑of‑memory” już po kilku obrazach. + +> **Pamiętaj:** Jeśli planujesz przetwarzać partie w pętli, wywołaj czyszczenie po każdej partii lub używaj tego samego `ai_helper` bez zwalniania go aż do końca. + +--- + +## Bonus: Dostosowywanie potoku do różnych scenariuszy + +### Dostosowanie kwantyzacji modelu + +Jeśli masz potężną kartę graficzną (np. RTX 4090) i potrzebujesz wyższej dokładności, zmień `hugging_face_quantization` na `"fp16"` i zwiększ `gpu_layers` do `30`. To zwiększy zużycie pamięci, więc będziesz musiał **release GPU memory python** bardziej agresywnie po każdej partii. + +### Użycie własnego korektora pisowni + +Możesz zamienić wbudowany `spell_corrector` na własny post‑procesor, który wykonuje korekty specyficzne dla domeny (np. terminologia medyczna). Wystarczy zaimplementować wymaganą interfejs i przekazać jego nazwę do `set_post_processor`. + +### Przetwarzanie wsadowe wielu obrazów + +Umieść kroki OCR w pętli `for`, zbieraj `cleaned_result.text` w listę i wywołaj `ai_helper.free_resources()` dopiero po zakończeniu pętli, jeśli masz wystarczająco pamięci GPU. Dzięki temu zmniejszysz narzut związany z wielokrotnym ładowaniem modelu. + +--- + +## Podsumowanie + +Pokazaliśmy, jak **perform OCR on image** w Pythonie, automatycznie **download a HuggingFace model**, **clean OCR text** i bezpiecznie **release GPU memory**, gdy skończysz. Pełny skrypt jest gotowy do skopiowania i wklejenia, a wyjaśnienia dają pewność, że możesz go dostosować do większych projektów. + +Co dalej? Spróbuj zamienić model Qwen 2.5 na większą wariację LLaMA, eksperymentuj z różnymi post‑procesorami lub zintegrować oczyszczony wynik z przeszukiwalnym indeksem Elasticsearch. Możliwości są nieograniczone, a Ty masz solidne podstawy do dalszej pracy. + +Miłego kodowania i niech Twoje potoki OCR będą zawsze czyste i przyjazne dla pamięci! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/portuguese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..541881f10 --- /dev/null +++ b/ocr/portuguese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,217 @@ +--- +category: general +date: 2026-04-29 +description: Extraia texto de PDF usando Aspose OCR em Python. Aprenda o processamento + em lote de OCR de PDF, converta texto de PDF escaneado e trate páginas de baixa + confiança. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: pt +og_description: Extraia texto de PDF com Aspose OCR em Python. Este guia mostra o + processamento em lote de OCR de PDF, a conversão de texto de PDFs escaneados e o + tratamento de resultados de baixa confiança. +og_title: Extrair Texto de PDF – OCR PDF com Python +tags: +- OCR +- Python +- PDF processing +title: Extrair Texto de PDF – OCR de PDF com Python +url: /pt/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Extrair Texto de PDF – OCR PDF com Python + +Já precisou **extrair texto de PDF** mas o arquivo é apenas uma imagem escaneada? Você não está sozinho—muitos desenvolvedores encontram essa barreira ao tentar transformar PDFs em dados pesquisáveis. A boa notícia? Com Aspose OCR for Python você pode converter texto de PDF escaneado em poucas linhas, e até executar **processamento em lote de OCR PDF** quando tem dezenas de arquivos para tratar. + +Neste tutorial vamos percorrer todo o fluxo de trabalho: configurar a biblioteca, executar OCR em um PDF único, escalar para um lote e lidar com páginas de baixa confiança para que você saiba quando uma revisão manual é necessária. Ao final você terá um script pronto‑para‑executar que extrai texto de qualquer PDF escaneado, e entenderá o porquê de cada passo. + +## O que você precisará + +- Python 3.8 ou mais recente (o código usa f‑strings, então 3.6+ funciona, mas 3.8+ é recomendado) +- Uma licença Aspose OCR for Python ou uma chave de avaliação gratuita (você pode obter uma no site da Aspose) +- Uma pasta com um ou mais PDFs escaneados que você deseja processar +- Uma quantidade moderada de espaço em disco para os relatórios *.txt* gerados + +É isso—sem dependências externas pesadas, sem acrobacias com OpenCV. O motor Aspose OCR faz o trabalho pesado para você. + +## Configurando o Ambiente + +Primeiro, instale o pacote Aspose OCR a partir do PyPI: + +```bash +pip install aspose-ocr +``` + +Se você tem um arquivo de licença (`Aspose.OCR.lic`), coloque‑o na raiz do seu projeto e ative‑o assim: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Dica profissional:** Mantenha o arquivo de licença fora do controle de versão; adicione‑o ao `.gitignore` para evitar exposição acidental. + +## Executando OCR em um PDF Único + +Agora vamos extrair texto de um PDF escaneado único. Os passos principais são: + +1. Crie uma instância de `OcrEngine`. +2. Aponte‑a para o arquivo PDF. +3. Recupere um `OcrResult` para cada página. +4. Grave a saída de texto simples no disco. +5. Descarte o engine para liberar recursos nativos. + +Aqui está o script completo e executável: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**O que você verá:** Para cada página o script imprime algo como `Page 1: confidence 97.45%`. Se uma página ficar abaixo do limite de 80 %, um aviso aparece, informando que o OCR pode ter perdido caracteres. + +### Por que isso funciona + +- **`OcrEngine`** é o gateway para a biblioteca nativa Aspose OCR; ele lida com tudo, desde o pré‑processamento de imagens até o reconhecimento de caracteres. +- **`extract_from_pdf`** rasteriza automaticamente cada página do PDF, então você não precisa converter o PDF em imagens manualmente. +- **Pontuações de confiança** permitem automatizar verificações de qualidade—crítico quando você está processando documentos legais ou médicos onde a precisão importa. + +## Processamento em Lote de OCR PDF com Python + +A maioria dos projetos do mundo real envolve mais de um arquivo. Vamos estender o script de arquivo único para um **processamento em lote de OCR PDF** que percorre um diretório, processa cada PDF e armazena os resultados em uma sub‑pasta correspondente. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Como isso ajuda + +- **Escalabilidade:** A função percorre a pasta uma vez, criando uma sub‑pasta de saída dedicada para cada PDF. Isso mantém tudo organizado quando você tem dezenas de documentos. +- **Reutilização:** `ocr_pdf_file` pode ser chamada a partir de outros scripts (por exemplo, um serviço web) porque é uma função pura. +- **Tratamento de erros:** O script imprime uma mensagem amigável se a pasta de entrada estiver vazia, evitando falhas silenciosas. + +## Convertendo Texto de PDF Escaneado – Lidando com Casos Limite + +Embora o código acima funcione para a maioria dos PDFs, você pode encontrar algumas particularidades: + +| Situação | Por que acontece | Como mitigar | +|-----------|------------------|--------------| +| **PDFs criptografados** | O PDF está protegido por senha. | Passe a senha para `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Documentos multilíngues** | Aspose OCR tem como padrão o inglês. | Defina `ocr_engine.language = "spa"` para espanhol, ou forneça uma lista para idiomas mistos. | +| **PDFs muito grandes (>500 páginas)** | O uso de memória aumenta porque cada página é carregada na RAM. | Processe o PDF em blocos usando `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` e faça loop. | +| **Qualidade de escaneamento ruim** | DPI baixo ou muito ruído reduz a confiança. | Pré‑procese o PDF com `engine.image_preprocessing = True` ou aumente o DPI via `engine.dpi = 300`. | + +> **Cuidado:** Ativar o pré‑processamento de imagem pode aumentar o tempo de CPU de forma perceptível. Se você estiver executando um lote noturno, agende tempo suficiente ou inicie um worker separado. + +## Verificando a Saída + +Depois que o script terminar, você encontrará uma estrutura de pastas como: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Abra qualquer arquivo `.txt`; você deverá ver texto limpo, codificado em UTF‑8, que espelha o conteúdo escaneado original. Se notar caracteres estranhos, verifique as configurações de idioma do PDF e assegure‑se de que os pacotes de fontes corretos estejam instalados na máquina. + +## Liberando Recursos + +Aspose OCR depende de DLLs nativas, portanto é essencial chamar `engine.dispose()` assim que terminar. Esquecer esse passo pode causar vazamentos de memória, especialmente em jobs de lote de longa duração. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Exemplo Completo de ponta a ponta + +Juntando tudo, aqui está um único + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/portuguese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..bd6f074b7 --- /dev/null +++ b/ocr/portuguese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Aprenda a reconhecer escrita à mão em Python com o Aspose OCR. Este guia + passo a passo mostra como extrair texto manuscrito de forma eficiente. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: pt +og_description: Como reconhecer escrita à mão em Python? Siga este guia completo para + extrair texto manuscrito usando o Aspose OCR, com código, dicas e tratamento de + casos extremos. +og_title: Como Reconhecer a Caligrafia em Python – Tutorial Completo +tags: +- OCR +- Python +- HandwritingRecognition +title: Como reconhecer escrita à mão em Python – Tutorial completo +url: /pt/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Como Reconhecer Escrita Manual em Python – Tutorial Completo + +Já precisou **como reconhecer escrita manual** em um projeto Python, mas não sabia por onde começar? Você não está sozinho—desenvolvedores perguntam constantemente: “Posso extrair texto de uma nota escaneada?” A boa notícia é que as bibliotecas modernas de OCR tornam isso muito fácil. Neste guia, vamos percorrer **como reconhecer escrita manual** usando Aspose OCR, e você também aprenderá a **extrair texto manuscrito** de forma confiável. + +Cobriremos tudo, desde a instalação da biblioteca até o ajuste dos limites de confiança para aqueles scripts cursivos bagunçados. Ao final, você terá um script executável que imprime o texto extraído e uma pontuação geral de confiança—perfeito para apps de anotações, ferramentas de arquivamento ou simplesmente para saciar a curiosidade. Não é necessário ter experiência prévia com OCR; conhecimento básico de Python basta. + +--- + +## O Que Você Vai Precisar + +- **Python 3.9+** (a versão estável mais recente funciona melhor) +- **Aspose.OCR for Python via .NET** – instale com `pip install aspose-ocr` +- Uma **imagem manuscrita** (JPEG/PNG) que você deseja processar +- Opcional: um ambiente virtual para manter as dependências organizadas + +Se você já tem esses itens prontos, vamos começar. + +![How to recognize handwriting example](/images/handwritten-sample.jpg "Exemplo de como reconhecer escrita manual") + +*(Texto alternativo: “exemplo de como reconhecer escrita manual mostrando uma nota manuscrita escaneada”)* + +--- + +## Etapa 1 – Instalar e Importar as Classes do Aspose OCR + +Primeiro de tudo, precisamos do próprio motor de OCR. A Aspose fornece uma API limpa que separa o reconhecimento de texto impresso do modo manuscrito. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Por que isso importa:* Importar `HandwritingMode` permite dizer ao motor que estamos lidando com **handwritten text recognition python** em vez de texto impresso, o que melhora drasticamente a precisão para traços cursivos. + +--- + +## Etapa 2 – Criar e Configurar o Motor OCR + +Agora criamos uma instância de `OcrEngine` e a configuramos para o modo manuscrito. Você também pode ajustar o limiar de confiança; valores mais baixos aceitam escrita trêmula, valores mais altos exigem entrada mais limpa. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Dica profissional:* Se suas notas forem escaneadas a 300 DPI ou mais, geralmente você obtém uma pontuação melhor. Para imagens de baixa resolução, considere aumentar a escala com Pillow antes de enviá‑las ao motor. + +--- + +## Etapa 3 – Preparar o Caminho da Imagem + +Certifique‑se de que o caminho do arquivo aponta para a imagem que você deseja processar. Caminhos relativos funcionam bem, mas caminhos absolutos evitam surpresas de “arquivo não encontrado”. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Armadilha comum:* Esquecer de escapar as barras invertidas no Windows (`C:\\folder\\image.jpg`). Usar strings brutas (`r"C:\folder\image.jpg"`) contorna esse problema. + +--- + +## Etapa 4 – Executar o Reconhecimento e Capturar os Resultados + +O método `recognize` faz o trabalho pesado. Ele retorna um objeto com as propriedades `.text` e `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Saída esperada (exemplo):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Se a confiança cair abaixo de 0,5, pode ser necessário limpar a imagem (remover sombras, aumentar contraste) ou reduzir o limiar na Etapa 2. + +--- + +## Etapa 5 – Liberar Recursos + +O Aspose OCR mantém recursos nativos; chamar `dispose()` os libera e evita vazamentos de memória, especialmente ao processar muitas imagens em um loop. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Por que liberar?* Em serviços de longa duração (por exemplo, uma API Flask que aceita uploads), esquecer de liberar recursos pode esgotar rapidamente a memória do sistema. + +--- + +## Script Completo – Execução com Um Clique + +Juntando tudo, aqui está um script autocontido que você pode copiar‑colar e executar. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Salve como `handwritten_ocr.py` e execute `python handwritten_ocr.py`. Se tudo estiver configurado corretamente, você verá o texto extraído impresso no console. + +--- + +## Lidando com Casos Limítrofes e Variações Comuns + +### Imagens de Baixo Contraste +Se o fundo se mistura à tinta, aumente o contraste primeiro: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Notas Rotacionadas +Uma página de caderno inclinada pode atrapalhar o reconhecimento. Use Pillow para corrigir a inclinação: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDFs Multi‑Página +O Aspose OCR também pode lidar com páginas PDF, mas você precisa converter cada página em imagem primeiro (por exemplo, usando `pdf2image`). Em seguida, itere sobre as imagens com a mesma função `recognize_handwriting`. + +--- + +## Dicas Profissionais para Melhores Resultados ao **Extract Handwritten Text** + +- **DPI importa:** Procure 300 DPI ou mais ao escanear. +- **Evite fundos coloridos:** Branco puro ou cinza claro produzem a saída mais limpa. +- **Processamento em lote:** Envolva a função em um `for` loop e registre a confiança de cada página; descarte resultados abaixo de um limiar para manter alta qualidade. +- **Suporte a idiomas:** O Aspose OCR suporta vários idiomas; defina `engine.set_language("en")` para otimização apenas em inglês. + +--- + +## Perguntas Frequentes + +**Isso funciona no Linux?** +Sim—o Aspose OCR vem com binários nativos para Windows, macOS e Linux. Basta instalar o pacote pip e está tudo pronto. + +**E se minha escrita for extremamente cursiva?** +Tente baixar o limiar de confiança (`0.5` ou até `0.4`). Lembre‑se de que isso pode introduzir mais ruído, então pós‑procese a saída (por exemplo, correção ortográfica) se necessário. + +**Posso usar isso em um serviço web?** +Com certeza. A função `recognize_handwriting` é sem estado, tornando‑a perfeita para endpoints Flask ou FastAPI. Apenas lembre‑se de chamar `dispose()` após cada requisição ou usar um gerenciador de contexto. + +--- + +## Conclusão + +Cobremos **como reconhecer escrita manual** em Python do início ao fim, mostrando como **extrair texto manuscrito**, ajustar configurações de confiança e lidar com armadilhas comuns como baixo contraste ou páginas rotacionadas. O script completo acima está pronto para ser executado, e a função modular facilita a integração em projetos maiores—seja construindo um app de anotações, digitalizando arquivos ou apenas experimentando técnicas de **handwritten ocr tutorial python**. + +Em seguida, você pode explorar **handwritten text recognition python** para notas multilíngues, ou combinar OCR com processamento de linguagem natural para resumir automaticamente atas de reunião. O céu é o limite—experimente e deixe seu código dar vida aos rabiscos. + +Feliz codificação, e sinta‑se à vontade para deixar suas dúvidas nos comentários! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/portuguese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..c8ac82efc --- /dev/null +++ b/ocr/portuguese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Aprenda a executar OCR nas suas digitalizações, usar o modelo Hugging + Face automaticamente e reconhecer texto das digitalizações com o Aspose OCR em minutos. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: pt +og_description: Como executar OCR em digitalizações usando Aspose OCR, baixar automaticamente + um modelo do Hugging Face e obter texto limpo e pontuado. +og_title: Como executar OCR com Aspose e Hugging Face – Guia completo +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Como Executar OCR com Aspose e Hugging Face – Guia Completo +url: /pt/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Como Executar OCR com Aspose & Hugging Face – Guia Completo + +Já se perguntou **como executar OCR** em uma pilha de documentos escaneados sem passar horas ajustando configurações? Você não está sozinho. Em muitos projetos, os desenvolvedores precisam **reconhecer texto de escaneamentos** rapidamente, mas tropeçam em downloads de modelos e no pós‑processamento. + +Boa notícia: este tutorial mostra uma solução pronta‑para‑usar que **usa um modelo Hugging Face**, baixa‑o automaticamente e adiciona pontuação para que a saída pareça ter sido escrita por um humano. Ao final, você terá um script que processa cada imagem em uma pasta e gera um arquivo `.txt` limpo ao lado de cada escaneamento. + +## O que Você Precisa + +- Python 3.8+ (o código usa f‑strings, então versões mais antigas não funcionam) +- `aspose-ocr` pacote (instale via `pip install aspose-ocr`) +- Acesso à internet para o download do modelo na primeira vez +- Uma pasta de escaneamentos de imagem (`.png`, `.jpg` ou `.tif`) + +É isso—sem binários extras, sem ajustes manuais de modelo. Vamos mergulhar. + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## Etapa 1: Importar Classes do Aspose OCR & Configurar o Ambiente + +Começamos importando as classes necessárias da biblioteca Aspose OCR. Importar tudo de uma vez mantém o script organizado e facilita a identificação de dependências ausentes. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Por que isso importa*: `OcrEngine` faz o trabalho pesado, enquanto `AsposeAI` nos permite conectar um large language model para um pós‑processamento mais inteligente. Se você pular a importação, o resto do código nem vai compilar—então não se esqueça dela. + +## Etapa 2: Configurar um Modelo Hugging Face Compatível com GPU + +Agora informamos ao Aspose onde buscar o modelo e quantas camadas devem ser executadas na GPU. O parâmetro `allow_auto_download="true"` cuida da parte de **download automático do modelo** para você. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Dica de especialista**: Se você não tem uma GPU, defina `gpu_layers=0`. O modelo usará a CPU, que é mais lenta mas ainda funciona. + +### Por que Escolher um Modelo Hugging Face? + +Hugging Face hospeda uma enorme coleção de LLMs prontos‑para‑usar. Ao apontar para `Qwen/Qwen2.5-3B-Instruct-GGUF`, você obtém um modelo compacto, ajustado por instruções, que pode adicionar pontuação, corrigir espaçamento e até corrigir pequenos erros de OCR. Essa é a essência de **usar modelo hugging face** na prática. + +## Etapa 3: Inicializar o Motor de IA e Habilitar o Pós‑Processamento de Pontuação + +O motor de IA não serve apenas para chats sofisticados—aqui anexamos um *adicionador de pontuação* que limpa a saída bruta do OCR. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*O que está acontecendo?* A chamada `set_post_processor` registra um pós‑processador embutido que roda após o OCR terminar. Ele recebe a string bruta e insere vírgulas, pontos e letras maiúsculas onde cabem, tornando o texto final muito mais legível. + +## Etapa 4: Criar o Motor OCR e Anexar o Motor de IA + +Conectar o motor de IA ao motor OCR nos fornece um único objeto que pode ler caracteres e aprimorar o resultado. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Se você pular esta etapa, o OCR ainda funcionará, mas perderá o aprimoramento de pontuação—então a saída parecerá um fluxo de palavras. + +## Etapa 5: Processar Cada Imagem em uma Pasta + +Aqui está o coração do tutorial. Percorremos cada imagem, executamos OCR, aplicamos o pós‑processador e gravamos o texto limpo em um arquivo `.txt` ao lado. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### O que Esperar + +Executar o script imprime algo como: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Cada linha informa a pontuação de confiança (uma verificação rápida) e cria `invoice_001.png.txt`, `receipt_2024.tif.txt`, etc., contendo texto pontuado e legível por humanos. + +### Casos Limite & Variações + +- **Escaneamentos não‑ingleses**: Troque o `hugging_face_repo_id` para um modelo multilíngue (ex.: `microsoft/Multilingual-LLM-GGUF`). +- **Grandes lotes**: Envolva o loop em um `concurrent.futures.ThreadPoolExecutor` para processamento paralelo, mas fique atento aos limites de memória da GPU. +- **Pós‑processamento customizado**: Substitua `"punctuation_adder"` pelo seu próprio script se precisar de limpeza específica de domínio (ex.: remover números de fatura). + +## Etapa 6: Limpar Recursos + +Quando o trabalho termina, liberar recursos evita vazamentos de memória, especialmente importante se você estiver executando isso dentro de um serviço de longa duração. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Negligenciar esta etapa pode deixar memória da GPU ocupada, o que sabotaria execuções subsequentes. + +## Recapitulação: Como Executar OCR de Ponta a Ponta + +Em apenas algumas linhas, demonstramos **como executar OCR** em uma pasta de escaneamentos, **usar um modelo Hugging Face** que se baixa automaticamente na primeira vez, e **reconhecer texto de escaneamentos** com pontuação adicionada automaticamente. O script completo está pronto para copiar‑colar, ajustar seus caminhos e executar. + +## Próximos Passos & Tópicos Relacionados + +- **Pós‑processamento em lote**: Explore `ocr_engine.run_batch_postprocessor` para um manuseio em massa ainda mais rápido. +- **Modelos alternativos**: Experimente a família `openai/whisper` se precisar de speech‑to‑text junto ao OCR. +- **Integração com bancos de dados**: Armazene o texto extraído no SQLite ou Elasticsearch para arquivos pesquisáveis. + +Sinta‑se à vontade para experimentar—troque o modelo, ajuste `gpu_layers`, ou adicione seu próprio pós‑processador. A flexibilidade do Aspose OCR combinada com o hub de modelos da Hugging Face torna isso uma base versátil para qualquer projeto de digitalização de documentos. + +--- + +*Feliz codificação! Se encontrar algum problema, deixe um comentário abaixo ou consulte a documentação do Aspose OCR para opções de configuração mais avançadas.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/portuguese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..b5d3ef66e --- /dev/null +++ b/ocr/portuguese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,209 @@ +--- +category: general +date: 2026-04-29 +description: Realizar OCR em imagem usando Python, baixar automaticamente um modelo + do HuggingFace e liberar a memória da GPU de forma eficiente enquanto limpa o texto + OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: pt +og_description: Aprenda a fazer OCR em imagens com Python, baixe automaticamente um + modelo do HuggingFace, limpe o texto e libere a memória da GPU. +og_title: Realize OCR em Imagem com Python – Guia Passo a Passo +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Realizar OCR em Imagem com Python – Guia Completo +url: /pt/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Realizar OCR em Imagem com Python – Guia Completo + +Já precisou **perform OCR on image** arquivos mas ficou preso na etapa de download do modelo ou limpeza da memória da GPU? Você não é o único—muitos desenvolvedores encontram esse obstáculo quando tentam combinar reconhecimento óptico de caracteres com grandes modelos de linguagem. + +Neste tutorial, percorreremos uma solução única e de ponta a ponta que **downloads a HuggingFace model in Python**, executa o Aspose OCR, limpa a saída bruta e, finalmente, **releases GPU memory Python** pode recuperar. Ao final, você terá um script pronto para executar que transforma um PNG escaneado em texto polido e pesquisável. + +> **O que você receberá:** um exemplo de código completo e executável, explicações sobre por que cada etapa importa, dicas para evitar armadilhas comuns e uma visão de como ajustar o pipeline para seus próprios projetos. + +--- + +## O que você precisará + +- Python 3.9 ou mais recente (o exemplo foi testado em 3.11) +- pacote `aspose-ocr` (instale via `pip install aspose-ocr`) +- Uma conexão à internet para a etapa **download HuggingFace model python** +- Uma GPU compatível com CUDA se você quiser o aumento de velocidade (opcional, mas recomendado) + +Nenhuma dependência adicional ao nível do sistema é necessária; o motor Aspose OCR inclui tudo o que você precisa. + +--- + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Texto alternativo da imagem: “perform OCR on image – Saída do Aspose OCR antes e depois da limpeza por IA”* + +--- + +## Realizar OCR em Imagem – Visão geral passo a passo + +A seguir, dividimos o fluxo de trabalho em blocos lógicos. Cada bloco tem seu próprio título, permitindo que assistentes de IA pulem rapidamente para a parte de seu interesse, e que motores de busca indexem as palavras‑chave relevantes. + +### 1. Download HuggingFace Model in Python + +A primeira coisa que precisamos fazer é obter um modelo de linguagem que atuará como pós‑processador da saída bruta do OCR. O Aspose OCR vem com uma classe auxiliar chamada `AsposeAI` que pode baixar automaticamente um modelo do hub HuggingFace. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Por que isso importa:** +- **download HuggingFace model python** – você evita lidar manualmente com arquivos zip ou autenticação por token. +- Usar quantização `int8` reduz o modelo para aproximadamente um quarto do seu tamanho original, o que é crucial quando você precisar **release GPU memory python** mais tarde. + +> **Dica profissional:** Mantenha `directory_model_path` em um SSD para tempos de carregamento mais rápidos. + +--- + +### 2. Inicializar o Auxiliar de IA e Habilitar a Verificação Ortográfica + +Agora criamos uma instância `AsposeAI` e anexamos um pós‑processador corretor ortográfico. É aqui que a magia do **clean OCR text python** começa. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explicação:** +O corretor ortográfico examina cada token do motor OCR e sugere edições limitadas por `max_edits`. Esse pequeno ajuste pode transformar “rec0gn1tion” em “recognition” sem um modelo de linguagem pesado. + +--- + +### 3. Conectar o Auxiliar de IA ao Motor OCR + +A Aspose introduziu um novo método na versão 23.4 que permite conectar um motor de IA diretamente ao pipeline OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Por que fazemos isso:** +Ao conectar o auxiliar de IA cedo, o motor OCR pode opcionalmente usar o modelo para melhorias em tempo real (por exemplo, detecção de layout). Também mantém o código organizado—não há necessidade de loops de pós‑processamento separados posteriormente. + +--- + +### 4. Perform OCR on the Scanned Image + +Este é o passo central que realmente **perform OCR on image** arquivos. Substitua `YOUR_DIRECTORY/input.png` pelo caminho da sua própria digitalização. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +A saída bruta típica pode conter quebras de linha em locais estranhos, caracteres mal reconhecidos ou símbolos soltos. Por isso precisamos da próxima etapa. + +**Saída bruta esperada (exemplo):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Limpar Texto OCR em Python com o Pós‑Processador de IA + +Agora deixamos a IA limpar a bagunça. Este é o coração do processo **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Resultado que você verá:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Observe como o corretor ortográfico corrigiu “Th1s” → “This” e removeu o “4n” solto. O modelo também normaliza o espaçamento, que costuma ser um ponto problemático quando você posteriormente envia o texto para pipelines de NLP subsequentes. + +--- + +### 6. Release GPU Memory in Python – Etapas de limpeza + +Quando terminar, é uma boa prática liberar os recursos da GPU, especialmente se você estiver executando múltiplos trabalhos de OCR em um serviço de longa duração. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**O que acontece nos bastidores:** +`free_resources()` descarrega o modelo da GPU, devolvendo a memória ao driver CUDA. `dispose()` encerra os buffers internos do motor OCR. Pular essas chamadas pode levar a erros de falta de memória após apenas algumas imagens. + +> **Lembre‑se:** Se você planeja processar lotes em um loop, chame a limpeza após cada lote ou reutilize o mesmo `ai_helper` sem liberá‑lo até o final. + +--- + +## Bônus: Ajustando o Pipeline para Diferentes Cenários + +### Ajustando a Quantização do Modelo + +Se você tem uma GPU poderosa (por exemplo, RTX 4090) e deseja maior precisão, altere `hugging_face_quantization` para `"fp16"` e aumente `gpu_layers` para `30`. Isso consumirá mais memória, portanto você precisará **release GPU memory python** de forma mais agressiva após cada lote. + +### Usando um Verificador Ortográfico Personalizado + +Você pode substituir o `spell_corrector` embutido por um pós‑processador personalizado que faça correções específicas de domínio (por exemplo, terminologia médica). Basta implementar a interface requerida e passar seu nome para `set_post_processor`. + +### Processamento em Lote de Múltiplas Imagens + +Envolva as etapas de OCR em um loop `for`, cole `cleaned_result.text` em uma lista e chame `ai_helper.free_resources()` somente após o loop se você tiver RAM de GPU suficiente. Isso reduz a sobrecarga de carregar o modelo repetidamente. + +--- + +## Conclusão + +Acabamos de mostrar como **perform OCR on image** arquivos em Python, baixar automaticamente um **download a HuggingFace model**, **clean OCR text**, e liberar com segurança a **release GPU memory** quando terminar. O script completo está pronto para copiar e colar, e as explicações dão a confiança necessária para adaptá‑lo a projetos maiores. + +Próximos passos? Experimente trocar o modelo Qwen 2.5 por uma variante maior de LLaMA, experimente diferentes pós‑processadores ou integre a saída limpa em um índice Elasticsearch pesquisável. As possibilidades são infinitas, e agora você tem uma base sólida para construir. + +Feliz codificação, e que seus pipelines de OCR estejam sempre limpos e com uso de memória amigável! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/russian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..b338a8184 --- /dev/null +++ b/ocr/russian/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Извлекать текст из PDF с помощью Aspose OCR в Python. Узнайте о пакетной + обработке PDF с OCR, конвертации текста из отсканированных PDF и работе со страницами + низкой уверенности. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: ru +og_description: Извлечение текста из PDF с помощью Aspose OCR в Python. Это руководство + демонстрирует пакетную обработку PDF с OCR, преобразование текста отсканированных + PDF и обработку результатов с низкой уверенностью. +og_title: Извлечение текста из PDF – OCR PDF с помощью Python +tags: +- OCR +- Python +- PDF processing +title: Извлечение текста из PDF — OCR PDF с помощью Python +url: /ru/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Извлечение текста из PDF – OCR PDF с Python + +Когда‑нибудь вам нужно было **извлечь текст из PDF**, но файл оказался просто отсканированным изображением? Вы не одиноки — многие разработчики сталкиваются с этой проблемой, пытаясь превратить PDF‑файлы в поисковые данные. Хорошая новость? С Aspose OCR для Python вы можете конвертировать текст из отсканированного PDF за несколько строк кода и даже выполнять **пакетную обработку OCR PDF**, когда у вас десятки файлов. + +В этом руководстве мы пройдем весь процесс: настройку библиотеки, запуск OCR для одного PDF, масштабирование до пакетной обработки и работу со страницами с низкой уверенностью, чтобы знать, когда требуется ручная проверка. К концу вы получите готовый к запуску скрипт, извлекающий текст из любого отсканированного PDF, и поймете причины каждого шага. + +## Что понадобится + +Перед тем как начать, убедитесь, что у вас есть: + +- Python 3.8 или новее (код использует f‑строки, поэтому работает с 3.6+, но рекомендуется 3.8+) +- Лицензия Aspose OCR для Python или бесплатный пробный ключ (можно получить на сайте Aspose) +- Папка с одним или несколькими отсканированными PDF, которые нужно обработать +- Умеренное количество места на диске для создаваемых отчетов *.txt* + +Вот и всё — никаких тяжёлых внешних зависимостей, никаких трюков с OpenCV. Движок Aspose OCR делает всю тяжелую работу за вас. + +## Настройка окружения + +Сначала установите пакет Aspose OCR из PyPI: + +```bash +pip install aspose-ocr +``` + +Если у вас есть файл лицензии (`Aspose.OCR.lic`), поместите его в корень проекта и активируйте следующим образом: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Совет:** Держите файл лицензии вне системы контроля версий; добавьте его в `.gitignore`, чтобы избежать случайного раскрытия. + +## Выполнение OCR для одного PDF + +Теперь извлечём текст из одного отсканированного PDF. Основные шаги: + +1. Создать экземпляр `OcrEngine`. +2. Указать ему PDF‑файл. +3. Получить `OcrResult` для каждой страницы. +4. Записать полученный обычный текст на диск. +5. Освободить движок, вызвав его уничтожение, чтобы освободить нативные ресурсы. + +Вот полный, готовый к запуску скрипт: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Что вы увидите:** Для каждой страницы скрипт выводит что‑то вроде `Page 1: confidence 97.45%`. Если уверенность страницы ниже 80 %, появляется предупреждение, указывающее, что OCR мог пропустить символы. + +### Почему это работает + +- **`OcrEngine`** — это шлюз к нативной библиотеке Aspose OCR; он обрабатывает всё, от предварительной обработки изображений до распознавания символов. +- **`extract_from_pdf`** автоматически растеризует каждую страницу PDF, поэтому вам не нужно самостоятельно конвертировать PDF в изображения. +- **Оценки уверенности** позволяют автоматизировать проверку качества — это критично при обработке юридических или медицинских документов, где важна точность. + +## Пакетная обработка OCR PDF с Python + +Большинство реальных проектов работают с более чем одним файлом. Давайте расширим скрипт для одного файла до **пакетной обработки OCR PDF**, который проходит по каталогу, обрабатывает каждый PDF и сохраняет результаты в соответствующей подпапке. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Как это помогает + +- **Масштабируемость:** Функция проходит по папке один раз, создавая отдельную подпапку вывода для каждого PDF. Это упрощает работу, когда у вас десятки документов. +- **Повторное использование:** `ocr_pdf_file` можно вызывать из других скриптов (например, веб‑службы), так как это чистая функция. +- **Обработка ошибок:** Скрипт выводит дружелюбное сообщение, если входная папка пуста, избавляя от тихих сбоев. + +## Конвертация текста из отсканированного PDF — обработка крайних случаев + +Хотя приведённый код работает для большинства PDF, вы можете столкнуться с некоторыми особенностями: + +| Ситуация | Почему происходит | Как смягчить | +|-----------|----------------|-----------------| +| **Зашифрованные PDF** | PDF защищён паролем. | Передайте пароль в `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Документы на нескольких языках** | По умолчанию Aspose OCR использует английский. | Установите `ocr_engine.language = "spa"` для испанского или задайте список для смешанных языков. | +| **Очень большие PDF (>500 страниц)** | Потребление памяти резко возрастает, так как каждая страница загружается в ОЗУ. | Обрабатывайте PDF частями, используя `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` и цикл. | +| **Низкое качество сканирования** | Низкое DPI или сильный шум снижают уверенность. | Предобработайте PDF с `engine.image_preprocessing = True` или увеличьте DPI через `engine.dpi = 300`. | + +> **Внимание:** Включение предобработки изображений может заметно увеличить время работы CPU. Если вы запускаете ночную пакетную обработку, запланируйте достаточное время или запустите отдельный воркер. + +## Проверка вывода + +После завершения скрипта вы увидите структуру папок, похожую на: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Откройте любой файл `.txt`; вы должны увидеть чистый текст в кодировке UTF‑8, соответствующий оригинальному отсканированному содержимому. Если заметите искажённые символы, проверьте настройки языка PDF и убедитесь, что на машине установлены необходимые пакеты шрифтов. + +## Очистка ресурсов + +Aspose OCR использует нативные DLL, поэтому важно вызвать `engine.dispose()` после завершения работы. Пропуск этого шага может привести к утечкам памяти, особенно в длительных пакетных заданиях. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Полный пример от начала до конца + +Собрав всё вместе, вот один + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/russian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..4a7049e39 --- /dev/null +++ b/ocr/russian/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Узнайте, как распознавать рукописный текст в Python с помощью Aspose + OCR. Это пошаговое руководство показывает, как эффективно извлекать рукописный текст. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: ru +og_description: Как распознавать рукописный текст в Python? Следуйте этому полному + руководству по извлечению рукописного текста с помощью Aspose OCR, включая код, + советы и обработку крайних случаев. +og_title: Как распознать рукописный ввод в Python – Полный учебник +tags: +- OCR +- Python +- HandwritingRecognition +title: Как распознавать рукописный ввод в Python — полный учебник +url: /ru/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Как распознавать рукописный текст в Python – Полный учебник + +Когда‑нибудь вам понадобилось **как распознавать рукописный текст** в проекте на Python, но вы не знали, с чего начать? Вы не одиноки — разработчики постоянно спрашивают: «Могу ли я извлечь текст из отсканированной записи?» Хорошая новость в том, что современные библиотеки OCR делают это проще простого. В этом руководстве мы пройдемся по **как распознавать рукописный текст** с использованием Aspose OCR, и вы также научитесь **извлекать рукописный текст** надёжно. + +Мы рассмотрим всё: от установки библиотеки до настройки порогов уверенности для запутанных курсивных шрифтов. К концу вы получите исполняемый скрипт, который выводит извлечённый текст и общий показатель уверенности — идеальный для приложений заметок, архивных инструментов или просто из любопытства. Предыдущий опыт работы с OCR не требуется; достаточно базовых знаний Python. + +--- + +## Что понадобится + +- **Python 3.9+** (последняя стабильная версия работает лучше всего) +- **Aspose.OCR for Python via .NET** — установить с помощью `pip install aspose-ocr` +- **Рукописное изображение** (JPEG/PNG), которое вы хотите обработать +- Необязательно: виртуальное окружение для аккуратного управления зависимостями + +Если всё готово, приступим. + +![How to recognize handwriting example](/images/handwritten-sample.jpg "How to recognize handwriting example") + +*(Alt text: “пример распознавания рукописного текста, показывающий отсканированную рукописную заметку”)* + +--- + +## Шаг 1 – Установить и импортировать классы Aspose OCR + +Сначала нам нужен сам движок OCR. Aspose предоставляет чистый API, который отделяет распознавание печатного текста от режима рукописного ввода. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Почему это важно:* Импорт `HandwritingMode` позволяет сообщить движку, что мы работаем с **handwritten text recognition python**, а не с печатным текстом, что значительно повышает точность распознавания курсивных штрихов. + +--- + +## Шаг 2 – Создать и настроить движок OCR + +Теперь создаём экземпляр `OcrEngine` и переключаем его в режим рукописного ввода. Также можно отрегулировать порог уверенности; более низкие значения принимают нечеткий почерк, более высокие — требуют чистого ввода. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Pro tip:* Если ваши заметки отсканированы с разрешением 300 DPI и выше, обычно получаете более высокий балл. Для изображений низкого разрешения рассмотрите увеличение масштаба с помощью Pillow перед передачей их в движок. + +--- + +## Шаг 3 – Подготовить путь к изображению + +Убедитесь, что путь к файлу указывает на изображение, которое нужно обработать. Относительные пути работают нормально, но абсолютные пути избавляют от неожиданностей «файл не найден». + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Распространённая ошибка:* Заб忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘忘 + +--- + +## Шаг 4 – Запустить распознавание и получить результаты + +Метод `recognize` делает всю тяжёлую работу. Он возвращает объект с свойствами `.text` и `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Ожидаемый вывод (пример):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Если уверенность падает ниже 0.5, возможно, потребуется очистить изображение (удалить тени, увеличить контраст) или уменьшить порог в Шаге 2. + +--- + +## Шаг 5 – Очистить ресурсы + +Aspose OCR удерживает нативные ресурсы; вызов `dispose()` освобождает их и предотвращает утечки памяти, особенно при обработке множества изображений в цикле. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Почему dispose?* В длительно работающих сервисах (например, Flask API, принимающем загрузки) забывание освобождения ресурсов может быстро исчерпать память системы. + +--- + +## Полный скрипт – Запуск в один клик + +Собрав всё вместе, представляем самодостаточный скрипт, который можно скопировать‑вставить и выполнить. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Сохраните файл как `handwritten_ocr.py` и запустите `python handwritten_ocr.py`. Если всё настроено правильно, вы увидите извлечённый текст, выведенный в консоль. + +--- + +## Обработка граничных случаев и распространённых вариаций + +### Изображения с низким контрастом +Если фон «просачивается» в чернила, сначала увеличьте контраст: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Повернутые заметки +Наклонённая страница блокнота может сбить распознавание. Используйте Pillow для выравнивания: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Многостраничные PDF +Aspose OCR также умеет работать с PDF‑страницами, но сначала нужно преобразовать каждую страницу в изображение (например, с помощью `pdf2image`). Затем пройдитесь по изображениям тем же функцией `recognize_handwriting`. + +--- + +## Советы для лучших результатов **Extract Handwritten Text** + +- **DPI matters:** Стремитесь к 300 DPI и выше при сканировании. +- **Avoid colored backgrounds:** Чисто белый или светло‑серый фон дают наилучший результат. +- **Batch processing:** Оберните функцию в цикл `for` и логируйте уверенность каждой страницы; отбрасывайте результаты ниже порога, чтобы поддерживать высокое качество. +- **Language support:** Aspose OCR поддерживает несколько языков; задайте `engine.set_language("en")` для оптимизации под английский. + +--- + +## Часто задаваемые вопросы + +**Работает ли это на Linux?** +Да — Aspose OCR поставляется с нативными бинарными файлами для Windows, macOS и Linux. Просто установите pip‑пакет, и всё готово. + +**Что делать, если мой почерк очень курсивный?** +Попробуйте уменьшить порог уверенности (`0.5` или даже `0.4`). Учтите, что это может добавить шум, поэтому при необходимости выполните пост‑обработку вывода (например, проверку орфографии). + +**Могу ли я использовать это в веб‑сервисе?** +Конечно. Функция `recognize_handwriting` не сохраняет состояние, что делает её идеальной для конечных точек Flask или FastAPI. Просто не забудьте вызвать `dispose()` после каждого запроса или использовать менеджер контекста. + +--- + +## Заключение + +Мы рассмотрели **как распознавать рукописный текст** в Python от начала до конца, показали, как **извлекать рукописный текст**, настроить пороги уверенности и справиться с типичными проблемами, такими как низкий контраст или повернутые страницы. Полный скрипт выше готов к запуску, а модульная функция упрощает интеграцию в более крупные проекты — будь то приложение для заметок, оцифровка архивов или просто эксперимент с **handwritten ocr tutorial python**. + +Дальше вы можете изучить **handwritten text recognition python** для многоязычных заметок или объединить OCR с обработкой естественного языка, чтобы автоматически резюмировать протоколы встреч. Возможности безграничны — попробуйте и позвольте вашему коду оживить рукописные каракули. + +Счастливого кодинга, и не стесняйтесь задавать вопросы в комментариях! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/russian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..b01fa3dfe --- /dev/null +++ b/ocr/russian/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,181 @@ +--- +category: general +date: 2026-04-29 +description: Узнайте, как выполнять OCR на ваших сканах, автоматически использовать + модель Hugging Face и распознавать текст со сканов с помощью Aspose OCR за считанные + минуты. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: ru +og_description: Как выполнять OCR на сканах с помощью Aspose OCR, автоматически загружать + модель Hugging Face и получать чистый текст с пунктуацией. +og_title: Как запустить OCR с Aspose и Hugging Face – полное руководство +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Как запустить OCR с Aspose и Hugging Face — Полное руководство +url: /ru/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Как выполнить OCR с Aspose & Hugging Face – Полное руководство + +Когда‑нибудь задавались вопросом **как выполнить OCR** на куче отсканированных документов, не тратя часы на настройку? Вы не одиноки. Во многих проектах разработчикам нужно **распознавать текст со сканов** быстро, но они сталкиваются с загрузкой моделей и пост‑обработкой. + +Хорошие новости: этот учебник показывает готовое к запуску решение, которое **использует модель Hugging Face**, автоматически загружает её и добавляет пунктуацию, чтобы вывод выглядел так, как будто его написал человек. К концу вы получите скрипт, который обрабатывает каждое изображение в папке и сохраняет чистый `.txt` файл рядом с каждым сканом. + +## Что понадобится + +- Python 3.8+ (код использует f‑строки, поэтому более старые версии не подойдут) +- `aspose-ocr` package (установите через `pip install aspose-ocr`) +- Доступ к Интернету для первой загрузки модели +- Папка со сканами изображений (`.png`, `.jpg`, или `.tif`) + +Вот и всё — никаких дополнительных бинарных файлов, никакой ручной настройки модели. Приступим. + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## Шаг 1: Импорт классов Aspose OCR и настройка окружения + +Мы начинаем с импорта необходимых классов из библиотеки Aspose OCR. Импорт всего сразу делает скрипт аккуратным и упрощает поиск недостающих зависимостей. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Почему это важно*: `OcrEngine` выполняет основную работу, а `AsposeAI` позволяет подключить большую языковую модель для более умной пост‑обработки. Если пропустить импорт, остальная часть кода даже не скомпилируется — так что не забудьте его. + +## Шаг 2: Настройка модели Hugging Face с поддержкой GPU + +Теперь мы указываем Aspose, откуда получать модель и сколько слоёв должно работать на GPU. Флаг `allow_auto_download="true"` отвечает за **автоматическую загрузку модели**. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Совет**: Если у вас нет GPU, установите `gpu_layers=0`. Модель переключится на CPU, что медленнее, но всё равно будет работать. + +### Почему выбирать модель Hugging Face? + +Hugging Face предоставляет огромную коллекцию готовых к использованию LLM. Указывая `Qwen/Qwen2.5-3B-Instruct-GGUF`, вы получаете компактную модель, настроенную под инструкции, которая может добавлять пунктуацию, исправлять пробелы и даже устранять небольшие ошибки OCR. Это и есть суть **использования модели Hugging Face** на практике. + +## Шаг 3: Инициализация AI‑движка и включение пост‑обработки пунктуации + +AI‑движок предназначен не только для продвинутого чата — здесь мы подключаем *добавление пунктуации*, которое очищает сырые результаты OCR. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Что происходит?* Вызов `set_post_processor` регистрирует встроенный пост‑процессор, который запускается после завершения работы OCR‑движка. Он берёт сырую строку и вставляет запятые, точки и заглавные буквы там, где это нужно, делая конечный текст гораздо более читаемым. + +## Шаг 4: Создание OCR‑движка и привязка AI‑движка + +Подключение AI‑движка к OCR‑движку даёт нам один объект, который может как распознавать символы, так и улучшать результат. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Если пропустить этот шаг, OCR всё равно будет работать, но вы потеряете улучшение пунктуации — поэтому вывод будет выглядеть как поток слов. + +## Шаг 5: Обработка всех изображений в папке + +Это сердце учебника. Мы проходим по каждому изображению, запускаем OCR, применяем пост‑процессор и записываем очищенный текст в соседний файл `.txt`. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Что ожидать + +Запуск скрипта выводит что‑то вроде: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Каждая строка показывает оценку уверенности (быстрая проверка состояния) и создаёт файлы `invoice_001.png.txt`, `receipt_2024.tif.txt` и т.д., содержащие пунктуацию и читаемый человеком текст. + +### Особые случаи и варианты + +- **Сканы не на английском**: Переключите `hugging_face_repo_id` на многоязычную модель (например, `microsoft/Multilingual-LLM-GGUF`). +- **Большие партии**: Оберните цикл в `concurrent.futures.ThreadPoolExecutor` для параллельной обработки, но учитывайте ограничения памяти GPU. +- **Пользовательская пост‑обработка**: Замените `"punctuation_adder"` своим скриптом, если нужна очистка, специфичная для домена (например, удаление номеров счетов). + +## Шаг 6: Очистка ресурсов + +Когда задача завершается, освобождение ресурсов предотвращает утечки памяти, что особенно важно, если вы запускаете это в длительно работающем сервисе. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Пренебрежение этим шагом может оставить память GPU занятаой, что нарушит последующие запуски. + +## Итоги: Как выполнить OCR от начала до конца + +Всего в нескольких строках мы продемонстрировали **как выполнить OCR** на папке со сканами, **использовать модель Hugging Face**, которая скачивается при первом запуске, и **распознавать текст со сканов** с автоматическим добавлением пунктуации. Полный скрипт готов к копированию, настройке ваших путей и запуску. + +## Следующие шаги и связанные темы + +- **Пакетная пост‑обработка**: Изучите `ocr_engine.run_batch_postprocessor` для ещё более быстрой массовой обработки. +- **Альтернативные модели**: Попробуйте семейство `openai/whisper`, если вам нужен speech‑to‑text вместе с OCR. +- **Интеграция с базами данных**: Сохраняйте извлечённый текст в SQLite или Elasticsearch для поисковых архивов. + +Не стесняйтесь экспериментировать — меняйте модель, настраивайте `gpu_layers` или добавляйте свой пост‑процессор. Гибкость Aspose OCR в сочетании с модельным хабом Hugging Face делает это универсальной основой для любого проекта по оцифровке документов. + +--- + +*Счастливого кодинга! Если возникнут проблемы, оставьте комментарий ниже или проверьте документацию Aspose OCR для более глубоких настроек.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/russian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..66a89aa83 --- /dev/null +++ b/ocr/russian/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,191 @@ +--- +category: general +date: 2026-04-29 +description: Выполнить OCR изображения с помощью Python, автоматически загрузить модель + HuggingFace и эффективно освободить память GPU, одновременно очищая полученный текст. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: ru +og_description: Узнайте, как выполнять OCR на изображении в Python, автоматически + загружать модель HuggingFace, очищать текст и освобождать память GPU. +og_title: Выполните OCR изображения с помощью Python — пошаговое руководство +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Выполнить OCR на изображении с помощью Python — Полное руководство +url: /ru/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Выполнение OCR на изображении с помощью Python – Полное руководство + +Когда‑нибудь вам нужно было **perform OCR on image** файлы, но вы застряли на этапе загрузки модели или очистки памяти GPU? Вы не одиноки — многие разработчики сталкиваются с этим, когда впервые пытаются объединить оптическое распознавание символов с большими языковыми моделями. + +В этом руководстве мы пройдем через единое, сквозное решение, которое **downloads a HuggingFace model in Python**, запускает Aspose OCR, очищает необработанный вывод и, наконец, **releases GPU memory Python** может освободить. К концу у вас будет готовый к запуску скрипт, который преобразует отсканированный PNG в отшлифованный, индексируемый текст. + +> **What you’ll get:** полный, исполняемый пример кода, объяснения, почему каждый шаг важен, советы по избежанию распространенных ошибок и взгляд на то, как настроить конвейер для ваших собственных проектов. + +--- + +## Что понадобится + +- Python 3.9 или новее (пример проверялся на 3.11) +- `aspose-ocr` package (install via `pip install aspose-ocr`) +- Интернет‑соединение для шага **download HuggingFace model python** +- GPU, совместимый с CUDA, если вы хотите ускорить процесс (необязательно, но рекомендуется) + +Дополнительные системные зависимости не требуются; движок Aspose OCR уже включает всё необходимое. + +![perform OCR on image example](image.png "Пример выполнения OCR на изображении с помощью Aspose OCR и пост‑процессора LLM") +*Image alt text: “perform OCR on image – вывод Aspose OCR до и после очистки ИИ”* + +## Выполнение OCR на изображении – пошаговый обзор + +Ниже мы разбиваем рабочий процесс на логические части. Каждая часть имеет собственный заголовок, чтобы AI‑ассистенты могли быстро перейти к интересующей вас секции, а поисковые системы могли индексировать релевантные ключевые слова. + +### 1. Загрузка модели HuggingFace в Python + +Первое, что нам нужно сделать, — получить языковую модель, которая будет выступать в роли пост‑процессора для необработанного вывода OCR. Aspose OCR поставляется с вспомогательным классом `AsposeAI`, который может автоматически загрузить модель из HuggingFace hub. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Why this matters:** +- **download HuggingFace model python** – вы избегаете ручного обращения с zip‑файлами или аутентификации токенов. +- Использование квантизации `int8` уменьшает модель примерно до четверти её исходного размера, что критично, когда позже нужно **release GPU memory python**. + +> **Pro tip:** Храните `directory_model_path` на SSD для более быстрого времени загрузки. + +--- + +### 2. Инициализация AI‑помощника и включение проверки орфографии + +Теперь мы создаём экземпляр `AsposeAI` и подключаем пост‑процессор‑корректор орфографии. Здесь начинается магия **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explanation:** +Корректор орфографии проверяет каждый токен, полученный от OCR‑движка, и предлагает исправления, ограниченные параметром `max_edits`. Эта небольшая настройка может превратить “rec0gn1tion” в “recognition” без тяжёлой языковой модели. + +### 3. Подключение AI‑помощника к OCR‑движку + +Aspose представила новый метод в версии 23.4, который позволяет напрямую подключить AI‑движок к OCR‑конвейеру. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Why we do it:** +Подключив AI‑помощник на раннем этапе, OCR‑движок может при желании использовать модель для улучшений «на лету» (например, обнаружения макета). Это также делает код чище — нет необходимости в отдельных циклах пост‑обработки позже. + +### 4. Выполнение OCR на отсканированном изображении + +Это основной шаг, который действительно **perform OCR on image** файлы. Замените `YOUR_DIRECTORY/input.png` на путь к вашему собственному скану. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Типичный необработанный вывод может содержать переносы строк в странных местах, неверно распознанные символы или лишние знаки. Поэтому нужен следующий шаг. + +**Ожидаемый необработанный вывод (пример):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +### 5. Очистка OCR‑текста в Python с помощью AI‑пост‑процессора + +Теперь мы позволяем AI очистить полученный беспорядок. Это сердце процесса **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Результат, который вы увидите:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Обратите внимание, как корректор орфографии исправил “Th1s” → “This” и удалил лишнее “4n”. Модель также нормализует пробелы, что часто является проблемой при дальнейшем использовании текста в downstream‑NLP конвейерах. + +### 6. Освобождение памяти GPU в Python – шаги очистки + +Когда работа завершена, рекомендуется освободить ресурсы GPU, особенно если вы запускаете несколько OCR‑задач в длительно работающем сервисе. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Что происходит под капотом:** +`free_resources()` выгружает модель с GPU, возвращая память драйверу CUDA. `dispose()` закрывает внутренние буферы OCR‑движка. Пропуск этих вызовов может привести к ошибкам out‑of‑memory уже после нескольких изображений. + +> **Remember:** Если планируете обрабатывать пакеты в цикле, вызывайте очистку после каждого пакета или переиспользуйте один и тот же `ai_helper`, не освобождая его до самого конца. + +## Бонус: Настройка конвейера для разных сценариев + +### Регулировка квантизации модели + +Если у вас мощный GPU (например, RTX 4090) и требуется более высокая точность, измените `hugging_face_quantization` на `"fp16"` и увеличьте `gpu_layers` до `30`. Это потребует больше памяти, поэтому вам придётся **release GPU memory python** более агрессивно после каждого пакета. + +### Использование пользовательского проверяющего орфографию + +Вы можете заменить встроенный `spell_corrector` на пользовательский пост‑процессор, выполняющий доменно‑специфические исправления (например, медицинскую терминологию). Просто реализуйте требуемый интерфейс и передайте его имя в `set_post_processor`. + +### Пакетная обработка нескольких изображений + +Обёрните шаги OCR в цикл `for`, собирайте `cleaned_result.text` в список и вызывайте `ai_helper.free_resources()` только после завершения цикла, если у вас достаточно GPU‑памяти. Это уменьшит накладные расходы на повторную загрузку модели. + +## Заключение + +Мы только что показали, как **perform OCR on image** файлы в Python, автоматически **download a HuggingFace model**, **clean OCR text** и безопасно **release GPU memory**, когда работа завершена. Полный скрипт готов к копированию и вставке, а объяснения дают уверенность в адаптации его к более крупным проектам. + +Следующие шаги? Попробуйте заменить модель Qwen 2.5 на более крупный вариант LLaMA, поэкспериментируйте с разными пост‑процессорами или интегрируйте очищенный вывод в индексируемый Elasticsearch. Возможностей бесконечно много, и теперь у вас есть прочная основа для дальнейшего развития. + +Счастливого кодинга, и пусть ваши OCR‑конвейеры всегда остаются чистыми и дружелюбными к памяти! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/spanish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..2b290dc9f --- /dev/null +++ b/ocr/spanish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Extrae texto de PDF usando Aspose OCR en Python. Aprende el procesamiento + por lotes de OCR en PDF, convierte texto de PDF escaneado y maneja páginas de baja + confianza. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: es +og_description: Extrae texto de PDF con Aspose OCR en Python. Esta guía muestra el + procesamiento por lotes de OCR en PDF, la conversión de texto de PDF escaneado y + el manejo de resultados de baja confianza. +og_title: Extraer texto de PDF – OCR de PDF con Python +tags: +- OCR +- Python +- PDF processing +title: Extraer texto de PDF – OCR PDF con Python +url: /es/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Extraer texto de PDF – OCR PDF con Python + +¿Alguna vez necesitaste **extraer texto de PDF** pero el archivo es solo una imagen escaneada? No estás solo—muchos desarrolladores se topan con ese obstáculo al intentar convertir PDFs en datos buscables. ¿La buena noticia? Con Aspose OCR para Python puedes convertir texto de PDF escaneado en unas pocas líneas, e incluso ejecutar **procesamiento por lotes de OCR en PDF** cuando tienes docenas de archivos que manejar. + +En este tutorial recorreremos todo el flujo de trabajo: configurar la biblioteca, ejecutar OCR en un PDF único, escalar a un lote y manejar páginas con baja confianza para que sepas cuándo se requiere una revisión manual. Al final tendrás un script listo para ejecutar que extrae texto de cualquier PDF escaneado, y comprenderás el porqué de cada paso. + +## Lo que necesitarás + +Antes de sumergirnos, asegúrate de contar con: + +- Python 3.8 o superior (el código usa f‑strings, por lo que 3.6+ funciona, pero se recomienda 3.8+) +- Una licencia de Aspose OCR para Python o una clave de prueba gratuita (puedes obtener una en el sitio web de Aspose) +- Una carpeta con uno o más PDFs escaneados que deseas procesar +- Una cantidad moderada de espacio en disco para los informes *.txt* generados + +Eso es todo—sin dependencias externas pesadas, sin trucos de OpenCV. El motor Aspose OCR hace el trabajo pesado por ti. + +## Configurando el entorno + +Primero, instala el paquete Aspose OCR desde PyPI: + +```bash +pip install aspose-ocr +``` + +Si tienes un archivo de licencia (`Aspose.OCR.lic`), colócalo en la raíz de tu proyecto y actívalo de la siguiente manera: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Consejo profesional:** Mantén el archivo de licencia fuera del control de versiones; añádelo a `.gitignore` para evitar su exposición accidental. + +## Realizando OCR en un PDF único + +Ahora extraigamos texto de un PDF escaneado único. Los pasos principales son: + +1. Crear una instancia de `OcrEngine`. +2. Apuntarlo al archivo PDF. +3. Obtener un `OcrResult` para cada página. +4. Escribir la salida de texto plano en disco. +5. Liberar el motor para liberar recursos nativos. + +Aquí tienes el script completo y ejecutable: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Lo que verás:** Para cada página el script imprime algo como `Page 1: confidence 97.45%`. Si una página está por debajo del umbral del 80 %, aparece una advertencia, indicándote que el OCR podría haber omitido caracteres. + +### Por qué funciona esto + +- `OcrEngine` es la puerta de enlace a la biblioteca nativa Aspose OCR; maneja todo, desde el preprocesamiento de imágenes hasta el reconocimiento de caracteres. +- `extract_from_pdf` rasteriza automáticamente cada página del PDF, por lo que no necesitas convertir el PDF a imágenes tú mismo. +- Los puntajes de confianza te permiten automatizar verificaciones de calidad—crítico cuando procesas documentos legales o médicos donde la precisión es importante. + +## Procesamiento por lotes de OCR en PDF con Python + +La mayoría de los proyectos del mundo real involucran más de un archivo. Extendamos el script de un solo archivo a una canalización de **procesamiento por lotes de OCR en PDF** que recorra un directorio, procese cada PDF y almacene los resultados en una subcarpeta correspondiente. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Cómo ayuda esto + +- **Escalabilidad:** La función recorre la carpeta una vez, creando una subcarpeta de salida dedicada para cada PDF. Esto mantiene todo ordenado cuando tienes docenas de documentos. +- **Reusabilidad:** `ocr_pdf_file` puede ser llamado desde otros scripts (p. ej., un servicio web) porque es una función pura. +- **Manejo de errores:** El script imprime un mensaje amigable si la carpeta de entrada está vacía, evitándote un fallo silencioso. + +## Conversión de texto de PDF escaneado – Manejo de casos límite + +Aunque el código anterior funciona para la mayoría de los PDFs, podrías encontrar algunos inconvenientes: + +| Situación | Por qué ocurre | Cómo mitigar | +|-----------|----------------|--------------| +| **PDFs encriptados** | El PDF está protegido con contraseña. | Pasa la contraseña a `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Documentos multilingües** | Aspose OCR usa inglés por defecto. | Establece `ocr_engine.language = "spa"` para español, o proporciona una lista para idiomas mixtos. | +| **PDFs muy grandes (>500 páginas)** | El uso de memoria se dispara porque cada página se carga en RAM. | Procesa el PDF en bloques usando `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` y repite en un bucle. | +| **Calidad de escaneo pobre** | Baja DPI o mucho ruido reduce la confianza. | Pre‑procesa el PDF con `engine.image_preprocessing = True` o aumenta la DPI mediante `engine.dpi = 300`. | + +> **¡Cuidado!** Activar el preprocesamiento de imágenes puede aumentar notablemente el tiempo de CPU. Si ejecutas un lote nocturno, programa suficiente tiempo o lanza un trabajador separado. + +## Verificando la salida + +Después de que el script termine, encontrarás una estructura de carpetas como: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Abre cualquier archivo `.txt`; deberías ver texto limpio codificado en UTF‑8 que refleja el contenido escaneado original. Si notas caracteres distorsionados, verifica la configuración de idioma del PDF y asegúrate de que los paquetes de fuentes correctos estén instalados en la máquina. + +## Liberando recursos + +Aspose OCR depende de DLLs nativas, por lo que es esencial llamar a `engine.dispose()` una vez que hayas terminado. Olvidar este paso puede provocar fugas de memoria, especialmente en trabajos por lotes de larga duración. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Ejemplo completo de extremo a extremo + +Juntando todo, aquí tienes un único + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/spanish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..940f88967 --- /dev/null +++ b/ocr/spanish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Aprende a reconocer escritura a mano en Python con Aspose OCR. Esta guía + paso a paso muestra cómo extraer texto manuscrito de manera eficiente. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: es +og_description: ¿Cómo reconocer la escritura a mano en Python? Sigue esta guía completa + para extraer texto manuscrito usando Aspose OCR, con código, consejos y manejo de + casos límite. +og_title: Cómo reconocer la escritura a mano en Python – Tutorial completo +tags: +- OCR +- Python +- HandwritingRecognition +title: Cómo reconocer la escritura a mano en Python – Tutorial completo +url: /es/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cómo reconocer escritura a mano en Python – Tutorial completo + +¿Alguna vez necesitaste **cómo reconocer escritura a mano** en un proyecto Python pero no sabías por dónde empezar? No estás solo: los desarrolladores preguntan constantemente, “¿Puedo extraer texto de una nota escaneada?” La buena noticia es que las bibliotecas OCR modernas hacen esto pan comido. En esta guía recorreremos **cómo reconocer escritura a mano** usando Aspose OCR, y también aprenderás a **extraer texto manuscrito** de forma fiable. + +Cubrirémos todo, desde la instalación de la biblioteca hasta ajustar los umbrales de confianza para esos scripts cursivos desordenados. Al final tendrás un script ejecutable que imprime el texto extraído y una puntuación de confianza global, perfecto para aplicaciones de toma de notas, herramientas de archivo o simplemente para saciar la curiosidad. No se requiere experiencia previa en OCR; con conocimientos básicos de Python basta. + +--- + +## Lo que necesitarás + +- **Python 3.9+** (la última versión estable funciona mejor) +- **Aspose.OCR for Python via .NET** – instala con `pip install aspose-ocr` +- Una **imagen manuscrita** (JPEG/PNG) que quieras procesar +- Opcional: un entorno virtual para mantener ordenadas las dependencias + +Si ya tienes estos elementos listos, vamos a sumergirnos. + +![Ejemplo de cómo reconocer escritura a mano](/images/handwritten-sample.jpg "Ejemplo de cómo reconocer escritura a mano") + +*(Texto alternativo: “ejemplo de cómo reconocer escritura a mano mostrando una nota manuscrita escaneada”)* + +--- + +## Paso 1 – Instalar e Importar Clases de Aspose OCR + +Primero lo primero, necesitamos el motor OCR en sí. Aspose ofrece una API limpia que separa el reconocimiento de texto impreso del modo manuscrito. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Por qué es importante:* Importar `HandwritingMode` nos permite indicar al motor que estamos tratando con **reconocimiento de texto manuscrito python** en lugar de texto impreso, lo que mejora drásticamente la precisión para trazos cursivos. + +--- + +## Paso 2 – Crear y Configurar el Motor OCR + +Ahora creamos una instancia de `OcrEngine` y la cambiamos al modo manuscrito. También puedes ajustar el umbral de confianza; valores más bajos aceptan escritura temblorosa, valores más altos exigen una entrada más limpia. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Consejo profesional:* Si tus notas se escanean a 300 DPI o más, normalmente obtendrás una mejor puntuación. Para imágenes de baja resolución, considera escalar con Pillow antes de enviarlas al motor. + +--- + +## Paso 3 – Preparar la Ruta de la Imagen + +Asegúrate de que la ruta del archivo apunte a la imagen que deseas procesar. Las rutas relativas funcionan bien, pero las rutas absolutas evitan sorpresas de “archivo no encontrado”. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Trampa común:* Olvidar escapar las barras invertidas en Windows (`C:\\folder\\image.jpg`). Usar cadenas crudas (`r"C:\folder\image.jpg"`) evita ese problema. + +--- + +## Paso 4 – Ejecutar el Reconocimiento y Capturar Resultados + +El método `recognize` hace el trabajo pesado. Devuelve un objeto con las propiedades `.text` y `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Salida esperada (ejemplo):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Si la confianza cae por debajo de 0.5, quizá necesites limpiar la imagen (eliminar sombras, aumentar contraste) o bajar el umbral en el Paso 2. + +--- + +## Paso 5 – Liberar Recursos + +Aspose OCR mantiene recursos nativos; llamar a `dispose()` los libera y previene fugas de memoria, especialmente al procesar muchas imágenes en un bucle. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*¿Por qué disponer?* En servicios de larga duración (p. ej., una API Flask que acepta cargas), olvidar liberar recursos puede agotar rápidamente la memoria del sistema. + +--- + +## Script completo – Ejecución con un clic + +Juntando todo, aquí tienes un script autónomo que puedes copiar‑pegar y ejecutar. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Guárdalo como `handwritten_ocr.py` y ejecuta `python handwritten_ocr.py`. Si todo está configurado correctamente, verás el texto extraído impreso en la consola. + +--- + +## Manejo de casos límite y variaciones comunes + +### Imágenes de bajo contraste +Si el fondo se mezcla con la tinta, aumenta el contraste primero: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Notas rotadas +Una página de cuaderno inclinada puede afectar el reconocimiento. Usa Pillow para corregir la inclinación: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDFs de varias páginas +Aspose OCR también puede manejar páginas PDF, pero primero debes convertir cada página a una imagen (p. ej., usando `pdf2image`). Luego recorre las imágenes con la misma función `recognize_handwriting`. + +--- + +## Consejos profesionales para mejores resultados de **Extract Handwritten Text** + +- **DPI importa:** Apunta a 300 DPI o más al escanear. +- **Evita fondos coloreados:** Blanco puro o gris claro produce la salida más limpia. +- **Procesamiento por lotes:** Envuelve la función en un bucle `for` y registra la confianza de cada página; descarta resultados por debajo de un umbral para mantener alta calidad. +- **Soporte de idiomas:** Aspose OCR admite varios idiomas; configura `engine.set_language("en")` para optimizar solo en inglés. + +--- + +## Preguntas frecuentes + +**¿Esto funciona en Linux?** +Sí—Aspose OCR incluye binarios nativos para Windows, macOS y Linux. Simplemente instala el paquete pip y listo. + +**¿Qué pasa si mi escritura es extremadamente cursiva?** +Intenta bajar el umbral de confianza (`0.5` o incluso `0.4`). Ten en cuenta que esto puede introducir más ruido, así que procesa la salida posteriormente (p. ej., corrección ortográfica) si es necesario. + +**¿Puedo usar esto en un servicio web?** +Claro. La función `recognize_handwriting` es sin estado, lo que la hace perfecta para endpoints de Flask o FastAPI. Solo recuerda llamar a `dispose()` después de cada solicitud o usar un gestor de contexto. + +--- + +## Conclusión + +Hemos cubierto **cómo reconocer escritura a mano** en Python de principio a fin, mostrándote cómo **extraer texto manuscrito**, ajustar configuraciones de confianza y manejar trampas comunes como bajo contraste o páginas rotadas. El script completo arriba está listo para ejecutarse, y la función modular facilita su integración en proyectos más grandes—ya sea que estés creando una app de toma de notas, digitalizando archivos o simplemente experimentando con técnicas de **handwritten ocr tutorial python**. + +A continuación, podrías explorar **handwritten text recognition python** para notas multilingües, o combinar OCR con procesamiento de lenguaje natural para resumir automáticamente actas de reuniones. El cielo es el límite—pruébalo y deja que tu código dé vida a los garabatos. + +¡Feliz codificación, y no dudes en dejar tus preguntas en los comentarios! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/spanish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..0996fce5b --- /dev/null +++ b/ocr/spanish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Aprende a ejecutar OCR en tus escaneos, usar automáticamente el modelo + de Hugging Face y reconocer texto de los escaneos con Aspose OCR en minutos. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: es +og_description: Cómo ejecutar OCR en escaneos usando Aspose OCR, descargar automáticamente + un modelo de Hugging Face y obtener texto limpio y puntuado. +og_title: Cómo ejecutar OCR con Aspose y Hugging Face – Guía completa +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Cómo ejecutar OCR con Aspose y Hugging Face – Guía completa +url: /es/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cómo ejecutar OCR con Aspose y Hugging Face – Guía completa + +¿Alguna vez te has preguntado **cómo ejecutar OCR** en una pila de documentos escaneados sin pasar horas ajustando configuraciones? No estás solo. En muchos proyectos, los desarrolladores necesitan **reconocer texto de escaneos** rápidamente, pero se tropiezan con la descarga de modelos y el post‑procesamiento. + +Buenas noticias: este tutorial te muestra una solución lista‑para‑ejecutar que **usa un modelo de Hugging Face**, lo descarga automáticamente y agrega puntuación para que la salida se lea como si la hubiera escrito un humano. Al final, tendrás un script que procesa cada imagen en una carpeta y genera un archivo `.txt` limpio junto a cada escaneo. + +## Lo que necesitarás + +- Python 3.8+ (el código usa f‑strings, así que versiones anteriores no sirven) +- `aspose-ocr` package (instalar vía `pip install aspose-ocr`) +- Acceso a Internet para la descarga del modelo la primera vez +- Una carpeta de escaneos de imagen (`.png`, `.jpg`, o `.tif`) + +Eso es todo—sin binarios extra, sin manipulación manual del modelo. Vamos a sumergirnos. + +![ejemplo de cómo ejecutar OCR](https://example.com/ocr-demo.png "ejemplo de cómo ejecutar OCR") + +## Paso 1: Importar clases de Aspose OCR y configurar el entorno + +Comenzamos obteniendo las clases necesarias de la biblioteca Aspose OCR. Importar todo al inicio mantiene el script ordenado y facilita detectar dependencias faltantes. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Por qué es importante*: `OcrEngine` realiza el trabajo pesado, mientras que `AsposeAI` nos permite conectar un modelo de lenguaje grande para un post‑procesamiento más inteligente. Si omites la importación, el resto del código ni siquiera compilará—así que no lo olvides. + +## Paso 2: Configurar un modelo de Hugging Face consciente de GPU + +Ahora indicamos a Aspose dónde obtener el modelo y cuántas capas deben ejecutarse en la GPU. La bandera `allow_auto_download="true"` se encarga de **descargar el modelo automáticamente** por ti. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Consejo profesional**: Si no tienes una GPU, establece `gpu_layers=0`. El modelo recurrirá a la CPU, lo cual es más lento pero sigue funcionando. + +### ¿Por qué elegir un modelo de Hugging Face? + +Hugging Face alberga una enorme colección de LLMs listos para usar. Al apuntar a `Qwen/Qwen2.5-3B-Instruct-GGUF`, obtienes un modelo compacto y afinado por instrucciones que puede agregar puntuación, corregir espaciado e incluso arreglar errores menores de OCR. Esta es la esencia de **usar un modelo de hugging face** en la práctica. + +## Paso 3: Inicializar el motor de IA y habilitar el post‑procesamiento de puntuación + +El motor de IA no es solo para chats elegantes—aquí adjuntamos un *agregador de puntuación* que limpia la salida cruda del OCR. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*¿Qué está pasando?* La llamada `set_post_processor` registra un post‑procesador incorporado que se ejecuta después de que el motor OCR termina. Toma la cadena cruda e inserta comas, puntos y mayúsculas donde corresponden, haciendo que el texto final sea mucho más legible. + +## Paso 4: Crear el motor OCR y conectar el motor de IA + +Conectar el motor de IA al motor OCR nos brinda un único objeto que puede leer caracteres y pulir el resultado. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Si omites este paso, el OCR seguirá funcionando, pero perderás el impulso de puntuación—por lo que la salida se verá como una corriente de palabras. + +## Paso 5: Procesar cada imagen en una carpeta + +Este es el corazón del tutorial. Recorremos cada imagen, ejecutamos OCR, aplicamos el post‑procesador y escribimos el texto limpio en un archivo `.txt` al lado. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Qué esperar + +Ejecutar el script imprime algo como: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Cada línea muestra la puntuación de confianza (una verificación rápida) y crea `invoice_001.png.txt`, `receipt_2024.tif.txt`, etc., que contienen texto puntuado y legible por humanos. + +### Casos límite y variaciones + +- **Escaneos no ingleses**: Cambia `hugging_face_repo_id` a un modelo multilingüe (p. ej., `microsoft/Multilingual-LLM-GGUF`). +- **Lotes grandes**: Envuelve el bucle en un `concurrent.futures.ThreadPoolExecutor` para procesamiento en paralelo, pero ten en cuenta los límites de memoria de la GPU. +- **Post‑procesamiento personalizado**: Reemplaza `"punctuation_adder"` con tu propio script si necesitas una limpieza específica del dominio (p. ej., eliminar números de factura). + +## Paso 6: Liberar recursos + +Cuando el trabajo termina, liberar recursos evita fugas de memoria, especialmente importante si ejecutas esto dentro de un servicio de larga duración. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Descuidar este paso puede dejar memoria de GPU colgada, lo que sabotearía ejecuciones posteriores. + +## Recapitulación: Cómo ejecutar OCR de extremo a extremo + +En solo unas cuantas líneas, hemos mostrado **cómo ejecutar OCR** en una carpeta de escaneos, **usar un modelo de Hugging Face** que se descarga solo la primera vez, y **reconocer texto de escaneos** con puntuación añadida automáticamente. El script completo está listo para copiar‑pegar, ajustar tus rutas y ejecutar. + +## Próximos pasos y temas relacionados + +- **Post‑procesamiento por lotes**: Explora `ocr_engine.run_batch_postprocessor` para un manejo masivo aún más rápido. +- **Modelos alternativos**: Prueba la familia `openai/whisper` si necesitas reconocimiento de voz a texto junto con OCR. +- **Integración con bases de datos**: Almacena el texto extraído en SQLite o Elasticsearch para archivos buscables. + +Siéntete libre de experimentar—cambia el modelo, ajusta `gpu_layers`, o agrega tu propio post‑procesador. La flexibilidad de Aspose OCR combinada con el hub de modelos de Hugging Face hace de esto una base versátil para cualquier proyecto de digitalización de documentos. + +--- + +*¡Feliz codificación! Si encuentras un problema, deja un comentario abajo o consulta la documentación de Aspose OCR para opciones de configuración más avanzadas.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/spanish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..abc80269d --- /dev/null +++ b/ocr/spanish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,209 @@ +--- +category: general +date: 2026-04-29 +description: Realiza OCR en una imagen usando Python, descarga automáticamente un + modelo de HuggingFace y libera la memoria GPU de manera eficiente mientras limpias + el texto OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: es +og_description: Aprende a realizar OCR en una imagen con Python, descargar automáticamente + un modelo de HuggingFace, limpiar el texto y liberar la memoria de la GPU. +og_title: Realiza OCR en una imagen con Python – Guía paso a paso +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Realiza OCR en una imagen con Python – Guía completa +url: /es/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Realizar OCR en Imagen con Python – Guía Completa + +¿Alguna vez necesitaste **realizar OCR en imagen** archivos pero te quedaste atascado en la etapa de descarga del modelo o de limpieza de la memoria GPU? No eres el único—muchos desarrolladores se topan con ese obstáculo cuando intentan combinar el reconocimiento óptico de caracteres con grandes modelos de lenguaje. + +En este tutorial recorreremos una solución única, de extremo a extremo, que **descarga un modelo HuggingFace en Python**, ejecuta Aspose OCR, limpia la salida cruda y, finalmente, **libera la memoria GPU que Python puede recuperar**. Al final tendrás un script listo para ejecutar que convierte un PNG escaneado en texto pulido y buscable. + +> **Lo que obtendrás:** una muestra de código completa y ejecutable, explicaciones de por qué cada paso es importante, consejos para evitar errores comunes y una visión de cómo ajustar la canalización para tus propios proyectos. + +--- + +## Lo que necesitarás + +- Python 3.9 o superior (el ejemplo se probó en 3.11) +- paquete `aspose-ocr` (instalar vía `pip install aspose-ocr`) +- Una conexión a internet para el paso de **download HuggingFace model python** +- Una GPU compatible con CUDA si deseas el aumento de velocidad (opcional pero recomendado) + +No se requieren dependencias a nivel de sistema adicionales; el motor Aspose OCR incluye todo lo que necesitas. + +--- + +![ejemplo de realizar OCR en imagen](image.png "Ejemplo de realizar OCR en imagen con Aspose OCR y un post‑procesador LLM") + +*Texto alternativo de la imagen: “realizar OCR en imagen – salida de Aspose OCR antes y después de la limpieza AI”* + +--- + +## Realizar OCR en Imagen – Visión General Paso a Paso + +A continuación dividimos el flujo de trabajo en bloques lógicos. Cada bloque tiene su propio encabezado, para que los asistentes de IA puedan saltar rápidamente a la parte que te interesa, y los motores de búsqueda puedan indexar las palabras clave relevantes. + +### 1. Descargar Modelo HuggingFace en Python + +Lo primero que debemos hacer es obtener un modelo de lenguaje que actuará como post‑procesador de la salida cruda de OCR. Aspose OCR incluye una clase auxiliar llamada `AsposeAI` que puede descargar automáticamente un modelo del hub de HuggingFace. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Por qué es importante:** +- **download HuggingFace model python** – evitas manejar manualmente archivos zip o autenticación de tokens. +- Usar cuantización `int8` reduce el modelo a aproximadamente una cuarta parte de su tamaño original, lo cual es crucial cuando luego necesitas **release GPU memory python**. + +> **Consejo profesional:** Mantén `directory_model_path` en un SSD para tiempos de carga más rápidos. + +--- + +### 2. Inicializar el Asistente AI y Habilitar la Corrección Ortográfica + +Ahora creamos una instancia de `AsposeAI` y adjuntamos un post‑procesador corrector ortográfico. Aquí es donde comienza la magia del **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Explicación:** +El corrector ortográfico examina cada token del motor OCR y sugiere ediciones limitadas por `max_edits`. Este pequeño ajuste puede convertir “rec0gn1tion” en “recognition” sin necesidad de un modelo de lenguaje pesado. + +--- + +### 3. Conectar el Asistente AI al Motor OCR + +Aspose introdujo un nuevo método en la versión 23.4 que permite conectar un motor AI directamente en la canalización OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Por qué lo hacemos:** +Al conectar el asistente AI temprano, el motor OCR puede usar opcionalmente el modelo para mejoras en tiempo real (p. ej., detección de diseño). También mantiene el código ordenado—no se necesitan bucles de post‑procesamiento separados más adelante. + +--- + +### 4. Realizar OCR en la Imagen Escaneada + +Este es el paso central que realmente **perform OCR on image** archivos. Reemplaza `YOUR_DIRECTORY/input.png` con la ruta a tu propio escaneo. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +La salida cruda típica puede contener saltos de línea en lugares extraños, caracteres mal reconocidos o símbolos errantes. Por eso necesitamos el siguiente paso. + +**Salida cruda esperada (ejemplo):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Limpiar Texto OCR en Python con el Post‑Procesador AI + +Ahora dejamos que la IA limpie el desorden. Este es el corazón del proceso **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Resultado que verás:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Observa cómo el corrector ortográfico arregló el “Th1s” → “This” y eliminó el “4n” errante. El modelo también normaliza los espacios, lo cual suele ser un punto problemático cuando más adelante alimentas el texto a canalizaciones NLP posteriores. + +--- + +### 6. Liberar Memoria GPU en Python – Pasos de Limpieza + +Cuando termines, es una buena práctica liberar los recursos GPU, especialmente si ejecutas múltiples trabajos OCR en un servicio de larga duración. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Qué ocurre internamente:** +`free_resources()` descarga el modelo de la GPU, devolviendo la memoria al controlador CUDA. `dispose()` cierra los búferes internos del motor OCR. Omitir estas llamadas puede provocar errores de falta de memoria después de solo unas cuantas imágenes. + +> **Recuerda:** Si planeas procesar lotes en un bucle, llama a la limpieza después de cada lote o reutiliza el mismo `ai_helper` sin liberarlo hasta el final. + +--- + +## Bonus: Ajustar la Canalización para Diferentes Escenarios + +### Ajustar la Cuantización del Modelo + +Si dispones de una GPU potente (p. ej., RTX 4090) y deseas mayor precisión, cambia `hugging_face_quantization` a `"fp16"` y aumenta `gpu_layers` a `30`. Esto consumirá más memoria, por lo que deberás **release GPU memory python** de forma más agresiva después de cada lote. + +### Usar un Corrector Ortográfico Personalizado + +Puedes reemplazar el `spell_corrector` incorporado por un post‑procesador personalizado que realice correcciones específicas de dominio (p. ej., terminología médica). Simplemente implementa la interfaz requerida y pasa su nombre a `set_post_processor`. + +### Procesamiento por Lotes de Múltiples Imágenes + +Envuelve los pasos de OCR en un bucle `for`, recopila `cleaned_result.text` en una lista y llama a `ai_helper.free_resources()` solo después del bucle si dispones de suficiente RAM GPU. Esto reduce la sobrecarga de cargar repetidamente el modelo. + +--- + +## Conclusión + +Acabamos de mostrarte cómo **perform OCR on image** archivos en Python, descargar automáticamente un **modelo HuggingFace**, **limpiar texto OCR**, y liberar de forma segura la **memoria GPU** cuando termines. El script completo está listo para copiar y pegar, y las explicaciones te dan la confianza para adaptarlo a proyectos más grandes. + +¿Próximos pasos? Prueba cambiar el modelo Qwen 2.5 por una variante LLaMA más grande, experimenta con diferentes post‑procesadores, o integra la salida limpia en un índice Elasticsearch buscable. Las posibilidades son infinitas, y ahora tienes una base sólida sobre la que construir. + +¡Feliz codificación, y que tus canalizaciones OCR estén siempre limpias y amigables con la memoria! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/swedish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..27844c158 --- /dev/null +++ b/ocr/swedish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,216 @@ +--- +category: general +date: 2026-04-29 +description: Extrahera text från PDF med Aspose OCR i Python. Lär dig batch‑OCR PDF‑behandling, + konvertera skannad PDF‑text och hantera sidor med låg förtroendegrad. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: sv +og_description: Extrahera text från PDF med Aspose OCR i Python. Denna guide visar + batch‑OCR PDF‑bearbetning, konvertering av skannad PDF‑text och hantering av resultat + med låg förtroendegrad. +og_title: Extrahera text från PDF – OCR PDF med Python +tags: +- OCR +- Python +- PDF processing +title: Extrahera text från PDF – OCR PDF med Python +url: /sv/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Extrahera text från PDF – OCR PDF med Python + +Har du någonsin behövt **extrahera text från PDF** men filen är bara en skannad bild? Du är inte ensam—många utvecklare stöter på detta när de försöker göra PDF-filer sökbara. Den goda nyheten? Med Aspose OCR för Python kan du konvertera skannad PDF‑text på några rader, och till och med köra **batch OCR PDF‑behandling** när du har dussintals filer att hantera. + +I den här handledningen går vi igenom hela arbetsflödet: installera biblioteket, köra OCR på en enskild PDF, skala upp till en batch, och hantera sidor med låg förtroendegrad så att du vet när en manuell granskning krävs. I slutet har du ett färdigt skript som extraherar text från vilken skannad PDF som helst, och du förstår varför varje steg behövs. + +## Vad du behöver + +- Python 3.8 eller nyare (koden använder f‑strings, så 3.6+ fungerar, men 3.8+ rekommenderas) +- En Aspose OCR för Python-licens eller en gratis provnyckel (du kan skaffa en från Aspose webbplats) +- En mapp med en eller flera skannade PDF‑filer du vill bearbeta +- En måttlig mängd diskutrymme för de genererade *.txt*-rapporterna + +Det är allt—inga tunga externa beroenden, ingen OpenCV‑gymnastik. Aspose OCR‑motorn sköter det tunga arbetet åt dig. + +## Ställa in miljön + +Först, installera Aspose OCR‑paketet från PyPI: + +```bash +pip install aspose-ocr +``` + +Om du har en licensfil (`Aspose.OCR.lic`), placera den i projektets rot och aktivera den så här: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Proffstips:** Håll licensfilen utanför versionskontrollen; lägg till den i `.gitignore` för att undvika oavsiktlig exponering. + +## Utföra OCR på en enskild PDF + +Låt oss nu extrahera text från en enskild skannad PDF. De grundläggande stegen är: + +1. Skapa en `OcrEngine`‑instans. +2. Peka den på PDF‑filen. +3. Hämta ett `OcrResult` för varje sida. +4. Skriv ut textutdata till disk. +5. Frigör motorn för att släppa inhemska resurser. + +Här är det kompletta, körbara skriptet: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Vad du kommer att se:** För varje sida skriver skriptet ut något i stil med `Page 1: confidence 97.45%`. Om en sida hamnar under 80 %-gränsen visas en varning som informerar dig om att OCR kan ha missat tecken. + +### Varför detta fungerar + +- **`OcrEngine`** är porten till det inhemska Aspose OCR‑biblioteket; det hanterar allt från bildförbehandling till teckenigenkänning. +- **`extract_from_pdf`** rasteriserar automatiskt varje PDF‑sida, så du behöver inte konvertera PDF‑en till bilder själv. +- **Förtroendescore** låter dig automatisera kvalitetskontroller—kritiskt när du bearbetar juridiska eller medicinska dokument där noggrannhet är viktigt. + +## Batch OCR PDF‑behandling med Python + +De flesta verkliga projekt involverar mer än en fil. Låt oss utöka en‑fil‑skriptet till en **batch OCR PDF‑behandlings**‑pipeline som går igenom en katalog, bearbetar varje PDF och lagrar resultaten i en motsvarande underkatalog. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Hur detta hjälper + +- **Skalbarhet:** Funktionen går igenom mappen en gång och skapar en dedikerad utdata‑underkatalog för varje PDF. Detta håller ordning när du har dussintals dokument. +- **Återanvändbarhet:** `ocr_pdf_file` kan anropas från andra skript (t.ex. en webbtjänst) eftersom den är en ren funktion. +- **Felhantering:** Skriptet skriver ut ett vänligt meddelande om inmatningsmappen är tom, vilket sparar dig från ett tyst fel. + +## Konvertera skannad PDF‑text – Hantera kantfall + +Även om koden ovan fungerar för de flesta PDF‑filer kan du stöta på några egenheter: + +| Situation | Varför det händer | Hur man mildrar | +|-----------|-------------------|-----------------| +| **Krypterade PDF‑filer** | PDF‑filen är lösenordsskyddad. | Skicka lösenordet till `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Flerspråkiga dokument** | Aspose OCR använder engelska som standard. | Ställ in `ocr_engine.language = "spa"` för spanska, eller ange en lista för blandade språk. | +| **Mycket stora PDF‑filer (>500 sidor)** | Minnesanvändningen skjuter i höjden eftersom varje sida laddas in i RAM. | Bearbeta PDF‑en i delar med `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` och loopa. | +| **Dålig skanningskvalitet** | Låg DPI eller mycket brus minskar förtroendet. | Förbehandla PDF‑en med `engine.image_preprocessing = True` eller öka DPI via `engine.dpi = 300`. | + +> **Observera:** Att slå på bildförbehandling kan märkbart öka CPU‑tiden. Om du kör en nattlig batch, schemalägg tillräckligt med tid eller starta en separat arbetare. + +## Verifiera utdata + +Efter att skriptet har avslutats hittar du en mappstruktur som: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Öppna någon `.txt`‑fil; du bör se ren, UTF‑8‑kodad text som speglar det ursprungliga skannade innehållet. Om du märker förvrängda tecken, dubbelkolla PDF‑ens språkinställningar och se till att rätt teckensnittspaket är installerade på maskinen. + +## Rensa upp resurser + +Aspose OCR förlitar sig på inhemska DLL‑filer, så det är viktigt att anropa `engine.dispose()` när du är klar. Att glömma detta steg kan leda till minnesläckor, särskilt i långvariga batch‑jobb. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Fullt end‑to‑end‑exempel + +Genom att samla allt, här är en enskild + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/swedish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..c2485dcaa --- /dev/null +++ b/ocr/swedish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Lär dig hur du känner igen handskrift i Python med Aspose OCR. Denna + steg‑för‑steg‑guide visar hur du effektivt extraherar handskriven text. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: sv +og_description: Hur känner du igen handskrift i Python? Följ den här kompletta guiden + för att extrahera handskriven text med Aspose OCR, med kod, tips och hantering av + kantfall. +og_title: Hur man känner igen handskrift i Python – Fullständig handledning +tags: +- OCR +- Python +- HandwritingRecognition +title: Hur man känner igen handskrift i Python – Fullständig handledning +url: /sv/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hur man känner igen handskrift i Python – Fullständig handledning + +Har du någonsin behövt **hur man känner igen handskrift** i ett Python‑projekt men inte vetat var du ska börja? Du är inte ensam—utvecklare frågar ständigt: “Kan jag dra ut text från en skannad anteckning?” Den goda nyheten är att moderna OCR‑bibliotek gör detta till en barnlek. I den här guiden går vi igenom **hur man känner igen handskrift** med Aspose OCR, och du får även lära dig att **extrahera handskriven text** på ett pålitligt sätt. + +Vi täcker allt från att installera biblioteket till att justera förtroendesgränser för de där röriga kursiva skripten. I slutet har du ett körbart skript som skriver ut den extraherade texten och ett totalt förtroende‑värde—perfekt för antecknings‑appar, arkiveringsverktyg eller bara för att stilla nyfikenheten. Ingen tidigare OCR‑erfarenhet krävs; grundläggande kunskaper i Python räcker. + +--- + +## Vad du behöver + +- **Python 3.9+** (den senaste stabila versionen fungerar bäst) +- **Aspose.OCR for Python via .NET** – installera med `pip install aspose-ocr` +- En **handskriven bild** (JPEG/PNG) som du vill bearbeta +- Valfritt: en virtuell miljö för att hålla beroenden organiserade + +Om du har dessa saker redo, låt oss dyka in. + +![Exempel på hur man känner igen handskrift](/images/handwritten-sample.jpg "Exempel på hur man känner igen handskrift") + +*(Alt‑text: “exempel på hur man känner igen handskrift som visar en skannad handskriven anteckning”)* + +--- + +## Steg 1 – Installera och importera Aspose OCR‑klasser + +Först och främst behöver vi OCR‑motorn själv. Aspose erbjuder ett rent API som separerar tryckt‑textigenkänning från handskriftsläge. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Varför detta är viktigt:* Att importera `HandwritingMode` låter oss tala om för motorn att vi arbetar med **handwritten text recognition python** snarare än tryckt text, vilket dramatiskt förbättrar noggrannheten för kurviga streck. + +--- + +## Steg 2 – Skapa och konfigurera OCR‑motorn + +Nu skapar vi en `OcrEngine`‑instans och byter till handskriftsläge. Du kan också justera förtroendesgränsen; lägre värden accepterar skakig skrift, högre värden kräver renare inmatning. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Proffstips:* Om dina anteckningar är skannade med 300 DPI eller högre får du vanligtvis ett bättre resultat. För lågupplösta bilder, överväg att skala upp med Pillow innan du skickar dem till motorn. + +--- + +## Steg 3 – Förbered bildens sökväg + +Se till att filvägen pekar på bilden du vill bearbeta. Relativa sökvägar fungerar bra, men absoluta sökvägar undviker “filen hittades inte”‑överraskningar. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Vanligt fallgropp:* Att glömma att escape‑a bakåtsnedstreck på Windows (`C:\\folder\\image.jpg`). Att använda råa strängar (`r"C:\folder\image.jpg"`) kringgår problemet. + +--- + +## Steg 4 – Kör igenkänningen och fånga resultaten + +`recognize`‑metoden gör det tunga arbetet. Den returnerar ett objekt med egenskaperna `.text` och `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Förväntad utskrift (exempel):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Om förtroendet sjunker under 0,5 kan du behöva rengöra bilden (ta bort skuggor, öka kontrast) eller sänka tröskeln i Steg 2. + +--- + +## Steg 5 – Rensa upp resurser + +Aspose OCR håller på inhemska resurser; ett anrop till `dispose()` frigör dem och förhindrar minnesläckor, särskilt när du bearbetar många bilder i en loop. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Varför dispose?* I långlivade tjänster (t.ex. ett Flask‑API som tar emot uppladdningar) kan glömska att frigöra resurser snabbt tömma systemets minne. + +--- + +## Fullt skript – Ett‑klick‑körning + +När allt är sammansatt, här är ett självständigt skript som du kan kopiera‑klistra in och köra. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Spara detta som `handwritten_ocr.py` och kör `python handwritten_ocr.py`. Om allt är korrekt installerat ser du den extraherade texten skriven i konsolen. + +--- + +## Hantera kantfall och vanliga variationer + +### Låg‑kontrastbilder +Om bakgrunden blöder in i bläcket, öka kontrasten först: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Rotera anteckningar +En sned notebook‑sida kan störa igenkänningen. Använd Pillow för att räta upp: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Fler‑sidiga PDF‑filer +Aspose OCR kan också hantera PDF‑sidor, men du måste först konvertera varje sida till en bild (t.ex. med `pdf2image`). Loop sedan igenom bilderna med samma `recognize_handwriting`‑funktion. + +--- + +## Proffstips för bättre **Extract Handwritten Text**‑resultat + +- **DPI är viktigt:** Sikta på 300 DPI eller högre när du skannar. +- **Undvik färgade bakgrunder:** Ren vit eller ljusgrå ger den renaste utskriften. +- **Batch‑bearbetning:** Lägg funktionen i en `for`‑loop och logga varje sidas förtroende; släng resultat under en tröskel för att hålla hög kvalitet. +- **Språkstöd:** Aspose OCR stödjer flera språk; sätt `engine.set_language("en")` för engelsk‑optimering. + +--- + +## Vanliga frågor + +**Fungerar detta på Linux?** +Ja—Aspose OCR levereras med inhemska binärer för Windows, macOS och Linux. Installera bara pip‑paketet så är du klar. + +**Vad händer om min handstil är extremt kursiv?** +Försök att sänka förtroendetröskeln (`0.5` eller till och med `0.4`). Tänk på att detta kan introducera mer brus, så efterbehandla gärna utskriften (t.ex. stavningskontroll) om det behövs. + +**Kan jag använda detta i en webbtjänst?** +Absolut. `recognize_handwriting`‑funktionen är stateless, vilket gör den perfekt för Flask‑ eller FastAPI‑endpoints. Kom bara ihåg att anropa `dispose()` efter varje begäran eller använd en context manager. + +--- + +## Slutsats + +Vi har gått igenom **hur man känner igen handskrift** i Python från början till slut, visat hur du **extraherar handskriven text**, justerar förtroendeinställningar och hanterar vanliga fallgropar som låg kontrast eller roterade sidor. Det kompletta skriptet ovan är redo att köras, och den modulära funktionen gör det enkelt att integrera i större projekt—oavsett om du bygger en anteckningsapp, digitaliserar arkiv eller bara experimenterar med **handwritten ocr tutorial python**‑tekniker. + +Nästa steg kan vara att utforska **handwritten text recognition python** för flerspråkiga anteckningar, eller kombinera OCR med naturlig språkbehandling för att automatiskt sammanfatta mötesprotokoll. Himlen är gränsen—ge det ett försök och låt din kod ge liv åt klotter. + +Lycka till med kodandet, och tveka inte att lämna dina frågor i kommentarerna! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/swedish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..575c153b5 --- /dev/null +++ b/ocr/swedish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Lär dig hur du kör OCR på dina skanningar, använder Hugging Face-modellen + automatiskt och känner igen text från skanningar med Aspose OCR på några minuter. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: sv +og_description: Hur man kör OCR på skanningar med Aspose OCR, automatiskt laddar ner + en Hugging Face-modell och får ren, punktuerad text. +og_title: Hur du kör OCR med Aspose och Hugging Face – Komplett guide +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Hur man kör OCR med Aspose och Hugging Face – Komplett guide +url: /sv/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hur man kör OCR med Aspose & Hugging Face – Komplett guide + +Har du någonsin undrat **hur man kör OCR** på en hög med skannade dokument utan att spendera timmar på att justera inställningarna? Du är inte ensam. I många projekt måste utvecklare **igenkänna text från skanningar** snabbt, men de stöter på problem med modellnedladdningar och efterbehandling. + +Good news: den här handledningen visar dig en färdig‑till‑körning‑lösning som **använder en Hugging Face-modell**, automatiskt hämtar den och lägger till interpunktion så att resultatet läses som om en människa skrivit det. I slutet har du ett skript som bearbetar varje bild i en mapp och skapar en ren `.txt`‑fil bredvid varje skanning. + +## Vad du behöver + +- Python 3.8+ (koden använder f‑strings, så äldre versioner räcker inte) +- `aspose-ocr`-paketet (installera via `pip install aspose-ocr`) +- Internetåtkomst för den första modellnedladdningen +- En mapp med bildskanningar (`.png`, `.jpg` eller `.tif`) + +Det är allt—inga extra binärer, ingen manuell modellhantering. Låt oss dyka in. + +![exempel på hur man kör OCR](https://example.com/ocr-demo.png "exempel på hur man kör OCR") + +## Steg 1: Importera Aspose OCR-klasser & konfigurera miljön + +We start by pulling the necessary classes from the Aspose OCR library. Importing everything up front keeps the script tidy and makes it easy to spot missing dependencies. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Varför detta är viktigt*: `OcrEngine` gör det tunga arbetet, medan `AsposeAI` låter oss ansluta en stor språkmodell för smartare efterbehandling. Om du hoppar över importen kommer resten av koden inte ens att kunna kompileras—så glöm inte den. + +## Steg 2: Konfigurera en GPU‑medveten Hugging Face-modell + +Now we tell Aspose where to fetch the model and how many layers should run on the GPU. The `allow_auto_download="true"` flag does the **download model automatically** part for you. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Proffstips**: Om du inte har ett GPU, sätt `gpu_layers=0`. Modellen kommer då att falla tillbaka till CPU, vilket är långsammare men fortfarande fungerar. + +### Varför välja en Hugging Face-modell? + +Hugging Face har en enorm samling av färdiga LLM:er. Genom att peka på `Qwen/Qwen2.5-3B-Instruct-GGUF` får du en kompakt, instruktion‑optimerad modell som kan lägga till interpunktion, korrigera avstånd och till och med fixa mindre OCR‑fel. Detta är kärnan i **use hugging face model** i praktiken. + +## Steg 3: Initiera AI-motorn och aktivera interpunktion efter‑behandling + +The AI engine isn’t just for fancy chat—here we attach a *punctuation adder* that cleans up raw OCR output. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Vad händer?* Anropet `set_post_processor` registrerar en inbyggd efterprocessor som körs efter att OCR‑motorn är klar. Den tar den råa strängen och sätter in kommatecken, punkter och stora bokstäver där de hör hemma, vilket gör den slutliga texten mycket mer läsbar. + +## Steg 4: Skapa OCR-motorn och anslut AI-motorn + +Connecting the AI engine to the OCR engine gives us a single object that can both read characters and polish the result. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +If you skip this step, the OCR will still work, but you’ll lose the punctuation boost—so the output will look like a stream of words. + +## Steg 5: Bearbeta varje bild i en mapp + +Here’s the heart of the tutorial. We loop over each image, run OCR, apply the post‑processor, and write the cleaned text to a side‑by‑side `.txt` file. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Vad du kan förvänta dig + +Running the script prints something like: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Each line tells you the confidence score (a quick health check) and creates `invoice_001.png.txt`, `receipt_2024.tif.txt`, etc., containing punctuated, human‑readable text. + +### Kantfall & variationer + +- **Icke‑engelska skanningar**: Byt `hugging_face_repo_id` till en flerspråkig modell (t.ex. `microsoft/Multilingual-LLM-GGUF`). +- **Stora batcher**: Inslå loopen i en `concurrent.futures.ThreadPoolExecutor` för parallell bearbetning, men var medveten om GPU‑minnesgränser. +- **Anpassad efterbehandling**: Ersätt `"punctuation_adder"` med ditt eget skript om du behöver domänspecifik rensning (t.ex. ta bort fakturanummer). + +## Steg 6: Rensa upp resurser + +When the job finishes, freeing resources prevents memory leaks, especially important if you’re running this inside a long‑lived service. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Neglecting this step can leave GPU memory hanging, which would sabotage subsequent runs. + +## Sammanfattning: Så kör du OCR från början till slut + +In just a handful of lines, we’ve shown **how to run OCR** on a folder of scans, **use a Hugging Face model** that downloads itself the first time, and **recognize text from scans** with punctuation added automatically. The complete script is ready to copy‑paste, adjust your paths, and execute. + +## Nästa steg & relaterade ämnen + +- **Batch‑efterbehandling**: Utforska `ocr_engine.run_batch_postprocessor` för ännu snabbare massbearbetning. +- **Alternativa modeller**: Prova `openai/whisper`‑familjen om du behöver tal‑till‑text tillsammans med OCR. +- **Integration med databaser**: Spara den extraherade texten i SQLite eller Elasticsearch för sökbara arkiv. + +Feel free to experiment—swap the model, tweak `gpu_layers`, or add your own post‑processor. The flexibility of Aspose OCR combined with Hugging Face’s model hub makes this a versatile foundation for any document‑digitization project. + +--- + +*Lycka till med kodandet! Om du stöter på problem, lämna en kommentar nedanför eller kolla Aspose OCR‑dokumentationen för djupare konfigurationsalternativ.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/swedish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..78bb35c00 --- /dev/null +++ b/ocr/swedish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,208 @@ +--- +category: general +date: 2026-04-29 +description: Utför OCR på bild med Python, automatiskt ladda ner en HuggingFace-modell + och frigör GPU-minne effektivt samtidigt som OCR‑texten rengörs. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: sv +og_description: Lär dig hur du utför OCR på en bild i Python, automatiskt laddar ner + en HuggingFace-modell, rensar texten och frigör GPU‑minne. +og_title: Utför OCR på bild med Python – Steg‑för‑steg guide +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Utför OCR på bild med Python – Komplett guide +url: /sv/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Utför OCR på bild med Python – Komplett guide + +Har du någonsin behövt **perform OCR on image** filer men fastnat vid modellnedladdning eller GPU‑minnesrensning? Du är inte ensam—många utvecklare stöter på den muren när de först försöker kombinera optisk teckenigenkänning med stora språkmodeller. + +I den här handledningen går vi igenom en enda, end‑to‑end‑lösning som **downloads a HuggingFace model in Python**, kör Aspose OCR, rensar den råa utskriften och slutligen **releases GPU memory Python** kan återta. I slutet har du ett färdigt skript som omvandlar en skannad PNG till polerad, sökbar text. + +> **What you’ll get:** ett komplett, körbart kodexempel, förklaringar till varför varje steg är viktigt, tips för att undvika vanliga fallgropar, och en inblick i hur du kan justera pipeline:n för dina egna projekt. + +--- + +## Vad du behöver + +- Python 3.9 eller nyare (exemplet testades på 3.11) +- `aspose-ocr`‑paket (installera via `pip install aspose-ocr`) +- En internetanslutning för steget **download HuggingFace model python** +- Ett CUDA‑kompatibelt GPU om du vill ha hastighetsökning (valfritt men rekommenderas) + +Inga extra system‑nivåberoenden krävs; Aspose OCR‑motorn samlar allt du behöver. + +--- + +![exempel på att utföra OCR på bild](image.png "Exempel på att utföra OCR på bild med Aspose OCR och en LLM‑post‑processor") + +*Bildtext: “perform OCR on image – Aspose OCR output before and after AI cleaning”* + +--- + +## Utför OCR på bild – Steg‑för‑steg‑översikt + +Nedan delar vi upp arbetsflödet i logiska delar. Varje del har sin egen rubrik, så AI‑assistenter kan snabbt hoppa till den del du är intresserad av, och sökmotorer kan indexera de relevanta nyckelorden. + +### 1. Ladda ner HuggingFace‑modell i Python + +Det första vi måste göra är att hämta en språkmodell som kommer att fungera som post‑processor för den råa OCR‑utdata. Aspose OCR levereras med en hjälparklass som heter `AsposeAI` och som automatiskt kan hämta en modell från HuggingFace‑hubben. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Varför detta är viktigt:** +- **download HuggingFace model python** – du undviker att manuellt hantera zip‑filer eller token‑autentisering. +- Att använda `int8`‑kvantisering minskar modellen till ungefär en fjärdedel av sin ursprungliga storlek, vilket är avgörande när du senare behöver **release GPU memory python**. + +> **Pro tip:** Behåll `directory_model_path` på en SSD för snabbare laddningstider. + +--- + +### 2. Initiera AI‑hjälpen och aktivera stavningskontroll + +Nu skapar vi en `AsposeAI`‑instans och fäster en stavningskorrigerande post‑processor. Här börjar magin med **clean OCR text python**. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Förklaring:** +Stavningskorrigeraren granskar varje token från OCR‑motorn och föreslår ändringar begränsade av `max_edits`. Denna lilla justering kan förvandla “rec0gn1tion” till “recognition” utan en tung språkmodell. + +--- + +### 3. Koppla AI‑hjälpen till OCR‑motorn + +Aspose introducerade en ny metod i version 23.4 som låter dig plugga in en AI‑motor direkt i OCR‑pipeline:n. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Varför vi gör det:** +Genom att ansluta AI‑hjälpen tidigt kan OCR‑motorn eventuellt använda modellen för förbättringar i realtid (t.ex. layoutdetektering). Det håller också koden ren – ingen separat post‑processingsloop behövs senare. + +--- + +### 4. Utför OCR på den skannade bilden + +Här är kärnsteg som faktiskt **perform OCR on image** filer. Ersätt `YOUR_DIRECTORY/input.png` med sökvägen till din egen skanning. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Typisk råutdata kan innehålla radbrytningar på konstiga ställen, felaktigt igenkända tecken eller lösa symboler. Därför behövs nästa steg. + +**Förväntad råutdata (exempel):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Rensa OCR‑text i Python med AI‑post‑processorn + +Nu låter vi AI:n städa upp röran. Detta är hjärtat i processen **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Resultat du kommer att se:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Lägg märke till hur stavningskorrigeraren fixade “Th1s” → “This” och tog bort den lösa “4n”. Modellen normaliserar också avstånd, vilket ofta är ett problem när du senare matar in texten i nedströms‑NLP‑pipeline:n. + +--- + +### 6. Frigör GPU‑minne i Python – Rengöringssteg + +När du är klar är det god praxis att frigöra GPU‑resurser, särskilt om du kör flera OCR‑jobb i en långlivad tjänst. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Vad som händer under huven:** +`free_resources()` avladdar modellen från GPU och återlämnar minnet till CUDA‑drivrutinen. `dispose()` stänger ner OCR‑motorns interna buffertar. Att hoppa över dessa anrop kan leda till minnesbrist efter bara ett fåtal bilder. + +> **Remember:** Om du planerar att bearbeta batchar i en loop, anropa rengöringen efter varje batch eller återanvänd samma `ai_helper` utan att frigöra den förrän i slutet. + +--- + +## Bonus: Anpassa pipeline:n för olika scenarier + +### Justera modellkvantisering + +Om du har ett kraftfullt GPU (t.ex. RTX 4090) och vill ha högre noggrannhet, ändra `hugging_face_quantization` till `"fp16"` och öka `gpu_layers` till `30`. Detta kommer att konsumera mer minne, så du måste **release GPU memory python** mer aggressivt efter varje batch. + +### Använda en anpassad stavningskontroll + +Du kan byta ut den inbyggda `spell_corrector` mot en egen post‑processor som gör domänspecifika korrigeringar (t.ex. medicinsk terminologi). Implementera bara det erforderliga gränssnittet och skicka dess namn till `set_post_processor`. + +### Batch‑bearbetning av flera bilder + +Packa OCR‑stegen i en `for`‑loop, samla `cleaned_result.text` i en lista och anropa `ai_helper.free_resources()` först efter loopen om du har tillräckligt med GPU‑RAM. Detta minskar overheaden av att ladda modellen upprepade gånger. + +--- + +## Slutsats + +Vi har just visat dig hur du **perform OCR on image** filer i Python, automatiskt **downloads a HuggingFace model**, **clean OCR text**, och säkert **releases GPU memory** när du är klar. Det kompletta skriptet är redo att kopieras och klistras in, och förklaringarna ger dig självförtroendet att anpassa det till större projekt. + +Nästa steg? Prova att byta ut Qwen 2.5‑modellen mot en större LLaMA‑variant, experimentera med olika post‑processorer, eller integrera den rensade utdata i ett sökbart Elasticsearch‑index. Möjligheterna är oändliga, och du har nu en solid grund att bygga vidare på. + +Lycka till med kodandet, och må dina OCR‑pipeline:n alltid vara rena och minnesvänliga! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/thai/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..921c6fb0b --- /dev/null +++ b/ocr/thai/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,217 @@ +--- +category: general +date: 2026-04-29 +description: ดึงข้อความจาก PDF ด้วย Aspose OCR ใน Python. เรียนรู้การประมวลผล OCR + PDF แบบชุด, แปลงข้อความจาก PDF ที่สแกน, และจัดการหน้าที่มีความมั่นใจต่ำ. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: th +og_description: ดึงข้อความจาก PDF ด้วย Aspose OCR ใน Python คู่มือนี้แสดงการประมวลผล + OCR PDF แบบชุด การแปลงข้อความจาก PDF ที่สแกน และการจัดการผลลัพธ์ที่มีความมั่นใจต่ำ. +og_title: ดึงข้อความจาก PDF – OCR PDF ด้วย Python +tags: +- OCR +- Python +- PDF processing +title: ดึงข้อความจาก PDF – OCR PDF ด้วย Python +url: /th/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# ดึงข้อความจาก PDF – OCR PDF ด้วย Python + +เคยต้อง **ดึงข้อความจาก PDF** แต่ไฟล์เป็นแค่ภาพสแกนหรือไม่? คุณไม่ได้เป็นคนเดียว—นักพัฒนาหลายคนเจออุปสรรคนี้เมื่อต้องแปลง PDF ให้เป็นข้อมูลที่ค้นหาได้ ข่าวดีคือ? ด้วย Aspose OCR for Python คุณสามารถแปลงข้อความจาก PDF สแกนได้ในไม่กี่บรรทัด และยังสามารถทำ **การประมวลผล OCR PDF แบบแบตช์** เมื่อคุณมีหลายสิบไฟล์ต้องจัดการ + +ในบทเรียนนี้เราจะเดินผ่านขั้นตอนทั้งหมด: ตั้งค่าห้องสมุด, รัน OCR บน PDF เดียว, ขยายเป็นแบตช์, และจัดการกับหน้าที่ความเชื่อมั่นต่ำเพื่อให้คุณทราบเมื่อจำเป็นต้องตรวจสอบด้วยตนเอง เมื่อจบคุณจะได้สคริปต์พร้อมรันที่ดึงข้อความจาก PDF สแกนใด ๆ และคุณจะเข้าใจเหตุผลของแต่ละขั้นตอน + +## สิ่งที่คุณต้องมี + +ก่อนที่เราจะเริ่มลงมือ ตรวจสอบให้แน่ใจว่าคุณมี: + +- Python 3.8 หรือใหม่กว่า (โค้ดใช้ f‑strings ดังนั้น 3.6+ ทำงานได้ แต่แนะนำ 3.8+) +- ใบอนุญาต Aspose OCR for Python หรือคีย์ทดลองฟรี (คุณสามารถรับได้จากเว็บไซต์ Aspose) +- โฟลเดอร์ที่มี PDF สแกนหนึ่งไฟล์หรือหลายไฟล์ที่ต้องการประมวลผล +- พื้นที่ดิสก์เพียงพอสำหรับไฟล์รายงาน *.txt* ที่สร้างขึ้น + +เท่านี้—ไม่มีการพึ่งพาไลบรารีภายนอกหนัก ๆ, ไม่มีการทำงานกับ OpenCV. เอนจิน OCR ของ Aspose จะทำงานหนักให้คุณ + +## ตั้งค่าสภาพแวดล้อม + +แรกสุด ให้ติดตั้งแพ็กเกจ Aspose OCR จาก PyPI: + +```bash +pip install aspose-ocr +``` + +หากคุณมีไฟล์ใบอนุญาต (`Aspose.OCR.lic`) ให้วางไว้ที่รากของโปรเจกต์และเปิดใช้งานดังนี้: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **เคล็ดลับ:** อย่าวางไฟล์ใบอนุญาตไว้ในระบบควบคุมเวอร์ชัน; เพิ่มไฟล์นี้ใน `.gitignore` เพื่อหลีกเลี่ยงการเปิดเผยโดยบังเอิญ + +## ทำ OCR บน PDF เดียว + +ตอนนี้มาดึงข้อความจาก PDF สแกนไฟล์เดียว ขั้นตอนหลักคือ: + +1. สร้างอินสแตนซ์ `OcrEngine` +2. ชี้ไปที่ไฟล์ PDF +3. ดึง `OcrResult` สำหรับแต่ละหน้า +4. เขียนผลลัพธ์เป็นข้อความธรรมดาไปยังดิสก์ +5. ปิดการทำงานของเอนจินเพื่อปล่อยทรัพยากรเนทีฟ + +นี่คือสคริปต์เต็มที่สามารถรันได้: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**สิ่งที่คุณจะเห็น:** สำหรับแต่ละหน้า สคริปต์จะแสดงข้อความเช่น `Page 1: confidence 97.45%`. หากหน้าหนึ่งมีความเชื่อมั่นต่ำกว่า 80 % จะมีคำเตือนปรากฏ เพื่อบอกว่าการ OCR อาจพลาดอักขระบางตัว + +### ทำไมวิธีนี้ถึงได้ผล + +- **`OcrEngine`** เป็นประตูสู่ไลบรารี Aspose OCR เนทีฟ; มันจัดการทุกอย่างตั้งแต่การเตรียมภาพล่วงหน้าไปจนถึงการจดจำอักขระ +- **`extract_from_pdf`** จะทำการแรสเตอร์แต่ละหน้าของ PDF โดยอัตโนมัติ ไม่ต้องแปลง PDF เป็นภาพด้วยตนเอง +- **คะแนนความเชื่อมั่น** ช่วยให้คุณทำการตรวจสอบคุณภาพอัตโนมัติ—สำคัญมากเมื่อคุณประมวลผลเอกสารทางกฎหมายหรือการแพทย์ที่ต้องการความแม่นยำสูง + +## การประมวลผล OCR PDF แบบแบตช์ด้วย Python + +โครงการในโลกจริงส่วนใหญ่ต้องจัดการไฟล์มากกว่าหนึ่งไฟล์ ให้ขยายสคริปต์ไฟล์เดียวเป็น **pipeline การประมวลผล OCR PDF แบบแบตช์** ที่เดินผ่านไดเรกทอรี, ประมวลผลแต่ละ PDF, และเก็บผลลัพธ์ในโฟลเดอร์ย่อยที่สอดคล้องกัน + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### วิธีที่ช่วยได้ + +- **ความสามารถขยาย:** ฟังก์ชันเดินผ่านโฟลเดอร์ครั้งเดียว สร้างโฟลเดอร์ย่อยสำหรับผลลัพธ์ของแต่ละ PDF ทำให้จัดการได้เป็นระเบียบเมื่อมีเอกสารหลายสิบไฟล์ +- **การนำกลับมาใช้ใหม่:** `ocr_pdf_file` สามารถเรียกจากสคริปต์อื่น (เช่น เว็บเซอร์วิส) เนื่องจากเป็นฟังก์ชันบริสุทธิ์ +- **การจัดการข้อผิดพลาด:** สคริปต์จะแสดงข้อความเป็นมิตรหากโฟลเดอร์อินพุตว่างเปล่า ช่วยหลีกเลี่ยงการล้มเหลวแบบเงียบ + +## แปลงข้อความจาก PDF สแกน – จัดการกรณีขอบ + +แม้โค้ดข้างต้นจะทำงานได้กับ PDF ส่วนใหญ่ คุณอาจเจอกรณีพิเศษบางอย่าง: + +| สถานการณ์ | สาเหตุ | วิธีแก้ | +|-----------|--------|----------| +| **PDF ที่เข้ารหัส** | PDF มีการป้องกันด้วยรหัสผ่าน | ส่งรหัสผ่านไปยัง `extract_from_pdf(pdf_path, password="yourPwd")` | +| **เอกสารหลายภาษา** | Aspose OCR ตั้งค่าเริ่มต้นเป็นภาษาอังกฤษ | ตั้งค่า `ocr_engine.language = "spa"` สำหรับภาษาสเปน หรือระบุรายการภาษาสำหรับหลายภาษา | +| **PDF ขนาดใหญ่มาก (>500 หน้า)** | การใช้หน่วยความจำพุ่งสูงเพราะโหลดทุกหน้าเข้าสู่ RAM | ประมวลผล PDF เป็นชิ้นส่วนโดยใช้ `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` แล้ววนลูป | +| **คุณภาพสแกนแย่** | DPI ต่ำหรือมีสัญญาณรบกวนมากทำให้ความเชื่อมั่นลดลง | ทำการพรี‑โปรเซสภาพด้วย `engine.image_preprocessing = True` หรือเพิ่ม DPI ผ่าน `engine.dpi = 300` | + +> **ระวัง:** การเปิดใช้งานการพรี‑โปรเซสภาพอาจทำให้เวลา CPU เพิ่มขึ้นอย่างเห็นได้ชัด หากคุณรันแบตช์ทุกคืน ควรจัดสรรเวลาให้เพียงพอหรือใช้ worker แยกออกมา + +## ตรวจสอบผลลัพธ์ + +หลังสคริปต์ทำงานเสร็จ คุณจะพบโครงสร้างโฟลเดอร์เช่น: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +เปิดไฟล์ `.txt` ใดก็ได้; คุณควรเห็นข้อความที่เป็น UTF‑8 สะอาดตรงกับเนื้อหาที่สแกนไว้เดิม หากพบอักขระแปลก ๆ ให้ตรวจสอบการตั้งค่าภาษาใน PDF และตรวจสอบว่ามีการติดตั้งแพ็คฟอนต์ที่ถูกต้องบนเครื่อง + +## ทำความสะอาดทรัพยากร + +Aspose OCR พึ่งพา DLL เนทีฟ ดังนั้นจึงจำเป็นต้องเรียก `engine.dispose()` หลังจากใช้งานเสร็จ การลืมขั้นตอนนี้อาจทำให้เกิดการรั่วของหน่วยความจำ โดยเฉพาะในงานแบตช์ที่รันเป็นเวลานาน + +```python +# Always the last line of your script +engine.dispose() +``` + +## ตัวอย่างครบวงจรจากต้นจนจบ + +รวมทุกอย่างเข้าด้วยกัน นี่คือตัวอย่างสำหรับไฟล์เดียว + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/thai/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..0132d93cb --- /dev/null +++ b/ocr/thai/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,276 @@ +--- +category: general +date: 2026-04-29 +description: เรียนรู้วิธีจดจำลายมือใน Python ด้วย Aspose OCR คู่มือขั้นตอนต่อขั้นตอนนี้แสดงวิธีสกัดข้อความลายมืออย่างมีประสิทธิภาพ. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: th +og_description: วิธีจดจำลายมือใน Python? ติดตามคู่มือฉบับสมบูรณ์นี้เพื่อสกัดข้อความลายมือโดยใช้ + Aspose OCR พร้อมโค้ด เคล็ดลับ และการจัดการกรณีขอบเขต. +og_title: วิธีจดจำลายมือใน Python – บทเรียนเต็ม +tags: +- OCR +- Python +- HandwritingRecognition +title: วิธีจดจำลายมือใน Python – บทเรียนเต็ม +url: /th/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# วิธีการจดจำลายมือใน Python – บทเรียนเต็ม + +เคยต้องการ **วิธีการจดจำลายมือ** ในโปรเจกต์ Python แต่ไม่รู้จะเริ่มจากตรงไหนหรือไม่? คุณไม่ได้อยู่คนเดียว—นักพัฒนามักถามว่า “ฉันสามารถดึงข้อความจากโน้ตสแกนได้หรือไม่?” ข่าวดีคือไลบรารี OCR สมัยใหม่ทำให้เรื่องนี้ง่ายมาก ในคู่มือนี้เราจะอธิบาย **วิธีการจดจำลายมือ** ด้วย Aspose OCR และคุณจะได้เรียนรู้วิธี **ดึงข้อความลายมือ** อย่างแม่นยำ + +เราจะครอบคลุมทุกอย่างตั้งแต่การติดตั้งไลบรารีจนถึงการปรับค่าขีดจำกัดความเชื่อมั่นสำหรับสคริปต์ลายมือที่ยุ่งยาก สุดท้ายคุณจะได้สคริปต์ที่รันได้ซึ่งพิมพ์ข้อความที่ดึงออกมาและคะแนนความเชื่อมั่นรวม—เหมาะสำหรับแอปจดบันทึก, เครื่องมือจัดเก็บเอกสาร, หรือแค่ความอยากรู้อยากเห็น ไม่จำเป็นต้องมีประสบการณ์ OCR มาก่อน; ความรู้พื้นฐาน Python เพียงเล็กน้อยก็พอ + +--- + +## สิ่งที่คุณต้องมี + +- **Python 3.9+** (เวอร์ชันล่าสุดที่เสถียรที่สุดทำงานได้ดีที่สุด) +- **Aspose.OCR for Python via .NET** – ติดตั้งด้วย `pip install aspose-ocr` +- **ภาพลายมือ** (JPEG/PNG) ที่คุณต้องการประมวลผล +- ตัวเลือก: สภาพแวดล้อมเสมือน (virtual environment) เพื่อจัดการ dependencies ให้เป็นระเบียบ + +ถ้าคุณเตรียมสิ่งเหล่านี้พร้อมแล้ว, มาเริ่มกันเลย + +![ตัวอย่างการจดจำลายมือ](/images/handwritten-sample.jpg "ตัวอย่างการจดจำลายมือ") + +*(ข้อความแทนภาพ: “ตัวอย่างการจดจำลายมือที่แสดงโน้ตสแกนลายมือ”)* + +--- + +## ขั้นตอนที่ 1 – ติดตั้งและนำเข้าคลาส Aspose OCR + +ก่อนอื่นเราต้องมีเอนจิน OCR เอง Aspose มี API ที่สะอาดและแยกการจดจำข้อความพิมพ์จากโหมดลายมือ + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*ทำไมจึงสำคัญ:* การนำเข้า `HandwritingMode` ทำให้เราบอกเอนจินว่าเรากำลังทำ **handwritten text recognition python** ไม่ใช่ข้อความพิมพ์ ซึ่งช่วยเพิ่มความแม่นยำอย่างมากสำหรับเส้นลายมือ + +--- + +## ขั้นตอนที่ 2 – สร้างและกำหนดค่า OCR Engine + +ต่อไปเราจะสร้างอินสแตนซ์ `OcrEngine` แล้วสลับเป็นโหมดลายมือ คุณยังสามารถปรับค่าขีดจำกัดความเชื่อมั่นได้; ค่าต่ำรับลายมือที่สั่นไหว, ค่าสูงต้องการอินพุตที่คมชัด + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*เคล็ดลับ:* หากโน้ตของคุณสแกนที่ 300 DPI หรือสูงกว่า, คุณมักจะได้คะแนนที่ดีกว่า สำหรับภาพความละเอียดต่ำ, พิจารณาอัปสเกลด้วย Pillow ก่อนส่งให้เอนจิน + +--- + +## ขั้นตอนที่ 3 – เตรียมเส้นทางไฟล์ภาพ + +ตรวจสอบให้แน่ใจว่าเส้นทางไฟล์ชี้ไปยังภาพที่ต้องการประมวลผล เส้นทางแบบ relative ทำงานได้ดี, แต่เส้นทางแบบ absolute จะหลีกเลี่ยงข้อผิดพลาด “file not found” + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*ข้อผิดพลาดทั่วไป:* ลืม escape backslashes บน Windows (`C:\\folder\\image.jpg`). ใช้ raw strings (`r"C:\folder\image.jpg"`) จะช่วยแก้ปัญหานี้ + +--- + +## ขั้นตอนที่ 4 – รันการจดจำและเก็บผลลัพธ์ + +เมธอด `recognize` ทำงานหนักทั้งหมด มันคืนอ็อบเจกต์ที่มีคุณสมบัติ `.text` และ `.confidence` + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**ผลลัพธ์ที่คาดหวัง (ตัวอย่าง):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +ถ้าความเชื่อมั่นต่ำกว่า 0.5, คุณอาจต้องทำความสะอาดภาพ (ลบเงา, เพิ่มคอนทราสต์) หรือปรับค่าขีดจำกัดในขั้นตอนที่ 2 + +--- + +## ขั้นตอนที่ 5 – ทำความสะอาดทรัพยากร + +Aspose OCR เก็บทรัพยากรเนทีฟ; การเรียก `dispose()` จะปล่อยทรัพยากรเหล่านั้นและป้องกัน memory leak, โดยเฉพาะเมื่อประมวลผลหลายภาพในลูป + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*ทำไมต้อง dispose?* ในบริการที่ทำงานต่อเนื่อง (เช่น Flask API ที่รับอัปโหลด) การลืมปล่อยทรัพยากรอาจทำให้หน่วยความจำเต็มอย่างรวดเร็ว + +--- + +## สคริปต์เต็ม – รันครั้งเดียว + +รวมทุกอย่างเข้าด้วยกัน, นี่คือสคริปต์อิสระที่คุณสามารถคัดลอกและรันได้ + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +บันทึกเป็น `handwritten_ocr.py` แล้วรัน `python handwritten_ocr.py` หากทุกอย่างตั้งค่าอย่างถูกต้อง, คุณจะเห็นข้อความที่ดึงออกมาปรากฏบนคอนโซล + +--- + +## การจัดการกรณีขอบและความแปรผันทั่วไป + +### ภาพที่คอนทราสต์ต่ำ +ถ้าพื้นหลังสีผสมกับหมึก, ให้เพิ่มคอนทราสต์ก่อน: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### โน้ตที่หมุนเอียง +หน้าหนังสือบันทึกที่เอียงอาจทำให้การจดจำผิดพลาด ใช้ Pillow เพื่อแก้ไขการเอียง: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDF หลายหน้า +Aspose OCR สามารถจัดการกับหน้า PDF ได้เช่นกัน, แต่ต้องแปลงแต่ละหน้าเป็นภาพก่อน (เช่น ใช้ `pdf2image`). จากนั้นวนลูปภาพด้วยฟังก์ชัน `recognize_handwriting` เดียวกัน + +--- + +## เคล็ดลับระดับมืออาชีพสำหรับผลลัพธ์ **Extract Handwritten Text** ที่ดียิ่งขึ้น + +- **DPI มีความสำคัญ:** ควรสแกนที่ 300 DPI หรือสูงกว่า +- **หลีกเลี่ยงพื้นหลังสี:** สีขาวหรือสีเทาอ่อนให้ผลลัพธ์ที่สะอาดที่สุด +- **การประมวลผลเป็นชุด:** ห่อฟังก์ชันใน `for` loop และบันทึกความเชื่อมั่นของแต่ละหน้า; กรองผลลัพธ์ที่ต่ำกว่าขีดจำกัดเพื่อรักษาคุณภาพ +- **การสนับสนุนภาษา:** Aspose OCR รองรับหลายภาษา; ตั้งค่า `engine.set_language("en")` เพื่อเพิ่มประสิทธิภาพสำหรับภาษาอังกฤษเท่านั้น + +--- + +## คำถามที่พบบ่อย + +**ทำงานบน Linux ได้หรือไม่?** +ได้—Aspose OCR มาพร้อมไบนารีเนทีฟสำหรับ Windows, macOS, และ Linux เพียงติดตั้งแพคเกจ pip แล้วคุณก็พร้อมใช้งาน + +**ถ้าลายมือของฉันเป็นลายมือที่ค่อนข้างเชื่อมต่อกันมากล่ะ?** +ลองลดค่าขีดจำกัดความเชื่อมั่น (`0.5` หรือแม้แต่ `0.4`). ต้องระวังว่าการทำเช่นนี้อาจทำให้เกิดเสียงรบกวนมากขึ้น, ดังนั้นอาจต้องทำ post‑process เช่น ตรวจสอบการสะกดคำ + +**สามารถใช้ในบริการเว็บได้หรือไม่?** +แน่นอน ฟังก์ชัน `recognize_handwriting` ไม่มีสถานะ (stateless) ทำให้เหมาะกับ endpoint ของ Flask หรือ FastAPI เพียงจำไว้ว่าให้เรียก `dispose()` หลังแต่ละคำขอหรือใช้ context manager + +--- + +## สรุป + +เราได้อธิบาย **วิธีการจดจำลายมือ** ใน Python ตั้งแต่ต้นจนจบ, แสดงวิธี **ดึงข้อความลายมือ**, ปรับค่าความเชื่อมั่น, และจัดการกับปัญหาทั่วไปเช่นคอนทราสต์ต่ำหรือหน้าที่หมุนเอียง สคริปต์เต็มที่ให้ไว้พร้อมรัน, และฟังก์ชันโมดูลาร์ทำให้การรวมเข้ากับโปรเจกต์ใหญ่เป็นเรื่องง่าย—ไม่ว่าจะเป็นแอปจดบันทึก, การดิจิไทซ์เอกสารเก่า, หรือการทดลองกับเทคนิค **handwritten ocr tutorial python** + +ต่อไปคุณอาจสำรวจ **handwritten text recognition python** สำหรับโน้ตหลายภาษา, หรือผสาน OCR กับการประมวลผลภาษาธรรมชาติเพื่อสรุปบันทึกการประชุมโดยอัตโนมัติ ไม่จำกัดอะไร—ลองทำดูและให้โค้ดของคุณทำให้รอยเขียนกลายเป็นข้อมูล + +ขอให้สนุกกับการเขียนโค้ด, และอย่าลังเลที่จะฝากคำถามในคอมเมนต์! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/thai/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..e263c9dd3 --- /dev/null +++ b/ocr/thai/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: เรียนรู้วิธีทำ OCR บนสแกนของคุณ ใช้โมเดล Hugging Face อัตโนมัติ และจดจำข้อความจากสแกนด้วย + Aspose OCR ในไม่กี่นาที. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: th +og_description: วิธีทำ OCR บนสแกนด้วย Aspose OCR ดาวน์โหลดโมเดลจาก Hugging Face อัตโนมัติ + และได้ข้อความที่สะอาดพร้อมเครื่องหมายวรรคตอน +og_title: วิธีใช้งาน OCR ด้วย Aspose & Hugging Face – คู่มือฉบับสมบูรณ์ +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: วิธีใช้งาน OCR ด้วย Aspose & Hugging Face – คู่มือฉบับสมบูรณ์ +url: /th/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# วิธีรัน OCR ด้วย Aspose & Hugging Face – คู่มือเต็ม + +เคยสงสัย **วิธีรัน OCR** บนกองเอกสารที่สแกนโดยไม่ต้องใช้เวลาหลายชั่วโมงปรับตั้งค่าหรือไม่? คุณไม่ได้เป็นคนเดียว ในหลายโครงการ นักพัฒนาต้องการ **จดจำข้อความจากการสแกน** อย่างรวดเร็ว แต่กลับเจอปัญหาในการดาวน์โหลดโมเดลและการประมวลผลต่อเนื่อง + +ข่าวดี: บทแนะนำนี้จะแสดงวิธีแก้ปัญหาที่พร้อมใช้งานซึ่ง **ใช้โมเดล Hugging Face**, ดึงโมเดลโดยอัตโนมัติ และเพิ่มเครื่องหมายวรรคตอนเพื่อให้ผลลัพธ์อ่านเหมือนคนเขียน สุดท้ายคุณจะได้สคริปต์ที่ประมวลผลรูปภาพทุกไฟล์ในโฟลเดอร์และสร้างไฟล์ `.txt` ที่สะอาดข้างๆ การสแกนแต่ละไฟล์ + +## สิ่งที่คุณต้องการ + +- Python 3.8+ (โค้ดใช้ f‑strings ดังนั้นเวอร์ชันเก่าจะไม่ทำงาน) +- `aspose-ocr` package (ติดตั้งโดยใช้ `pip install aspose-ocr`) +- การเข้าถึงอินเทอร์เน็ตสำหรับการดาวน์โหลดโมเดลครั้งแรก +- โฟลเดอร์ของภาพสแกน (`.png`, `.jpg`, หรือ `.tif`) + +เท่านี้—ไม่มีไบนารีเพิ่มเติม ไม่มีการจัดการโมเดลด้วยตนเอง มาเริ่มกันเลย + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## ขั้นตอนที่ 1: นำเข้า Aspose OCR Classes & ตั้งค่าสภาพแวดล้อม + +เราเริ่มโดยดึงคลาสที่จำเป็นจากไลบรารี Aspose OCR การนำเข้าทั้งหมดตั้งแต่ต้นทำให้สคริปต์เป็นระเบียบและง่ายต่อการตรวจหาการพึ่งพาที่ขาดหาย + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*ทำไมเรื่องนี้สำคัญ*: `OcrEngine` ทำงานหนัก ส่วน `AsposeAI` ให้เราต่อโมเดลภาษาใหญ่เพื่อการประมวลผลต่อเนื่องที่ฉลาดขึ้น หากคุณละเว้นการนำเข้า โค้ดส่วนที่เหลือจะไม่คอมไพล์เลย—ดังนั้นอย่าลืมทำ + +## ขั้นตอนที่ 2: กำหนดค่าโมเดล Hugging Face ที่รองรับ GPU + +ตอนนี้เราบอก Aspose ว่าจะดึงโมเดลจากที่ไหนและต้องใช้เลเยอร์บน GPU กี่ชั้น ธง `allow_auto_download="true"` ทำหน้าที่ **ดาวน์โหลดโมเดลโดยอัตโนมัติ** ให้คุณ + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **เคล็ดลับ**: หากคุณไม่มี GPU ให้ตั้งค่า `gpu_layers=0` โมเดลจะย้อนกลับไปใช้ CPU ซึ่งช้ากว่าแต่ยังทำงานได้ + +### ทำไมต้องเลือกโมเดล Hugging Face? + +Hugging Face มีคอลเลกชันขนาดใหญ่ของ LLM ที่พร้อมใช้ โดยการชี้ไปที่ `Qwen/Qwen2.5-3B-Instruct-GGUF` คุณจะได้โมเดลขนาดกะทัดรัดที่ปรับตามคำสั่งซึ่งสามารถเพิ่มเครื่องหมายวรรคตอน แก้ไขการเว้นวรรค และแม้กระทั่งแก้ไขข้อผิดพลาด OCR เล็กน้อย นี่คือแก่นของการ **ใช้โมเดล hugging face** ในการปฏิบัติ + +## ขั้นตอนที่ 3: เริ่มต้น AI Engine และเปิดใช้งานการประมวลผลต่อเนื่องเพื่อเพิ่มเครื่องหมายวรรคตอน + +AI engine ไม่ได้มีไว้เพียงการสนทนาที่หรูหรา—ที่นี่เราติดตั้ง *punctuation adder* เพื่อทำความสะอาดผลลัพธ์ OCR ดิบ + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*เกิดอะไรขึ้น?* การเรียก `set_post_processor` ลงทะเบียน post‑processor ในตัวที่ทำงานหลังจาก OCR engine เสร็จ มันรับสตริงดิบและแทรกเครื่องหมายจุลภาค จุด และตัวอักษรพิมพ์ใหญ่ในตำแหน่งที่ควรจะเป็น ทำให้ข้อความสุดท้ายอ่านง่ายขึ้นมาก + +## ขั้นตอนที่ 4: สร้าง OCR Engine และเชื่อมต่อ AI Engine + +การเชื่อมต่อ AI engine กับ OCR engine ทำให้เราได้อ็อบเจ็กต์เดียวที่สามารถอ่านอักขระและทำให้ผลลัพธ์เรียบเนียนขึ้น + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +หากคุณข้ามขั้นตอนนี้ OCR จะยังทำงานได้ แต่คุณจะเสียการเพิ่มเครื่องหมายวรรคตอน—ทำให้ผลลัพธ์ดูเหมือนสตรีมของคำต่อเนื่อง + +## ขั้นตอนที่ 5: ประมวลผลทุกภาพในโฟลเดอร์ + +นี่คือหัวใจของบทแนะนำ เราวนลูปผ่านแต่ละภาพ รัน OCR ใช้ post‑processor แล้วเขียนข้อความที่ทำความสะอาดแล้วลงไฟล์ `.txt` ข้างๆ + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### สิ่งที่คาดว่าจะได้ + +การรันสคริปต์จะแสดงผลประมาณ: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +แต่ละบรรทัดบอกคะแนนความเชื่อมั่น (การตรวจสอบสุขภาพอย่างรวดเร็ว) และสร้างไฟล์ `invoice_001.png.txt`, `receipt_2024.tif.txt` เป็นต้น ที่มีข้อความที่มีเครื่องหมายวรรคตอนและอ่านง่ายเหมือนมนุษย์ + +### กรณีขอบและการปรับเปลี่ยน + +- **การสแกนที่ไม่ใช่ภาษาอังกฤษ**: เปลี่ยน `hugging_face_repo_id` ไปยังโมเดลหลายภาษา (เช่น `microsoft/Multilingual-LLM-GGUF`) +- **แบชขนาดใหญ่**: ห่อวงวนลูปด้วย `concurrent.futures.ThreadPoolExecutor` เพื่อประมวลผลแบบขนาน แต่ต้องระวังขีดจำกัดหน่วยความจำ GPU +- **การประมวลผลต่อเนื่องแบบกำหนดเอง**: แทนที่ `"punctuation_adder"` ด้วยสคริปต์ของคุณเองหากต้องการทำความสะอาดเฉพาะโดเมน (เช่น การลบหมายเลขใบแจ้งหนี้) + +## ขั้นตอนที่ 6: ทำความสะอาดทรัพยากร + +เมื่องานเสร็จ การปล่อยทรัพยากรช่วยป้องกันการรั่วไหลของหน่วยความจำ ซึ่งสำคัญอย่างยิ่งหากคุณรันนี้ในบริการที่ทำงานต่อเนื่อง + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +การละเลยขั้นตอนนี้อาจทำให้หน่วยความจำ GPU ค้างอยู่ ซึ่งจะทำให้การรันครั้งต่อไปล้มเหลว + +## สรุป: วิธีรัน OCR ตั้งแต่ต้นจนจบ + +ในไม่กี่บรรทัด เราได้แสดง **วิธีรัน OCR** บนโฟลเดอร์ของการสแกน, **ใช้โมเดล Hugging Face** ที่ดาวน์โหลดตัวเองครั้งแรก, และ **จดจำข้อความจากการสแกน** พร้อมเครื่องหมายวรรคตอนที่เพิ่มโดยอัตโนมัติ สคริปต์เต็มพร้อมคัดลอก‑วาง ปรับเส้นทางของคุณ แล้วรันได้เลย + +## ขั้นตอนต่อไปและหัวข้อที่เกี่ยวข้อง + +- **การประมวลผลต่อเนื่องเป็นชุด**: สำรวจ `ocr_engine.run_batch_postprocessor` เพื่อการจัดการแบบกลุ่มที่เร็วขึ้น +- **โมเดลทางเลือก**: ลองใช้ตระกูล `openai/whisper` หากคุณต้องการ speech‑to‑text ควบคู่กับ OCR +- **การรวมกับฐานข้อมูล**: เก็บข้อความที่สกัดออกใน SQLite หรือ Elasticsearch เพื่อทำคลังข้อมูลที่ค้นหาได้ + +อย่าลังเลที่จะทดลอง—เปลี่ยนโมเดล ปรับ `gpu_layers` หรือเพิ่ม post‑processor ของคุณเอง ความยืดหยุ่นของ Aspose OCR ร่วมกับศูนย์โมเดลของ Hugging Face ทำให้เป็นพื้นฐานที่หลากหลายสำหรับโครงการดิจิไทซ์เอกสารใด ๆ + +--- + +*ขอให้สนุกกับการเขียนโค้ด! หากเจออุปสรรคใด ๆ ฝากคอมเมนต์ด้านล่างหรือดูเอกสาร Aspose OCR เพื่อดูตัวเลือกการตั้งค่าที่ลึกขึ้น.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/thai/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..717e0929f --- /dev/null +++ b/ocr/thai/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,198 @@ +--- +category: general +date: 2026-04-29 +description: ทำการ OCR บนภาพโดยใช้ Python, ดาวน์โหลดโมเดล HuggingFace อัตโนมัติและปล่อยหน่วยความจำ + GPU อย่างมีประสิทธิภาพขณะทำความสะอาดข้อความ OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: th +og_description: เรียนรู้วิธีทำ OCR บนรูปภาพด้วย Python ดาวน์โหลดโมเดล HuggingFace + อัตโนมัติ ทำความสะอาดข้อความและคืนหน่วยความจำ GPU +og_title: ทำ OCR บนภาพด้วย Python – คู่มือขั้นตอนโดยละเอียด +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: ทำ OCR บนภาพด้วย Python – คู่มือฉบับสมบูรณ์ +url: /th/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# ทำ OCR บนรูปภาพด้วย Python – คู่มือฉบับสมบูรณ์ + +เคยต้องการ **perform OCR on image** ไฟล์แต่ติดขัดที่ขั้นตอนดาวน์โหลดโมเดลหรือการทำความสะอาดหน่วยความจำ GPU หรือไม่? คุณไม่ได้เป็นคนเดียว—นักพัฒนาจำนวนมากเจออุปสรรคนี้เมื่อลองผสานการจดจำอักขระด้วยแสง (OCR) กับโมเดลภาษาใหญ่เป็นครั้งแรก + +ในบทแนะนำนี้ เราจะพาคุณผ่านโซลูชันแบบต้นจนจบที่ **downloads a HuggingFace model in Python**, รัน Aspose OCR, ทำความสะอาดผลลัพธ์ดิบ, และสุดท้าย **releases GPU memory Python** ที่ Python สามารถกู้คืนได้. เมื่อจบคุณจะได้สคริปต์พร้อมใช้งานที่แปลงไฟล์ PNG สแกนเป็นข้อความที่เรียบหรูและค้นหาได้ + +> **What you’ll get:** ตัวอย่างโค้ดที่สมบูรณ์และรันได้, คำอธิบายว่าทำไมแต่ละขั้นตอนจึงสำคัญ, เคล็ดลับเพื่อหลีกเลี่ยงข้อผิดพลาดทั่วไป, และมุมมองว่าคุณจะปรับแต่ง pipeline สำหรับโครงการของคุณอย่างไร + +## สิ่งที่คุณต้องการ + +- Python 3.9 หรือใหม่กว่า (ตัวอย่างทดสอบบน 3.11) +- `aspose-ocr` package (ติดตั้งโดยใช้ `pip install aspose-ocr`) +- การเชื่อมต่ออินเทอร์เน็ตสำหรับขั้นตอน **download HuggingFace model python** +- GPU ที่รองรับ CUDA หากคุณต้องการความเร็วเพิ่ม (ไม่บังคับแต่แนะนำ) + +ไม่ต้องการการพึ่งพาระดับระบบเพิ่มเติม; Aspose OCR engine มีทุกอย่างที่คุณต้องการรวมอยู่แล้ว. + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Image alt text: “perform OCR on image – ผลลัพธ์ Aspose OCR ก่อนและหลังการทำความสะอาดด้วย AI”* + +## Perform OCR on Image – ภาพรวมขั้นตอนทีละขั้นตอน + +ด้านล่างเราจะแบ่ง workflow เป็นส่วนที่มีตรรกะแต่ละส่วนมีหัวข้อของตนเอง เพื่อให้ผู้ช่วย AI สามารถกระโดดไปยังส่วนที่คุณสนใจได้อย่างรวดเร็วและเครื่องมือค้นหาสามารถทำดัชนีคำสำคัญที่เกี่ยวข้อง + +### 1. ดาวน์โหลด HuggingFace Model ใน Python + +สิ่งแรกที่เราต้องทำคือดึงโมเดลภาษาเพื่อทำหน้าที่เป็น post‑processor สำหรับผลลัพธ์ OCR ดิบ Aspose OCR มาพร้อมคลาสช่วยเหลือชื่อ `AsposeAI` ที่สามารถดึงโมเดลจาก HuggingFace hub ได้โดยอัตโนมัติ + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**ทำไมเรื่องนี้ถึงสำคัญ:** +- **download HuggingFace model python** – คุณหลีกเลี่ยงการจัดการไฟล์ zip หรือการยืนยันโทเคนด้วยตนเอง. +- การใช้การควอนติฟาย `int8` ทำให้โมเดลลดขนาดลงเหลือประมาณหนึ่งในสี่ของขนาดเดิม ซึ่งสำคัญเมื่อคุณต้อง **release GPU memory python** ในภายหลัง. + +> **Pro tip:** เก็บ `directory_model_path` ไว้บน SSD เพื่อเวลาโหลดที่เร็วขึ้น. + +--- + +### 2. Initialise the AI Helper and Enable Spell‑Checking + +ตอนนี้เราสร้างอินสแตนซ์ `AsposeAI` และแนบ post‑processor ตัวแก้ไขการสะกด. นี่คือจุดเริ่มต้นของเวทมนตร์ **clean OCR text python** + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**คำอธิบาย:** +Spell‑corrector ตรวจสอบแต่ละโทเคนจาก OCR engine และเสนอการแก้ไขที่จำกัดโดย `max_edits`. การปรับเล็ก ๆ นี้สามารถเปลี่ยน “rec0gn1tion” ให้เป็น “recognition” ได้โดยไม่ต้องใช้โมเดลภาษาใหญ่ + +--- + +### 3. Hook the AI Helper into the OCR Engine + +Aspose แนะนำเมธอดใหม่ในเวอร์ชัน 23.4 ที่ให้คุณต่อ AI engine เข้ากับ pipeline ของ OCR ได้โดยตรง + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**ทำไมเราถึงทำเช่นนี้:** +การเชื่อม AI helper ตั้งแต่ต้นทำให้ OCR engine สามารถใช้โมเดลเพื่อปรับปรุงผลลัพธ์แบบเรียลไทม์ (เช่น การตรวจจับเลย์เอาต์) และช่วยให้โค้ดดูเรียบร้อย—ไม่ต้องสร้างลูป post‑processing แยกต่างหากภายหลัง + +--- + +### 4. Perform OCR on the Scanned Image + +นี่คือขั้นตอนหลักที่จริง ๆ แล้ว **perform OCR on image** ไฟล์. แทนที่ `YOUR_DIRECTORY/input.png` ด้วยพาธของสแกนของคุณเอง + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +ผลลัพธ์ดิบที่ได้มักจะมีการขึ้นบรรทัดใหม่ในตำแหน่งแปลก ๆ, ตัวอักษรที่อ่านผิด, หรือสัญลักษณ์รบกวน. นั่นคือเหตุผลที่เราต้องทำขั้นตอนต่อไป + +**ผลลัพธ์ดิบที่คาดหวัง (ตัวอย่าง):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Clean OCR Text in Python with the AI Post‑Processor + +ตอนนี้ให้ AI ทำความสะอาดข้อความที่สกปรก. นี้คือหัวใจของกระบวนการ **clean OCR text python** + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**ผลลัพธ์ที่คุณจะเห็น:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +สังเกตว่า spell‑corrector แก้ “Th1s” → “This” และลบ “4n” ที่เหลืออยู่ โมเดลยังทำการปรับระยะห่างของคำให้เป็นมาตรฐาน ซึ่งมักเป็นปัญหาเมื่อคุณนำข้อความไปใช้ใน pipeline NLP ต่อไป + +--- + +### 6. Release GPU Memory in Python – Clean‑up Steps + +เมื่อทำงานเสร็จแล้ว ควรปล่อยทรัพยากร GPU ให้ว่างเพื่อหลีกเลี่ยงการใช้หน่วยความจำเกินกำหนด, โดยเฉพาะอย่างยิ่งหากคุณรัน OCR หลายภาพต่อเนื่อง + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**สิ่งที่เกิดขึ้นเบื้องหลัง:** +`free_resources()` จะ unload โมเดลออกจาก GPU, คืนหน่วยความจำให้กับไดรเวอร์ CUDA. `dispose()` ปิดบัฟเฟอร์ภายในของ OCR engine. การข้ามขั้นตอนเหล่านี้อาจทำให้เกิด error out‑of‑memory หลังจากประมวลผลเพียงไม่กี่ภาพ + +> **Remember:** หากคุณวางแผนประมวลผลเป็นชุดในลูป, ให้เรียก clean‑up หลังแต่ละชุดหรือใช้ `ai_helper` เดียวกันจนจบกระบวนการทั้งหมด + +## Bonus: ปรับแต่ง Pipeline สำหรับสถานการณ์ต่างๆ + +### Adjusting Model Quantization + +หากคุณมี GPU ที่ทรงพลัง (เช่น RTX 4090) และต้องการความแม่นยำสูงขึ้น, เปลี่ยน `hugging_face_quantization` เป็น `"fp16"` และเพิ่ม `gpu_layers` เป็น `30`. การตั้งค่านี้จะใช้หน่วยความจำมากขึ้น, ดังนั้นคุณจะต้อง **release GPU memory python** อย่างเข้มข้นหลังแต่ละชุด + +### Using a Custom Spell‑Checker + +คุณสามารถสลับ `spell_corrector` ที่มาพร้อมกับระบบเป็น post‑processor ที่กำหนดเองสำหรับการแก้ไขเฉพาะโดเมน (เช่น คำศัพท์ทางการแพทย์). เพียงแค่ implement อินเทอร์เฟซที่ต้องการและส่งชื่อคลาสให้ `set_post_processor` + +### Batch Processing Multiple Images + +ห่อขั้นตอน OCR ไว้ในลูป `for`, เก็บ `cleaned_result.text` ลงในลิสต์, และเรียก `ai_helper.free_resources()` หลังลูปเท่านั้นหากมี RAM GPU เพียงพอ. วิธีนี้ช่วยลดค่าโอเวอร์เฮดจากการโหลดโมเดลซ้ำหลายครั้ง + +## Conclusion + +เราได้สาธิตวิธี **perform OCR on image** ด้วย Python, ดาวน์โหลด HuggingFace model อัตโนมัติ, ทำความสะอาดข้อความ OCR, และปล่อยหน่วยความจำ GPU อย่างปลอดภัยเมื่อเสร็จ. สคริปต์เต็มพร้อมคัดลอก‑วาง, และคำอธิบายช่วยให้คุณมั่นใจในการปรับใช้กับโครงการขนาดใหญ่ต่อไป + +ขั้นตอนต่อไป? ลองเปลี่ยนโมเดล Qwen 2.5 เป็นเวอร์ชัน LLaMA ที่ใหญ่กว่า, ทดลอง post‑processor ต่าง ๆ, หรือผสานผลลัพธ์ที่ทำความสะอาดแล้วเข้าสู่ดัชนี Elasticsearch ที่ค้นหาได้. ความเป็นไปได้ไม่มีที่สิ้นสุด, และคุณมีพื้นฐานที่มั่นคงเพื่อสร้างต่อ + +Happy coding, and may your OCR pipelines be ever‑clean and memory‑friendly! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/turkish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..8efac7530 --- /dev/null +++ b/ocr/turkish/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Python'da Aspose OCR kullanarak PDF'den metin çıkarın. Toplu OCR PDF + işleme, taranmış PDF metnini dönüştürme ve düşük güvenilirlikli sayfaları yönetmeyi + öğrenin. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: tr +og_description: Python'da Aspose OCR ile PDF'den metin çıkarın. Bu kılavuz, toplu + OCR PDF işleme, taranmış PDF metnini dönüştürme ve düşük güvenilirlik sonuçlarını + ele almayı gösterir. +og_title: PDF'den Metin Çıkar – Python ile PDF OCR +tags: +- OCR +- Python +- PDF processing +title: PDF'den Metin Çıkar – Python ile OCR PDF +url: /tr/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# PDF'den Metin Çıkarma – Python ile OCR PDF + +Hiç **PDF'den metin çıkarmak** istediğinizde dosyanın sadece taranmış bir görüntü olduğunu fark ettiniz mi? Yalnız değilsiniz—birçok geliştirici PDF'leri aranabilir veri haline getirmeye çalışırken bu engelle karşılaşıyor. İyi haber? Aspose OCR for Python ile taranmış PDF metnini birkaç satırda dönüştürebilir ve hatta **toplu OCR PDF işleme** yapabilirsiniz; dosyalarınız onlarca olduğunda bile. + +Bu öğreticide tüm iş akışını adım adım inceleyeceğiz: kütüphaneyi kurma, tek bir PDF üzerinde OCR çalıştırma, toplu işleme ölçeklendirme ve düşük güvenilirlikli sayfalarla başa çıkma, böylece manuel inceleme gerektiğinde bunu bileceksiniz. Sonunda, herhangi bir taranmış PDF'den metin çıkaran hazır bir betiğe sahip olacak ve her adımın nedenini anlayacaksınız. + +## Gerekenler + +İlerlemeye başlamadan önce şunların olduğundan emin olun: + +- Python 3.8 veya daha yeni bir sürüm (kod f‑string kullandığı için 3.6+ çalışır, ancak 3.8+ tavsiye edilir) +- Aspose OCR for Python lisansı ya da ücretsiz deneme anahtarı (Aspose web sitesinden alabilirsiniz) +- İşlemek istediğiniz bir veya birden fazla taranmış PDF içeren bir klasör +- Oluşturulan *.txt* raporları için makul bir disk alanı + +Hepsi bu—harici ağır bağımlılıklar yok, OpenCV gibi ek kütüphaneler de yok. Aspose OCR motoru işi sizin yerinize yapar. + +## Ortamı Kurma + +İlk olarak Aspose OCR paketini PyPI üzerinden yükleyin: + +```bash +pip install aspose-ocr +``` + +Lisans dosyanız (`Aspose.OCR.lic`) varsa proje kök dizinine koyun ve şu şekilde etkinleştirin: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **İpucu:** Lisans dosyasını sürüm kontrolünden dışarı tutun; `.gitignore` dosyanıza ekleyerek yanlışlıkla ifşa edilmesini önleyin. + +## Tek PDF Üzerinde OCR Çalıştırma + +Şimdi tek bir taranmış PDF'den metin çıkaralım. Temel adımlar şunlar: + +1. Bir `OcrEngine` örneği oluşturun. +2. PDF dosyasını ona gösterin. +3. Her sayfa için bir `OcrResult` alın. +4. Düz metin çıktısını diske yazın. +5. Motoru serbest bırakıp yerel kaynakları temizleyin. + +Tam, çalıştırılabilir betik aşağıdadır: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Gözleyecekleriniz:** Her sayfa için betik `Page 1: confidence 97.45%` gibi bir satır yazdırır. Eğer bir sayfa %80 eşik değerinin altına düşerse, bir uyarı gösterilir ve OCR'un karakter kaçırmış olabileceği bildirilir. + +### Neden Bu Şekilde Çalışıyor + +- **`OcrEngine`** yerel Aspose OCR kütüphanesine erişim sağlar; görüntü ön işleme ve karakter tanımayı kendisi yönetir. +- **`extract_from_pdf`** her PDF sayfasını otomatik olarak rasterleştirir, böylece PDF'yi ayrı ayrı görüntülere çevirmenize gerek kalmaz. +- **Güven skorları** kalite kontrollerini otomatikleştirmenize olanak tanır—özellikle doğruluğun kritik olduğu hukuk veya tıp belgeleriyle çalışırken çok önemlidir. + +## Python ile Toplu OCR PDF İşleme + +Gerçek dünyadaki projeler genellikle birden fazla dosya içerir. Tek‑dosya betiğini **toplu OCR PDF işleme** boru hattına genişletelim; klasörü dolaşır, her PDF'yi işler ve sonuçları eşleşen bir alt‑klasöre kaydeder. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Bu Nasıl Yardımcı Olur + +- **Ölçeklenebilirlik:** Fonksiyon klasörü bir kez dolaşır, her PDF için ayrı bir çıktı alt‑klasörü oluşturur. Bu, onlarca belgeyle çalışırken düzeni korur. +- **Yeniden Kullanılabilirlik:** `ocr_pdf_file` diğer betiklerden (ör. bir web servisi) çağrılabilir çünkü saf bir fonksiyondur. +- **Hata yönetimi:** Giriş klasörü boşsa dostça bir mesaj yazdırır, sessiz hatalardan kaçınmanızı sağlar. + +## Taranmış PDF Metnini Dönüştürme – Kenar Durumlarıyla Baş Etme + +Yukarıdaki kod çoğu PDF için çalışsa da bazı tuhaflıklarla karşılaşabilirsiniz: + +| Durum | Neden Oluşur | Nasıl Çözülür | +|-----------|----------------|-----------------| +| **Şifreli PDF'ler** | PDF parola korumalıdır. | `extract_from_pdf(pdf_path, password="yourPwd")` ile şifreyi geçin. | +| **Çok‑dilli belgeler** | Aspose OCR varsayılan olarak İngilizce kullanır. | `ocr_engine.language = "spa"` ile İspanyolca ayarlayın veya karışık diller için bir liste sağlayın. | +| **Çok büyük PDF'ler (>500 sayfa)** | Her sayfa RAM'e yüklendiği için bellek kullanımı artar. | `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` gibi parçalar halinde işleyin ve döngü kullanın. | +| **Kötü tarama kalitesi** | Düşük DPI veya yüksek gürültü güveni düşürür. | `engine.image_preprocessing = True` ile ön işleme etkinleştirin veya `engine.dpi = 300` ile DPI'yi artırın. | + +> **Dikkat:** Görüntü ön işleme etkinleştirildiğinde CPU süresi belirgin şekilde artabilir. Gece toplu bir iş çalıştırıyorsanız yeterli zamanı planlayın ya da ayrı bir çalışan (worker) başlatın. + +## Çıktıyı Doğrulama + +Betik tamamlandığında aşağıdaki gibi bir klasör yapısı göreceksiniz: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Herhangi bir `.txt` dosyasını açın; orijinal taranmış içeriği yansıtan temiz, UTF‑8 kodlu metin görmelisiniz. Eğer bozuk karakterler fark ederseniz, PDF'nin dil ayarlarını ve makinede doğru font paketlerinin kurulu olduğunu kontrol edin. + +## Kaynakları Temizleme + +Aspose OCR yerel DLL'lere dayanır; bu yüzden işiniz bittiğinde `engine.dispose()` çağırmak çok önemlidir. Bu adımı atlamak, özellikle uzun süren toplu işlerde bellek sızıntılarına yol açabilir. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Tam Uçtan Uca Örnek + +Her şeyi bir araya getirerek, işte tek bir + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/turkish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..a9a482f93 --- /dev/null +++ b/ocr/turkish/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Aspose OCR ile Python’da el yazısını tanımayı öğrenin. Bu adım adım rehber, + el yazısı metnini verimli bir şekilde çıkarmayı gösterir. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: tr +og_description: Python'da el yazısını nasıl tanıyabilirsiniz? Aspose OCR kullanarak + el yazısı metnini çıkarmak için kod, ipuçları ve uç durum yönetimi içeren bu kapsamlı + rehberi takip edin. +og_title: Python'da El Yazısını Tanıma – Tam Kılavuz +tags: +- OCR +- Python +- HandwritingRecognition +title: Python’da El Yazısını Tanıma – Tam Kılavuz +url: /tr/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python’da El Yazısını Tanıma – Tam Kılavuz + +Bir Python projesinde **el yazısını tanıma** ihtiyacınız oldu mu ama nereden başlayacağınızı bilemediniz mi? Yalnız değilsiniz—geliştiriciler sürekli “Taran bir nottan metin çıkarabilir miyim?” diye soruyor. İyi haber şu ki, modern OCR kütüphaneleri bunu çocuk oyuncağı haline getiriyor. Bu rehberde **el yazısını tanıma** yöntemini Aspose OCR kullanarak adım adım göstereceğiz ve **el yazısı metnini çıkarma** konusunda güvenilir sonuçlar almayı öğreneceksiniz. + +Kütüphanenin kurulumu, dağınık el yazısı scriptleri için güven skorlarını ayarlamaya kadar her şeyi ele alacağız. Sonunda, çıkarılan metni ve genel bir güven skorunu ekrana yazdıran çalıştırılabilir bir betiğiniz olacak—not‑alma uygulamaları, arşivleme araçları veya sadece merakınızı gidermek için mükemmel. Önceden OCR deneyimi gerekmez; temel Python bilgisi yeterli. + +--- + +## Gerekenler + +- **Python 3.9+** (en yeni kararlı sürüm en iyisidir) +- **Aspose.OCR for Python via .NET** – `pip install aspose-ocr` ile kurun +- İşlemek istediğiniz bir **el yazısı görüntüsü** (JPEG/PNG) +- İsteğe bağlı: Bağımlılıkları düzenli tutmak için bir sanal ortam + +Bu öğelere sahipseniz, başlayalım. + +![El yazısını tanıma örneği](/images/handwritten-sample.jpg "El yazısını tanıma örneği") + +*(Alt text: “el yazısını tanıma örneği, taranmış bir el yazısı notunu gösteriyor”)* + +--- + +## Adım 1 – Aspose OCR Sınıflarını Kurun ve İçe Aktarın + +İlk olarak OCR motoruna ihtiyacımız var. Aspose, basılı metin tanımını el yazısı modundan ayıran temiz bir API sunar. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Neden önemli:* `HandwritingMode`’u içe aktarmak, motorun **el yazısı metin tanıma python** ile çalıştığını belirtmemizi sağlar ve kıvrımlı harflerde doğruluğu büyük ölçüde artırır. + +--- + +## Adım 2 – OCR Motorunu Oluşturun ve Yapılandırın + +Şimdi bir `OcrEngine` örneği oluşturup el yazısı moduna geçiriyoruz. Güven eşiğini de ayarlayabilirsiniz; düşük değerler titrek yazıyı kabul eder, yüksek değerler daha temiz giriş ister. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*İpucu:* Notlarınız 300 DPI veya daha yüksek bir çözünürlükte taranmışsa genellikle daha iyi bir skor elde edersiniz. Düşük çözünürlüklü görüntüler için, motorun önüne beslemeden önce Pillow ile ölçeklendirmeyi düşünün. + +--- + +## Adım 3 – Görüntü Yolunu Hazırlayın + +İşlemek istediğiniz görüntünün dosya yolunun doğru olduğundan emin olun. Göreli yollar çalışır, ancak mutlak yollar “dosya bulunamadı” hatalarını önler. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Yaygın tuzak:* Windows’da ters eğik çizgileri kaçırmak (`C:\\folder\\image.jpg`). Ham dizgileri (`r"C:\folder\image.jpg"`) kullanmak bu sorunu ortadan kaldırır. + +--- + +## Adım 4 – Tanıma İşlemini Çalıştırın ve Sonuçları Yakalayın + +`recognize` metodu ağır işi yapar. `.text` ve `.confidence` özelliklerine sahip bir nesne döndürür. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Beklenen çıktı (örnek):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Güven skoru 0.5’in altına düşerse, görüntüyü temizlemeniz (gölge kaldırma, kontrast artırma) veya Adım 2’deki eşiği düşürmeniz gerekebilir. + +--- + +## Adım 5 – Kaynakları Temizleyin + +Aspose OCR yerel kaynaklar tutar; `dispose()` çağrısı bunları serbest bırakır ve özellikle bir döngüde çok sayıda görüntü işliyorsanız bellek sızıntılarını önler. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Neden dispose?* Uzun çalışan hizmetlerde (ör. yüklemeleri kabul eden bir Flask API’si) kaynakları serbest bırakmayı unutmak sistem belleğini hızla tüketebilir. + +--- + +## Tam Betik – Tek Tıkla Çalıştır + +Her şeyi bir araya getirerek, kopyalayıp yapıştırabileceğiniz ve çalıştırabileceğiniz bağımsız bir betik aşağıdadır. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Bu dosyayı `handwritten_ocr.py` olarak kaydedin ve `python handwritten_ocr.py` komutunu çalıştırın. Her şey doğru kurulduysa, çıkarılan metin konsola yazdırılacaktır. + +--- + +## Kenar Durumları ve Yaygın Varyasyonlar + +### Düşük Kontrastlı Görüntüler +Arka plan mürekkebe karışıyorsa, önce kontrastı artırın: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Döndürülmüş Notlar +Eğimli bir defter sayfası tanıma başarısını etkileyebilir. Pillow ile eğimi düzeltin: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### Çok Sayfalı PDF’ler +Aspose OCR PDF sayfalarını da işleyebilir, ancak önce her sayfayı bir görüntüye dönüştürmeniz gerekir (ör. `pdf2image` kullanarak). Ardından aynı `recognize_handwriting` fonksiyonuyla görüntüler üzerinde döngü kurun. + +--- + +## Daha İyi **El Yazısı Metni Çıkarma** Sonuçları İçin Profesyonel İpuçları + +- **DPI önemli:** Tarama yaparken 300 DPI veya üzeri hedefleyin. +- **Renkli arka planlardan kaçının:** Saf beyaz veya açık gri en temiz çıktıyı verir. +- **Toplu işleme:** Fonksiyonu bir `for` döngüsü içinde sarın ve her sayfanın güven skorunu kaydedin; kaliteyi yüksek tutmak için eşik altındaki sonuçları atın. +- **Dil desteği:** Aspose OCR birden çok dili destekler; sadece İngilizce için `engine.set_language("en")` ayarlayın. + +--- + +## Sık Sorulan Sorular + +**Bu Linux’ta çalışır mı?** +Evet—Aspose OCR Windows, macOS ve Linux için yerel ikili dosyalar içerir. Pip paketini kurun, sorun yok. + +**El yazım çok kıvrımlıysa ne yapmalıyım?** +Güven eşiğini düşürmeyi deneyin (`0.5` ya da hatta `0.4`). Bunun daha fazla gürültü getirebileceğini unutmayın; gerektiğinde çıktıyı (ör. yazım denetimi) sonradan işleyin. + +**Bunu bir web hizmetinde kullanabilir miyim?** +Kesinlikle. `recognize_handwriting` fonksiyonu durum‑sızdır, bu da Flask ya da FastAPI uç noktaları için idealdir. Her istekten sonra `dispose()` çağırmayı ya da bir bağlam yöneticisi kullanmayı unutmayın. + +--- + +## Sonuç + +Python’da **el yazısını tanıma** sürecini baştan sona ele aldık, **el yazısı metni çıkarma**, güven ayarlarını ince ayarlama ve düşük kontrast ya da döndürülmüş sayfalar gibi yaygın sorunları nasıl yöneteceğinizi gösterdik. Yukarıdaki tam betik çalıştırılmaya hazır ve modüler fonksiyon, not‑alma uygulamaları, arşiv dijitalleştirme veya sadece **el yazısı ocr tutorial python** denemeleri gibi daha büyük projelere kolayca entegre edilebilir. + +İleride çok dilli notlar için **el yazısı tanıma python** keşfedebilir ya da OCR’ı doğal dil işleme ile birleştirerek toplantı tutanaklarını otomatik özetleyebilirsiniz. Hayal gücünüzün sınırı yok—deneyin ve kodunuzun karalamaları hayata geçirmesine izin verin. + +İyi kodlamalar, sorularınızı yorumlarda bekliyoruz! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/turkish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..d1e9e3400 --- /dev/null +++ b/ocr/turkish/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,181 @@ +--- +category: general +date: 2026-04-29 +description: Tarama dosyalarınızda OCR çalıştırmayı, Hugging Face modelini otomatik + olarak kullanmayı ve Aspose OCR ile dakikalar içinde taramalardan metin tanımayı + öğrenin. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: tr +og_description: Aspose OCR kullanarak taramalarda OCR nasıl çalıştırılır, Hugging + Face modelini otomatik olarak indirir ve temiz, noktalama işaretli metin elde eder. +og_title: Aspose ve Hugging Face ile OCR Nasıl Çalıştırılır – Tam Rehber +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Aspose ve Hugging Face ile OCR Nasıl Çalıştırılır – Tam Rehber +url: /tr/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Aspose & Hugging Face ile OCR Çalıştırma – Tam Kılavuz + +Hiç **OCR çalıştırmanın** taranmış belgeler yığını üzerinde saatlerce ayar yapmadan nasıl yapılacağını merak ettiniz mi? Tek başınıza değilsiniz. Birçok projede geliştiriciler, **tarama üzerinden metin tanıma** işlemini hızlıca yapmak istiyor, ancak model indirme ve son‑işleme aşamalarında takılıyorlar. + +İyi haber: Bu öğreticide, **bir Hugging Face modeli** kullanan, otomatik olarak modeli indiren ve çıktıya noktalama ekleyerek metnin insan tarafından yazılmış gibi okunmasını sağlayan hazır bir çözüm gösteriyoruz. Sonunda, klasördeki her görüntüyü işleyen ve her taramanın yanına temiz bir `.txt` dosyası bırakan bir betiğe sahip olacaksınız. + +## Gerekenler + +- Python 3.8+ (kod f‑string kullandığı için daha eski sürümler yeterli olmayacak) +- `aspose-ocr` paketi (`pip install aspose-ocr` ile kurulur) +- İlk model indirme için internet erişimi +- Görüntü taramaları içeren bir klasör (`.png`, `.jpg` veya `.tif`) + +Hepsi bu—ekstra ikili dosyalar yok, manuel model ayarlamaları yok. Hadi başlayalım. + +![how to run OCR example](https://example.com/ocr-demo.png "how to run OCR example") + +## Adım 1: Aspose OCR Sınıflarını İçe Aktarın ve Ortamı Hazırlayın + +Aspose OCR kütüphanesinden gerekli sınıfları alarak başlıyoruz. Her şeyi baştan içe aktarmak betiği düzenli tutar ve eksik bağımlılıkları fark etmeyi kolaylaştırır. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Neden önemli*: `OcrEngine` ağır işi yaparken, `AsposeAI` büyük bir dil modelini daha akıllı son‑işleme için bağlamamızı sağlar. İçe aktarmayı atlayarsanız kod derlenmez—bu yüzden unutmayın. + +## Adım 2: GPU‑Duyarlı bir Hugging Face Modeli Yapılandırın + +Şimdi Aspose’a modelin nereden çekileceğini ve kaç katmanın GPU’da çalıştırılacağını söylüyoruz. `allow_auto_download="true"` bayrağı **modelin otomatik indirilmesi** işini sizin için halleder. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Pro ipucu**: GPU’nuz yoksa `gpu_layers=0` olarak ayarlayın. Model CPU’ya geçer, daha yavaş olur ama yine de çalışır. + +### Neden Hugging Face Modeli Seçilmeli? + +Hugging Face, kullanıma hazır LLM’lerin devasa bir koleksiyonunu barındırır. `Qwen/Qwen2.5-3B-Instruct-GGUF` adresine işaret ederek, noktalama ekleyebilen, boşlukları düzeltebilen ve küçük OCR hatalarını bile onarabilen kompakt, talimat‑tuned bir model elde edersiniz. Bu, **use hugging face model** ifadesinin pratikteki özüdür. + +## Adım 3: AI Motorunu Başlatın ve Noktalama Son‑İşlemesini Etkinleştirin + +AI motoru sadece şık sohbetler için değil—burada ham OCR çıktısını temizleyen bir *noktalama ekleyici* bağlıyoruz. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Ne oluyor?* `set_post_processor` çağrısı, OCR motoru tamamlandıktan sonra çalışan yerleşik bir son‑işleyici kaydeder. Ham dizeyi alır, gerekli yerlere virgül, nokta ve büyük harf ekler, böylece son metin çok daha okunaklı hâle gelir. + +## Adım 4: OCR Motorunu Oluşturun ve AI Motorunu Bağlayın + +AI motorunu OCR motoruna bağlamak, karakterleri okuyabilen ve sonucu parlatabilen tek bir nesne elde etmemizi sağlar. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Bu adımı atlamazsanız OCR hâlâ çalışır, ancak noktalama desteğini kaybedersiniz—çıktı kelime yığını gibi görünür. + +## Adım 5: Klasördeki Her Görüntüyü İşleyin + +İşte öğreticinin kalbi. Her görüntüyü döngüye alıyor, OCR çalıştırıyor, son‑işleyiciyi uyguluyor ve temiz metni yan yana bir `.txt` dosyasına yazıyoruz. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Beklenen Sonuç + +Betik çalıştırıldığında aşağıdakine benzer bir çıktı verir: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Her satır güven puanını (hızlı bir sağlık kontrolü) gösterir ve `invoice_001.png.txt`, `receipt_2024.tif.txt` gibi dosyalar oluşturur; bu dosyalar noktalama eklenmiş, insan tarafından okunabilir metin içerir. + +### Kenar Durumları & Varyasyonlar + +- **İngilizce dışı taramalar**: `hugging_face_repo_id` değerini çok‑dilli bir modele (ör. `microsoft/Multilingual-LLM-GGUF`) değiştirin. +- **Büyük partiler**: Döngüyü `concurrent.futures.ThreadPoolExecutor` içinde çalıştırarak paralel işleme geçin, ancak GPU bellek sınırlarına dikkat edin. +- **Özel son‑işleme**: `"punctuation_adder"` ifadesini, alan‑spesifik temizlik (ör. fatura numaralarını kaldırma) ihtiyacınıza göre kendi betiğinizle değiştirin. + +## Adım 6: Kaynakları Temizleyin + +İş bittiğinde kaynakları serbest bırakmak bellek sızıntılarını önler; özellikle uzun‑ömürlü bir hizmet içinde çalışıyorsanız bu çok önemlidir. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Bu adımı atlamak GPU belleğinin hâlâ dolu kalmasına neden olur ve sonraki çalışmaları sabote eder. + +## Özet: OCR’u Baştan Sona Çalıştırma + +Sadece birkaç satır kodla, **OCR’u** bir klasör taraması üzerinde nasıl çalıştıracağınızı, **Hugging Face modelini** ilk çalıştırmada otomatik indiren ve **tarama üzerinden metin tanıma** işlemini noktalama ekleyerek nasıl otomatikleştireceğinizi gösterdik. Tam betik kopyala‑yapıştır, yol ayarlarını yap ve çalıştır hazır. + +## Sonraki Adımlar & İlgili Konular + +- **Toplu son‑işleme**: Daha hızlı toplu işlem için `ocr_engine.run_batch_postprocessor`’ı keşfedin. +- **Alternatif modeller**: OCR ile birlikte konuşmadan metne dönüşüm ihtiyacınız varsa `openai/whisper` ailesini deneyin. +- **Veritabanı entegrasyonu**: Çıkarılan metni SQLite veya Elasticsearch’te saklayarak aranabilir arşivler oluşturun. + +Denemekten çekinmeyin—modeli değiştirin, `gpu_layers` değerini ayarlayın veya kendi son‑işleyicinizi ekleyin. Aspose OCR’un esnekliği ile Hugging Face model hub’ının çeşitliliği, herhangi bir belge‑dijitalleştirme projesi için çok yönlü bir temel sağlar. + +--- + +*Kodlamanız keyifli olsun! Bir sorunla karşılaşırsanız aşağıya yorum bırakın ya da daha derin yapılandırma seçenekleri için Aspose OCR belgelerine göz atın.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/turkish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..041eb2809 --- /dev/null +++ b/ocr/turkish/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,189 @@ +--- +category: general +date: 2026-04-29 +description: Python kullanarak görüntüde OCR gerçekleştir, HuggingFace modelini otomatik + indir ve OCR metnini temizlerken GPU belleğini verimli bir şekilde serbest bırak. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: tr +og_description: Python'da görüntü üzerinde OCR nasıl yapılır, HuggingFace modelini + otomatik olarak nasıl indirilir, metni nasıl temizlenir ve GPU belleği nasıl boşaltılır + öğrenin. +og_title: Python ile Görüntüde OCR Yapma – Adım Adım Rehber +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Python ile Görüntüde OCR Yapma – Tam Kılavuz +url: /tr/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python ile Görüntü Üzerinde OCR Yapma – Tam Kılavuz + +Görüntü dosyalarında **görüntü üzerinde OCR gerçekleştirme** yapmanız gerektiğinde model indirme veya GPU bellek temizleme aşamasında takıldınız mı? Tek başınıza değilsiniz—birçok geliştirici, optik karakter tanıma ile büyük dil modellerini birleştirmeye ilk kez çalıştıklarında bu duvara çarpıyor. + +Bu öğreticide, **downloads a HuggingFace model in Python** yapan, Aspose OCR çalıştıran, ham çıktıyı temizleyen ve sonunda **releases GPU memory Python** yapan tek bir uçtan uca çözüm üzerinden geçeceğiz. Sonunda taranmış bir PNG'yi düzenli, aranabilir metne dönüştüren çalıştırmaya hazır bir betiğe sahip olacaksınız. + +> **Ne elde edeceksiniz:** tam, çalıştırılabilir kod örneği, her adımın neden önemli olduğuna dair açıklamalar, yaygın hatalardan kaçınma ipuçları ve kendi projeleriniz için pipeline'ı nasıl ayarlayabileceğinize dair bir bakış. + +## Gereksinimler + +- Python 3.9 ve üzeri (örnek 3.11 üzerinde test edilmiştir) +- `aspose-ocr` paketi (`pip install aspose-ocr` ile kurulur) +- **download HuggingFace model python** adımı için bir internet bağlantısı +- Hız artışı istiyorsanız CUDA uyumlu bir GPU (isteğe bağlı ancak önerilir) + +Ek sistem düzeyinde bağımlılıklar gerekmez; Aspose OCR motoru ihtiyacınız olan her şeyi içinde barındırır. + +![perform OCR on image example](image.png "Example of performing OCR on image with Aspose OCR and an LLM post‑processor") + +*Image alt text: “görüntü üzerinde OCR – AI temizlemeden önce ve sonra Aspose OCR çıktısı”* + +## Görüntü Üzerinde OCR – Adım Adım Genel Bakış + +Aşağıda iş akışını mantıksal bölümlere ayırıyoruz. Her bölümün kendi başlığı var, böylece AI asistanları ilgilendiğiniz bölüme hızlıca atlayabilir ve arama motorları ilgili anahtar kelimeleri indeksleyebilir. + +### 1. Python’da HuggingFace Modelini İndir + +İlk yapmamız gereken, ham OCR çıktısı için bir post‑işlemci görevi görecek bir dil modeli almaktır. Aspose OCR, HuggingFace hub'ından otomatik olarak model çekebilen `AsposeAI` adlı bir yardımcı sınıfla birlikte gelir. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Neden Önemli:** +- **download HuggingFace model python** – zip dosyalarını veya token kimlik doğrulamasını manuel olarak yönetmek zorunda kalmazsınız. +- `int8` kuantizasyonu kullanmak, modeli orijinal boyutunun yaklaşık dörtte birine küçültür; bu, daha sonra **release GPU memory python** gerektiğinde kritik öneme sahiptir. + +> **İpucu:** `directory_model_path`'i daha hızlı yükleme süreleri için bir SSD'de tutun. + +### 2. AI Yardımcısını Başlat ve Yazım Denetimini Etkinleştir + +Şimdi bir `AsposeAI` örneği oluşturup bir yazım‑düzeltici post‑işlemci ekliyoruz. İşte **clean OCR text python** sihrinin başladığı yer. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Açıklama:** +Yazım‑düzeltici, OCR motorundan gelen her token'ı inceler ve `max_edits` ile sınırlı düzenlemeler önerir. Bu küçük ayar, ağır bir dil modeli olmadan “rec0gn1tion”ı “recognition”a dönüştürebilir. + +### 3. AI Yardımcısını OCR Motoruna Bağla + +Aspose, sürüm 23.4'te AI motorunu doğrudan OCR pipeline'ına bağlamanızı sağlayan yeni bir yöntem tanıttı. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Neden Bunu Yapıyoruz:** +AI yardımcısını erken bağlayarak, OCR motoru modeli isteğe bağlı olarak anlık iyileştirmeler için (ör. düzen algılama) kullanabilir. Ayrıca kodu düzenli tutar—sonradan ayrı post‑işlem döngülerine ihtiyaç kalmaz. + +### 4. Taranmış Görüntüde OCR Yap + +İşte **görüntü üzerinde OCR gerçekleştirme** dosyalarında gerçek adım. `YOUR_DIRECTORY/input.png` ifadesini kendi taramanızın yolu ile değiştirin. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Tipik ham çıktı, garip yerlerde satır sonları, hatalı tanınan karakterler veya gereksiz semboller içerebilir. Bu yüzden bir sonraki adıma ihtiyacımız var. + +**Beklenen ham çıktı (örnek):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +### 5. AI Post‑İşlemcisi ile Python’da OCR Metnini Temizle + +Şimdi AI'ye karışıklığı temizletiyoruz. Bu, **clean OCR text python** sürecinin kalbidir. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Görürsünüz Sonuç:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Yazım‑düzelticinin “Th1s” → “This” hatasını düzelttiğine ve gereksiz “4n”i sildiğine dikkat edin. Model ayrıca boşlukları normalleştirir; bu, metni sonraki NLP pipeline'larına beslerken sıkça sorun yaratır. + +### 6. Python’da GPU Belleğini Serbest Bırak – Temizleme Adımları + +İşiniz bittiğinde, özellikle uzun süreli bir hizmette birden fazla OCR işi çalıştırıyorsanız GPU kaynaklarını serbest bırakmak iyi bir uygulamadır. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Arka planda ne olur:** +`free_resources()` modeli GPU'dan kaldırır, belleği CUDA sürücüsüne geri verir. `dispose()` OCR motorunun iç tamponlarını kapatır. Bu çağrıları atlamak, sadece birkaç görüntüden sonra bellek yetersizliği hatalarına yol açabilir. + +> **Unutmayın:** Bir döngüde toplu işlem yapmayı planlıyorsanız, her batch'ten sonra temizleme işlemini çağırın veya `ai_helper`'ı sonuna kadar serbest bırakmadan yeniden kullanın. + +## Bonus: Farklı Senaryolar İçin Pipeline’ı Ayarlama + +### Model Kuantizasyonunu Ayarlama + +Güçlü bir GPU'nuz (ör. RTX 4090) varsa ve daha yüksek doğruluk istiyorsanız, `hugging_face_quantization` değerini `"fp16"` olarak değiştirin ve `gpu_layers`'ı `30`'a yükseltin. Bu daha fazla bellek tüketir, bu yüzden her batch'ten sonra **release GPU memory python** daha agresif bir şekilde yapmanız gerekir. + +### Özel Bir Yazım‑Denetleyici Kullanma + +Yerleşik `spell_corrector`'ı, alan‑spesifik düzeltmeler yapan (ör. tıbbi terminoloji) özel bir post‑işlemci ile değiştirebilirsiniz. Gerekli arayüzü uygulayın ve adını `set_post_processor`'a iletin. + +### Birden Çok Görüntüyü Toplu İşleme + +OCR adımlarını bir `for` döngüsü içinde sarın, `cleaned_result.text`'i bir listeye toplayın ve yeterli GPU RAM'ınız varsa döngüden sonra sadece `ai_helper.free_resources()`'ı çağırın. Bu, modeli tekrar tekrar yüklemenin getirdiği yükü azaltır. + +## Sonuç + +Python’da **görüntü üzerinde OCR gerçekleştirme** dosyalarını nasıl **download a HuggingFace model** otomatik olarak **clean OCR text** yapıp, işiniz bittiğinde güvenli bir şekilde **release GPU memory** serbest bırakacağınızı gösterdik. Tam script kopyala‑yapıştır için hazır ve açıklamalar, onu daha büyük projelere uyarlama konusunda size güven verir. + +Sonraki adımlar? Qwen 2.5 modelini daha büyük bir LLaMA varyantıyla değiştirmeyi deneyin, farklı post‑işlemcilerle deney yapın veya temizlenmiş çıktıyı aranabilir bir Elasticsearch indeksine entegre edin. Olanaklar sonsuz ve artık üzerine inşa edebileceğiniz sağlam bir temele sahipsiniz. + +Kodlamaktan keyif alın, ve OCR pipeline'larınız her zaman temiz ve bellek dostu olsun! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md b/ocr/vietnamese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md new file mode 100644 index 000000000..fcd3cd662 --- /dev/null +++ b/ocr/vietnamese/python/general/extract-text-from-pdf-ocr-pdf-with-python/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-04-29 +description: Trích xuất văn bản từ PDF bằng Aspose OCR trong Python. Tìm hiểu xử lý + OCR PDF hàng loạt, chuyển đổi văn bản PDF đã quét và xử lý các trang có độ tin cậy + thấp. +draft: false +keywords: +- extract text from pdf +- ocr pdf with python +- convert scanned pdf text +- batch ocr pdf processing +language: vi +og_description: Trích xuất văn bản từ PDF bằng Aspose OCR trong Python. Hướng dẫn + này trình bày xử lý OCR PDF hàng loạt, chuyển đổi văn bản PDF đã quét và xử lý kết + quả có độ tin cậy thấp. +og_title: Trích xuất văn bản từ PDF – OCR PDF bằng Python +tags: +- OCR +- Python +- PDF processing +title: Trích xuất văn bản từ PDF – OCR PDF bằng Python +url: /vi/python/general/extract-text-from-pdf-ocr-pdf-with-python/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Trích xuất văn bản từ PDF – OCR PDF với Python + +Bạn đã bao giờ cần **trích xuất văn bản từ PDF** nhưng tệp chỉ là hình ảnh đã được quét? Bạn không đơn độc—nhiều nhà phát triển gặp phải rào cản này khi muốn chuyển PDF thành dữ liệu có thể tìm kiếm. Tin tốt là gì? Với Aspose OCR cho Python, bạn có thể chuyển đổi văn bản PDF đã quét chỉ trong vài dòng code, và thậm chí thực hiện **xử lý OCR PDF hàng loạt** khi có hàng chục tệp cần xử lý. + +Trong hướng dẫn này, chúng ta sẽ đi qua toàn bộ quy trình: cài đặt thư viện, chạy OCR trên một PDF đơn, mở rộng thành batch, và xử lý các trang có độ tin cậy thấp để bạn biết khi nào cần kiểm tra thủ công. Khi kết thúc, bạn sẽ có một script sẵn sàng chạy để trích xuất văn bản từ bất kỳ PDF đã quét nào, và bạn sẽ hiểu lý do đằng sau mỗi bước. + +## Những gì bạn cần + +Trước khi bắt đầu, hãy chắc chắn rằng bạn có: + +- Python 3.8 hoặc mới hơn (code sử dụng f‑strings, vì vậy 3.6+ cũng hoạt động, nhưng khuyến nghị 3.8+) +- Giấy phép Aspose OCR cho Python hoặc khóa dùng thử miễn phí (bạn có thể lấy từ trang web Aspose) +- Một thư mục chứa một hoặc nhiều PDF đã quét mà bạn muốn xử lý +- Một lượng không gian đĩa vừa đủ cho các báo cáo *.txt* được tạo ra + +Đó là tất cả—không cần phụ thuộc bên ngoài nặng, không cần OpenCV. Engine OCR của Aspose sẽ thực hiện phần việc nặng cho bạn. + +## Cài đặt môi trường + +Đầu tiên, cài đặt gói Aspose OCR từ PyPI: + +```bash +pip install aspose-ocr +``` + +Nếu bạn có tệp giấy phép (`Aspose.OCR.lic`), đặt nó vào thư mục gốc dự án và kích hoạt như sau: + +```python +# Activate Aspose OCR license (optional but removes trial watermark) +from aspose.ocr import License + +license = License() +license.set_license("Aspose.OCR.lic") +``` + +> **Mẹo chuyên nghiệp:** Giữ tệp giấy phép ra khỏi hệ thống kiểm soát phiên bản; thêm nó vào `.gitignore` để tránh lộ ngoài ý muốn. + +## Thực hiện OCR trên một PDF đơn + +Bây giờ hãy trích xuất văn bản từ một PDF đã quét duy nhất. Các bước chính là: + +1. Tạo một thể hiện `OcrEngine`. +2. Chỉ định tệp PDF cho nó. +3. Lấy một `OcrResult` cho mỗi trang. +4. Ghi kết quả văn bản thuần vào đĩa. +5. Giải phóng engine để giải phóng tài nguyên gốc. + +Dưới đây là script đầy đủ, có thể chạy ngay: + +```python +# Step 1: Import the OCR engine and create an instance +from aspose.ocr import OcrEngine + +# Optional: activate license (see previous section) +# from aspose.ocr import License +# License().set_license("Aspose.OCR.lic") + +ocr_engine = OcrEngine() + +# Step 2: Specify the PDF file to be processed +pdf_file_path = r"YOUR_DIRECTORY/input.pdf" + +# Step 3: Extract OCR results – one OcrResult object per page +ocr_pages = ocr_engine.extract_from_pdf(pdf_file_path) + +# Step 4: Iterate through the results, show confidence, flag low‑confidence pages, and save the plain text +for ocr_page in ocr_pages: + print(f"Page {ocr_page.page_number}: confidence {ocr_page.confidence:.2%}") + + # If confidence is below 80 %, flag it for manual review + if ocr_page.confidence < 0.80: + print(" Low confidence – consider manual review") + + # Build the output filename + output_txt = f"YOUR_DIRECTORY/Report_page{ocr_page.page_number}.txt" + with open(output_txt, "w", encoding="utf-8") as f: + f.write(ocr_page.text) + +# Step 5: Release resources used by the engine +ocr_engine.dispose() +``` + +**Bạn sẽ thấy gì:** Đối với mỗi trang, script sẽ in ra một dòng như `Page 1: confidence 97.45%`. Nếu một trang có độ tin cậy dưới ngưỡng 80 %, một cảnh báo sẽ xuất hiện, cho bạn biết OCR có thể đã bỏ sót ký tự. + +### Tại sao cách này hoạt động + +- **`OcrEngine`** là cổng vào thư viện OCR gốc của Aspose; nó xử lý mọi thứ từ tiền xử lý hình ảnh đến nhận dạng ký tự. +- **`extract_from_pdf`** tự động raster hoá mỗi trang PDF, vì vậy bạn không cần tự chuyển PDF sang hình ảnh. +- **Điểm tin cậy** cho phép bạn tự động kiểm tra chất lượng—rất quan trọng khi xử lý tài liệu pháp lý hoặc y tế, nơi độ chính xác là yếu tố then chốt. + +## Xử lý OCR PDF hàng loạt với Python + +Hầu hết các dự án thực tế liên quan đến hơn một tệp. Hãy mở rộng script đơn thành một **pipeline xử lý OCR PDF hàng loạt** đi qua một thư mục, xử lý mỗi PDF và lưu kết quả vào một thư mục con tương ứng. + +```python +import os +from pathlib import Path +from aspose.ocr import OcrEngine + +def ocr_pdf_file(pdf_path: Path, output_dir: Path, engine: OcrEngine, confidence_threshold: float = 0.80): + """Extract text from a single PDF and write per‑page .txt files.""" + ocr_pages = engine.extract_from_pdf(str(pdf_path)) + for page in ocr_pages: + print(f"{pdf_path.name} – Page {page.page_number}: {page.confidence:.2%}") + if page.confidence < confidence_threshold: + print(" ⚠️ Low confidence – manual review may be needed") + + out_file = output_dir / f"{pdf_path.stem}_page{page.page_number}.txt" + out_file.write_text(page.text, encoding="utf-8") + +def batch_ocr_pdf(input_folder: str, output_folder: str): + """Iterate over all PDFs in input_folder and run OCR.""" + engine = OcrEngine() + input_path = Path(input_folder) + output_path = Path(output_folder) + output_path.mkdir(parents=True, exist_ok=True) + + pdf_files = list(input_path.glob("*.pdf")) + if not pdf_files: + print("🚫 No PDF files found in the specified folder.") + return + + for pdf_file in pdf_files: + # Create a sub‑folder for each PDF to keep pages organized + pdf_out_dir = output_path / pdf_file.stem + pdf_out_dir.mkdir(exist_ok=True) + ocr_pdf_file(pdf_file, pdf_out_dir, engine) + + # Clean up native resources + engine.dispose() + print("✅ Batch OCR completed.") + +# Example usage: +if __name__ == "__main__": + batch_ocr_pdf("YOUR_DIRECTORY/input_pdfs", "YOUR_DIRECTORY/ocr_output") +``` + +#### Lợi ích của cách này + +- **Khả năng mở rộng:** Hàm duyệt thư mục một lần, tạo thư mục đầu ra riêng cho mỗi PDF. Giúp giữ mọi thứ gọn gàng khi bạn có hàng chục tài liệu. +- **Tái sử dụng:** `ocr_pdf_file` có thể được gọi từ các script khác (ví dụ, một dịch vụ web) vì nó là một hàm thuần. +- **Xử lý lỗi:** Script in ra thông báo thân thiện nếu thư mục đầu vào rỗng, tránh việc thất bại im lặng. + +## Chuyển đổi văn bản PDF đã quét – Xử lý các trường hợp đặc biệt + +Mặc dù code trên hoạt động với hầu hết các PDF, bạn có thể gặp một số tình huống đặc biệt: + +| Tình huống | Nguyên nhân | Cách khắc phục | +|-----------|-------------|----------------| +| **PDF được mã hoá** | PDF được bảo vệ bằng mật khẩu. | Truyền mật khẩu vào `extract_from_pdf(pdf_path, password="yourPwd")`. | +| **Tài liệu đa ngôn ngữ** | Aspose OCR mặc định là tiếng Anh. | Đặt `ocr_engine.language = "spa"` cho tiếng Tây Ban Nha, hoặc cung cấp danh sách cho các ngôn ngữ hỗn hợp. | +| **PDF rất lớn (>500 trang)** | Tiêu thụ bộ nhớ tăng vì mỗi trang được tải vào RAM. | Xử lý PDF theo từng khối bằng `engine.extract_from_pdf(pdf_path, start_page=1, end_page=100)` và lặp lại. | +| **Chất lượng quét kém** | DPI thấp hoặc nhiễu mạnh làm giảm độ tin cậy. | Tiền xử lý PDF với `engine.image_preprocessing = True` hoặc tăng DPI bằng `engine.dpi = 300`. | + +> **Lưu ý:** Bật tiền xử lý hình ảnh có thể làm tăng thời gian CPU đáng kể. Nếu bạn chạy batch vào ban đêm, hãy lên lịch đủ thời gian hoặc khởi động một worker riêng. + +## Kiểm tra kết quả + +Sau khi script hoàn thành, bạn sẽ thấy cấu trúc thư mục như sau: + +``` +ocr_output/ +├─ invoice_2023/ +│ ├─ invoice_2023_page1.txt +│ ├─ invoice_2023_page2.txt +│ └─ … +└─ contract_A/ + ├─ contract_A_page1.txt + └─ … +``` + +Mở bất kỳ tệp `.txt` nào; bạn sẽ thấy văn bản sạch, mã hoá UTF‑8, phản ánh nội dung đã quét gốc. Nếu gặp ký tự lộn xộn, hãy kiểm tra lại cài đặt ngôn ngữ của PDF và đảm bảo các gói phông chữ phù hợp đã được cài trên máy. + +## Dọn dẹp tài nguyên + +Aspose OCR dựa vào các DLL gốc, vì vậy cần gọi `engine.dispose()` một khi công việc xong. Bỏ qua bước này có thể gây rò rỉ bộ nhớ, đặc biệt trong các job batch chạy lâu. + +```python +# Always the last line of your script +engine.dispose() +``` + +## Ví dụ hoàn chỉnh từ đầu đến cuối + +Kết hợp mọi thứ lại, dưới đây là một script duy nhất + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md b/ocr/vietnamese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md new file mode 100644 index 000000000..e8b96ab49 --- /dev/null +++ b/ocr/vietnamese/python/general/how-to-recognize-handwriting-in-python-full-tutorial/_index.md @@ -0,0 +1,278 @@ +--- +category: general +date: 2026-04-29 +description: Tìm hiểu cách nhận dạng chữ viết tay trong Python với Aspose OCR. Hướng + dẫn từng bước này cho thấy cách trích xuất văn bản viết tay một cách hiệu quả. +draft: false +keywords: +- how to recognize handwriting +- extract handwritten text +- handwritten text recognition python +- handwritten ocr tutorial python +language: vi +og_description: Làm thế nào để nhận dạng chữ viết tay trong Python? Theo dõi hướng + dẫn đầy đủ này để trích xuất văn bản viết tay bằng Aspose OCR, kèm mã nguồn, mẹo + và xử lý các trường hợp đặc biệt. +og_title: Cách Nhận Dạng Chữ Viết Tay trong Python – Hướng Dẫn Đầy Đủ +tags: +- OCR +- Python +- HandwritingRecognition +title: Cách Nhận Dạng Chữ Viết Tay trong Python – Hướng Dẫn Toàn Diện +url: /vi/python/general/how-to-recognize-handwriting-in-python-full-tutorial/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cách Nhận Diện Văn Bản Viết Tay trong Python – Hướng Dẫn Đầy Đủ + +Bạn đã bao giờ cần **cách nhận diện văn bản viết tay** trong một dự án Python nhưng không biết bắt đầu từ đâu? Bạn không đơn độc—các nhà phát triển thường hỏi: “Liệu tôi có thể trích xuất văn bản từ một ghi chú đã quét không?” Tin tốt là các thư viện OCR hiện đại đã biến việc này thành chuyện đơn giản. Trong hướng dẫn này, chúng ta sẽ đi qua **cách nhận diện văn bản viết tay** bằng Aspose OCR, và bạn cũng sẽ học cách **trích xuất văn bản viết tay** một cách đáng tin cậy. + +Chúng ta sẽ bao phủ mọi thứ từ cài đặt thư viện đến việc điều chỉnh ngưỡng độ tin cậy cho những nét chữ xoáy lộn. Khi hoàn thành, bạn sẽ có một script có thể chạy được, in ra văn bản đã trích xuất và điểm độ tin cậy tổng thể—hoàn hảo cho các ứng dụng ghi chú, công cụ lưu trữ, hoặc chỉ đơn giản là thỏa mãn sự tò mò. Không cần kinh nghiệm OCR trước; chỉ cần kiến thức cơ bản về Python là đủ. + +--- + +## Những Gì Bạn Cần Chuẩn Bị + +- **Python 3.9+** (phiên bản ổn định mới nhất hoạt động tốt nhất) +- **Aspose.OCR for Python via .NET** – cài đặt bằng `pip install aspose-ocr` +- Một **hình ảnh viết tay** (JPEG/PNG) mà bạn muốn xử lý +- Tùy chọn: môi trường ảo để giữ các phụ thuộc gọn gàng + +Nếu bạn đã có những mục trên, hãy bắt đầu. + +![Ví dụ cách nhận diện văn bản viết tay](/images/handwritten-sample.jpg "Ví dụ cách nhận diện văn bản viết tay") + +*(Văn bản thay thế: “ví dụ cách nhận diện văn bản viết tay hiển thị một ghi chú viết tay đã quét”)* + +--- + +## Bước 1 – Cài Đặt và Nhập Các Lớp Aspose OCR + +Đầu tiên, chúng ta cần chính engine OCR. Aspose cung cấp một API sạch sẽ, tách biệt việc nhận diện văn bản in ra khỏi chế độ viết tay. + +```python +# Install the package (run this in your terminal if you haven't already) +# pip install aspose-ocr + +# Import the required OCR classes +from aspose.ocr import OcrEngine, HandwritingMode +``` + +*Lý do quan trọng:* Nhập `HandwritingMode` cho phép chúng ta thông báo cho engine rằng chúng ta đang thực hiện **handwritten text recognition python** thay vì văn bản in, điều này cải thiện đáng kể độ chính xác cho các nét chữ xoáy. + +--- + +## Bước 2 – Tạo và Cấu Hình Engine OCR + +Bây giờ chúng ta khởi tạo một thể hiện `OcrEngine` và chuyển nó sang chế độ viết tay. Bạn cũng có thể điều chỉnh ngưỡng độ tin cậy; giá trị thấp hơn chấp nhận chữ viết lắc lư, giá trị cao hơn yêu cầu đầu vào sạch hơn. + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = OcrEngine() +ocr_engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + +# Optional: tighten the confidence threshold for cursive scripts +# Default is 0.5; 0.65 works well for most notebooks +ocr_engine.set_handwriting_confidence_threshold(0.65) +``` + +*Mẹo chuyên nghiệp:* Nếu ghi chú của bạn được quét ở 300 DPI hoặc cao hơn, thường sẽ nhận được điểm cao hơn. Đối với ảnh độ phân giải thấp, hãy cân nhắc tăng kích thước bằng Pillow trước khi đưa vào engine. + +--- + +## Bước 3 – Chuẩn Bị Đường Dẫn Ảnh + +Đảm bảo đường dẫn file trỏ tới hình ảnh bạn muốn xử lý. Đường dẫn tương đối hoạt động tốt, nhưng đường dẫn tuyệt đối tránh được các lỗi “file not found”. + +```python +# Step 3: Point to your handwritten image +input_image_path = "YOUR_DIRECTORY/handwritten_sample.jpg" +``` + +*Nhầm lẫn thường gặp:* Quên escape dấu gạch chéo ngược trên Windows (`C:\\folder\\image.jpg`). Sử dụng raw string (`r"C:\folder\image.jpg"`) sẽ giải quyết vấn đề này. + +--- + +## Bước 4 – Chạy Nhận Diện và Ghi Nhận Kết Quả + +Phương thức `recognize` thực hiện công việc nặng. Nó trả về một đối tượng có các thuộc tính `.text` và `.confidence`. + +```python +# Step 4: Run OCR and get the result +result = ocr_engine.recognize(input_image_path) + +# Display the extracted text and confidence +print("Hand‑written extraction:") +print(result.text) +print("Overall confidence:", result.confidence) +``` + +**Kết quả mong đợi (ví dụ):** + +``` +Hand‑written extraction: +Meeting notes: +- Discuss quarterly goals +- Review budget allocations +- Assign action items +Overall confidence: 0.78 +``` + +Nếu độ tin cậy giảm xuống dưới 0.5, bạn có thể cần làm sạch ảnh (loại bỏ bóng, tăng độ tương phản) hoặc hạ ngưỡng trong Bước 2. + +--- + +## Bước 5 – Dọn Dẹp Tài Nguyên + +Aspose OCR giữ các tài nguyên gốc; gọi `dispose()` giải phóng chúng và ngăn ngừa rò rỉ bộ nhớ, đặc biệt khi xử lý nhiều ảnh trong một vòng lặp. + +```python +# Step 5: Release the engine resources +ocr_engine.dispose() +``` + +*Tại sao phải dispose?* Trong các dịch vụ chạy lâu (ví dụ, một API Flask nhận tải lên), quên giải phóng tài nguyên có thể nhanh chóng làm cạn kiệt bộ nhớ hệ thống. + +--- + +## Script Đầy Đủ – Chạy Một Lần + +Kết hợp mọi thứ lại, đây là một script tự chứa mà bạn có thể sao chép‑dán và thực thi. + +```python +# -*- coding: utf-8 -*- +""" +Handwritten OCR Tutorial – how to recognize handwriting in Python +Author: Your Name +Date: 2026-04-29 +""" + +# Install the library first if you haven't already: +# pip install aspose-ocr + +from aspose.ocr import OcrEngine, HandwritingMode + +def recognize_handwriting(image_path: str, confidence_threshold: float = 0.65): + """ + Recognizes handwritten text from an image using Aspose OCR. + + Args: + image_path (str): Path to the handwritten image file. + confidence_threshold (float): Minimum confidence to accept a word. + + Returns: + tuple: (extracted_text (str), overall_confidence (float)) + """ + # Initialize engine in handwritten mode + engine = OcrEngine() + engine.set_recognition_mode(HandwritingMode.HANDWRITTEN) + engine.set_handwriting_confidence_threshold(confidence_threshold) + + # Perform recognition + result = engine.recognize(image_path) + + # Capture output + extracted = result.text + confidence = result.confidence + + # Clean up + engine.dispose() + return extracted, confidence + +if __name__ == "__main__": + # Replace with your actual file location + img_path = "YOUR_DIRECTORY/handwritten_sample.jpg" + + text, conf = recognize_handwriting(img_path) + + print("Hand‑written extraction:") + print(text) + print("Overall confidence:", conf) +``` + +Lưu lại dưới tên `handwritten_ocr.py` và chạy `python handwritten_ocr.py`. Nếu mọi thứ đã được thiết lập đúng, bạn sẽ thấy văn bản đã trích xuất được in ra console. + +--- + +## Xử Lý Các Trường Hợp Cạnh và Các Biến Thể Thông Thường + +### Ảnh Độ Tương Phản Thấp +Nếu nền lẫn vào mực, hãy tăng độ tương phản trước: + +```python +from PIL import Image, ImageEnhance + +img = Image.open(input_image_path) +enhancer = ImageEnhance.Contrast(img) +high_contrast = enhancer.enhance(2.0) # 2× contrast +high_contrast.save("temp_high_contrast.jpg") +result = ocr_engine.recognize("temp_high_contrast.jpg") +``` + +### Ghi Chú Bị Xoay +Một trang sổ tay nghiêng có thể làm giảm độ nhận diện. Dùng Pillow để chỉnh góc: + +```python +from PIL import Image +import numpy as np +import cv2 + +def deskew(image_path): + img = cv2.imread(image_path, 0) + coords = np.column_stack(np.where(img > 0)) + angle = cv2.minAreaRect(coords)[-1] + if angle < -45: + angle = -(90 + angle) + else: + angle = -angle + (h, w) = img.shape[:2] + M = cv2.getRotationMatrix2D((w // 2, h // 2), angle, 1.0) + corrected = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) + cv2.imwrite("deskewed.jpg", corrected) + +deskew(input_image_path) +result = ocr_engine.recognize("deskewed.jpg") +``` + +### PDF Đa Trang +Aspose OCR cũng có thể xử lý các trang PDF, nhưng bạn cần chuyển mỗi trang thành ảnh trước (ví dụ, dùng `pdf2image`). Sau đó lặp qua các ảnh bằng cùng một hàm `recognize_handwriting`. + +--- + +## Mẹo Chuyên Nghiệp Để Có Kết Quả **Extract Handwritten Text** Tốt Hơn + +- **DPI quan trọng:** Nhắm tới 300 DPI hoặc cao hơn khi quét. +- **Tránh nền màu:** Nền trắng thuần hoặc xám nhạt cho kết quả sạch nhất. +- **Xử lý hàng loạt:** Đặt hàm trong một vòng `for` và ghi lại độ tin cậy của mỗi trang; loại bỏ các kết quả dưới ngưỡng để duy trì chất lượng. +- **Hỗ trợ ngôn ngữ:** Aspose OCR hỗ trợ nhiều ngôn ngữ; đặt `engine.set_language("en")` để tối ưu cho tiếng Anh. + +--- + +## Câu Hỏi Thường Gặp + +**Có hoạt động trên Linux không?** +Có—Aspose OCR đi kèm các binary gốc cho Windows, macOS và Linux. Chỉ cần cài đặt gói pip và bạn đã sẵn sàng. + +**Nếu chữ viết tay của tôi cực kỳ xoáy thì sao?** +Hãy thử hạ ngưỡng độ tin cậy (`0.5` hoặc thậm chí `0.4`). Lưu ý rằng điều này có thể tạo ra nhiều nhiễu, vì vậy hãy xử lý hậu kỳ (ví dụ, kiểm tra chính tả) nếu cần. + +**Có thể dùng trong dịch vụ web không?** +Chắc chắn. Hàm `recognize_handwriting` không giữ trạng thái, rất phù hợp cho các endpoint Flask hoặc FastAPI. Chỉ cần nhớ gọi `dispose()` sau mỗi yêu cầu hoặc dùng context manager. + +--- + +## Kết Luận + +Chúng ta đã đi qua **cách nhận diện văn bản viết tay** trong Python từ đầu đến cuối, cho bạn biết cách **trích xuất văn bản viết tay**, điều chỉnh cài đặt độ tin cậy, và xử lý các vấn đề thường gặp như độ tương phản thấp hoặc trang bị xoay. Script hoàn chỉnh ở trên đã sẵn sàng chạy, và hàm mô-đun giúp bạn dễ dàng tích hợp vào các dự án lớn hơn—dù bạn đang xây dựng ứng dụng ghi chú, số hoá tài liệu, hay chỉ đơn giản là thử nghiệm các kỹ thuật **handwritten ocr tutorial python**. + +Tiếp theo, bạn có thể khám phá **handwritten text recognition python** cho các ghi chú đa ngôn ngữ, hoặc kết hợp OCR với xử lý ngôn ngữ tự nhiên để tự động tóm tắt biên bản họp. Không gì là không thể—hãy thử và để mã của bạn thổi hồn vào những nét bút. + +Chúc lập trình vui vẻ, và đừng ngại để lại câu hỏi trong phần bình luận! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md b/ocr/vietnamese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md new file mode 100644 index 000000000..2d133d85c --- /dev/null +++ b/ocr/vietnamese/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/_index.md @@ -0,0 +1,180 @@ +--- +category: general +date: 2026-04-29 +description: Tìm hiểu cách chạy OCR trên các bản quét của bạn, sử dụng mô hình Hugging + Face tự động và nhận dạng văn bản từ các bản quét bằng Aspose OCR trong vài phút. +draft: false +keywords: +- how to run OCR +- use hugging face model +- recognize text from scans +- download model automatically +language: vi +og_description: Cách chạy OCR trên các bản quét bằng Aspose OCR, tự động tải xuống + mô hình Hugging Face và nhận văn bản sạch, có dấu câu. +og_title: Cách chạy OCR với Aspose & Hugging Face – Hướng dẫn đầy đủ +tags: +- OCR +- Aspose +- Hugging Face +- Python +title: Cách chạy OCR với Aspose & Hugging Face – Hướng dẫn đầy đủ +url: /vi/python/general/how-to-run-ocr-with-aspose-hugging-face-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hướng Dẫn Chạy OCR với Aspose & Hugging Face – Toàn Bộ Quy Trình + +Bạn đã bao giờ tự hỏi **cách chạy OCR** trên một đống tài liệu đã quét mà không phải tốn hàng giờ chỉnh sửa cài đặt chưa? Bạn không phải là người duy nhất. Trong nhiều dự án, các nhà phát triển cần **nhận dạng văn bản từ ảnh quét** nhanh chóng, nhưng lại gặp khó khăn với việc tải mô hình và xử lý hậu kỳ. + +Tin tốt: hướng dẫn này sẽ cho bạn một giải pháp đã sẵn sàng chạy, **sử dụng mô hình Hugging Face**, tự động tải về và thêm dấu câu để kết quả trông như được con người viết. Khi kết thúc, bạn sẽ có một script xử lý mọi hình ảnh trong một thư mục và tạo file `.txt` sạch sẽ bên cạnh mỗi ảnh quét. + +## Những Gì Bạn Cần Chuẩn Bị + +- Python 3.8+ (code dùng f‑strings, các phiên bản cũ hơn sẽ không hoạt động) +- Gói `aspose-ocr` (cài đặt bằng `pip install aspose-ocr`) +- Kết nối Internet để tải mô hình lần đầu tiên +- Một thư mục chứa các ảnh quét (`.png`, `.jpg`, hoặc `.tif`) + +Đó là tất cả—không cần binary bổ sung, không cần can thiệp mô hình thủ công. Hãy bắt đầu. + +![how to run OCR example](https://example.com/ocr-demo.png "hướng dẫn chạy OCR") + +## Bước 1: Nhập Các Lớp Aspose OCR & Thiết Lập Môi Trường + +Chúng ta bắt đầu bằng cách kéo các lớp cần thiết từ thư viện Aspose OCR. Nhập toàn bộ ở đầu giúp script gọn gàng và dễ phát hiện các phụ thuộc còn thiếu. + +```python +# Step 1: Import Aspose OCR classes +import os +from aspose.ocr import OcrEngine, AsposeAI, AsposeAIModelConfig +``` + +*Lý do quan trọng*: `OcrEngine` thực hiện công việc nặng, trong khi `AsposeAI` cho phép chúng ta gắn một mô hình ngôn ngữ lớn để xử lý hậu kỳ thông minh hơn. Nếu bỏ qua việc nhập, phần còn lại của code sẽ không biên dịch—vì vậy đừng quên. + +## Bước 2: Cấu Hình Mô Hình Hugging Face Có Hỗ Trợ GPU + +Bây giờ chúng ta chỉ định cho Aspose nơi tải mô hình và số lớp sẽ chạy trên GPU. Cờ `allow_auto_download="true"` thực hiện việc **tự động tải mô hình** cho bạn. + +```python +# Step 2: Configure a GPU‑aware AI model (replace with your own model folder) +model_config = AsposeAIModelConfig( + allow_auto_download="true", + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", + gpu_layers=40, # use GPU for faster inference + directory_model_path=r"YOUR_DIRECTORY/models" +) +``` + +> **Mẹo chuyên nghiệp**: Nếu bạn không có GPU, đặt `gpu_layers=0`. Mô hình sẽ chuyển sang CPU, chậm hơn nhưng vẫn hoạt động. + +### Tại Sao Chọn Mô Hình Hugging Face? + +Hugging Face lưu trữ một bộ sưu tập khổng lồ các LLM đã sẵn sàng dùng. Khi trỏ tới `Qwen/Qwen2.5-3B-Instruct-GGUF`, bạn nhận được một mô hình gọn nhẹ, được tinh chỉnh để thực hiện lệnh, có khả năng thêm dấu câu, chỉnh sửa khoảng cách và thậm chí sửa các lỗi OCR nhỏ. Đây chính là bản chất của **use hugging face model** trong thực tế. + +## Bước 3: Khởi Tạo AI Engine và Bật Xử Lý Hậu Kỳ Dấu Câu + +AI engine không chỉ để chat—ở đây chúng ta gắn một *punctuation adder* để làm sạch đầu ra OCR thô. + +```python +# Step 3: Initialise the AI engine and enable punctuation post‑processing +ai_engine = AsposeAI() +ai_engine.set_post_processor("punctuation_adder", {}) +``` + +*Điều gì đang diễn ra?* Lệnh `set_post_processor` đăng ký một bộ xử lý hậu kỳ tích hợp, chạy sau khi engine OCR hoàn thành. Nó nhận chuỗi thô và chèn dấu phẩy, dấu chấm và chữ hoa ở những vị trí thích hợp, làm cho văn bản cuối cùng dễ đọc hơn rất nhiều. + +## Bước 4: Tạo OCR Engine và Gắn AI Engine + +Kết nối AI engine với OCR engine cho chúng ta một đối tượng duy nhất có thể đọc ký tự và tinh chỉnh kết quả. + +```python +# Step 4: Create the OCR engine and attach the AI engine +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_engine) +``` + +Nếu bỏ qua bước này, OCR vẫn hoạt động, nhưng bạn sẽ mất phần tăng cường dấu câu—kết quả sẽ giống như một dải các từ liền nhau. + +## Bước 5: Xử Lý Mọi Ảnh Trong Thư Mục + +Đây là phần cốt lõi của hướng dẫn. Chúng ta lặp qua từng ảnh, chạy OCR, áp dụng bộ xử lý hậu kỳ, và ghi văn bản đã làm sạch vào file `.txt` bên cạnh. + +```python +# Step 5: Run OCR on each image in a folder, post‑process the result, and save the text +scans_folder = r"YOUR_DIRECTORY/scans" +for image_file in os.listdir(scans_folder): + # Filter only supported image types + if not image_file.lower().endswith(('.png', '.jpg', '.tif')): + continue + + image_path = os.path.join(scans_folder, image_file) + + # Recognise text from the image + ocr_result = ocr_engine.recognize(image_path) + + # Apply the punctuation post‑processor + ocr_result = ocr_engine.run_postprocessor(ocr_result) + + # Show a brief confidence summary + print(f"{image_file} – confidence {ocr_result.confidence:.2%}") + + # Save the cleaned text next to the source image + txt_path = image_path + ".txt" + with open(txt_path, "w", encoding="utf-8") as txt_file: + txt_file.write(ocr_result.text) +``` + +### Những Gì Bạn Có Thể Mong Đợi + +Chạy script sẽ in ra thứ gì đó như: + +``` +invoice_001.png – confidence 96.73% +receipt_2024.tif – confidence 94.12% +``` + +Mỗi dòng cho bạn biết điểm confidence (kiểm tra nhanh sức khỏe) và tạo các file `invoice_001.png.txt`, `receipt_2024.tif.txt`, v.v., chứa văn bản có dấu câu, dễ đọc cho con người. + +### Các Trường Hợp Đặc Biệt & Biến Thể + +- **Ảnh không phải tiếng Anh**: Đổi `hugging_face_repo_id` sang mô hình đa ngôn ngữ (ví dụ `microsoft/Multilingual-LLM-GGUF`). +- **Lô lớn**: Bao bọc vòng lặp trong `concurrent.futures.ThreadPoolExecutor` để xử lý song song, nhưng cần chú ý tới giới hạn bộ nhớ GPU. +- **Xử lý hậu kỳ tùy chỉnh**: Thay `"punctuation_adder"` bằng script của bạn nếu cần làm sạch đặc thù (ví dụ, loại bỏ số hóa đơn). + +## Bước 6: Dọn Dẹp Tài Nguyên + +Khi công việc kết thúc, giải phóng tài nguyên ngăn ngừa rò rỉ bộ nhớ, đặc biệt quan trọng nếu bạn chạy script trong một dịch vụ lâu dài. + +```python +# Step 6: Release resources +ai_engine.free_resources() +ocr_engine.dispose() +``` + +Bỏ qua bước này có thể để lại bộ nhớ GPU bị kẹt, gây ảnh hưởng tới các lần chạy tiếp theo. + +## Tóm Tắt: Cách Chạy OCR Từ Đầu Đến Cuối + +Chỉ trong vài dòng lệnh, chúng ta đã minh họa **cách chạy OCR** trên một thư mục các ảnh quét, **sử dụng mô hình Hugging Face** tự tải lần đầu, và **nhận dạng văn bản từ ảnh quét** với dấu câu được tự động thêm. Script hoàn chỉnh sẵn sàng sao chép, điều chỉnh đường dẫn và thực thi. + +## Các Bước Tiếp Theo & Chủ Đề Liên Quan + +- **Xử lý hậu kỳ hàng loạt**: Khám phá `ocr_engine.run_batch_postprocessor` để xử lý bulk nhanh hơn. +- **Mô hình thay thế**: Thử họ họ `openai/whisper` nếu bạn cần chuyển giọng nói thành văn bản cùng với OCR. +- **Tích hợp với cơ sở dữ liệu**: Lưu trữ văn bản đã trích xuất vào SQLite hoặc Elasticsearch để tạo kho lưu trữ có thể tìm kiếm. + +Hãy thoải mái thử nghiệm—đổi mô hình, điều chỉnh `gpu_layers`, hoặc thêm bộ xử lý hậu kỳ của riêng bạn. Sự linh hoạt của Aspose OCR kết hợp với hub mô hình của Hugging Face tạo nên nền tảng đa năng cho bất kỳ dự án số hoá tài liệu nào. + +--- + +*Chúc lập trình vui! Nếu gặp khó khăn, hãy để lại bình luận bên dưới hoặc tham khảo tài liệu Aspose OCR để biết các tùy chọn cấu hình sâu hơn.* + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md b/ocr/vietnamese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md new file mode 100644 index 000000000..32c63dc4b --- /dev/null +++ b/ocr/vietnamese/python/general/perform-ocr-on-image-with-python-complete-guide/_index.md @@ -0,0 +1,208 @@ +--- +category: general +date: 2026-04-29 +description: Thực hiện OCR trên hình ảnh bằng Python, tự động tải xuống mô hình HuggingFace + và giải phóng bộ nhớ GPU một cách hiệu quả trong khi làm sạch văn bản OCR. +draft: false +keywords: +- perform OCR on image +- download HuggingFace model python +- release GPU memory python +- clean OCR text python +language: vi +og_description: Tìm hiểu cách thực hiện OCR trên hình ảnh trong Python, tự động tải + xuống mô hình HuggingFace, làm sạch văn bản và giải phóng bộ nhớ GPU. +og_title: Thực hiện OCR trên hình ảnh bằng Python – Hướng dẫn từng bước +tags: +- OCR +- Python +- Aspose +- HuggingFace +title: Thực hiện OCR trên hình ảnh bằng Python – Hướng dẫn đầy đủ +url: /vi/python/general/perform-ocr-on-image-with-python-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Thực hiện OCR trên hình ảnh bằng Python – Hướng dẫn đầy đủ + +Bạn đã bao giờ cần **thực hiện OCR trên hình ảnh** nhưng lại gặp khó khăn ở bước tải mô hình hoặc dọn dẹp bộ nhớ GPU? Bạn không phải là người duy nhất—nhiều nhà phát triển gặp phải rào cản này khi lần đầu kết hợp nhận dạng ký tự quang học với các mô hình ngôn ngữ lớn. + +Trong hướng dẫn này, chúng ta sẽ đi qua một giải pháp toàn diện, **tải mô hình HuggingFace trong Python**, chạy Aspose OCR, làm sạch đầu ra thô, và cuối cùng **giải phóng bộ nhớ GPU mà Python có thể thu hồi**. Khi hoàn thành, bạn sẽ có một script sẵn sàng chạy để chuyển đổi file PNG đã quét thành văn bản sạch sẽ, có thể tìm kiếm được. + +> **Bạn sẽ nhận được:** một mẫu code hoàn chỉnh, có thể chạy được, giải thích lý do mỗi bước quan trọng, mẹo tránh các lỗi thường gặp, và cái nhìn tổng quan về cách tùy chỉnh pipeline cho dự án của bạn. + +--- + +## Những gì bạn cần + +- Python 3.9 hoặc mới hơn (ví dụ được kiểm tra trên 3.11) +- Gói `aspose-ocr` (cài đặt bằng `pip install aspose-ocr`) +- Kết nối internet để thực hiện **download HuggingFace model python** +- GPU hỗ trợ CUDA nếu bạn muốn tăng tốc (tùy chọn nhưng khuyến nghị) + +Không cần các phụ thuộc hệ thống bổ sung; engine Aspose OCR đã bao gồm mọi thứ bạn cần. + +--- + +![thực hiện OCR trên hình ảnh ví dụ](image.png "Ví dụ thực hiện OCR trên hình ảnh với Aspose OCR và bộ xử lý sau LLM") + +*Văn bản thay thế hình ảnh: “perform OCR on image – Kết quả Aspose OCR trước và sau khi làm sạch bằng AI”* + +--- + +## Thực hiện OCR trên hình ảnh – Tổng quan các bước + +Dưới đây chúng tôi chia quy trình thành các phần logic. Mỗi phần có tiêu đề riêng, giúp trợ lý AI nhanh chóng nhảy tới phần bạn quan tâm, và công cụ tìm kiếm có thể lập chỉ mục các từ khóa liên quan. + +### 1. Download HuggingFace Model in Python + +Điều đầu tiên chúng ta cần làm là tải một mô hình ngôn ngữ sẽ đóng vai trò là bộ xử lý hậu kỳ cho đầu ra OCR thô. Aspose OCR cung cấp một lớp trợ giúp gọi là `AsposeAI` có thể tự động kéo mô hình từ HuggingFace hub. + +```python +import aspose.ocr as aocr +from aspose.ocr import AsposeAI, AsposeAIModelConfig, OcrEngine + +# Configure the model – it will auto‑download the first time you run it +model_config = AsposeAIModelConfig( + allow_auto_download="true", # <-- enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", + hugging_face_quantization="int8", # smaller footprint on GPU + gpu_layers=20, # how many layers stay on GPU + directory_model_path=r"YOUR_DIRECTORY" # where the model files live +) +``` + +**Tại sao điều này quan trọng:** +- **download HuggingFace model python** – bạn tránh việc phải tự xử lý các file zip hoặc xác thực token. +- Sử dụng lượng tử hoá `int8` làm giảm kích thước mô hình xuống khoảng một phần tư so với kích thước gốc, điều này rất quan trọng khi bạn sau này cần **release GPU memory python**. + +> **Mẹo chuyên nghiệp:** Đặt `directory_model_path` trên SSD để thời gian tải nhanh hơn. + +--- + +### 2. Initialise the AI Helper and Enable Spell‑Checking + +Bây giờ chúng ta tạo một thể hiện `AsposeAI` và gắn một bộ xử lý hậu kỳ kiểm tra chính tả. Đây là nơi phép màu **clean OCR text python** bắt đầu. + +```python +# Initialise the AI helper +ai_helper = AsposeAI() +ai_helper.set_post_processor( + processor="spell_corrector", + custom_settings={"max_edits": 2} # allows up to two character edits per word +) +``` + +**Giải thích:** +Bộ kiểm tra chính tả sẽ xem xét từng token từ engine OCR và đề xuất các chỉnh sửa giới hạn bởi `max_edits`. Thay đổi nhỏ này có thể biến “rec0gn1tion” thành “recognition” mà không cần một mô hình ngôn ngữ nặng. + +--- + +### 3. Hook the AI Helper into the OCR Engine + +Aspose đã giới thiệu một phương thức mới trong phiên bản 23.4 cho phép bạn gắn một engine AI trực tiếp vào pipeline OCR. + +```python +# Initialise the OCR engine and attach the AI helper +ocr_engine = OcrEngine() +ocr_engine.set_ai_engine(ai_helper) # new in v23.4 +``` + +**Lý do chúng ta làm như vậy:** +Bằng cách kết nối AI helper sớm, engine OCR có thể tùy chọn sử dụng mô hình để cải thiện ngay trong quá trình xử lý (ví dụ: phát hiện bố cục). Điều này cũng giúp code gọn gàng hơn—không cần vòng lặp hậu xử lý riêng biệt sau này. + +--- + +### 4. Perform OCR on the Scanned Image + +Đây là bước cốt lõi thực sự **perform OCR on image** các file. Thay `YOUR_DIRECTORY/input.png` bằng đường dẫn tới file scan của bạn. + +```python +image_path = r"YOUR_DIRECTORY/input.png" +ocr_result = ocr_engine.recognize(image_path) + +print("Raw OCR text:") +print(ocr_result.text) +``` + +Đầu ra thô thường chứa các ngắt dòng ở vị trí lạ, ký tự nhận dạng sai, hoặc các ký hiệu lẻ loi. Vì vậy chúng ta cần bước tiếp theo. + +**Đầu ra thô dự kiến (ví dụ):** + +``` +Th1s 1s 4n ex4mpl3 0f r4w OCR t3xt. +It c0ntains numb3rs 123 and s0me m1stakes. +``` + +--- + +### 5. Clean OCR Text in Python with the AI Post‑Processor + +Bây giờ chúng ta để AI dọn dẹp đống bừa bộn. Đây là trái tim của quy trình **clean OCR text python**. + +```python +cleaned_result = ocr_engine.run_postprocessor(ocr_result) + +print("\nAI‑enhanced text:") +print(cleaned_result.text) +``` + +**Kết quả bạn sẽ thấy:** + +``` +This is an example of raw OCR text. +It contains numbers 123 and some mistakes. +``` + +Chú ý cách bộ kiểm tra chính tả đã sửa “Th1s” → “This” và loại bỏ “4n” lẻ loi. Mô hình cũng chuẩn hoá khoảng trắng, điều thường gây phiền khi bạn đưa văn bản vào các pipeline NLP tiếp theo. + +--- + +### 6. Release GPU Memory in Python – Clean‑up Steps + +Khi công việc xong, nên giải phóng tài nguyên GPU, đặc biệt nếu bạn chạy nhiều job OCR trong một dịch vụ lâu dài. + +```python +# Release resources – crucial for GPU memory +ai_helper.free_resources() +ocr_engine.dispose() +``` + +**Điều gì xảy ra phía sau:** +`free_resources()` gỡ mô hình khỏi GPU, trả lại bộ nhớ cho driver CUDA. `dispose()` tắt các buffer nội bộ của engine OCR. Bỏ qua các lời gọi này có thể gây lỗi hết bộ nhớ sau chỉ vài hình ảnh. + +> **Nhớ:** Nếu bạn dự định xử lý các batch trong vòng lặp, hãy gọi clean‑up sau mỗi batch hoặc tái sử dụng cùng một `ai_helper` mà không giải phóng cho đến cuối cùng. + +--- + +## Bonus: Tinh chỉnh Pipeline cho Các Kịch bản Khác nhau + +### Điều chỉnh Lượng tử hoá Mô hình + +Nếu bạn có GPU mạnh (ví dụ: RTX 4090) và muốn độ chính xác cao hơn, thay `hugging_face_quantization` thành `"fp16"` và tăng `gpu_layers` lên `30`. Điều này sẽ tiêu tốn nhiều bộ nhớ hơn, vì vậy bạn sẽ cần **release GPU memory python** một cách tích cực hơn sau mỗi batch. + +### Sử dụng Trình Kiểm Tra Chính Tả Tùy Chỉnh + +Bạn có thể thay thế `spell_corrector` tích hợp sẵn bằng một bộ xử lý hậu kỳ tùy chỉnh cho các chỉnh sửa chuyên ngành (ví dụ: thuật ngữ y tế). Chỉ cần triển khai giao diện yêu cầu và truyền tên của nó vào `set_post_processor`. + +### Xử Lý Hàng Loạt Nhiều Hình Ảnh + +Bao bọc các bước OCR trong một vòng `for`, thu thập `cleaned_result.text` vào danh sách, và gọi `ai_helper.free_resources()` chỉ sau vòng lặp nếu bạn có đủ RAM GPU. Cách này giảm overhead khi phải tải lại mô hình liên tục. + +--- + +## Kết luận + +Chúng ta vừa cho bạn thấy cách **perform OCR on image** các file trong Python, tự động **download một mô hình HuggingFace**, **clean OCR text**, và an toàn **release GPU memory** khi công việc hoàn tất. Script hoàn chỉnh đã sẵn sàng để copy‑paste, và các giải thích giúp bạn tự tin tùy biến cho các dự án lớn hơn. + +Bước tiếp theo? Thử thay thế mô hình Qwen 2.5 bằng một biến thể LLaMA lớn hơn, thử nghiệm các bộ xử lý hậu kỳ khác nhau, hoặc tích hợp đầu ra đã làm sạch vào một chỉ mục Elasticsearch có thể tìm kiếm. Khả năng là vô hạn, và bạn đã có nền tảng vững chắc để xây dựng. + +Chúc lập trình vui vẻ, và chúc các pipeline OCR của bạn luôn sạch sẽ và thân thiện với bộ nhớ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file