Open source OCR models for Indic Languages
This repository contains ocr model links for popular Indian languages developed as part of the Anuvaad project.
Please reach out to [email protected] for any clarification/interpretation/usage of the linked datasets.
Below models are trained using Tesseract-OCR.
| Language | Model |
|---|---|
| Hindi | anuvaad_hin.traineddata |
| Bengali | anuvaad_ben.traineddata |
| Kannada | anuvaad_kan.traineddata |
| Malayalam | anuvaad_mal.traineddata |
| Marathi | anuvaad_mar.traineddata |
| Odia | anuvaad_ori.traineddata |
| Tamil | anuvaad_tam.traineddata |
| Telugu | anuvaad_tel.traineddata |
| Language | Model |
|---|---|
| Hindi | anuvad_hin_scene_text_real.traineddata |
| Tamil | anuvad_tam_scene_text_real.traineddata |
| Scene-Text Judgement Lline Detection V1 | scene_text_judgement_line_detection_v1_model.pth |
Below layout models are trained using Layout Parser(Detectron2).
| Language | Model |
|---|---|
| Anuvaad Judgement Line Detection | anuvaad_line_v1.pth |
| Anuvaad Scene-Text Line Detection | scene_text_judgement_line_detection_v1_model.pth |
| Anuvaad Judgement Layout | model_final.pth |
| Anuvaad Table Layout | judgement_prima_table_layout_modelv3.pth |