Moreover, 18110 characters and 2617 words are used to make the OCR's library. To take this issue in consideration, the latest 3.03 version of Tesseract OCR engine for Windows operating system is used to develop an OCR for Bengali language. It has been found that researchers put lots of efforts for developing a Bengali OCR but none of them is completely error free. In case of a more sophisticated approach, an OCR also works on sentence detection to preserve a document's structure. Significant number of algorithms is required to develop an OCR and basically it works in two phases such as character and word detection. The main purpose of an OCR is to make editable documents from existing paper documents or image files. Optical Character Recognition (OCR) is the process of extracting text from an image.
0 Comments
Leave a Reply. |