Tuesday, 6 November 2018

Extract text from image using Pytesseract in windows platform

For windows Os, we need an installation. Pytesseract binary is available here. Then Add a new variable with name tesseract in environment variables with value C:\Program Files (x86)\Tesseract-OCR\tesseract.exe

Then we need to install a python package: pip install tesseract 

Some cases we need the following line of code (if the environment variable is not added correctly) 

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'


Here we provide the Pytesseract path to the interpreter.



Full Code

       
'''
download and install-https://github.com/UB-Mannheim/tesseract/wiki

'''

import numpy as np
import cv2
import time
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
frame1 = cv2.imread('poc.jpg',0);

cv2.imwrite('ocr.jpg',frame1)

#from tesseract import image_to_string
text = pytesseract.image_to_string(frame1)
print(text)
cv2.imshow(text,frame1 )

cv2.waitKey(0)
cv2.destroyAllWindows()



Input Image

Output






No comments:

Post a Comment