Text Extraction-EasyOCR
EasyOCR is a Python library that provides simple and accurate text recognition from images and scanned documents. It is based on the popular OCR engine Tesseract, and it can be used to extract text from images, scanned documents, and PDF files.
Go Link
To use EasyOCR, you will need to install it first. You can do this by running the following command:
Copy code
pip install easyocr
Once you have EasyOCR installed, you can use it to extract text from an image or PDF file by using the following code:
Copy code
import easyocr
reader = easyocr.Reader(['en']) # Set English as the language
# Extract text from an image
image = '/path/to/image.jpg'
text = reader.readtext(image)
# Extract text from a PDF file
pdf_file = '/path/to/document.pdf'
text = reader.readpdf(pdf_file)
EasyOCR can also be used to extract text from multiple languages by specifying the languages you want to use when creating the Reader object. For example:
Copy code
reader = easyocr.Reader(['en', 'fr', 'de']) # Set English, French, and German as the languages
You can find more information about EasyOCR and its usage in the documentation: https://pypi.org/project/easyocr/