Welcome to a tutorial on how to convert an image to text using OCR in Python. So you are working on a project that needs to “extract” text from an image? A common solution is called Optical Character Recognition, and here are some possible ways to do it in Python. Read on!
TABLE OF CONTENTS
DOWNLOAD & NOTES
Here is the download link to the example code, so you don’t have to copy-paste everything.
EXAMPLE CODE DOWNLOAD
Just click on “download zip” or do a git clone. I have released it under the MIT license, so feel free to build on top of it or use it in your own project.
SORRY FOR THE ADS...
But someone has to pay the bills, and sponsors are paying for it. I insist on not turning Code Boxx into a "paid scripts" business, and I don't "block people with Adblock". Every little bit of support helps.
Buy Me A Coffee Code Boxx eBooks
PYTHON IMAGE TO TEXT WITH OCR
All right, let us now get into the examples of converting images to text in Python using OCR.
QUICK SETUP
The “usual stuff”:
- Create a virtual environment
virtualenv venv
and activate it –venv\Scripts\activate
(Windows)venv/bin/activate
(Linux/Mac) - Install required libraries –
pip install flask
- For those who are new, the default Flask folders are –
static
Public files (JS/CSS/images/videos/audio)templates
HTML pages
SOLUTION 1) TESSERACT
1A) DOWNLOAD & INSTALL TESSERACT
There is a popular open-source OCR library called Tesseract, but unfortunately, I can’t find a Python port-over. Don’t worry though, we can still use this library. First, install it:
- For the experts, here’s the Tesseract Github page… If you want to download and compile it yourself.
- Otherwise, the easy way is to download and install one of their pre-built versions.
1B) PYTHON RUN TESSERACT IN THE COMMAND LINE
# (A) LOAD SUBPROCESS MODULE & SETTINGS - CHANGE TO YOUR OWN!
import subprocess
tes = "C:/Program Files/Tesseract-OCR/tesseract.exe"
img = "demo.png"
lang = "eng"
# (B) RUN TESSERACT COMMAND
cmd = f'"{tes}" {img} - -l {lang}'
res = subprocess.run(cmd, stdout=subprocess.PIPE)
# (C) GET TEXT
txt = res.stdout.decode("utf-8")
# @TODO - WHATEVER YOU NEED WITH THE TEXT
print(txt)
How to “gel” Tesseract and Python together:
- (B) Run
PATH/TO/TESSERACT IMAGE.FILE - -l eng
in the command line. - (C) Get the command line output as a string.
SOLUTION 2) TESSERACT JS
2A) HTML PAGE
<!-- (A) FILE SELECTOR -->
<input type="file" id="select" accept="image/png, image/gif, image/webp, image/jpeg">
<!-- (B) LOAD TESSERACT -->
<!-- https://cdnjs.com/libraries/tesseract.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/4.0.6/tesseract.min.js"></script>
<!-- (C) INIT -->
<script>
window.addEventListener("load", async () => {
// (C1) GET HTML FILE SELECTOR
const hSel = document.getElementById("select");
// (C2) CREATE ENGLISH WORKER
const worker = await Tesseract.createWorker();
await worker.loadLanguage("eng");
await worker.initialize("eng");
// (C3) ON FILE SELECT
hSel.onchange = async () => {
// (C3-1) IMAGE TO TEXT
const res = await worker.recognize(hSel.files[0]);
// (C3-2) UPLOAD TO SERVER
let data = new FormData();
data.append("text", res.data.text);
fetch("/save", { method:"post", body:data })
.then(res => res.text())
.then(txt => console.log(txt))
.catch(err => console.error(err));
};
});
</script>
If you cannot install anything on the server, here’s an alternative – Tesseract does not have a “Python version”, but someone did manage to create a Javascript web assembly version.
2B) FLASK HTTP SERVER
# (A) INIT
# (A1) LOAD MODULES
from flask import Flask, render_template, request, make_response, send_from_directory
# (A2) FLASK SETTINGS + INIT
HOST_NAME = "localhost"
HOST_PORT = 80
app = Flask(__name__)
# app.debug = True
# (B) VIEWS
# (B1) "LANDING PAGE"
@app.route("/")
def index():
return render_template("2A-tesseract-js.html")
# (B2) SAVE CONVERTED TEXT
@app.route("/save", methods=["POST"])
def txt():
data = dict(request.form)
# @TODO - WHATEVER YOU NEED WITH THE TEXT
print(data["text"])
return "OK"
# (C) START
if __name__ == "__main__":
app.run(HOST_NAME, HOST_PORT)
TesseractJS is client-side, how does it work with Python? This is unfortunately a little bit roundabout:
- Create a simple HTTP server with Flask, and serve the above Tesseract page at
http://localhost
. - The Tesseract page will send the result to
http://localhost/save
.
SOLUTION 3) GOOGLE CLOUD VISION
If all else fails, the final alternative you have is to use an online image-to-text recognition service – Google Cloud Vision is a good option. At the time of writing, they offer 1000 free processes per month.
- Sign up with Google Cloud first.
- Follow their instructions.
- Create a new project in Google Cloud console.
- Enable Vision API.
- Install Google Cloud CLI.
- All the code samples are on the instructions page, also on Github.
EXTRAS
That’s all for the tutorial, and here is a small section on some extras and links that may be useful to you.
LINKS & REFERENCES
- Tesseract – GitHub
- TesseractJS – GitHub
- Google Cloud Vision
- Javascript OCR Image To Text – Code Boxx
THE END
Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!