Welcome to a tutorial on converting an image to text in PHP using Optical Character Recognition (OCR). So you are working on a project to “extract text” from an image? Sadly, the Internet doesn’t seem to have a proper free PHP OCR library at the time of writing. But there are still ways to do that – Read on!
TABLE OF CONTENTS
DOWNLOAD & NOTES
Here is the download link to the example code, so you don’t have to copy-paste everything.
EXAMPLE CODE DOWNLOAD
Just click on “download zip” or do a git clone. I have released it under the MIT license, so feel free to build on top of it or use it in your own project.
SORRY FOR THE ADS...
But someone has to pay the bills, and sponsors are paying for it. I insist on not turning Code Boxx into a "paid scripts" business, and I don't "block people with Adblock". Every little bit of support helps.
Buy Me A Coffee Code Boxx eBooks
PHP IMAGE TO TEXT
All right, let us now get into the possible ways to convert images to text in PHP.
SOLUTION 1) TESSERACT
1A) INSTALL TESSERACT
There are no proper PHP OCR libraries, but there is a good open-source one that has been around for a long time – Tesseract. Now, the installation is kind of “everywhere”. You can download the source code from GitHub and compile it yourself, or the easy way is to use the pre-built versions.
1B) PHP RUN TESSERACT IN SHELL
<?php
// (A) SETTINGS - CHANGE TO YOUR OWN!
$tes = "C:/Program Files/Tesseract-OCR/tesseract.exe"; // path to tesseract
$img = __DIR__ . DIRECTORY_SEPARATOR . "test.png"; // image to read
$lang = "eng"; // language
// (B) RUN TESSERACT IN TERMINAL
// https://github.com/tesseract-ocr/tesseract
$cmd = "\"$tes\" $img - -l $lang";
$result = shell_exec($cmd);
// (C) OPTIONAL - STRIP LINE BREAKS
$result = str_replace("\n", " ", $result);
echo $result;
After you have installed Tesseract, simply run PATH/TO/TESSERACT PATH/TO/IMAGE - -l eng
in the command line (or terminal) and get the results.
P.S. Check out the Tesseract documentation for the full list of options and languages. Links below in the extras section.
SOLUTION 2) TESSERACT JS
2A) THE HTML
<!-- (A) FILE SELECTOR -->
<input type="file" id="select" accept="image/png, image/gif, image/webp, image/jpeg">
<!-- (B) LOAD TESSERACT -->
<!-- https://cdnjs.com/libraries/tesseract.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/4.0.6/tesseract.min.js"></script>
<!-- (C) INIT -->
<script>
window.addEventListener("load", async () => {
// (C1) GET HTML FILE SELECTOR
const hSel = document.getElementById("select");
// (C2) CREATE ENGLISH WORKER
const worker = await Tesseract.createWorker();
await worker.loadLanguage("eng");
await worker.initialize("eng");
// (C3) ON FILE SELECT
hSel.onchange = async () => {
// (C3-1) IMAGE TO TEXT
const res = await worker.recognize(hSel.files[0]);
// (C3-2) UPLOAD TO SERVER
let data = new FormData();
data.append("text", res.data.text);
fetch("2b-save.php", { method:"post", body:data })
.then(res => res.text())
.then(txt => console.log(txt))
.catch(err => console.error(err));
};
});
</script>
Here’s the weird part – Tesseract does not have a “PHP port over”, but it does have a web assembly version. Once again, download and host it on your own or load it from a CDN. Create a simple webpage, use Tesseract to convert images to text, then upload the result to your PHP script.
2B) THE PHP
<?php
print_r($_POST);
file_put_contents("demo.txt", $_POST["text"]);
This is just a dummy demo. Do whatever you need in your own project – Save to database, save to CSV, generate PDF, etc…
SOLUTION 3) GOOGLE VISION OCR
If you don’t have access to install anything on a shared server and don’t want to use the Javascript solution – The only option left is to use an online service. Google Cloud Vision offers OCR and 1000 free processes per month at the time of writing. The setup is pretty painful though:
- Sign up with Google Cloud if you have not already done so.
- Follow their instructions.
- Create a new project in Google Cloud console.
- Enable Vision API.
- Install Google Cloud CLI.
- Download the PHP library, and read through their documentation for examples.
EXTRAS
That’s all for the tutorial, and here is a small section on some extras and links that may be useful to you.
LINKS & REFERENCES
- Tesseract – GitHub
- Tesseract Documentation – GitHub
- Javascript OCR Image To Text – Code Boxx
- Tesseract JS
- Google Cloud Vision
THE END
Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!