PHP Image To Text OCR (Simple Examples)

Welcome to a tutorial on converting an image to text in PHP using Optical Character Recognition (OCR). So you are working on a project to “extract text” from an image? Sadly, the Internet doesn’t seem to have a proper free PHP OCR library at the time of writing. But there are still ways to do that – Read on!

Download & Notes

PHP Image To Text

Extras

The End

DOWNLOAD & NOTES

Here is the download link to the example code, so you don’t have to copy-paste everything.

EXAMPLE CODE DOWNLOAD

Source code on GitHub Gist

Just click on “download zip” or do a git clone. I have released it under the MIT license, so feel free to build on top of it or use it in your own project.

SORRY FOR THE ADS...

But someone has to pay the bills, and sponsors are paying for it. I insist on not turning Code Boxx into a "paid scripts" business, and I don't "block people with Adblock". Every little bit of support helps.

Buy Me A Coffee Code Boxx eBooks

PHP IMAGE TO TEXT

All right, let us now get into the possible ways to convert images to text in PHP.

SOLUTION 1) TESSERACT

1A) INSTALL TESSERACT

There are no proper PHP OCR libraries, but there is a good open-source one that has been around for a long time – Tesseract. Now, the installation is kind of “everywhere”. You can download the source code from GitHub and compile it yourself, or the easy way is to use the pre-built versions.

1B) PHP RUN TESSERACT IN SHELL

1-tesseract.php

<?php
// (A) SETTINGS - CHANGE TO YOUR OWN!
$tes = "C:/Program Files/Tesseract-OCR/tesseract.exe"; // path to tesseract
$img = __DIR__ . DIRECTORY_SEPARATOR . "test.png"; // image to read
$lang = "eng"; // language

// (B) RUN TESSERACT IN TERMINAL
// https://github.com/tesseract-ocr/tesseract
$cmd = "\"$tes\" $img - -l $lang";
$result = shell_exec($cmd);

// (C) OPTIONAL - STRIP LINE BREAKS
$result = str_replace("\n", " ", $result);
echo $result;

After you have installed Tesseract, simply run PATH/TO/TESSERACT PATH/TO/IMAGE - -l eng in the command line (or terminal) and get the results.

P.S. Check out the Tesseract documentation for the full list of options and languages. Links below in the extras section.

SOLUTION 2) TESSERACT JS

2A) THE HTML

2a-tesseract-js.html

<!-- (A) FILE SELECTOR -->
<input type="file" id="select" accept="image/png, image/gif, image/webp, image/jpeg">

<!-- (B) LOAD TESSERACT -->
<!-- https://cdnjs.com/libraries/tesseract.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/4.0.6/tesseract.min.js"></script>

<!-- (C) INIT -->
<script>
window.addEventListener("load", async () => {
  // (C1) GET HTML FILE SELECTOR
  const hSel = document.getElementById("select");

  // (C2) CREATE ENGLISH WORKER
  const worker = await Tesseract.createWorker();
  await worker.loadLanguage("eng");
  await worker.initialize("eng");

  // (C3) ON FILE SELECT
  hSel.onchange = async () => {
    // (C3-1) IMAGE TO TEXT
    const res = await worker.recognize(hSel.files[0]);

    // (C3-2) UPLOAD TO SERVER
    let data = new FormData();
    data.append("text", res.data.text);
    fetch("2b-save.php", { method:"post", body:data })
    .then(res => res.text())
    .then(txt => console.log(txt))
    .catch(err => console.error(err));
  };
});
</script>

Here’s the weird part – Tesseract does not have a “PHP port over”, but it does have a web assembly version. Once again, download and host it on your own or load it from a CDN. Create a simple webpage, use Tesseract to convert images to text, then upload the result to your PHP script.

2B) THE PHP

2b-save.php

<?php
print_r($_POST);
file_put_contents("demo.txt", $_POST["text"]);

This is just a dummy demo. Do whatever you need in your own project – Save to database, save to CSV, generate PDF, etc…

SOLUTION 3) GOOGLE VISION OCR

If you don’t have access to install anything on a shared server and don’t want to use the Javascript solution – The only option left is to use an online service. Google Cloud Vision offers OCR and 1000 free processes per month at the time of writing. The setup is pretty painful though:

Sign up with Google Cloud if you have not already done so.
Follow their instructions.
- Create a new project in Google Cloud console.
- Enable Vision API.
- Install Google Cloud CLI.
Download the PHP library, and read through their documentation for examples.

EXTRAS

That’s all for the tutorial, and here is a small section on some extras and links that may be useful to you.

LINKS & REFERENCES

Tesseract – GitHub
Tesseract Documentation – GitHub
Javascript OCR Image To Text – Code Boxx
Tesseract JS
Google Cloud Vision

THE END

Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!