Quickly Deploy Powerful Cross-Platform Machine Learning OCR Technology

Are you familiar with the concept of OCR? Wouldn’t it be nice to be able to easily convert images of typed, handwritten or printed text into machine-encoded text? Take a look at the two images below, with just a few lines of code we will make our Windows, Mac, Android or iOS application able to “read” those texts! Whether from a scanned document, a photo of a document or the text on signs and billboards in a landscape photo this process of extracting text from images is called Optical Character Recognition or Optical Character Reader (OCR).

We can easily use Google OCR machine-learning AI in our Delphi applications

The option for “Text Detection” is part of the Vision API that we can use to detect and extract information about multitple Texts in an image. For each text detected Google returns both a list of words identifed with text, bounding boxes, and textAnnotations , as well as the structural hierarchy for the OCR detected text.

quickly deploy AI vision with an API in your programs

Google Cloud’s Vision API offers powerful pre-trained machine learning models that you can easily use on your desktop and mobile applications through REST or RPC API methods calls. Lets say you want your application to detect objects, locations, activities, animal species, products, or maybe you want not only to detect faces but also their emotions, or you may have the need to read printed or handwritten text, this and much more is possible to be done for free (up to first 1000 units/month per feature) or at very affordable prices and scalable to the use you make with no upfront commitments.

How do I get my RAD Studio Delphi applications to detect text in images with an API?

We can use RAD Studio and Delphi to easily setup its REST client library to take advantage of Google Cloud’s Vision API to empower our desktop and mobile applications and if the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

Our RAD Studio and Delphi applications will be able to either call the API and perform the detection on a local image file by sending the contents of the image file as a base64 encoded string in the body of the request or rather use an image file located in Google Cloud Storage or on the Web without the need to send the contents of the image file in the body of your request.

How do I set up the Google Cloud Vision Logo Detection API?

Make sure you refer to Google Cloud Vision API documentation in the Text Detect section (https://cloud.google.com/vision/docs/ocr) and also Document Text Detection optimized for dense text/handwriting (https://cloud.google.com/vision/docs/pdf), but in general lines this is what you need to do on Google’s side:

  • Visit https://cloud.google.com/vision and login with your Gmail account
  • Create or select a Google Cloud Platform (GCP) project
  • Enable the Vision API for that project
  • Enable the Billing for that project
  • Create a API Key credential

How do I call Google Vision API Text Detection endpoint?

Now all we need to do is to call the API URL via a HTTP POST method passing the request JSON body with type TEXT_DETECTION and source as the link to the image we want to analyze. One can do that using REST Client libraries available on several programming languages and a quick start guide is available on Google’s documentation (https://cloud.google.com/vision/docs/quickstart-client-libraries).

Actually at the bottom page of th the Google Cloud Vision documentation Guide (https://cloud.google.com/vision/docs/ocr) there is an option Try This API that allows you to post the JSON request body as shown below and get the JSON response as follows.

What does the Google Vision API Text Detection endpoint return to my application?

After the call the main results will be a list with a description field containing the extracted text and a bounding polygon showing where in the image the text was found. You can use the polygon information to draw a square on top of the image and highlight the Text. Here below you find the result we got when using the code above with the link for the image containing the Mercedes car plate number. Now go ahead and try it with the image containing the handwriting note available at this link.

How do I connect my applications to Google Cloud Vision Logo Detection API?

Once you have followed basic steps to set up Text Detection API on Google’s side, make sure you go to the Console and in the Credentials menu item click on Create Credentials button and add an API key. Copy this key as we will need it later.

captura-de-tela-2021-04-17-20-22-54-5

RAD Studio Delphi and C++Builder make it very easy to connect to APIs as you can you REST Debugger to automatically create the REST components and paste them into your app.

In Delphi all the job is done using 3 components tot make the API call. They are the TRESTClient, TRESTRequest, and TRESTResponse. Once you connect the REST Debugger successfully, copy and past the components you will notice that the API URL is set on the BaseURL of TRESTClient. On the TRESTRequest component you will see that the request type is set to rmPOST, the ContentType is set to ctAPPLICATION_JSON, and that it contains one request body for the POST.

Run your RAD Studio Delphi and on the main menu click on Tools > REST Debbuger. Configure the REST Debugger as follows marking the content-type as application/json, and adding the POST url, the JSON request body and the API key you created. Once you click the Send Request button you should see the JSON response, just like we demonstrated above.

Check the video below for more details on how to configure and text Google Cloud Vision API text detection and other features using REST Debugger

How do I build a Windows desktop or Android/iOS mobile device application using the Google Cloud Vision API Text Detection?

Now that you were able to successfully configure and test your API calls on the REST Debugger, just click the Copy Components button, go back to Delphi and create a new application project and Paste the components on your main form.

Very simple code is added to a TButton OnClick event to make sure every thing is configured correctly and voila! In five minutes we have made our very first call to Google Vision API and we are able to receive JSON response for whatever images we want to perform Text Detection. Please note that on the TRESTResponse component the RootElement is set to ‘responses[0].textAnnotations’. This means that the ‘textAnnotations’ element in the JSON is specifically selected to be pulled into the in memory table (TFDMemTable).

The sample application features a TEdit as a place to paste in the link to the image you want to analyze and another TEdit for the maxResults parameter, a TMemo to display the JSON results of the REST API call, and a TStringGrid component to navigate and display the data in a tabular way demonstrating how to easily integrate the JSON response result with a TFDMemTable component. When the button is clicked the image is analyzed and the application presents the response JSON as text and as data in a grid. Now you have every thing you need in order to integrate with the response data and make your application process the information the way it better suits your needs!

Detect text in your programs using AI API

In this blog post we’ve seen how to sign up for the Google Cloud Vision API in order to perform Logo Detection on images. We’ve seen how to use the RAD Studio REST Debugger to connect to the endpoint and copy that code into a real application. And finally we’ve seen how easy and fast it is to use RAD Studio Delphi to create a real Windows (and Linux and macOS and Android and iOS) application which connects to the Google Cloud Vision API, executes Logo Detection image analysis and gives as result a memory dataset ready for you to iterate!

Head over to the following link to download the example source code for the desktop and mobile Google Cloud Vision API Logo Detection REST demo: https://github.com/checkdigits/google_text_detection_api_delphi_example