Detecting Objects on images using Google Cloud Vision API

Lets take a look at this image below. Let’s try to think about 2 or 3 objects we can see and focus on what calls our attention the most. If we write them down , lets see if Google can guess it right.

Using Google's cloud-based vision API with Delphi 1

Google’s cloud-based vision API – making sense of what we see and much more

Google Cloud’s Vision API offers powerful pre-trained machine learning models that you can easily use on your desktop and mobile applications through REST or RPC API methods calls. Lets say you want your application to detect objects, locations, activities, animal species, and products. Or maybe you want not only to detect faces but also their emotion expressed on the faces. Or perhaps you have the need to read printed or handwritten text. All of this and much more is possible to be done for free (up to first 1000 units/month per feature) or at very affordable prices and it’s scalable too with no upfront commitments.

Object localization

The option for “Object Localization” is part of the Vision API that we can use to detect and extract information about multiple objects in an image. For each object detected the following elements are returned:

  • A textual description – what is it, in plain human language?
  • A confidence score – how certain is the API of what it has detected?
  • And normalized vertices [0,1] for the bounding polygon around the object. Where are the objects on the image?

Using RAD Studio Delphi to control the Google Cloud Vision API

We can use RAD Studio and Delphi to easily setup its REST client library to take advantage of Google Cloud’s Vision API to empower our desktop and mobile applications and if the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

Our RAD Studio and Delphi applications will be able to either call the API and perform the detection on a local image file by sending the contents of the image file as a base64 encoded string in the body of the request or rather use an image file located in Google Cloud Storage or on the Web without the need to send the contents of the image file in the body of your request.

How do I set up the Google Cloud Vision Object Localization API?

Make sure you refer to Google Cloud Vision API documentation in the Object Localization section (, but in general terms this is what you need to do on Google’s side:

  • Visit and login with your Gmail account
  • Create or select a Google Cloud Platform (GCP) project
  • Enable the Vision API for that project
  • Enable the Billing for that project
  • Create a API Key credential

How do I call Google Vision API Object Localization endpoint?

Now all we need to do is to call the API URL via a HTTP POST method passing the request JSON body with type OBJECT_LOCALIZATION and source as the link to the image we want to analyze. One can do that using REST Client libraries available on several programming languages and a quick start guide is available on Google’s documentation (

Actually, at the bottom page of the Google Cloud Vision documentation Guide ( there is an option “Try This API” which allows you to post the JSON request body as shown below and get the JSON response as follows.

What does the Google Vision API Object Localization endpoint return?

After the call the result will be a list with a “Object” description, the confidence score (which ranges from 0-no confidence to 1-very high confidence), and a bounding polygon showing where in the image the object was found. You you can use the polygon information to draw a square on top of the image and highlight the objects so the final result would be something like shown in the image below.

For this image in specific I was expecting Google to return additional objects like Purse/Bag and Flip-flops/Shoes. But as far as I could see it looks like the API focuses on the most important and relevant objects in the complete scenario.

What is the difference between the Google Cloud vision label detection and object localization APIs?

in a previous article ( we went through the Google’s Cloud Vision Label Detection feature and at this point it would be nice to check differences between them. The Label Detection feature is far more broad-ranging and can identify general objects, locations, activities, animal species, products, and more, but it gives no information bounding polygon localization information as result.

If we input this very same image into the Label Detection feature the result would be “Water, Sky, Umbrella, People on beach, Blue”. As you can see it included activities and colors and more general information about the thing we can find on the image. In some ways it’s a more complete description of the image.

The list of items will be limited on how many objects are detected up to the maximum configured in the parameter maxResults in the request JSON. It’s the perfect starting-point for creating a simple “spot the objects” game.

Using Google's cloud-based vision API with Delphi - detected labels

How do I connect my applications to Google Cloud Vision Object Localization API?

Once you have followed basic steps to set up Object Localization API on Google’s side, make sure you go to the Console and in the Credentials menu item click on the “Create Credentials button” and add an API key. Copy this key as we will need it later.

Using Google's cloud-based vision API with Delphi - configuring your API

RAD Studio makes it easy to call the Google Cloud Vision API with REST

RAD Studio Delphi and C++Builder make it very easy to connect to APIs as you can you REST Debugger to automatically create the REST components and paste them into your app.

In Delphi all the work is done using 3 components to make the API call. They are the TRESTClient, TRESTRequest, and TRESTResponse. Once you connect the REST Debugger successfully and copy and paste the components, you will notice that the API URL is set on the BaseURL of TRESTClient. On the TRESTRequest component you will see that the request type is set to rmPOST, the ContentType is set to ctAPPLICATION_JSON, and that it contains one request body for the POST.

Run RAD Studio Delphi and on the main menu click on Tools > REST Debbuger. Configure the REST Debugger as follows marking the content-type as application/json, and adding the POST url, the JSON request body and the API key you created. Once you click the “Send Request” button you should see the JSON response, just like we demonstrated above.

How do I build a Windows desktop or Android/iOS mobile device application using the Google Cloud Vision API Object Localization?

Now that you were able to successfully configure and test your API calls on the REST Debugger, just click the “Copy Components” button, go back to Delphi, and create a new application project. Now paste the components onto your main form.

Very simple code added to a TButton OnClick event to make sure every thing is configured correctly and it’s done! In around five minutes we have made our very first call to Google Vision API and we are able to receive JSON response for whatever images on which we want to perform Object Localization. Please note that on the TRESTResponse component the RootElement is set to ‘responses[0].localizedObjectAnnotations’. This means that the ‘localizedObjectAnnotations’ element in the JSON is specifically selected to be pulled into the in memory table (TFDMemTable).

Example code showing how to call the Google Cloud Vision API in Delphi

The sample application features a TEdit as a place to paste in the link to the image you want to analyze and another TEdit for the maxResults parameter, a TMemo to display the JSON results of the REST API call, and a TStringGrid component to navigate and display the data in a tabular way. All this demonstrates how to easily integrate the JSON response result with a TFDMemTable component. When the button is clicked the image is analyzed and the application presents the response JSON as text and as data in a grid. Now you have every thing you need in order to integrate with the response data and make your application process the information the way it better suits your needs!

Using Google's cloud-based vision API with Delphi - showing the REST results in the IDE

Summary of what we learned in this article

In this blog post we’ve seen how to sign up for the Google Cloud Vision API in order to perform Object Localization on images. We’ve seen how to use the RAD Studio REST Debugger to connect to the endpoint and copy that code into a real application. And finally we’ve seen how easy and fast it is to use RAD Studio Delphi to create a real Windows (and Linux and macOS and Android and iOS) application which connects to the Google Cloud Vision API, executes Object Localization image analysis and gives as result a memory dataset ready for you to iterate!

You can download the full example code for the desktop and mobile Google Cloud Vision API Object Localization REST demo here:

Are you ready to make your programs understand real world images? It’s easy with RAD Studio Delphi!