How Real Estate Sites Use Image Recognition For A Better Visual Search Experience

In today’s digital world, real estate shopping has moved almost exclusively online. Real estate listing portals, such as Zillow or Trulia, have helped usher in the era of visual search as the future of buying or selling your home online. They’ve gotten so popular in fact, that many of the top real estate portals are managing over one million new images per day as sellers and realtors want to provide the best presentation of their home and users demand to see every nook and cranny before actually visiting the property.

In order to keep pace with the onslaught of new property listings and unprecedented traffic from potential home buyers, real estate portals have been desperately looking for solutions to manage their visual assets - ie: photos and videos of properties uploaded by users.

Understanding The Challenge

 the challenge - creating a better visual search experience amongst millions of images to manage -

You have to realize that along with the benefits of user-generated content (ie: access to millions upon millions of uploaded photos for virtual property tours), real estate portals are faced with the unmanageable task of collecting, organizing and displaying that content in an efficient and easy to navigate manner.

First, as a real estate portal, how will you go about sorting through all of those images? Well, you have a few options:

  1. Do nothing - Also known as the “Hide My Head In The Sand And Hope The Problem Goes Away” approach. Sadly, this is the most common solution currently employed by real estate portals. These platforms will die a slow death as users abandon them in search of a better user experience.
  2. Do it in-house - That’s right. Ask your team to sift through millions of photos per day in order to properly classify them for quality, accuracy, copyrights and illegal content. I’m sure that will go over well at your next team meeting.
  3. Crowdsource it - Perhaps a percentage of your users will follow your instructions well enough to “self-classify” the images, but this hinders the user experience and creates an unnecessary barrier to attract new home sellers to your platform. Alternatively, you could outsource it, but this gets expensive fast and the process is painfully slow.
  4. Teach a machine to do it! Ding, ding, ding. We have a winner, folks. With advances in image recognition, computer vision and deep learning techniques, machines are getting smarter at understanding visual images.

Given the sheer volume of images that real estate listing sites are receiving, crowdsourcing the organization of your visual content is a non-starter. It would simply take too long to be relevant. It’s for this reason that we help real estate listing portals develop their own customized image recognition capabilities. Here’s how we do it.

Enabling Visual Browsing

better visual search experience with visual browsing -’s provides an already trained real estate vision API. A portal can start using it in a matter of minutes (a simple line of restful code), thanks to our ‘plug and play’ approach. There are several ready Image Recognition models, that cover almost all the needs and different country specific requirements. However, in some cases where a portal has different needs or is serving a niche market we can costumize our ConvNet accordingly, creating and training a tailored algorithm.

In a perfect world, our image recognition model would visually detect the features we collected i of relevant real estate terms and phrases, including specific scenes (ie: backyard swimming pool), objects (ie: fireplace) or materials (ie: tile floors), and so on. While’s convolutional neural network has already been trained with massive amounts of annotated data for the real estate sector, user photos and videos often portray scenes cluttered with various objects and materials making it difficult to detect accurately “out-of-the-box”.

Fortunately, the vast majority of images can be correctly categorized into a fixed set of scenes, objects and features. We then use our proprietary deep learning framework to train our ConvNets to learn and adapt to your specific real estate portal in order to fill in any blanks. Training our visual models to your portal is an intensive process with human oversight and takes approximately 1-3 weeks. Once the calibration is complete,’s image recognition can automatically annotate any relevant characteristics within its visual field with human-like accuracy!

We now have all of the required elements to enable visual browsing on your real estate portal. Predetermined real estate vocabulary from your property descriptions can be combined with our ConvNet visual model to automatically generate searchable visual labels for every single photo on your listing portal as they are being uploaded by your users.

Optimizing The Search Experience


You maybe asking yourself, why this is so important. Can’t we just search the written descriptions? Well you could, but we’re living in a visual world, and I am visual girl. People are scrolling through properties on their smartphones and tablets on their lunch break or while stuck in traffic. They don’t type! In other words, nobody’s searching for their next home on Craigslist anymore. We NEED visual search to improve user engagement.

Real estate photos and videos come in a wide variety of quality and content. It’s critical for your users to be presented with only the most relevant and attractive images at the beginning of every single search they run on your platform.

By leveraging our image recognition process, users are now be able to access a rich database of characteristics for every single property they search for within the portal that was otherwise inaccessible. For example, say you wanted to see every property featuring an “open kitchen layout” with “hardwood floors” in certain zip code of San Francisco, CA. Rather than your team manually assigning those values for filter and boolean search, users could simply type in “open kitchen with hardwood floors in the Marina or Russian Hill”. Your algorithm has automatically tagged images for any properties matching the natural language search description, which is then verified by the content within the related property descriptions AND the images associated with those listings.

These are just a few examples of how our image recognition API is improving the search experience for millions of users in the real estate industry. We’ll be exploring this concept much further in future articles, as we continue to innovate our machine learning techniques and extract even more useful information from our clients’ photos and videos.

Read more about Image Recognition.