Blog / Foundations, Visually Speaking

Semantic Image Understanding

Peter Krogh
Fri May 08 2020

In 1984, Apple unveiled the Macintosh computer, which unleashed a desktop-publishing (DTP) and word-processing revolution. Tools that had previously been used only by a small number of trained professionals were suddenly in the hands of nearly everyone and soon became essential to many jobs and to the general functioning of society. Mobile phones are doing the same thing with visual media.

It’s hard to imagine, but it took 20 years from the start of the DTP revolution until full drive indexing came to your computer. (You know, that thing you take for granted, where you can type a bit of text, and every document on your computer with that text shows up in a list?) In the interim, there was no good way to file and find specific documents, other than file and folder names. It was clunky, time-consuming, and very easy to lose important stuff.

We are at a similar point in the development of photographic speech. We’re experiencing a flood of new files to manage, but the tools to store, tag and find are lagging far behind. In large part, this is because we don’t have a good notion of the semantics of images.

What are image semantics?

Semantics is loosely defined as the study of meaning in a language. As we think about speaking the language of imagery, it will be essential to get a more formalized notion of content, context, and meaning. This notion needs to factor in a number of the following elements:

  • Denotative elements - This is the who, what, when, where, and why of an image’s subject matter. Many of the mature metadata tools have focused on this, starting with IPTC long before the digital photo revolution. The stock photography industry has also pushed this forward, since there was an economic reason to develop better ways to tag and search vast image collections for sales and licensing. AI tools are now driving this forward.
  • Object graph - In a language spoken with the use of objects, the path, proliferation, and connections to the object become a deeply important part of understanding the meaning and importance of the image.
  • Creator knowledge and intent - It is often essential to know the intent of the photographer in order to understand the meaning or importance of an image completely. Was an image captured (and shared) to show a specific thing? Was this a good thing or a bad thing? Visual media can hold a lot of information, and it can be really helpful to know which part the creator intended you to pay attention to.
  • Viewer perspective - You can’t determine meaning without determining the relationship between the image and the person viewing. The denotative information and the object graph help to determine if an object has meaning to me. And that meaning may be different than the meaning to others, depending on my personal graph or my cultural perspective.

Informatics and discovery

Image semantics falls under the rubric of Informatics: the study of the interaction between people and information systems. Ultimately, we need a way to parse through images to find the ones that suit our needs. Sometimes this will be easy. As your needs become more complex, as your collection grows larger, and as you seek to use visual media from other collections, the semantics problem becomes harder and more important.

There are several structural methods to approach the discovery issue:

  • Simple search and filtering - The familiar tools we have to search our own collections will continue to be important. If you know the date taken, a simple filter may be the easiest way to find the right image. Search and filter will clearly be improved by computational tagging services, which will help as collections expand.
  • Searching within identity-aware services - When you search with Google, the search is assisted by what Google knows about you. This might be the location you’re in, which helps to find locally-relevant results. Siri and Google know a lot more about you and can, for instance, make a guess as to whether you mean “horses” or “cars” when you search for “racing.”
  • Intelligent local agents - It’s possible that we will also see some type of intelligent search capability that runs locally in private collections and allows the library owner to know about the person searching rather than keeping all the information locked away in a social media or giant web service.

Image semantics is a young field with a lot of growing to do. While the exact path is uncertain, it’s certain to grow because the problem--and the value of a solution--is growing. Using new tools for visual semantics will require the collection, preservation, and accessibility of the media.

Next week, we’ll take a look at the media library ecosystem - what your tools need to accomplish and how to evaluate them.

Is Mediagraph right for your organization?

Let’s find out together.

Book Your demo today