Blog / Foundations, Metadata

Metadata Month!

Peter Krogh

Mon Jun 29 2020

We are devoting this entire month to metadata. (Yes, we know it’s technically still June, but every day feels like Groundhog Day.) We will explore how metadata works, what you can do with it, and how to create a great metadata strategy. Let’s get going.

Metadata is a term that generally refers to data about data. This ranges from the mundane (the format of a file or where it’s located) to the sublime (what kinds of concepts a photo helps illustrate or the structure of complex narratives).

You can use metadata to describe just about any characteristic of a media item: where it was created, how much you like it, specific subject matter, usage rights, and more.

An unruly race to the future

Ideally, every piece of software would be able to see, make use of, and preserve all metadata.

Unfortunately, that’s not the case. And no, that’s not caused by a conspiracy, or even generally a turf war—it’s just an unruly race to the future. We’ll investigate the issues related to handling metadata a bit later.

A place to store knowledge

You can attach tags, such as keywords, to your images so you will know what a picture is about. Tags can therefore help you find photos inside a library. But tags do something else as well: they can tell you something about a photo and its subject matter. So tags are also a place to store knowledge about the events the media describe. A centralized media library, therefore, can be a wonderful place to store knowledge about people, places and events, while at the same time storing visual depiction of the events.

The metadata attached to your media collection is gaining increasing importance and value. The images themselves are rich vessels of data that are independently useful. The graph of an image can represent human networks as photos are shared, liked, and re-shared. And there is a gigantic wealth of information that can be inferred from dates, locations and machine learning analysis of images.

Incorporating computational tagging

The discussion about metadata used to be entirely centered around some well-established and standardized tools for classifying images, like the Dublin Core and IPTC standards. These tools continue to exist--and they are important to master as part of your collection management--but the advent of computational tagging has (and will) continue to fundamentally change the metadata landscape.

Traditional metadata schemas are slowly evolving in response to new needs. They continue to be the gold standard method to customize and lock down the important facts about your image collection.
Computational tagging in many different flavors is automating an increasing share of the metadata workload.
Connectivity between systems is becoming a very important tool for making sense of image collections. The methods for integrating your collection with other databases and knowledge bases is expanding at a rapid clip.
Black box tagging and search leverages the full power of machine learning services. Adobe, Amazon, Apple, and Google all provide black box environments that continually improve as they learn more about you and your images.

Wait, what?

Before we dive into details, let me flesh out these points a little more. This looks like a pretty big shift from the way things worked in the past. And it is (or soon will be). However like most of the changes to media workflow, it mostly builds on the past, rather than replacing it.

The most important metadata that you’re likely to have is still going to be the tagging and curation done by you, rather than made by some deep-learning robot. The photographer and/or the collection manager knows a thousand things that are going to be very difficult (or impossible) for an AI service to know. Why was the photo made? What is the backstory of the person, event, place or object shown? Why is any of this relevant to me, the collection and the collection’s stakeholders?

For the visible time horizon, it’s going to be important for professional and institutional collections (as well as many enthusiast collections) to include metadata that reflects the priorities and intent of the creators, users, and collection managers. The best way to preserve and leverage this information is to save it to traditional metadata structures.

Human-assisted tagging

Computational tagging and linked data services can make your life easier. Instead of thinking of them as a replacement for human tagging though, they should be thought of as a tool for human-assisted tagging. Some of this will be automatic, like in-camera GPS tagging. Some of it will require more intervention, like training face recognition to identify certain people.

We’re on the cusp of some very interesting capabilities for data linking. Some of it is already happening in certain places, like institutional collections where images are linked to a database of contracts that spell out usage rights. Other linking is still more experimental, like the ability to link to external knowledge bases such as Wikipedia. But even when these links are easy to make, it will still be up to the collection manager to decide which services to link to, and how to make it work.

Black boxes

The black box systems are ones that attempt to do all the tagging automatically. This is a combination of visual recognition that can create tags, as well as intelligent search agents that can guess what’s important to you. These have a good bit of road to travel before they can replace your human-managed tagging and curation, but they are improving fast.

With that context in mind, the coming posts will examine the way metadata works.