Blog / Foundations, Metadata

Where does metadata live?

Peter Krogh

Wed Jul 01 2020

The metadata for a media file can live in one of several places. This raises the question of where metadata should live. We’ll move onto that a little later.

Embedded in the file

Most media file types can store metadata directly in the file itself. This is typically done in the file header, which is a block of storage that is separate from the actual media payload. Storing metadata in the file header allows programs to easily read, change or make use of the metadata without decoding the entire file. This makes the metadata both accessible and portable.

Embedding metadata in a file is one of the easiest ways to make the tags discoverable as a file moves from place to place and between applications. There’s no way to permanently lock embedded metadata in most file types, so it’s possible to strip it from the file as the file is passed around.

In a sidecar file

Metadata can also live in a text file that is stored alongside the media file. This may be saved as one sidecar per media file, or it could be saved as one sidecar file per folder of files. These sidecar files are typically used when the media file is a proprietary file type that is not fully documented. In these cases, it’s possible that writing metadata into the file header may inadvertently corrupt the image. The safest approach, therefore, is to write the metadata to a separate file.

Sidecar files may also be used when the application’s designers want to support a strict read-only workflow for source files. This may be the case even when it is safe to write back to files.

Sidecar text files made by Adobe software (and compatible software) will have the extension XMP. Despite the custom file extension, these are just regular text files.

In a library database

Metadata can live in a library database. Most library software will read and store embedded metadata when the file is first indexed by the software. New tags created in the library will live in the database, e.g., a keyword added in Lightroom. It may be possible to write some or all of the metadata from the database back to the file or to the sidecar file.

Keeping the “master” copy of your metadata in a database has several advantages. Centralization makes it easier to search and organize with metadata, and to ensure consistent use of tags. Centralizing all your metadata also makes it easier to create a comprehensive backup of the tagging and curation you do to your media files. It’s much easier to backup a catalog than to backup many thousands of individual files.

Note that library software will probably encounter some embedded metadata that it cannot understand. If the library database is not designed to work with a specific namespace, it will usually be ignored. Given the large and expanding number of schemas, this should be expected for at least some metadata.

In the cloud

A cloud service should behave in much the same way as a library database. Metadata that is understood should be parsed into the cloud database so that it can be used. Metadata that is not understood should be ignored but not destroyed. In most cases, this is accomplished by storing the tags in a BLOB.

In a BLOB

Yet another example of computer nerd humor, a Binary Large Object (BLOB) is a big chunk of digital data that is stored in a database without being parsed into fields. Typically, BLOBs are used to store information that the database software does not understand, such as tags from unsupported schemas. Storing the information as a BLOB preserves the information and allows it to be passed on to derivative files.

Saving unsupported metadata as a BLOB can allow the cloud service or database to offer some level of support for unknown schemas and namespaces. The data sitting in a BLOB may be accessible through integration with outside services. So my cloud service may not understand all the GPS metadata, but if all GPS tags are sitting in a BLOB, another service may be able to search and make use of the unsupported GPS tags. Because BLOBs contain unparsed data, it can be pretty slow to access. Still, slow probably beats nonexistent.

In a project

If you use creative project software like InDesign or Premiere Pro, there’s probably some valuable metadata stored inside the project file itself. At minimum, the project file contains information about which source files are used and how a media file is used within the project, e.g., what page does the image appear? It may be possible to extract this information from the project file, or the metadata may only be useful inside the project software itself.

Linked databases

The information about your media may also live in a linked database. For instance, you may be able to link the GPS tags in an image to more information about the location where a photo was taken. This should be an area of pretty rapid growth as more databases and services link together through APIs.