Blog / Formats, Foundations

Format Structures

Peter Krogh

Wed Jun 17 2020

In today’s post, we dive into the structure of image formats. We look at the purposes, specifications and standardization.

Formats are a way to stack up the bits in a file. These bits represent the components of a digital object. The format specifies a method to save all this data so it can be understood by various software programs. There are two traditional functions that the format can specify, and a third one I’m introducing here.

Container formats - All the formats we will be discussing are, to some degree, container formats. This means that the format itself specifies a way to save multiple components into a single file. The exact way to encode the media object inside the container may be very tightly controlled or it may be only loosely specified.
Encoding formats - Some media formats will also outline very specific ways to encode media. The JPEG format, for instance, provides a container for a few different components, but it primarily describes the options you must use to encode the media. To illustrate this (and make things a little more complicated), JPEG encoding can also be used by other formats.
Workflow formats - I’m coining this term to describe formats that go beyond what we typically understand as container or encoding formats. These formats are half media object and half project file. They can contain multiple images and other media. And they can contain program-specific instructions like a project file. They can also contain final output of the optimized image(s). In most cases, these additional capabilities are more specifically laid out than what you get with a general container format.

A bit of each

Most formats for visual media specify both container and encoding methods, to varying degrees.

Formats without container features cannot contain the components that are essential to modern media objects, except for very lightweight uses. For instance, GIF is used as a totally stripped down format for moving images. But it cannot contain metadata, so all provenance, rights and other other information cannot ride along with the file.
Formats without any published encoding guidelines are typically too fragile and application-specific to be used outside of a single program.

As we examine specific formats, we will be able to see the balance between these two types of structure.

Format specification

Many formats will have some kind of written specification. This allows everyone to understand how the format is structured. By following this specification, files can be built that are compliant and therefore behave predictably. The specification tells software developers where and how the various components of a file are structured.

There is no universal way to write these specifications. Most specifications will have some type of narrative that helps you understand the purpose of the format and how to make use of it. Some specifications are extremely detailed, with many dozens of individual components that are all precisely spelled out. Some formats are quite loose, outlining the components but not providing a high level of detail.

Some formats do not have a published specification. Many camera raw formats, for instance, are not publicly documented. In order to work with these formats, software developers have to figure out the precise details of how each format is structured. Software developers will also use SDKs and other libraries to work with these files. As discussed below, undocumented formats present an increased danger of becoming unreadable in the future compared to ones that are well documented.

Links to some format specifications

If you’d like to geek out and actually read some format specifications, I’ve linked to some of them below. While they can be hard to understand, especially at first, some parts will be clear even to people without computer science degrees. (You might start with HEIF, since it’s less intimidating and more purpose-driven.)

The DNG specification is quite comprehensive and forward-looking. It is currently in its fifth version.
HEIF is much less mature, and the specification reflects that. But if you read the first part of the Technical Information page, you can see the objectives that the format is designed to address. It’s also useful to look at the examples on the github pages.
The TIFF specification is a comprehensive and well-written document - for 1992, the last time it was officially updated.
Video - If you really want to make your brain hurt, take a look at the MPEG-4 video spec. You can get an understanding of why video is so much harder than still images to standardize.

Components: Specified or freelance?

As we look at file formats, we will see there are a couple of ways to accommodate the need for complex image components. The TIFF format, for instance, has an extremely flexible structure allowing for the inclusion of nearly any kind of image component a software developer can dream up.

In later posts, we will see a different approach when we discuss DNG. Instead of an “anything goes” approach, DNG sets out some very specific requirements for image components and how to structure them within the file. (It also allows for flexibility, but lays out where and how to accomplish that). The more formalized approach was made necessary by the unruly proliferation of raw file variants and the prospect of many of them becoming unreadable.

I’m convinced that the more formalized approach to file formats as shown in DNG is the right way to go, at least for raw files. With more than 600 variants (and counting) available already, it’s essential to bring some order to the process.

In the next post, we will look at SDKs and libraries. These tools allow developers to properly access features supported by a format without having to write new code.