Thursday, March 18, 2010

Video vs. Raw : Log vs. Lin.

As our world keeps replacing reality with 1's and 0's it's a good idea to get everyone on the same page when we discuss certain words and/or processes. I've found through experience that just because we have gotten accustomed to using terms like RAW and S-Log does not mean we actually understand the choice we're about to make. Here is a conversational rather than technical explanation to clear the air.

When it comes to digital acquisition we have three specific choices that we can make on-set which will directly determine the quality of our image and our ability to make changes to it in post-production. Do we record Raw, Log, or straight HD Video?

First we must decide in what form we want to capture our image. We have two choices here: Raw or Video. Raw is a digital camera sensor's unprocessed linear luminance values. Because this is just data a raw file will not include any white balance adjustment, color and saturation adjustments, gamma correction, or debayering (if it is a CMOS sensor). Simply stated: a Raw file is NOT an image. It is just the luminance values from every pixel on the sensor. It is data: 1's and 0's. A digital Raw file is analogous to 35mm negative. Though you can interpret an image by looking at a film negative you must color time, print from the neg to a release stock, and finally project the print to view your image as intended. It is the same with digital Raw files. You can interpret an image from the coded luminance values in a Raw file but you won’t actually see your image until it has been processed through either a camera’s firmware (like the Sony F35 which is a Video camera, not a Raw camera) or post-production software (like Speedgrade DI or RedAlert).

Once a Raw file is processed to create an image it ceases to be Raw and instead becomes Video. Raw is data, Video is the image. RED R3D files are your Raw files, but once you open a R3D in RedAlert it becomes a HD Video clip. You will never be able to actually view your Raw file as an image. If you did, it would look something like this:


The reason that Raw is preferable as a capture choice over HD Video is that any processing manipulation that is done to your image during post will be done directly from the source data (each pixel’s luminance values). During the Digital Intermediate process any change to the Raw file simply transcodes the original values into the newly “timed” values while also adding other mathematical transfer functions to the data set such as gamma curves and gain reduction to achieve your overall “look”. This is often referred to as Rendering. If you decide to change your “look” all of your changes will be rendered from the original Raw file and not your previous rendered “look” thereby maintaining the true values found in the Raw file. This is why color-timing Raw files is considered “non-destructive” to your original captured image.

Here's our Raw file after color timing:


Color-timing from a HD Video source is destructive. Any changes made to your image are “burnt in” to your new “look”. Once these changes are made they cannot be unmade. There is no reset button. This is why some cameras are capable of recording in a log video mode. By recording in log the camera attempts to stretch and squeeze the image to capture as closely as possible each pixel’s luminance values within the context of a HD Video signal. But more on that in just a bit.

A great way to think of Raw vs. Video is to imagine constructing a house. A Raw file is your blueprint. Though the overall shape and size of your new home is set you still retain the ability to make a wide range of choices. From adding closets to raising the ceiling height any changes can be redrafted and visualized before construction begins. With Video you assume the role of the foreman. Not only has the final blueprint been drafted, but the foundation has been laid and the house has been framed. Though you still have the ability to make changes, every window you add or room you extend requires physically cutting into your home's frame and either throwing away or adding material. Any change, big or small, will fundamentally weaken your home's foundation. Too many radical revisions can bring the whole structure right down into a pile of junk. Then again, a bad artistic eye can be just as fatal...

Okay, let's take a breath. Much better. Now for something completely different: what is the difference between a linear (though used commonly, the proper technical term would be "gamma-corrected") and log video signal? First off, it would help to briefly explain how a digital sensor "sees" light vs. how our eye "sees" light. The distinction is between the physical process the camera sensor is using to interpret light in a scene and the fundamentally different process that human vision uses. All CCD and CMOS sensors "see" luminance in linear form while human vision has a logarithmic response.

So what exactly is lin and log?

In digital photography we are fundamentally concerned with brightness (luminance) in a scene that needs to be converted into a coded value (dependent on bit-depth) of video signal strength (sometimes represented in milivolts: mV) in order to reproduce an image. To make it simple we can say that a digital camera will assign a number to a specific amount of brightness in a scene and that number will be output as voltage. On-set we can view the intensity of this voltage by running our video signal through a waveform monitor and noting its IRE value. A digital camera's ability to interpret variations in light intensity within a scene is directly related to its bit-depth. The bigger the bit-depth the more luminance values a camera can discern. An 8-bit camera can discern 256 intensity values per pixel per color channel (RGB). A 10-bit camera can discern 1024 values. A 12-bit: 4096. And a 14-bit sensor: 16,384. It's easy to see why bit-depth has a huge role in a camera's dynamic range.

A digital camera encodes these luminance values linearly. That is, for every discreet step of difference in luma, the camera will output an equal step of difference in voltage or video signal.

The human eye is sensitive to relative, not discreet, steps of difference in luma. For example, during a full moon your eye will have no problem in discerning your immediate surroundings. If you were to light a bonfire the relative illumination coming from the flames would certainly overpower the moon light. Inversely, if you were to light that same bonfire during high noon you would be hard pressed to notice any discernible increase in illumination. This is why we use f-stops (the doubling or halving of light) to interpret changes in exposure.

What we can learn from the difference between linear and logarithmic responses to luminance is that a linear approach will be able to discern more discreet values in the highlights of an image while a logarithmic approach can discern more subtleties in the shadows. This is because a digital camera only has a finite number of bits in which to store a scene's dynamic range and most of those bits are used up to capture the brightest portions of the scene. Art Adams over at ProVideo Coalition did the math for us with a 14-bit sensor that will distribute 16,384 discreet luminance values over 3 channels (RGB):




As Art points out: "The first four stops of dynamic range get an enormous number of storage bits--and that’s just about when we hit middle gray. As we get toward the bottom of the dynamic range there are fewer steps to record each change in brightness, and we’re also a lot closer to the noise floor." Well said.

Because our eyes respond to relative changes in luminance we don't perceive much difference in highlights. If the sun beaming down on Death Valley is hot and bright at 11:00am it's not going to appear hotter and brighter at noon. It's just still gonna be sweltering. But in low-luminance environments our eyes will give greater weight to slight changes in brightness. Turn the lights off at home at night and your eyes will soon adjust to begin making out your surroundings. Ambient streetlight, moonlight, and the soft green glow coming from your charging iphone will all contribute significantly to your visual perception.

Now let's take a look at a scene captured with a digital sensor. The linear pixel values when processed into a video stream but without any gamma correction applied will look something like this:


Notice the extreme contrast and lack of shadow detail. Since most of the sensor's bits are used to capture highlight detail the mid-tones and shadows appear almost black when rendered linearly (or literally). This is because there simply isn't the same fine incremental differences of data between the shadow areas as there is in the highlights. Like a 50-cent Ace Comb with fine teeth on one end and large on the other.

This brings to mind a scene from one of the great comedy masterpieces...



Alright, in order to view our scene as intended the linear light values need to be gamma corrected. That is, the sensor's linear output needs to have a logarithmic transfer function applied in order to stretch out the shadows and mid-tones while pushing back all the highlights. We're basically taking a straight line and making it into a curve by adding relative brightness to the areas of the image that need it most to appear correct to our eye. A gamma curve of .45 has become the standard as it closely approximates our eye's own logarithmic response.

So here's our image with gamma correction applied:


Much better. This, of course, brings us to our next question: what is log in a HD video camera and why should we care?

The S-log record mode in a Sony F35 is simply an unique gamma correction. The difference is that log mode is gamma correction for post-production purposes while every other gamma correction is intended to deliver an image with the proper contrast for viewing. The F35 allows you to select a standard gamma correction (rec709), a number of hypergammas (variations on rec709 that will lift the mids or pull back the highlights), S-log for post-production, or you can create your own gamma curve.

The reason that recording in log mode is so beneficial during post production is that instead of providing an image with proper contrast for immediate viewing, log seeks to preserve as much of the fine differences in an image's dynamic range for later manipulation in post. Those differences are shades of gray. This is why a log HD video signal always seems to appear washed out and flat. Once you take all those shades of grey and crush them in post to make deep blacks or burn them out to create specular highlights you inherently loose your image's original dynamic range, but you gain a gorgeous, contrasty, and sharp image.

Here's 10 stops from a log image:


And here's the remaining 5 stops from the final color time:


Let's finish this off with the pros and con's of our three choices. Raw gives us the best image quality and dynamic range because our image is built directly from each pixel's discreet luminance values. Because a Raw file has not been processed or compressed (unless you're a RED ONE owner), shooting Raw produces the largest amount of data-sets and will require the largest storage capacity for all of the files generated. Raw data also usually requires intensive post-production manipulation to arrive at your final image. This process can be expensive and time-consuming.

HD Video can give us a fantastic image with the understanding that post-production manipulation is limited. It is best understood that what is seen on-set during the shooting day is what will be shown to an audience later. An experienced D.I.T. and colorist can work with the DP to provide the final visual look of the project during recording. This can mean very little to no post-production color-timing and can easily shave thousands off a budget. Of course, any radical deviation from the set look during post can severely degrade your image. A Video stream can also be easily compressed to tape (HDCAM SR) or through a video codec to cut down a project's archival requirements.

HD Video captured in a log record mode is sort of the best of both worlds. Though you are recording a video image that can still be tweaked and tuned by the DP and D.I.T. the image recorded makes the best use of the sensor's dynamic range. Your footage also will not need to be rendered as it is already a usable image. However, your images will need to be manipulated in post-production to arrive at your project's final look. This can easily add expense and time to a project's turn-around.

Most HD cameras on the market today have the ability to record in either Raw or Video modes but not both. There is really only one that has a proven track record of being able to record Raw, log, and straight gamma-corrected HD video. And that's the Arri D-21.

I did a commercial recently for Texas Energy. The production company was small and wanted to ingest the footage immediately after wrap, cut on Final Cut Pro, and deliver the spot within a few days. Using the D-21 in a 4:2:2 HD video mode recording to HDCAM SR was the obvious choice. This allowed for a great image on-set that could be approved on the spot, an in-house edit and color-time, and a super-fast turn-around.

"Lie to Me", the television series on Fox, uses D-21's in log mode to get the most out of their images while still keeping their post-production budget and episodic turn-around time in check.

And if you're David Fincher, I'm sure you'll insist on using the D-21 to record straight Raw files to a S-2 Digimag or Codex Recorder. Because Justin Timberlake and Tobey Maguire wouldn't have it any other way...

Hey, did I mention I'm looking forward to Arri's new line of digital cameras?