In a blog post on its Google AI Blog, Google Software Engineer Bartlomiej Wronski and Computational Imaging Lead Scientist Peyman Milanfar have laid out how they created the new Super Res Zoom technology inside the Pixel 3 and Pixel 3 XL.
Over the past year or so, several smartphone manufacturers have added multiple cameras to their phones with 2x or even 3x optical zoom lenses. Google, however, has taken a different path, deciding instead to stick with a single main camera in its new Pixel 3 models and implementing a new feature it is calling Super Res Zoom.
Unlike conventional digital zoom, Super Res Zoom technology isn’t simply upscaling a crop from a single image. Instead, the technology merges many slightly offset frames to create a higher resolution image. Google claims the end results are roughly on par with 2x optical zoom lenses on other smartphones.
The Google engineers are using the photographer’s hand motion – and the resulting movement between individual frames of a burst – to their advantage. After optical stabilization removes macro movements (5-20 pixels), the remaining high frequency movement due to hand tremor naturally shifts the image on the sensor by just a few pixels. Since any shift is unlikely to be exactly (a multiple of) a single pixel, scene detail can be localized with sub-pixel precision, provided you interpolate between pixels when synthesizing the super resolution image.
When the device is mounted on a tripod or otherwise stabilized natural hand motion is simulated by slightly moving the camera’s OIS module between shots.
The pictures from a burst – of up to 15 frames on the Pixel 3 – are aligned on a base grid of higher resolution than that of each individual frame. First a reference frame is chosen, and then all other frames are aligned relative to it with sub-pixel precision. This leads to increased detail – albeit ultimately limited by the lens’ resolving power – and cleaner images, since frame averaging reduces noise. When there are objects that have moved relative to the reference frame, the software only merges information from other frames if it has confidently found the correct corresponding feature, thus avoiding ghosting.
|Left: crop of a 7x zoomed image on Pixel 2 (digital zoom). Right: crop of Super Res Zoom image on Pixel 3. Note not only the increase in detail, but the decrease in noise due to frame averaging and not having to demosaic.|
As a bonus there’s no more need to demosaic, resulting in even more image detail and less noise. With enough frames in a burst any scene element will have fallen on a red, green, and blue pixel on the image sensor. After alignment R, G, and B information is then available for any scene element, removing the need for demosaicing.
Furthermore, Google’s merge algorithm takes into account edges in the image and adapts accordingly, merging pixels along the direction of edges as opposed to across them. This approach provides a suitable trade-off between increased resolution and noise suppression, and avoids the artifacts less sophisticated techniques introduce (see dots and increased perception of noise in the ‘Dynamic Pixel Shift’ crop on right, here).
One might initially think ‘wouldn’t it be easier to just put an optical 2x zoom in the phone’, but perhaps that isn’t the question to ask. Super resolution can increase the resolution of even the standard camera without zoom*, and all zooms in between a wide and a tele module. And any technique that makes a single camera better will make multiple camera approaches that much better. Imagine a smartphone with 3 or 4 lens modules that allows you to smoothly zoom between all focal lengths, with super resolution ensuring that focal lengths in between those of each lens module remain detailed.
*For now, the Pixel 3 does not use super resolution for standard 1x shots. Super Res only kicks in at 1.2x and above, currently for performance reasons.