Delivering new methods and codecs for working with large-scale data, particularly VR and HDR, is high priority for the media and entertainment industry. As MPEG begins work on a successor to HEVC, we take a look at the hyper-efficient compression technologies being developed for streaming immersive media.
The sheer volume and complexity of video coming down the track, not least with the imminent opportunity of super-speed 5G mobile networks, makes efficient data processing essential if live-streamed VR and other ultra-resolution low latency media applications are to fly.
Arguably, standards bodies like MPEG have never been busier. The 30-year old institution has drafted and released an average of six standards a year since launch and it only succeeds for the industry if it stays way ahead of the game.
Ericsson Media Solutions’ Principle Technologist Tony Jones, says: “Compression efficiency is one of the primary tools for providing new or better services, minimising the distribution costs, or a combination of the two.”
That’s why work developing a means of handling large-scale data is so urgent. Chief among these is a successor to the current video streaming standard HEVC. The Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group 16’s VCEG, has started work on Versatile Video Coding (VVC) which is promised, like MPEG 2, MPEG4 and HEVC before it, to be 50% more efficient than its predecessor.
Spokesperson for MPEG Christian Timmerer says: “The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard and to be completed in 2020.”
Timmerer, who is Associate Professor at Austria’s Klagenfurt University and Head of Research at codec vendor Bitmovin, adds: “The main target applications and services include — but are not limited to — 360-degree and high-dynamic-range (HDR) videos.”
According to MPEG, initial proposals for VVC have demonstrated “particular effectiveness” on ultra-high definition (UHD) video test material. It predicts compression efficiency gains “well-beyond the targeted 50% for the final standard”.
VVC would therefore join an increasingly crowded market for OTT streaming, which includes the current most frequently used codecs AVC, VP9 and HEVC, and the newcomer AV1.
Bitmovin has just published comparison tests of these codecs which suggest that AV1 (like VP9, but unlike AVC and HEVC, is royalty free), is able to outperform HEVC by up to 40%.
However, the company is of the opinion that multiple codec standards can exist side by side. Indeed, the company has stated this is “mostly necessary”, in order to stream to a wide range of devices and platforms, adding that “the support of multiple video codecs is confirmed with the appearance of VVC.”
An important aspect of VVC is for encoding to be more focused on specific regions of a 360-degree frame where most of the relevant image activity is happening and which the majority of users will watch.
Timmerer says: “VVC is still in its infancy but we might see companies making announcements in this direction at IBC.”
Enter JPEG XS
Whereas MPEG is typically utilised for storage, delivery, and consumption by end users, the work of JPEG has historically centred on still images, but it has just delivered a new codec for video production and streaming.
JPEG XS is open-source and goes against the grain of historic codec development by having a compression ratio of 6:1 ratio, which is actually lower than the standard JPEG (10:1).
École Polytechnique Fédérale De Lausanne (EPFL) Professor Touradj Ebrahimi says: “For the first time in the history of image coding, we are compressing less in order to better preserve quality, and we are making the process faster while using less energy.”
Ebrahimi, who led JPEG XS development at EPFL, adds: “We want to be smarter in how we do things. The idea is to use less resources and use them more wisely. This is a real paradigm shift.”
JPEG XS is an evolution of the TICO codec (SMPTE RDD 35), itself based on JPEG2000 and now widely accepted for transporting video over IP workflows using SMPTE 2110.
IntoPix, the Belgium firm behind TICO, also helped design JPEG XS.
IntoPix Director of Marketing & Sales Jean-Baptiste Lorent feels it will be most useful for workflows “wherever uncompressed video is currently used”.
“A new codec is necessary to handle ever increasing data volumes due to increasing resolutions, higher frame rates, 360-degree capture and higher quality pixels,” adds Lorent.
JPEG XS is intended to address uses where low complexity and low latency are necessary, but reasonably high bandwidths can be used, for example, UHD at around 2 Gbit/s vs uncompressed at 12 Gbit/s.
Tony Jones says: “JPEG XS is an intra-coding technique. That is, no temporal prediction is performed. This results in much lower bit rate efficiency than compression standards such as AVC and HEVC, but in turn offers extremely low latency.
“There are a wide range of potential professional applications, including studio use, remote production and other instances where latency is critical, but where high bandwidth connections are still available,” adds Jones.
It is likely to be suited to 4K and 8K, in particular for production and editing (both live and file based), though its profile includes handling 10K.
“Light compression, such as JPEG XS, is a realistic technique to keep bandwidths, file sizes and file transfer times under control for high-quality assets, where the quality needs to be virtually indistinguishable from the uncompressed quality,” says Jones. “JPEG XS is also useful for keeping the latency well below one video frame.”
Jean-Baptiste Lorent is of the opinion that such a low latency, low compression and high efficiency codec is ideal for streaming video via Wi-Fi and 5G and will later assist the operation of drones and self-driving cars – technologies where long latency represents a danger for humans.
According to Fraunhofer IIS – developer of a JPEG XS software plugin for Adobe Premiere Pro CC – the codec is optimised for the use with mezzanine (very light) compression when high image quality data has to be transferred via limited bandwidth or has to be processed with limited computing resources.
Under standardisation by ISO, JPEG XS will likely be ratified by the end of 2018 with the first products, including cameras, due shortly after.
Omnidirectional VR to the home
MPEG is also addressing delivery into the home of immersive media, for example 360 video and VR.
In both cases, according to Ericsson’s Tony Jones, there is an extremely stringent motion-to-photon requirement – the responsiveness of the display to any change in head position must be extremely low latency.
Jones says: “For 360 video, the rendering is performed locally from either the entire 360 image or a suitably sized portion of it, whereas for true VR, the scene itself must be created based on those head movements. If the scene creation can be performed locally, such as in a games console, then the requirements are not too challenging. If, on the other hand, the rendering is performed remotely and needs to be delivered without an excessive bit rate demand, then there are significant challenges to achieve that at the same time as meeting the motion-to-photon requirements.”
A broad initiative that may help is MPEG-I. It’s at various stages of development; while the first part of the scheme, which defines systems, audio and video parameters, is due for publication soon, other parts are largely outline.
VVC is part of MPEG-I, as is a related Immersive Audio Coding scheme, though this is still at the architecture level. However, the most intriguing phase of MPEG-I is Omnidirectional Media Format (OMAF). The first version targets 360-degree video compression in HEVC and is complete.
Timmerer says: “OMAF enables many optimisations but it may take some time until widely adopted, if at all, as it basically has a major impact on encoding, streaming, decoding, and rendering.”
A second version (OMAFv2), to be drafted by October, will target 3DoF+, an advance which includes ‘motion parallax’ to allow a viewer to also ‘watch behind objects’. To put it another way, OMAF is addressing potential holographic displays.
Later versions of OMAF will also address ‘omnidirectional 6 Degrees of Freedom (6Dof) for social VR’ and even the ‘dense representation of light fields’. Timmerer describes social VR as cases which “enable VR content to be consumed in a social environment, either within the same geographic context”, for example in the same room, or “with different geographic context” – different rooms and countries.
Other aspects of MPEG-I examine point cloud compression. This form of depth information can be used to produce three dimensional or holographic scenes.
“This is in its hot phase of core experiments for various coding tools,” says Timmerer. The results are set for incorporation into a working draft.
According to Timmerer, there is no relation between VVC and OMAF although that might change in the future (perhaps 2020).
“I expect OMAFv2 will be completed earlier than VVC and therefore OMAFv2 will still rely on HEVC,” he says. “This is my current estimation.”
Publication of OMAF version 1 is in the hands of ISO, but the final draft international standard can be used now. “Basic use cases could be deployed already,” says Timmerer. “I’m pretty sure there will be some demos at IBC. It’s a bit tricky though. Devices are not yet [aware of] OMAF.”
Compression for holograms
There’s yet another layer, a scheme that specifically addresses compression of massive data recorded as a light field. While part of MPEG-I there also seems some divergence on the approach.
Streaming a ‘true native’ light field would require broadband speeds of 500Gbps up to 1TBps. That’s according to estimates by Jon Karafin, CEO at holographic display developer Light Field Lab.
However, Karafin adds: “That’s never going to get into homes in our lifetime.”
Being able to work with so much data, let alone transmit it, requires serious compression. A group at MPEG is drafting a means of enabling the “interchange of content for authoring and rendering rich immersive experiences”.
It goes under the snappily titled Hybrid Natural/Synthetic Scene data container (HNSS).
According to MPEG, HNSS should provide a means to support “scenes that obey the natural flows of light, energy propagation and physical kinematic operations”.
Timmerer says the group is working on scene descriptions in MPEG-I, “which will study existing formats and tools and whether they can be used within MPEG-I.”
In fact, the activity is being led under the MPEG banner by CableLabs – a think tank funded by the cable industry – with input from OTOY and Light Field Lab among others.
The approach differs from conventional video compression techniques by looking to create 3D models of a scene by trapping texture, geometry and other volumetric data then wrapping it in a ‘media container’.
Not everyone is convinced that a media container is the right one.
MIT holographic expert V. Michael Bove says: “There isn’t a universally agreed on best practice yet. I expect that will be taken care of. It’s not an insoluble problem.”
Karafin points out that the concept is already familiar to the entertainment industry. The DCP (Digital Cinema Package) is commonly used to store and convey digital files for cinema audio, image, and data streams.