Phil Hippensteel on Latency

Publish date:
Social count:

Today, we want to focus on the meaning and importance of the term, latency.  Latency is also sometimes called delay.  Most experts define latency in video to be the difference in time between when the camera captures a scene and when it is displayed.  Seeing that the focus in our series is on IP, we will consider latency at the camera, in the network and at the play out device. 

At the Camera End

When the camera captures a scene, it will be operating with a frame rate, generally 25 or30 frames per second.  So, immediately, a delay of either 33 or 40 ms. is introduced.  Sometimes the image needs to be processed to achieve the correct resolution or format.  This is also a source of latency.  If the stream is to be transported and simultaneously recorded, it may need to be copied.

If the camera is preparing the video for transport across the IP network, the next step will often be to compress the video.  While modern compression algorithms are extremely efficient, they still take time.  The higher the resolution, the more aggressive the compression may need to be, adding further latency.  Finally, in each step in which the frame is processed, it will be buffered in memory.  FIFO (first in, first out) buffers will require time for the data to be written and will consume more time when it is read.

On the other hand, the camera may only be responsible for capture of the video.  In this case a separate encoder will be used.  The encoder will take care of image processing and compression.  The encoder will also contain the aforementioned buffers that add some latency.

Regardless of who captures and compresses the video, it must now be placed in a format that can be transported in IP packets.  In the industry, the most common format is called the mpeg-2 transport stream.  While it was created with the writing of the MPEG-2 standard, it can carry uncompressed video video compressed in any format by any codec.  It is a packet structure, not a codec.  We’ll cover its structure in much more detail in a future newsletter.  But for our present purposes, we’ll state that the compressed stream is separated into blocks of 184 bytes.  Each block has a four byte header added to it.  Since the combination is 188 bytes, seven of these will fit nicely into an IP packet for transport across the network.  This process is often called packetization.

Two other factors can add delay before the IP packets are ready to be sent.  Without going into detail, all frames are not always compressed in the same manner.  Some frames are compressed so that they can be independently uncompressed, just like a jpeg picture.  They are called I-frames.  Others may be compressed by comparing them to the I-frames and using the difference recorded by the camera.  As a result the encoder may need to store each I frame until the derived frames are determined.  That’s more delay.

In the Network

In an IP network, the latency is defined as the time from when the packet enters the network until it arrives at the destination device. At each switch, router and firewall, IP packets will be buffered as part of the relaying operation of that device.  In particular, routers and firewalls must completely store a packet and read it in back from memory in order to send it.  This is because it must modify parts of the IP header, which are the hop count and the checksum.  In addition, buffers are used to smooth the flow of traffic in a network that is often bursty.  Recent research has revealed that we may have made these buffers far larger than they needed.  The result is a slowing of some forms of traffic that use TCP (Transmission Control Protocol).  Adaptive bit rate video uses TCP.   TCP requires acknowledgement of packets sent.  So, the delay that is critical here is the RTT (round-trip-time), the time from when the packet is sent until it is acknowledged.  When the RTT increases, TCP throughput decreases.

Over the last decade, the most widely used forms of IP video use UDP (User Datagram Protocol).   It doesn’t require acknowledgment that the data was delivered.  When it’s used, the critical delay factor tends to be the variability in delay, called jitter.  A jitter buffer will be placed in the receiving decoder to absorb this erratic delivery rate.  An increase in network latency will not decrease UDP throughput.  But, the jitter buffer can empty or overfill, causing dropped video data.

At the Play Out Device

The device responsible to receive the IP packets and recover the video will need to remove the IP and UDP headers. It must receive the transport stream until it has one or more video frames, and then pass the signal to the screen.  The device may be a decoder, a set top box, computer or smart TV or phone.  However, each of the steps mentioned at the encoding end, has a corresponding process at the receive end with the associated latency.