Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


IT trends: Video Over IP

A look at the industry division over network protocols.

IT trends: Video Over IP

Nov 1, 2005 12:00 PM,
By Brent Harshbarger

A look at the industry division over network protocols.

Am I going crazy? Why does it seem like when you try to have a discussion about video over IP everyone’s eyes glaze over and the thought that immediately comes to mind is, “We have a failure to communicate.”

Photo courtesy of Tandberg

One answer is that each segment of the industry has a different concept of what video over IP means. Internet, broadcasting, conferencing, and production segments all have different requirements for transporting video over a network, and there are several technologies involved. There is the actual audio and video content, but what is the format? What compression technology is employed, and how do you transport it? Then, to make matters worse, there is the prevailing assumption that if your device connects to a Cat-5 cable, then it is 1) an Ethernet connection, and 2) all Ethernet devices will work on the Internet.

While each segment uses audio and video, the limitations of bandwidth require different codecs selected based on the bandwidth and quality required by each user. For example, consider what each group would consider the standard codec for video. If you were to ask a web-centric person to name the video codec to use, you might get the answer VC-1. Ask the same question to a videoconferencing or presentation systems person, and you would get H.263. If you asked a broadcast engineer, the answer would probably be MPEG-2. Audio codecs from the same group would include MP3,, or AC3. And you would get as many different concepts on the definition of transport and control.

Traditional Internet/web protocol stack
For a larger image, click here.


Which is the IP part of video over IP? Actually, video over IP doesn’t have as much to do with IP as it does with other protocols used in networking. But before we can discuss the similarities and differences between these protocols and technologies, a working understanding of the OSI (Open Systems Interconnection) model is required.

The OSI model is the framework on which networking technology is built, and it is broken into seven layers or levels of communication protocols. The framework starts from the electrical connection of a network and works its way up to the application level of the user. Working from the bottom to the top, the seven layers include physical, datalink, network, transport, session, presentation, and application layers. Each layer describes how to communicate to its peers and to the layers directly above and below them.

Multimedia Protocol Stack
For a larger image, click here.


The physical layer includes the physical network connection devices and protocols such as cables, plugs, switches, and electrical standards.

The datalink layer builds on the physical connection; for example, it turns a connection into Ethernet. This layer provides framing for data transport units, defines how the link is shared among multiple connected devices, and supplies addresses for the devices on each link.

The network layer connects links, unifying them into a single network. It provides addressing and routing of messages through the network. It may also provide control of congestion in the switches, prioritization of certain messages, and so on. A network layer device processes messages received from one link and dispatches them to another, using routing information exchanged with its peers at the far ends of those links.

The transport layer is the first end-to-end layer. It takes responsibility for delivery of messages from one system to another, using the services provided by the network layer. This responsibility includes providing reliability and flow control if they are needed by the session layer and not provided by the network layer.

IETF Multimedia Protocol Stack
For a larger image, click here.

The session layer manages transport connections in a fashion meaningful to the application. Examples include the Hypertext Transport Protocol (HTTP) used to retrieve web pages and the management of the control and data channels in the File Transfer Protocol (FTP).

The presentation layer describes the format of the data conveyed by the lower layers. Examples include the Hypertext Markup Language (HTML) used to describe the presentation of the web pages and more mundane issues such as the differences between text and binary transfers in FTP.

The application layer includes the applications themselves (web browsers, for example). See Chart 1 on p. 34 for the typical model of many traditional web technologies.

ITU Teleconferencing Protocol Stack
For a larger image, click here.


There are two technological camps that have done significant work using video over IP: the Internet Engineering Task Force (IETF) and the International Telecommunications Union (ITU), and understanding the two different uses of the OSI for video over IP has caused the greatest confusion among these camps. The biggest difference between the IETF and the ITU is in their original design concepts; the IETF provided a client/server model for using media, while the ITU used telecommunications standards based around ISDN.

The primary differences are the technologies and protocols in the presentation and session layers. The Multimedia Protocol Stack (Chart 2 on p. 34) shows all of the protocols normally used for audio and video on a network. First, compare the layers and protocols of the Multimedia Protocol Stack to the Traditional Internet/Web Protocol Stack on p. 34.

Comparison of sip and h.323 protocol
For a larger image, click here.

Next, compare the IETF and ITU protocol stacks. The key difference between the two technologies is that the ITU uses the H.323 Protocol and the IETF uses Session Initiation Protocol (SIP). There has been much discussion as to which is better, but we will not debate either case in this article. H.323 has made many changes over the years to better support networking technologies. For example, earlier versions only supported Transmission Control Protocol (TCP), while version 3 and later supports both TCP and User Datagram Protocol (UDP).

The IETF and ITU both use IP, and both systems can use the TCP or UDP (as noted above). TCP provides a guaranteed delivery of data, but due to the time required to provide this level of delivery, the typical choice is UDP, which provides a stream of data packets and doesn’t communicate the delivery status, thus eliminating the time issue.

Within the H.323 protocol are two protocols that perform competing functions as SIP: H.245 and H.225. These protocols determine how connections are set up, controlled, and disconnected. Again, the key difference here is the SIP works on a client-server method and the H.series protocols are more akin to connection-oriented telephony. The H.225 is an H.323 sub-layer, if you will, that connects to the LAN interface. Chart 5 on p. 34 shows this comparison.

The common element for both is the media protocol — the Real Time Protocol (RTP) — but depending on the segment of the industry you come from, the audio and video used within this protocol will differ. The standard for traditional video conferencing will be H.263, and other video-oriented segments probably use some version of MPEG.


With SIP’s roots in Internet, SIP is being adopted in more areas that would have traditionally used H.323. The demand on the marketplace to have a convergence of communication tools, whether PCI, PDA, phone, or a wireless device, and SIP is bringing the technologies together.

This trend has taken hold in the video arena as well. Content providers place great importance on the ability to transport the same content across a network and display it on the device at hand. Moving Picture Experts Group (MPEG), ITU, and others have all worked together to provide a scalable video codec. It has many names so all the groups get credit for it. You might know it as MPEG-4 part 10, AVC, or H.264. This is another example of the communication groups working together to provide a single solution.

With SIP and H.264 technologies working together, you will soon see video content scalable from HDTV down to G3 (third-generation mobile communication). Eventually this great trend in all audio and video segments will soon be speaking the same language, which leads to better-integrated systems.

Brent Harshbargerhas worked for Peavey in the development of MediaMatrix. He can be reached

Featured Articles