
Some years ago, I got into a “disagreement” with a media archivist about whether a system for capturing video from tape needed genlock. The archivist had worked at a TV station and insisted that video signals had “timing” which meant that everything must be locked to a common sync source. This was partly correct, within a particular context, but reflected a basic misunderstanding about how video sync actually works (and why it’s rarely needed in a capture or editing system).
What is sync, and why does it matter? First off, consider analog audio—a continuous signal of varying amplitude and frequency. Analog audio does not have, and does not need, any kind of sync. Yes, there is tempo and meter in music, but that’s the content of the signal. The sound waves created by instruments or voices are continuous analog phenomena, as are the electrical signals generated by microphones or electronic instruments.
Video, on the other hand, is not a continuous function. What we perceive as images and motion are actually rows of pixels, and complete frames, repeated at high speed. Some kind of synchronization is necessary to define where these discrete elements begin and end, and to make multiple video sources intermix cleanly.
Analog video carries its own sync information in the horizontal and vertical blanking intervals in between active picture (Fig. 1). These specifically defined pulses tell cameras and video monitors where picture lines and frames occur, and also provide a reference for color information, among other things. Measuring and adjusting sync and blanking was once a common activity for TV engineers.
A digital video signal consists of a stream of digital data, but it is still necessary to define the start and end of picture lines and frames. In SDI, for example, this is done with unique data words such as TRS (timing reference signal), SAV (start of active video), and EAV (end of active video).
More fundamentally, a device receiving digital video must be able to understand the spacing of the voltage transitions that represent bits, and where the data words start and end. This requires a clock signal to establish a reference. Most digital video bitstreams are self-clocking, meaning that the bitstream is constructed to contain clock information that can be retrieved by the receiver, which then synchronizes its own clock to the incoming signals. Digital audio signals also need clock information to accurately recover the bitstream. Most digital audio bitstreams, such as AES3, are self-clocking.
One of the key problems in moving digital signals over wires or air is distortion of the bitstream. Loss of voltage level, rounded high-frequency edges, and smearing of the bit transitions (jitter) can make the signal difficult to capture and decode. Hence, the great attention paid to cable types, distance, connector termination, etc. when building networks or AV systems.
Processing devices that reclock the signal, like a reclocking distribution amplifier, will push out a fresh bitstream that is a “clean” copy of what is received. This is useful to refresh the signal when transporting over long distances. Non-reclocking devices may amplify the signal, but do not create a clean copy.
So we are dealing with two distinct types of sync information: Video sync that defines picture lines and frames (whether analog or digital), and clock signals for digital bitstreams.
Reference & Genlock
In television jargon, reference is a signal, usually from a sync generator, that establishes when in time frames of video occur, within a particular system or facility. Generator lock, or genlock, simply means that a piece of equipment receives a reference signal and makes its internal clock run in sync with that reference.
Going back to analog production, in order to cut and dissolve cleanly between multiple cameras and other sources on a video switcher, those sources must be timed to the switcher inputs. This means they are genlocked to the same reference as the switcher and their output signals reach the switcher at the same time–as measured by the location of the sync pulses in the signal.
Digital video production switchers generally allow a greater range of input timing, but non-synchronous (unlocked) sources can still produce glitches, such as the picture offset vertically in the frame. Some readers may not even know that this can happen because many switchers now include internal frame synchronizers (on each input or assignable) that compensate for non-sync inputs. This is quite handy, but the price paid is a frame of video delay for any source using a frame sync. That’s one reason it is still useful to genlock sources when possible (Fig. 2).
A video device that is not genlocked is said to be free-running, or on internal sync, which is fine in most situations. For example, a camera going to a file recorder or streaming encoder does not need genlock, and that is true of most AV systems. Those that might include systems for live production, theatrical, and theme park attractions, where synchronized playback or switching is involved.
Note that cutting between non-synchronous signals, as on a video routing switcher, may cause a momentary glitch or pause if the destination device needs to re-lock itself. In many situations that is acceptable, even expected. However, if cuts between sources must be clean, the sources and the routing switcher should be genlocked to a common sync source so that the router knows where to make the switch cleanly.
Another common use for genlock is to synchronize cameras with video displays that those cameras are aimed at. This was often necessary in the days of CRT televisions to prevent the appearance of dark bars rolling vertically through the frame. Then it almost disappeared due to the advent of non-CRT displays and more sophisticated imager scanning.
Now it’s back in use because of direct-view LEDs, in particular on “virtual volume” sets for cinematic production. DVLEDs not only have line and frame scanning, but the LED brightness is modulated by switching on and off at very high speed. The possible need to genlock cameras and displays is an extra complication, but some advanced virtual production systems take advantage of this by synchronizing cameras and displays into “time slots” so that different cameras can “see” different onscreen images simultaneously.
It’s important to note that genlock is not needed for recording. In order to accurately capture a video signal, the recording device must lock to the incoming signal (whether SDI, HDMI, or analog). If the recorder does have a reference input, it’s there to genlock the output when playing back into a downstream device like a switcher. The reference does not, and should not, affect the recording.
But wait a minute, if you record in-camera on a memory card, and also genlock the camera, you’re genlocking a recording device! Well, kind of. What you’re really doing is locking the camera’s internal sync generator so that when the sensor makes video frames, they happen to occur in sync with the genlock source, as will the camera’s live output. Those frames are then stored on the memory card. What is being genlocked is the source of video frames, irrespective of the in-cam recording.
Moving back to audio, synchronization is an issue when bringing digital audio sources into a mixer for the same reasons as with video. The mixer needs to know where the data bits are in time, and they all need to line up. For a single AES source going to a mixer, telling the mixer that the incoming source is the clock master is usually sufficient.
Using multiple AES sources means feeding wordclock to all the devices so that, again, they are locked to a common generator (or the output from one particular device). In some cases, different equipment may use different sync types (blackburst, tri-level sync, wordclock), and that is usually okay as long as all sync sources come from the same generator or there is a “chain” of genlock between devices.
Analog Sync Signals
The actual sources and types of signals used for clocking and genlocking equipment come in various flavors. One way to define them is by whether they are analog or digital. Another way is by the precision or granularity of their clock components.
In the video world, analog blackburst has been, and arguably still is, the most universal video sync signal in use. The name derives from the components of analog video composite sync, which includes pulses for horizontal and vertical synchronization of the picture, a “packet” of 3.58MHz sine wave known as colorburst, for color reference, and video at black level. Figure 1 shows the horizontal blanking portion between video lines. There are also pulses in vertical blanking between frames. Together, these provide the necessary information to lock other devices and appear on screen as black (for purists, we’re not getting into NTSC video “setup” level here).
Just about any device I’ve encountered that accepts a genlock input on BNC will lock to blackburst. The circuits that generate and receive blackburst were long ago built into chips and chip sets that are readily usable, so it’s quite ubiquitous. For that matter, since blackburst is just a composite video signal, with picture content at video black level, most devices will also lock to any stable composite video. The picture content is stripped out, so only the sync pulses and burst are used.
But the sync pulses used in analog standard definition were not sufficient for analog HD, which required tri-level sync. TLS does away with the colorburst sample, which served no purpose in HD, and adds more sync pulses at different voltage levels to increase precision. For the short time that HD video was analog, TLS served the same purpose as composite sync in analog SD, by indicating the lines and frames being scanned.
HD-SDI (and everything since) uses the same data word framework as SD-SDI, so analog sync is replaced by SAV, EAV, etc. in the bitstream. But trilevel sync can still be used to genlock devices and is sometimes considered preferable to blackburst because of its finer precision. Early in the transition to HD some equipment could only accept TLS for genlock, and much ado was made about sync generators that could produce TLS. I even recall that some early HD equipment would not work at all unless it received a TLS signal. Over time, most gear became able to accept TLS or blackburst, and sometimes SDI video, as a reference source. In practice, I have not encountered any problems using analog black in HD systems.
Remember that blackburst and TLS are used to genlock digital equipment, but are themselves analog signals. They will not pass through distribution amps, routing switchers, or other equipment designed for SDI video. Wordclock is also an analog signal, basically a square wave. It is handled like video, using coaxial cable and BNC connectors. There are some variations as well, such as AES3 signals with no audio payload, only clock (aka AES11 or DARS).
In the audio production world, a lot of attention is paid to digital audio clocking, with master sync generators, high-resolution wordclocks, and other considerations to reduce the potential for jitter in clock and digital audio signals. This is of particular concern with audio sample rates like 96 and 192KHz. I don’t have a strong opinion on this subject because I’ve never tested how audio actually sounds using different resolution wordclocks. Not a topic to get into here.
As an aside, there is “black” in digital video, but that has nothing to do with sync or genlock. It’s the darkest on-screen image content. For production systems, I usually install a sync generator that produces both SDI colorbars and SDI black, and include them as sources on the SDI router, which is quite useful.
It’s also worth mentioning at this point that SMPTE timecode was never intended as a synchronizing signal. Timecode was developed to numerically keep track of video frames and should be locked to video when used this way. In fact, in the analog days, it was possible to inadvertently produce a videotape on which the timecode did not align with video frame boundaries, causing editing confusion.
Nor was timecode intended as a speed-resolving method, though it can be used this way for synchronizing multiple audio tape machines, or syncing audio tape to video, since analog audio (as mentioned at the top of this article) has no inherent “sync” content.
Timecode generated within a camera or other video source will be locked to the video. In a multi-cam recording situation, it’s generally acceptable to jam-sync the timecode of all cameras from one source at the start and let them free-run. That way, their internal recordings will be correct, and it’s easy enough to adjust in editing for any minor time slippage between cameras. Another commonly used option are “lock” boxes that generate both timecode and sync for each camera and can be networked together to keep the timecode of all cameras matched.
Digital Sync
AV operating on networks is an entirely different animal than either baseband analog or digital video and audio. In this environment, the signal “on the wire” has no resemblance to the actual payload, which has been abstracted into data words using the IP networking stack and protocols. Not only that, Ethernet, which is what we’re mostly talking about, was not designed to carry time-dependent data, so the delivery of data packets (to use the term generically) is not predictable or deterministic.
But those packets still need to be sent and reassembled into audio and video, with the picture and sound in their original relationships of waveforms, pixels, lines, frames, etc. Enter Precision Time Protocol, also known as IEEE1588, which can synchronize device clocks with accuracy down to nanoseconds for networks handling real-time data, including some types of AV.
A key point here is the “real-time” aspect, in which streams of audio and/or video need to stay in sync, such as for multichannel audio, audio/ video lip-sync, or combining in a video switcher. In addition, AV-over-IP standards such as SMPTE 2110 are designed for uncompressed payloads, which means higher data rates, which means greater need for precision clocking.
Without getting deeply into how PTP works, there are two key concepts. First, network switches provide the clock fabric and can have one (or more) of several roles. In a network with one switch, the switch is the Leader clock, and devices that receive clock messages are Ordinary Follower clocks. In larger networks, there can be Grandmaster, Boundary, Transparent, and Ordinary (leader or follower) clocks arranged in a hierarchy. How clocks are designated depends on the network architecture and algorithms within PTP.
Secondly, because every transmission through a network has some latency, if a follower clock simply locked itself to a leader clock, it would immediately be behind. PTP is designed so that follower clocks can calculate the delay and offset necessary to keep themselves in sync with the leader in their network segment by exchanging specific timestamp messages on a regular basis.
PTP has gone through several revisions and is part of AV-over-IP protocols such as Dante, AES67, Ravenna, AVB, SMPTE 2110, and others. Generally speaking, networks carrying these signals should have switches with PTP implemented in hardware, which is also advisable for computer NICs and other devices. Here is one of many helpful general descriptions of PTP online from 2021, but the fundamentals remain relevant:
https://www.rs-online.com/designspark/ an-introduction-to-ieee-1588-precision-timeprotocol
Needless to say, not every “over-IP” application uses or needs PTP. Simply sending a video source to some monitors over a network can be done using a variety of protocols without precision sync. In this case, absolute timing is not critical; nobody will care about the few milliseconds it takes to buffer the data and produce video on the screen. If audio is embedded in the video source, it should stay in correct sync with picture.
In more extensive systems, or for live production, a protocol like NDI manages to reassemble and process data with a relatively small margin of timing difference without PTP. If sources are being combined in a switcher or software production platform (Tricaster, VMix, etc.) they will be synchronized at that point–with one or more frames of latency inherent in the process.
Lastly, a word about PTP terminology. For obvious reasons of cultural sensitivity the terms master and slave are being replaced over time by leader/follower or other options in various contexts. I understand that the IEEE recommends timeTransmitter and timeReceiver in the case of PTP, but I preferred leader/follower for this article.