I worry a fair bit about bad ideas and misconceptions that get passed on, in full or in part, and then become “common knowledge.” Eventually it’s easier (and less risky) to repeat what we think must be right, even if we don’t truly understand why. But ideas that are taken out of context, misunderstood, or repeated incorrectly, may be all or partially wrong.
We deal with technologies that have origins in different industries and disciplines, have evolved for different reasons, and are ultimately constrained by the laws of physics. Statements about how something is “supposed to be” need context and confirmation. Here are a few examples.
Reference, Genlock and Timing
Before file-based recording and non-linear editing, when video was captured and edited on linear tape, the way to edit a show was to synchronize the source tape machines with frame accuracy. Effects like wipes and dissolves were done in real-time through a video production switcher.
Similarly, live production, like a news or sports broadcast, required syncing cameras and tape machines, and sending signals through a switcher. If the production involved taking feeds from remote locations those had to be synchronized as well. Live production has changed in many ways, but syncing sources is often still necessary.
In television jargon, reference is a signal, usually from a sync generator, that establishes when in time frames of video occur, within a particular system or facility. A signal like wordclock does the same for digital audio. Generator lock, or genlock, simply means that a piece of equipment receives a reference signal and makes its internal clocks run in sync with that reference. Timing, in this context, means that the outputs of genlocked devices produce video frames in sync with each other, and are matched to a certain tolerance (typically within microseconds for video) when they reach a switcher. Fig. 1
In the days of analog production, reference, genlock, and timing were constantly discussed, and video engineers became accustomed to providing reference to everything in a facility as a matter of “best practice” because it often mattered. This has left a residue of sorts within the industry, where those concepts are sometimes discussed without any context for their actual usage. But if you examine what’s going on in a given system, reference may be doing nothing.
For starters, genlock is not used by recording devices. That is, for the purpose of accurately capturing a video signal, the recording device must lock to the incoming signal (whether SDI, HDMI or analog). If the recorder does have a reference input, it is there to genlock the output when playing back into a downstream device like a switcher. The reference does not, and should not, affect the recording.
Secondly, a system with only one video source, or where frame-accurate switching is not needed, won’t benefit from genlock. (1) In that same context, there is no “timing” unless multiple devices are being fed to a production switcher or similar device. Yes, there is plenty of timing information within video signals, but that’s inherent in the nature of video. It is not the same as needing to time-align sources into a production switcher. In fact, even that practice became mostly unnecessary long ago because digital production switchers can accept sources that are not perfectly timed. So “video timing” is just not a thing anymore.
Furthermore, it is now common to incorporate video frame synchronizers within all kinds of equipment, including production switchers. In that case it may not even be necessary to genlock cameras and other sources to get clean switching. I imagine that many users of popular small production switchers don’t even realize that genlock was once a necessity.
For the record, when I design systems for live production, I still look for cameras with a reference input, and use it when possible, even if the switcher has frame syncs. To some extent this is a bit of “insurance” so I don’t have to worry if the system changes and genlock might matter. Plus, using the framesyncs in a switcher to lock asynchronous sources automatically adds a frame of video delay, which can lead to lip-sync errors (this may happen anyway, depending on the switcher design).
Interestingly, the introduction of virtual sets using direct-view LED video walls has made genlock important in a different way. It may be critical to lock the cameras and the wall processor together to avoid visual artifacts and ensure that scene elements appear correctly. In this scenario the reference is synchronizing the scanning of the displays and the camera sensors (a practice that went away with the retirement of CRT monitors).
Lastly on this topic, new schemes for reference and timing, like Precision Time Protocol (PTP), have been developed for use with packet-based transport such as SMPTE ST2110. In many cases these are not optional because they are directly concerned with reconstructing video from data packets, which is an entirely different purpose than described above.
(1) Side note: In some cases old equipment, such as a timebase corrector, may benefit from receiving good reference because its internal sync generator is no longer accurate, even it is being used in a single-source function (such as videotape capture).
Latency and Delays
Latency is a basic fact of the world. It takes some amount of time to do any physical, electronic, or software process. Manufacturer claims of “zero latency” are just marketing mythology. But some latency might truly be insignificant for the particular application. This can be important for events that take place in real time, like live broadcasts or meetings with remote participants. And figuring out how much latency is acceptable means accounting for all the places that delays will occur.
I learned this the hard way some years ago when trying to send real-time video over the internet. The internet part was on a private network between private data centers, so the network latency was insignificant in video terms. What proved problematic was the delay for encoding and decoding the video at both ends! Fortunately processing power, and compression, have improved significantly since that experiment.
In this article I’m more concerned with what does not cause latency–though people may think it does. With audio, delays above about five milliseconds are long enough to cause localization errors, comb-filtering, and eventually discrete echoes. But real-time audio processing equipment does not produce delays that long (unless it’s intentional). Put a mic through preamps, compressors, mixers, and whatever you want, there is no effective latency. And certainly not from wire and cable. OMG, I have heard people suggest this!
Video frequencies are orders of magnitude higher than audio, so cable actually does produce latency. This was an issue in facilities that were concerned with source timing (see above) but not much anymore. For purposes today, video latency usually comes in frames, which are approximately 1/30 of a second (33mS) for 30fps video, or 1/60 of a second (17ms) for 60fps video.
Once again, small converters and processors that use discrete electronic components, or ASIC chips, do not contribute any significant delay. This includes devices like distribution amplifiers (splitters), A/D and D/A or up/ down converters, audio embedders and disembedders, fiber converters, extenders, and other utility devices, as well as most kinds of (matrix) routing switchers.
What does cause real video delay are devices with frame buffers, like capture cards, and those with a lot of complex processing. Some software-based production switchers (Vmix, Wirecast, etc.) may introduce as much as 12 frames of delay between input and final output. That’s just the nature of the beast.
An important point about latency, of course, is that something can only be “late” in relation to something else. That 12-frame delay won’t matter to a stream viewer at home, since there is nothing to compare with (and the stream is likely 20-30 seconds behind real time anyway). But that much delay could be a problem for other uses, such as feeding the program back to a remote guest. And if audio is monitored ahead of the production switcher, the apparent lip-sync will be way off.
Video displays themselves may have one or more frames of inherent delay due to processing, which isn’t usually a problem. But when audio is separate, as in production and AV systems, a frame of delay in the display, plus a couple frames from a multiviewer, can become noticeable. It’s often important to have a monitoring point in the system where audio and video are known to be aligned, so that real lipsync problems in the program can be detected. This might require a monitor intended for broadcast and production, which are designed to minimize internal delay. Fig. 2
Grounding
Over my 40 years in this business I have observed, read about, and tried various schemes that were “best practices” or automatically assumed when designing the electrical and electronic parts of systems. Some people sink ground rods into the soil outside the building. We may lay big copper bus bars under the floor and put them in racks. We may tie racks together with star washers, or isolate equipment from racks with insulators. We might use “isolated ground” power receptacles with separate ground wires. We might lift the shields on audio cables at the output end. Or the input end. Or add a capacitor somewhere. We might even build a symmetrical power distribution system, with 60VAC on either side of ground.
This is a worst-case Technology Mythology scenario: Some good information getting mixed with bad information, passed along by various sources, possibly corrupted in the process, eventually adopted as gospel by some practitioners, rejected by others, and rarely fact-checked against the reality of physics. The result can be a lot of expense and trouble to follow procedures that may or may not do any good.
One thing that needs no argument is safety. Electrical grounds (and what electricians call bonding) are an important part of power system design to prevent hazards from lightning strikes, circuit overloads, wiring errors, and malfunctioning equipment. This is what the National Electrical Code is about, and the grounding schemes we use in AV systems must not subvert that safety. The NEC has adapted over the years to accommodate special scenarios in AV and broadcast (though a given electrician may need convincing).
Fortunately, digital systems are less susceptible to the kind of noise problems that were common with analog signals. There is still noise induced in wires, and errant voltages on ground conductors, but these don’t cause much trouble unless they make the digital data unrecoverable at the receiving device. In general , as long as the voltage transitions are clear enough to retrieve 1s and 0s, the signal will work.
Most “grounding” schemes arose in the world of analog signals, and where analog audio is in use there’s still the potential for ground-related noise, much of which can be mitigated by the use of balanced (differential) connections, as found on pro equipment. Here I dutifully refer everyone to the excellent papers and presentations by industry hero Bill Whitlock.
The bottom line on this topic is the need to ask why. Someone wants to put in isolated-ground power for the AV system… okay, why? What is the supposed value, and can it be demonstrated? We should lift all the audio grounds at one end… why? If the rationale cannot be clearly explained, it’s suspicious.
The same goes for the other topics presented here. We must genlock every device in the system! Really? Why? What does that achieve in this situation? All these HDMI converters are delaying the video too much! Have you read the specs? Is that actually possible? The cure for Technology Mythology is skepticism and education.