News | December 8, 2000

Audio to video delay solutions within the broadcast system

When there are no visual clues in the picture to determine synchronization, such as a voice-over, it is difficult for an operator to determine a "lip-sync" error.

By Tom Tucker, Product Marketing Manager, Tektronix, Inc.

Since the dawn of time, our brain has been accustomed to seeing an event happen before we hear the sound of the event. This is because sound travels at 1,100 ft/sec and light at almost 984 million ft/sec. When a condition occurs where the sound is heard ahead of seeing the accompanying visual (such as hearing audio before seeing the video), the viewer becomes disturbed because the situation does not obey the laws of physics and is unnatural.

Audio-to-video (A/V) delay is introduced within the broadcast network whenever audio and video signals are processed separately. The processing of high bandwidth, digital video signals can take several fields to produce an output signal, whereas audio has a significantly lower bandwidth than video and takes less processing time to produce an output. It is necessary to take the processing time of both audio and video signals into account within the design of the A/V infrastructure system and insert fixed delays in the audio path to remove the audio ahead of video condition.

The increasing complexity of routing, distribution, and digital signal processing of multiple channels of video and audio signals has caused increasing problems in maintaining audio-to-video synchronization within the system. Small unnoticeable audio-to-video delay errors within parts of the system accumulate throughout the entire system to eventually produce a noticeable error at the end of the distribution channel. It is therefore important to monitor and measure audio-to-video synchronization at several points within the system.

Typically a skilled operator manually watching and listening to the program material has to determine the presence of an error. In order to do this, the operator must look for visual clues within the picture to determine if the sound being listened to corresponds to the picture and that they are synchronous. The most obvious method is to listen to a person speaking, watching the lip movements to verify audio-to-video "lip-sync." When there are no visual clues in the picture to determine synchronization, such as a voice-over, it is difficult for an operator to determine whether there is, or is not, a "lip-sync" error.

Tektronix has developed a digital watermarking technology that allows an audio signature to be embedded invisibly within the picture as a synchronizing reference between audio and video timing. This permanent signature is carried by the video throughout the distribution of the signal and is relatively immune to compression, archiving, cropping, ADC and DAC conversions, and other manipulations. At a certain point in the network where audio and video are combined, the embedded audio signature can be extracted from the video and compared to a new audio signature produced from the audio signal at this part of the system. The two audio signatures are compared and a calculation made on the amount of audio-to-video delay. The measurement of audio-to-video delay can then be used by an internal audio delay to correct for audio advance conditions present within the signal.


Figure 1. Audio-video delay correction system

Figure 1 explains the process of digital watermarking. The Tektronix AVDC100 Audio-to-Video Delay Corrector accepts a 525 (NTSC video lines standard) or 625 (PAL video lines standard) Serial Digital Interface (SDI) signal at its input which can either have embedded audio or external AES/EBU digital audio present.

An audio signature, similar to the envelope file generated by a WAV file program, characterizes the audio input signal. This envelope is digitized and added to the video signal by means of the Tektronix digital watermark technology that hides the data within the picture. The watermark can be thought of as a pseudo-random pattern that is added to a full frame picture. The pattern is modulated into the picture dependent on the scene to make the pattern invisible to the viewer.

The synchronizing reference is now embedded within the video signal and processed with the video signal as it is distributed through the network, subjecting it to the same delays as the video on which it is carried. At some point in the network, the separately processed audio and video signals will be combined. At that point, another AVDC100 is used to decode and measure any audio-to-video synchronization error by extracting the audio signature from the SDI video signal and comparing it to a newly generated audio signature at the measurement point. A correlation process takes the two audio signatures and calculates the measurement of A/V delay. This value is then displayed on the device, which allows the user to automatically correct for audio advance conditions by using the internal audio delay. The final result is a properly re-timed audio and video output.

The AVDC100 can also be used to insert a content ID within the watermark process as part of the data carried within the picture. The user can program a specific set of characters to be used for identification of the program material. At various points in the network, a watermark decoder can extract this information and display it on the AVDC100 as a scrolling message.

The audio-to-video delay correction system is a point-to-multipoint system. Because of this, it is important to perform the watermarking process at certain originating points within the network where no audio-to-video delay exists to ensure correct timing throughout the rest of the system. The originating material should be watermarked to provide the desired reference of audio-to-video; otherwise there is the potential for error within the system by embedding a watermark with a significant audio-to-video delay error already present. However, this point-to-point system can correct any further error that accumulates throughout the system.

How to use the AVDC100


Figure 2. Basic systems diagram of Program origination

The originating video production takes place at either a remote location or within a television studio. It is at this point that audio and video are properly timed and the video should be watermarked. An AVDC100 is set to Watermark Encode Mode to add the audio signature data to the video signal. There are three watermark levels within the AVDC100. These levels are various intensities of the watermark pattern and are used when it is desirable to boost the watermark level in the presence of high video compression or noise reduction which tends to remove watermark energy.

Figure 2 shows a typical transmission of video production from a remote location using a 270Mb fiber optic link to the television network. The transmission link is a high quality system and does not degrade the signal. The Watermark Level 1 should be used for contribution quality video in the origination phase of the program.

The processing of the audio and video signals within the transmission channel could have introduced delay within the system. By setting an AVDC100 to Watermark Decode Mode at a point within the television network where the audio and video signal are combined, the propagation delay between audio and video through the system can be measured and corrected.

In certain systems, the audio signal could be embedded in the SDI signal within the transmission path. Normally, audio embedding equipment is not a source of audio-to-video delay. However, in many cases it is still desirable to monitor and correct for audio-to-video delay that could have occurred within the program distribution chain caused by equipment that processes video with embedded audio or after repeated audio de-embedding and re-embedding cycles. In this case, the user should choose to watermark the video before the initial audio embedding process as the AVDC100 is a point-to-point system and makes measurements between these points. If the watermark encoding happens after initial audio-to-video delay, errors occur, either by the embedding process or other types of equipment. Any delay introduced by the embedding equipment will not be included in the measurement since the audio signature reference was placed in the video after this processing occurred.

The user could choose to watermark encode the SDI video signal using embedded audio as the reference. The AVDC100 supports embedded audio processing. The user can select any pair (1+2 to 15+16) of the 16 audio channels with which to watermark the SDI video signal. Note that when using embedded audio, the AVDC100 watermark decoders located within the television network need to select the appropriate embedded audio signals in order to correlate the same selected pair of audio channels used by the watermark encoder. Therefore, the television network needs to standardize on the two channels it will use for the audio signature. If the signal will pass through several different broadcaster networks, the user could use the watermark ID data to carry information regarding the embedded audio channels being used for watermarking. The AVDC100 can be used to decode the watermark with the appropriate embedded audio channel contained within the SDI signal. The audio-to-video delay measurement can then be made by selecting the appropriate embedded audio channels to use for comparison of the audio signature. The two selected audio channels will be re-timed to the video signal and output as an AES signal. Audio-to-video delay correction does not occur for all the SDI embedded audio channels. If the user wishes to correct all the embedded audio channels, an external audio de-embedder and delay unit is required. However, the AVDC100 can be used to drive the delay of the external devices providing a solution to controlling audio delay on multi-channel audio sources.

Author's bio
Tom Tucker is a product-marketing manager for Tektronix, Inc.'s Video Business Unit in Beaverton, OR. Tucker studied electronics and engineering at Portland Community College and has been instrumental in the development and marketing of automatic video measurement products for most of his 22 years at Tektronix. Tucker has also been responsible for business development in the Pacific Rim and Latin regions and has authored several articles in major trade journals.