The previous article focused on multistandard VCR and TV ensembles and VCR/converters costing up to $2500. Those machines were fine for prosumers, institutions, and consumers who occasionally play a tape from Uncle Swen. This month, we'll study the big guys costing eight kilobucks or more, serving the broadcast and other discriminating high-end markets. Reviewing briefly, three things happen when you convert from one TV standard to another:

1. The number of scan lines in the picture has to change (625 to 525 or vice versa).

2. The number of pictures per second has to change (25 fps to 30 or vice versa).

3. The color encoding method has to be changed. There are presently five different systems for making color.

We'll examine how the standards converter delicately excises 100 lines of a PAL or SECAM picture without the viewer missing them. Going in the opposite direction, we'll explore how a 525 line NTSC picture has to have 100 lines added before it can become PAL or SECAM. Changing from 25 frames per second to 30 and vice versa presents similar challenges. Somehow the converter has to manufacture 5 extra frames per second for NTSC, or dispose of 5 frames when converting to PAL or SECAM. When a TV picture is standing still, no one notices the added or missing frames, but when something moves across the scene or the camera is panned, suddenly you see a jiggling (called judder), or you see smearing, a softening of the motion parts of the picture. There are lots of ways to judge a standards converter, but motion handling is the factor that separates the men from the toys. Older or cheaper converters that simply drop frames or repeat them, yield juddery pictures with the right number of frames per second. One step up from these are the converters using interfield interpolation to create new pictures by averaging adjacent ones. They may also average scan lines together to create new additional lines. Either way, averaging smears the picture. One solution, adaptive motion interpolation (also called motion adaptive ) involves tricking the eye. Your eyes aren't too sensitive to the sharpness of moving objects, so the machines fudge on these hard-to-convert parts. How elegantly they fudge determines the cost and complexity of the converter. Through adaptive motion interpolation , some models determine which parts of the picture are stationary and process those parts in the least obtrusive way, letting that part of the image pass to the viewer with high resolution. The fast moving parts of the picture, however, go through motion detectors and sometimes receive bandwidth reduction, making them fuzzy, but less jumpy-looking.

For such converters to work, they must have a mechanism (some call it an algorithm) for judging which parts of the picture are in motion and what to do about it. Motion adaptive and linear adaptive compensation techniques are older technologies, being replaced by motion vector methods. A motion adaptive system will look at 4 TV lines and if it sees no motion will give each line an equal contribution to making a new 5th line. But if motion is detected, it gives more weight to the lines nearest the one being created. The weighting factor depends on the amount of motion detected. A lot of motion forces the system to heavily weight 2 lines, a process that unfortunately decreases the vertical resolution. Linear adaptive interpolation creates a weighting factor that applies to a whole field. It yields better resolution but handles motion less well. Motion prediction and compensation systems - You'll only care about the following technobabble if you plan to buy a standards converter --- it describes the finer points of what to look for in motion compensation schemes.

Spatio-temporal gradients (also called pel-recursive gradients) are one method of tracking moving objects. Moving objects tend to have a blurred edge in their direction of motion. The faster the movement, the longer and more gradual this blurred "gradient". Because edges of objects are often poorly defined, this calculation is made over several fields to improve the estimate and accuracy. The process is relatively simple (translate: less expensive) and yields good resolution, but gets confused by some images. Tube cameras that leave lag trails, image enhancement, noise reduction, and digital data compression all baffle the box.

Cuts and complex moves such as rotations also confound the converter. Block matching is a popular method of motion detection and compensation. It involves dividing the image into tiny blocks, about 8 pixels by 8 lines each, "memorizing" the content of each block, and in the following field, trying to see where each went. The blocks that stayed the same received relatively little processing. For each block that changed, the converter would make a search in every direction from where the block was, looking for a match. Farther and farther out it would look, and when the block was "matched," the converter would calculate how far the block moved and in which direction, storing this data momentarily as a motion vector. This vector could be used to calculate where the block was likely to be in a subsequent field or in an inserted field. Then instead of averaging fields (creating mush) the converter would "paste" the moving object in a calculated place on the newly invented field.

The process is fairly inexpensive for low resolution applications, but becomes complex (expensive) for high resolution video. Because of the computing power needed to search in many directions (64 directions for an 8 x 8 block) and great distances (32 pixels sometimes) tradeoffs are necessary. The system can detect fast moving objects at the cost of losing small objects (fine details). Or it tracks small objects while losing fast moving ones. One of block matching's weird artifacts is that sometimes the edges of an object get confused with its background and the whole works gets transported. The golf ball when struck carries a vestige of grass background with it (though I've seen plenty of real grass accompany my own golf balls).

Phase correlation , a third algorithm, is highly complex, but offers excellent resolution, works with quick motion, has few artifacts, and doesn't get confused easily. The video signal undergoes a mathematical Forier Transform, whereby the waves representing the signal are compared one field to the next. The phase differences between the spectra are compared and objects in motion are represented mathematically by peaks in a three dimensional wave pattern inside the machine. The pattern would look like rolling plains dotted with step mountain peaks. It is computationally efficient for the computer to track just the mountain peaks (moving parts) and assign motion vectors to just those parts of the picture. The process is expensive, but elegant. Snell & Wilcox uses phase correlation on its $230,000 Alchemist PhC model standards converter.

Other features in standards converters - The more bits that are allotted for each pixel, the more steps there will be to its gray scale and the smoother (less posterized) the image will look. Seven bits yields 27 [2 to 7th power] = 128 shades of gray. Eight bits gives 28 [2 to 8th power]= 256 shades. The last 7 bit models disappeared a year ago, yielding to the preferred 8 bit converters producing twice the number of shades of gray in the digitized image. A few models offer 10 bits. Some standards converters track motion over 2 fields (one frame, or 1/30 second) while others track 4 and sometimes 6 fields. The greater the number of fields tracked, the more accurate the motion predictions. Machines, like weather forecasters, make mistakes. The more data they have, the fewer the errors. It takes, of course, a lot more computer memory to store 4 or 6 fields, and a lot of computer horsepower to make all the calculations between fields. Ergo, the outlandish prices of the more accurate machines. Similarly, there are "2-line," "3-line," and "4-line" converters. A 2-line converter creates a new line by comparing two adjacent lines. A 4-line model compares 4 lines, and depending on the motion in the picture, weights the average of the 4 in different ways when calculating the new line. Naturally, the 4-line method is more accurate, but employs more calculations and costs more. Upgradability is another feature found in "industrial strength" standards converters.

The newest motion detection and compensation technologies are changing so fast, they're most appropriately sold as options to the main box. Put another way, you can buy a converter with all the normal features plus a simple-minded motion handler (like motion adaptive or linear adaptive). Then you buy the latest option to supercharge the machine with motion vector processors. And when the technology advances, you cash in your children's savings bonds and upgrade the options. Film-to-NTSC-to-PAL or SECAM - Get ready for a whole new ballgame. "Natural" motion is one thing to predict, artificial "film" motion is quite another.

Converting movie film to tape in Europe is easy. Movie film runs at 24 frames per second. Television there runs at 25. By running the projectors 4% faster than normal, the film rate matches the video rate making film-to-tape transfers a snap. But something is still fishy here. Where "natural" motion would show a changing scene in each video field, film-converted-to-tape is yielding 2 identical fields of film picture 1, 2 fields of picture 2, etc. This motion isn't smooth, it's jumpy. But at least it's regular (probably eats lots of fiber). In the U.S.A., the process is way more complex. If we sped the film speed up to our 30 frames per second, the Minute Waltz would last only 48 seconds and everyone would sound like they'd been breathing helium. Our film-to-tape transfers require a more involved technique. Using a system called 3:2 pulldown , we display film picture number 1 for 3 TV fields (3/60 of a second total), then move to picture number 2. This we display for 2 TV fields (2/60 of a second). The next picture we show for 3 fields, the next 2 fields and so on. At the end of a second, after 24 movie pictures have been played, 12 have been converted into 24 TV fields (2 times 12), and the remaining 12 have been converted into 36 TV fields (3 times 12). The sum is 60 TV fields, just right for video in the U.S. (the actual number comes to 59.94 fields per second, but that's another article). If you projected a movie in a theater so that one picture stayed on the screen for 2/60 of a second and the next picture stayed on the screen for 3/60 of a second, followed by 2/60, then 3/60 etc., you would see slightly choppy motion. It wouldn't be as smooth as if the pictures were played evenly at 24 per second. Our American TV-eyes have grown used to this slight jerkiness. (Maybe it was from years of watching Jerry Lewis movies.)

Interestingly, most of us can tell immediately when a TV picture is shown live or shot directly on video tape, as opposed to shot on film. We've grown so accustomed to the jerkiness that perfectly smooth motion startles us. Before proceeding much further (and triggering letters to the editor), I should note that professional American film shot-for-video is nowadays imaged at 30 frames per second, yielding a simple conversion process with minimal jumpiness. Converting 24 fps film to 30 fps video yields jerkiness. Converting 30 fps NTSC video to 25 fps PAL/SECAM video adds more jiggle making the image too stroboscopic to enjoy. Normal standards converters equipped with motion processing get confused when the motion is irregular. Current temporal interpolation techniques operate by continually sampling four adjacent fields in weighted ratios. If the fields do not have a continuous temporal flow as in the case of 3:2 pulldown, the results are appalling. In 1987 there was an outcry in Britain when the TV series "Dallas" was aired. The program had previously been shot, edited, and distributed on 35mm film, playable everywhere and easily transferred to any video standard.

In a cost-cutting move, Lorimar, the "Dallas" producers, began shooting on film and transferring the footage to tape for editing and distribution. The 1" tape then had to be standards converted to PAL, and that's where the skit hit the fans, so to speak. "Dallas" gourmets, accustomed to a diet of sharp images and smooth motion, were appalled by the smear and judder. The solution was to employ a special standards converter that took the film jumpiness into consideration, then calculated for it. Snell & Wilcox of Hampshire, England developed the Digital Electronic Film Transfer (DEFT) system which fixes this problem with a box costing only $500,000 (Make a wish, Santa is only a month away). The DEFT system first digitally stores a number of sequential fields to analyse the motion and identify the 3:2 pulldown sequence by looking for identical fields. Because DEFT knows which fields are "true fields" (not duplicates of earlier fields), it knows which ones to motion process. The resulting image looks almost indistinguishable from the original film.

What's on the market? - CEL makes an $8000 4-field unidirectional (NTSC to PAL or vice versa) converter, the P255-10. It has 4 other versions costing $18,000 to $60,000 with a P180 option which automatically delays the audio to keep it matched to the picture. Its Tetra plus model includes vector and waveform monitoring.

I.den offers its $10,000 IP-450 TBC-and-eight-bit-converter with motion compensation that works in the component, Y/C, composite, and RGB domains. Video International makes and markets the DTC-1504, an $18,000 motion adaptive 4 field 4 line component converter with eight bits and 4:2:2 processing similar to D1. Although the processing inside the machine is digital, the 1504 lacks direct digital inputs and outputs. For that, you need to jump to the $150,000 DTC-4500. The 4500 comes complete with motion vector processing, color processor, TBC, and other features. Back to earth, Video International also makes a $6000 model, its DTC-1004. It is composite only, has no TBC and converts in one direction only (NTSC to PAL or PAL to NTSC, but not both ways in one unit). It is 4 field, 4 line motion adaptive with an 8 bit 4:2:2 processor inside. The simple box is especially appropriate for cable head end use where it sits all day doing the same job.

AVS Broadcast Inc. offers the Cyrus, a 10-bit converter with parallel or serial digital inputs and outputs, good for D1, D2, D3, and beyond. While in the digital domain you adjust color, black balance, gain, and other aspects of the signal. It costs $97,000 with simple motion adaptive circuitry, upgradable to the more complex motion vector processing. Snell & Wilcox's Alchemist and Gazelle use 10 bits and phase correlation. The Gazelle features the ability to create a steady picture from tape shuttling forward or backward at high speeds. Who needs conversion? Considering the cost of converters, it seems that a multistandard VCR or VCR/TV would serve the occasional player of foreign tapes. If, however, you're duplicating 1000 copies of something for your overseas office, you had might as well convert the tape to their standard and be done with the confusion.

If planning to buy a standards converter, Paul McGoldrick of Snell & Wilcox suggested the classic eyeball test of a standards converter: Convert video tape of jockeys bouncing up and down as they race horses past a picket fence in front of a cheering crowd. That scene should confuse most any converter.



Detecting motion- One popular way to detect motion is through block matching. It works this way: The entire TV picture is divided into samples, perhaps 8 pixels wide by 8 lines tall. On the next field, these samples are taken again. If these sample blocks didn't change from one field to the next, it is assumed that there was no motion; no further processing occurs (other than duplicating or removing fields as needed in the conversion process). Say for a moment, that one of the blocks did change (indicating motion). The computer would look into its previous field and set up a search area, perhaps 16 pixels by 16 lines in size. The computer would then compare the 8 by 8 measurement block with an 8 by 8 block in the search area. If it didn't match, the computer would move the search block one pixel to the side and compare again. If no match, it would move the block again to the side and eventually down one pixel. With each comparison, the computer stores a score indicating how many of the pixels in the measurement block exactly matched the pixels in the test block. Eventually, all the possible positions for the 8 x 8 block in the 16 x 16 search area are tried. Most likely, one of the tests will show a "perfect" match, a fingerprint of the original measurement block. Next, the computer calculates how far the block moved from its previous field. This number becomes the motion vector for that particular block.

The process is repeated for every other block where motion has been detected. Using motion vectors to interpolate motion - The motion vector is used to forecast where the block is likely to be in the following field. The process is then repeated field after field to calculate more accurately the exact motion vector. When it comes time to "invent" a new field, the computer simply copies all the stagnant blocks onto the new field. Then it uses the motion vector to determine where moving objects would be and "pastes" a duplicate of the measurement block in the calculated space.

Problems with block matching - If an object is moving quickly and the search area is only 16 pixels by 16 lines, it is possible for the object to move beyond the search area; thus a match will never be found. But if you make the search area larger, you give the computer too many places to test; the rigorous search exceeds the computer's ability. One way to make the search area larger is to create fatter pixels. This could be done by taking quads of 4 pixels and blurring them together into jumbo pixels. Thus, the measurement block which was 8 x 8 pixels can become 4 times as large using the same number of calculations. The search area, likewise, can become 4 times as large. The search area is now large enough to capture a fast moving object, but another problem occurs: When you blur 4 pixels into 1 big pixel, you loose resolution. The "fingerprint" becomes less defined. The computer can no longer recognize and track small objects. Thus you have a tradeoff: You can either track fast objects, or track small objects, but not both (unless you beef up the computing power).

The block matching algorithm has another flaw: When the computer pastes a copy of the measurement block onto the invented field, it pastes the entire block. If the moving object were small, occupying only some of the pixels inside the block, the computer would move this small object and the unnecessary surrounding pixels that completed the block. This way, small moving objects would carry along with them part of their surroundings, creating an undesirable artifact (ie. grass around the golf ball).

Nevertheless, block matching and motion vectors are a very popular algorithm for motion handling by upper mid-line professional standards converters.


AVS, Inc. 100 Stonehurst Court Northvale, NJ 07647 201-767-1200

Cel Electronics Inc. 4550 West 109 Street Ste. 140 Overland Park, KS 66211 913-345-0925

I.den Videotronics Corp. 9620 Chesapeake Drive #204 San Diego, CA 92123 619-492-9239

Snell & Wilcox Inc. 2454 Embarcadero Palo Alto, CA 94303 415-856-2930

Video International Development Inc. 65-16 Brook Avenue Deer Park, NY 11729 516-243-5414

Vistek Americas 1900 Embarcadero Road Suite 209 Palo Alto, CA 94303
 About the author  About Today's Video 4th. ed.  Return home