The Challenges Of DVD Authoring

Panos Nasiopoulos

Rabab K. Ward
Electrical Engineering, University of British Columbia, Vancouver, BC, Canada
Masato Otsuka
DAIKIN US Comtec Laboratories, Novato, CA, USA


ABSTRACT: The evolution of DVD promises telecomputers will find their way into our living rooms, linked to a large flat screen which will be able to display HDTV, Standard Definition TV, video games, interactive movies from Digital Versatile Discs (DVD), Internet, video telephony and computer graphics. For the first time, Hollywood Studios and consumer electronic companies have formed an alliance to support the DVD technology which is critical for both sides. In this paper we address the complex process of DVD authoring.

INTRODUCTION: One of the most significant technological achievements in the consumer and entertainment industries is the development of the Digital Versatile Disc (DVD). DVD is the first union of emerging technologies, bringing together computer consumer electronics and entertainment. As a result, we are witnessing the generation of an entirely new infrastructure that is reshaping the world of entertainment. DVD is a lot more than just a storage medium. It is a new multi-purpose technology that will affect both the entertainment and computer worlds. For consumers, this is the first digital medium that offers studio video and audio quality combined with unprecedented interactivity at a very low cost. For the PC multimedia side, DVD is the first video distribution medium designed for very high data rates. Its interactivity and distribution format have the potential to revolutionize the entertainment software industry. This paper describes the complex process of DVD authoring.

DVD: AN OVERVIEW Storage As a storage media, DVD can hold from 4.7 GB of digital data on one-side single-layer format to 17 GB on a double-side dual-layer format [1,2]. This increase in capacity (up to 25 times that of CDís) is achieved by introducing a shorter wavelength laser beam, dual focusing mechanism that allows the use of two layers per side, smaller pit size and tighter spirals. Furthermore, DVD discs offer ten times the speed of the CD rate, opening the way to numerous new real-time applications. While storage capacity is very important, it is DVDís other capabilities that make this technology so attractive.

HOLLYWOOD: As an entertainment product, DVD satisfies the goals established by the Hollywood Digital Video Disc Advisory Committee, delivering extraordinary picture and sound quality. DVD takes advantage of a two-pass variable bit rate MPEG-2 video encoding process to offer a superb picture quality comparable to D-1, the studio production standard. To make it more exciting, this is the first medium to introduce a number of viewing formats such as the 4:3 TV screen format, the 16:9 HDTV screen format and the 20:9 letterbox format [1,2]. Combine this picture experience with Dolbyís AC-3 5.1 channel surround sound (or MPEG-2 7.1 for Europe) and you have reproduced video and audio quality that rivals that of a theater. In addition, this technology allows the use of 8 different languages, 32 subtitles, different camera angles and video-clip paths including interviews with producers and actors. The viewer can choose the camera angles and language, switching seamlessly from one to another, scan forwards and backwards and play slow motion. Parents are given the option to lock out versions of the movie which range from the directorís cut, to R-rated, to PG-13.

HYBRID DVD-INTERNET: But it is the PC world where DVD will have its biggest impact. The first read-only DVD drives are expected to offer over 7 times the storage capacity of the current CD, and will also be able to play DVD movie titles and existing music CDs. DVD brings the added capability of supporting the implementation of interactive adjuncts to traditional PC content. Embedded navigation such as web browsing can be added to video, enhancing the userís experience with hybrid content. A DVD-based PC application that combines DVDís interactive performance and rich video and audio capabilities with the Internet would offer a wealth of opportunities. For example, it is possible to produce a DVD-based department store catalogue that offers an interactive showcase of all the departmentsí merchandise, complete with audio and video. Playing the disc automatically connects the user with the store via the Internet, allowing the consumer to get current prices, order merchandise, communicate with a personal shopper or pay bills. Such a service is not viable today because of the Internetís low bandwidth. A similar hybrid application allows DVD-based courses and encyclopedias (which devour huge amounts of space with text, pictures, video sound and animations) to remain up-to-date by cross-referencing a constantly updated web site.

DVD AUTHORING: Authoring is generally defined as the process of preparing content, encoding video and audio, and creating the final DVD image. In the case of DVD, authoring is a complex process since it involves the laying out of multiple audio tracks and a video track, generation of sub-titles, menu pages, parental lock-out features, interactive functions such as program search, time search, seamless play, and pause, and finally editing of video and audio. Since authoring is always performed along with encoding and disc formatting, it is, in many cases, referred to as the entire DVD pre-mastering process.

PREPARATION OF MATERIALS: The first step in authoring is the collection of materials. These materials include video, audio, still images, and sub-pictures. DVDís video source format is the CCIR-601 studio format compressed to MPEG-2 format. The frame rate is 29.97 f/s for NTSC sources (North America) and 25 f/s for PAL/SECAM sources (Europe). The maximum allowable bit rate is 9.8 Mbps. Audio includes the surround track and up to 8 different language tracks for each title. All language tracks must be compared for level, mix, and equalization so that seamless switching between languages can be achieved. Still images are used to provide break points in the title, so that search functions and other interactive functions can be implemented. The preparation of still images includes identification of the breakpoints in the video and definition of the time duration for each image. Sub-pictures are bitmaps that are overlaid on top of the video. They include menus, sub-titles, graphics, and simple animation. Once created, their start and stop time must be defined in order to be synchronized with the associated video and audio elements. Up to a maximum of 32 sub-picture bit-streams are allowed in a title.

TECHNIQUES AND PARAMETERS: Good understanding of how various elements will be used in constructing the title is the key to intelligent parameter determination which includes tradeoffs between picture quality, length of program, number and quality of audio channels, number of subtitles, and level of interactivity. The following is a list of some of the basic parameters needed to be determined for a DVD title [3]:

  • the number of audio channels
  • the number of language versions
  • the number of sub-picture elements
  • the number of breakpoints in the video
  • the number and the levels of rated versions of the title
  • the number of still images used at each breakpoint
  • the type of parental lock outs
  • the type of directors cuts
  • the audio encoding techniques
  • the format used for still images

A single-layer single-sided DVD disc can store 2 hours and 13 minutes of video compressed at a nominal average bit rate of 3.5 Mbps combined with 3 languages encoded using AC-3 5.1 channels and 4 additional languages encoded as sub-titles [4]. The maximum program rate (i.e., video + audio + sub-pictures) is specified to be 10.08 Mbps. Given the disc capacity, the overall quality depends on determining the different trade-offs between several parameters. For example, Table 1 shows the average storage requirements for a DVD title with the following parameters:

  • Audio tracks encoded using Dolby AC-3 5.1
  • 4 unique languages supported
  • 4 sub-picture streams supported
  • "G" rated version has a total run length of 100 minutes
  • "PG" rated version has an additional 4 minute run length
  • 2 previews; each has a run length of 3 minutes
  • 4 trailers; each has a run length of 2.5 minutes

Note that 4% of the total disc capacity is always reserved for backup of the program control data and for additional information that is added after editing. The total run length is 120 minutes, resulting in the average bit rate of 3.43 Mbps.

Video Encoding

DVD takes advantage of the MPEG-2 compression technology to achieve picture quality comparable to that of D-1, the CCIR-601 TV studio production standard. MPEG-2 is a flexible and scaleable compression scheme which can produce bit rates that range from 1 to 40 Mbps. As implemented for DVD, MPEG-2 encoding is a two-pass process. During the first pass, the encoder scans the video source, detects scene changes and determines the optimal bit rates for each frame. During the second pass, higher bit rates are assigned to complex frames and sequences with more activity and lower bit rates to "simple" frames. The two-pass process guarantees the best possible picture quality for the given video clip and disc storage capacity. For video material originated from film, inverse telecine may be used to improve the compression performance. The reason is that film uses 24 f/s, a rate that is converted to the 30 f/s required by the NTSC standard. This conversion process is known as telecine and involves duplication of frames at regular intervals. Inverse telecine removes the duplicated frames, thus allowing more bandwidth to be allocated to the video.

Audio Encoding

Movies released in North America and Japan can carry Dolbyís AC-3 stereo or 5.1 audio which offers 5 surround channels plus a low frequency (sub-woofer) channel. For movies released in Europe, AC-3 is replaced by MPEG-2 stereo or 7.1 surround sound. In addition, as an option to AC-3 and MPEG-2 audio, DVD enables producers to choose uncompressed 16-bit linear PCM stereo sound with Dolby Pro Logic encoding. Table 2 shows audio encoding options as well as the specified sampling frequency rates, bit and transfer rates and number of channels supported by each option.

Sub-Picture Encoding

Sub-pictures are run-length compressed bit-maps using 2 bits/pixel and 4 colors out of a 16 color palette. The sub-picture size is 62KB per GOP/cell with 32 KB allocated for control data. Applications may vary from simple text (sub-titles) to menus to still images used for presentation effects. Pixels are categorized as foreground, background, emphasis-1 and emphasis-2. The still picture format must be a standard image format such as TIFF, GIF, or BMP. MPEG is used to encode still images which are then incorporated into the video stream.

Putting it Together

After preparing the different "segments" of a DVD title, a multiplexing process should link everything together and define the program flow of the DVD title. This final step should specify how each of the media elements will be presented to the user and how the user can interact with the program. Program flow specifications are translated to navigation commands that are, in turn, incorporated into program cells and program chains. A cell consists of a navigation command and all the video and audio data associated with a GOP. The navigation command (button) defines the playback behavior of the corresponding cell and it consists of one or at most a combination of three of the following instructions [4]:

  • GoTo -- branch between commands
  • Link -- transfer between the same domain
  • Jump -- transfer between each domain
  • Compare -- recognition of parameter value
  • SetSystem -- player system setting
  • Set -- calculate GPRM values

A sequence of cells and cell commands (navigation commands) form a program (PG). A program usually corresponds to one scene. Programs and video objects (nominally a GOP) form a program chain (PGC). A program chain is separated into the control information (PGCI) and the video object (VOB). PGCI acts as an address table pointing to cells, thus defining the playback order of Programs. The Part of Title (PTT) helps to construct multiple versions of the same title. A DVD title can have only one or multiple program chains. Interactive functions such as PTT searches, directorís cuts, and parental lock-outs can be achieved by creating the title as a multi-PGC_title, with different directorís cuts and different rated versions on different program chains.

Simulation and Verification

After all the media elements and control information are multiplexed into one stream, simulation testing is to be performed. The stream must guarantee that audio, video, and sub-pictures are synchronized; otherwise, the content must be re-edited or re-encoded. Besides synchronization, interactive functions may also be simulated and verified.

  • [1] DVD Format, TOSHIBA, DVD Forum April 1996.
  • [2] DVD Presentation Data Specifications, VICTOR Company of Japan Ltd., DVD Forum, April 1996.
  • [3] C. Fogg, DVD Technical Notes, July 1996.
  • [4] Interactive Functions, HITACHI Ltd., DVD Forum, April 1996.

Table 1. Storage Requirements for Each Media Element in Average-Bit-Rate Calculation Example

Media Element Total Length Average Bit Rate Total Storage Requirements
4 Language Tracks 120 minutes 0.384 Mbps per language 4*120*60*0.384Mbps/8 = 1382 MB
4 Sub-picture streams 120 minutes 0.01 Mbps per language 4*120*60*0.01Mbps/8 = 36 MB
Reserved
4% of 4.7 Gbytes 188 MB


SUBTOTAL 1606 MB
Video 120 minutes 3094/(120*60)*8= 3.43 Mbps 3094 MB

Table 2. Audio Data Specifications


Linear PCM Dolby AC-3 MPEG Audio
Sampling Frequency 48K, 96K 48K 48K
Number of Bits 16/20/24 bits compressed compressed
Transfer Rate max. 6.144 Mpbs max. 448 kbps max. 640 kbps
Number of Channels max. 8 max. 5.1 max. 7.1