OVERVIEW OF
VIDEO ON DEMAND
SYSTEMS
SCOPE
INTRODUCTION
THE INITIATIVE FOR WORLDWIDE MULTIMEDIA TELECONFERENCING AND VIDEO SERVER STANDARDS
NEW BUSINESS IMPERATIVES
STARTING WITH STANDARDS
TWO STANDARDS, ONE GOAL
STANDARDS FIRST
SUMMARY
CONTENT PREPARATION:
REQUIREMENTS:
CODECs/Compression
Object Oriented Database Management Systems
Encoding Verification
SUMMARY
VIDEO SERVER
REQUIREMENTS
LIMITATIONS
PRODUCTS
DISTRIBUTION NETWORK:
LAN TYPES
PROTOCOLS
WAN TYPES
CLIENT INTERFACES
RETRIEVAL INTERFACE
VIEWER REQUIREMENTS
PRODUCTS
HARDWARE MINIMUMS
SUMMARY
DEFINITIONS
A
C
D
E
F
G
H
I
J
L
M
N
O
P
S
T
BIBLIOGRAPHY:
MULTIMEDIA:
WEB Sites:
Hard Copy References:
WANS/GOV:
WEB Sites:
Hard Copy References:
ODBMS:
WEB Sites:
Hard Copy References:
MPEG:
WEB Sites:
Hard Copy References:
LANS:
WEB Sites:
TOPICS FOR FUTURE MEETINGS:
THE ATM ADAPTION LAYER
ATM STANDARDS
ISDN-B
BROADBAND WAN IMPLEMENTATION
VIDEO CONFERENCING
ODBMS
VIDEO ENCODING/DECODING STANDARDS
SCOPE
Video on demand has evolved as a major implementation problem for network integrators.
Clients want the ability to retrieve and view stored video files asynchronously at near
broadcast quality, on a local host. Some problems integrators face to achieve this goal
include: video content preparation, server storage, network throughput, latency, client
interfaces, quality of service, and cost. This paper addresses the design considerations
for a private video on demand implementation.
INTRODUCTION
The Initiative for Worldwide Multimedia Teleconferencing and Video Server Standards
The market for multipoint multimedia teleconferencing and video server equipment is
poised for explosive growth. The technology for this necessary and much-anticipated
business tool has been in development for years. By the turn of the century,
teleconferences that include any combination of video, audio, data, and graphics will be
standard business practice.
Compliance with teleconferencing standards will create compatible solutions from
competing manufacturers, feeding the market with a variety of products that work together
as smoothly as standard telephone products do today. Specifically, with the adoption of
International Telecommunications Union (ITU) recommendations T.120, H.320 and H261,
multimedia teleconferencing equipment manufacturers, developers, and service providers
will have a basic established connectivity protocol upon which they can build products,
applications, and services that will change the face of business communications.
New Business Imperatives
Voice on Demand systems are starting to be required by commercial, industrial,
governmental and military associations to retrieve past information in order to prepare
and anticipate future events. This preparation and anticipation can be crucial to the
survival of these industries because of the key roll of the individuals or groups being
monitored. It is this monitoring and collection of data that allows these organizations
to make informed decisions and to take the appropriate action to current events.
Multipoint multimedia teleconferencing and video servers offer the required solution. As
defined here, it involves a user-specified mix of traditional voice, motion video, and
still-image information in the same session. The images can be documents, spreadsheets,
simple hand-written drawings, highly-detailed color schematics, photographs or video
clips. Participants can access the same image at the same time, including any changes or
comments on that image that are entered by other participants. Video servers allow users
to view stored video files of specific events, conferences, news clips and important
information in near realtime.
The benefits are obvious. Instead of text interpretation of a video clip, all interested
parties can access the information. Little is left to verbal interpretation since all
users have access to the original video. In the case of video clips, a persons actions,
verbal tones, mannerisms and reactions to events around them can be viewed and
interpreted. Increased productivity, reduced cost, and reduced travel time are the
primary benefits while proprietary technology and solutions are specified as the primary
inhibitors of using video on demand products and services.
Starting with Standards
While multimedia teleconferencing and video servers promise to revolutionize vital
everyday corporate tasks such as project management, training, and communication between
geographically-dispersed teams, it is clear that standards-based solutions are a
prerequisite for volume deployment. Standards ensure that end-users are not tied to any
one supplier's proprietary technology. They also optimize capital investment in new
technologies and prevent the creation of de facto communication islands, where products
manufactured by different suppliers do not interoperate with each other or do not
communicate over the same type of networks.
When adopted and adhered to by equipment suppliers and service providers alike, standards
represent the most effective and rational market-making mechanism available. ISDN, fax,
X.25, and GSM are a few obvious examples of standards-based technologies. Without
internationally-accepted standards and the corresponding ability to interoperate, the
services based on these technologies would almost certainly languish as simple
curiosities.
Interoperability is particularly important in multipoint operation, where more than two
sites communicate. A proprietary solution might suffice if two end users want to
communicate only with each other; however, this limited type of communication is rare in
today's business world. In typical business communications, multiple sites, multiple
networks, and multiple users have communications equipment from multiple manufacturers,
requiring the support of industry standards to be able to work together. This
interoperability is also critically important when a video server may be transmitting
data across a WAN to multiple users, in multiple sites.
Perhaps the most important effect of standards is that they protect the end users'
investments. A customer purchasing a standards-based system can rely on not only the
current interoperability of his equipment but also the prospect of future upgrades. In
the end, standards foster the growth of the market by encouraging consumer purchases.
They also encourage multiple manufacturers and service providers to develop competing and
complementary solutions and services.
Two Standards, One Goal
Fortunately, standards for multimedia teleconferencing are at hand. Working within the
United Nations-sanctioned ITU's Telecommunications Standardization Sector, two goals have
been achieved: the T.120 audiographics standards and the H.320 videotelephony standards.
T.120, H.320 and H.261 are "umbrella" standards that encompass the major aspects of the
multimedia communications standards set. The T.120 series governs the audiographic
portion of the H.320 series and operates either within H.320 or by itself.
Ratification of the core T.120 series of standards is complete. These recommendations
specify how to use a set of infrastructure protocols to efficiently and reliably
distribute files and graphical information in a multipoint multimedia meeting. The T.120
series consists of two major components. The first addresses interoperability at the
application level, and includes T.126 and T.127. The second component includes three
infrastructure components: T.122/T.125, T.124, and T.123.
The H.320 standards were ratified in 1990, but work continues to encompass connectivity
across LAN-WAN gateways. The existing H.320 umbrella covers several general types of
standards that govern video, audio, control, and system components. With many businesses
using LANs to connect their PCs, the pressure is on to add videoconferencing to those
networks. Since the H.320 standards currently address interoperability of video
conferencing equipment across digital WANs, it is a logical and necessary step to expand
the standards to address LAN connectivity issues. As the work to expand H.320 continues,
it remains the accepted standard.
Both the T.120 and the H.320 series of standards will be improved upon and extended to
cover networks and provide new functionality. This work will maintain interoperability
with the existing standards.
Standards First
Standards as complex and universal as the H.320 and T.120 series need a coordination
point for the interim steps a proposal takes on its way to becoming a standard. The IMTC
is an international group of more than 60 industry-leading companies working to
complement the efforts of the ITU-T with an emphasis on assisting the industry to bring
standards-based products successfully to the market. Its goals include promoting open
standards, educating the end user and the industry on the value of standards compliance
and applications of new technologies, and providing a forum for the discussion and
development of new standards. The IMTC is approved as an ITU-T liaison, and interfaces
with the ITU-T by participating in standards discussion and development, feeding
information and findings into the appropriate ITU-T Study Groups.
The Standards First initiative encourages multimedia equipment manufacturers to start
with compliance to at least the H.320 T.120 and H.261 standards described above. Further
standards compliance is recommended but optional, and manufacturers will still have the
ability to differentiate their products with proprietary features, creating Standards
Plus products. Compliance to the minimum H.320/T.120 standards will ensure a basic level
of connectivity across equipment from all participating manufacturers.
Summary
Standards have played an important part in the establishment and growth of several
consumer and telecommunications markets. By creating a basic commonality, they insure
compatibility among products from different manufacturers, thereby encouraging companies
to produce varying solutions and end users to purchase products without fear of
obsolescence or incompatibility.
The work of both the IMTC and the ITU-T represents an orchestrated effort to promote a
basic connectivity protocol that will encourage the growth of the multimedia
telecommunications market. The Standards First initiative, which has been accepted by
several industry leading companies, requires a minimum of H.320, H.261 and T.120
compliance to establish that basic connectivity. Manufacturers are then able to build on
the basic compliance by adding features to their products, creating Standards Plus
equipment. By insuring interoperability among equipment from competing manufacturers,
developers, and service providers, Standards First ensures that a customer's initial
investment is protected and future system upgrades are possible.
Content Preparation:
The first step in a VOD system is the entry of Video information. The possible sources
of video information in a large scale (Government) VOD system include: Recorded and Live
video, Scanned Images, EO, IR, SAR collected Images. Recorded video is the primary
concern of this paper. Since latency and jitter do not effect Imagery data types they
will be noted but not expanded upon. Live video is the primary concern of video
conferencing, but the requirements do overlap with recorded (VOD) video.
REQUIREMENTS:
Recorded video must be digitized and compressed as soon as possible in the VOD
architecture to minimize the system storage requirements. The Motion Picture Experts
Group of the ISO developed the MPEG-1 and MPEG-2 standards for video compression. With
MPEG 1 a 50 to 1 ratio is typical. MPEG-1 can encode images at up to 4k X 4k X 60
frames/sec. MPEG-2 was optimized for digital compression of TV and supports rates up to
16K X 16K X 30 frames/sec, but 1920 x 1080 x 30 frames/sec is considered broadcast
quality (MPEG-2, Hewlet Packard pub. 5963-7511E). MPEG-2 offers a more efficient means
to code interlaced video signals such as those which originate from electronic cameras.
(Chadd Frogg 8/95)
CODECs/Compression
CODECs encode and decode video into digital format. The CODEC must be configured to
encode the information at the desired end resolution. If the end user requires
broadcast quality video the CODEC must support that level of quality. The CODEC should
also be compatible with the desired data throughput rate of the Content Preparation
element. (This can of course be overcome with sufficient buffering .) Several CODECs
output information in a form which is directly compatible with distribution HW. Some
are designed to output information in DS3, ATM OC3, or Fiber Channel. The Pacific Bell
"Cinema of the Future" project utilizes a HDTV CODEC. The analog HDTV signal is
digitized and compressed to a DS3 rate (44.7mhz) by Alcatels 1741 CODEC. The CODEC
imposes a Discrete Cosign Transform (DCT) hybrid compression algorithm with compensation
for video motion. Though the precise algorithm performed by the 1741 is proprietary the
following is a overview of the process: Pixel groups called blocks are translated into
frequency information using the DCT (similar to a Fourier transform). Next a
Quantization step drops off the least significant bits of information. These
coefficients are then "entropy- encoded into variable bit length codes. This digital
information , now 1/50 of its original size can be passed onto a output mechanism (HW or
SW driver ). This is of course just a quick overview, the process for encoding
information has been fairly well documented by the ISO.
Object Oriented Database Management Systems
In order to setup a searchable database of these MPEG objects several companies are
introducing Object Oriented Data Base Management Systems (ODBMS). These systems can
either be coupled with the Media Server element or Content Preparation element of the VOD
system. It would be ideal if all ODBMS spoke the same language so that information could
be exchanged between data bases. A common query language would be advantageous, but
established standards such as SQL do not adequately address Video Objects. Illustra has
added Object-Oriented extensions onto ANSI- SQL. These extensions are then used to
create "DATABLADES" which provide image handling and manipulation capabilities. Since
this architecture uses SQL it is more likely that third party front end Authoring
software will be compatible with Illustra. (Interoperability 10/95').
Encoding Verification
If the VOD server is seen as a central library of video files, with multiple users
archiving files and other users retrieving files; the requirement for format standards is
evident. There is then, also a requirement to verify that these format standards are
being met. This verification usually falls upon the content Preparation element of a VOD
system. The natural medifore being that of a publisher ensuring that a book is legible
and free of grammatical errors before releasing it to the public. ( This paper would
probably be caught by such a publisher.) This auditing of compressed video information is
not as straight forward. A particular video stream can flow through an MPEG-2 encoder
without incident while a second stream will bog-down the system (possibly inducing
errors). Rapidly changing backgrounds , like sports coverage can cause problems.. The
MPEG-2 standard is complex and requires more than just an astute systems engineer to
ensure that equipment designers of the encoders have not interpreted the MPEG standard
differently (from the decoder designers). Hewlett Packard suggests that the industry
needs to consider testability as a primary requirement of VOD systems. One way to
resolve encoding concerns could be to create standardized test that carefully verify the
implementation of the MPEG standard. Bit error rate testers can test transport layers,
traditional data analysis tools can also be used to build new test tools for MPEG. It
should be no surprise that testability is the last area of standardization for the VOD
marketplace.
Summary
Preparing video information for VOD archiving has reached a point that developers are
able to concentrate on accelerating the compression phase. The compression techniques
are relatively well documented. The industry is now addressing how to implement them
faster; HW vs. SW, Digitizing Cameras vs. DSP cards. Most experts agree that even
though today's workstations have the processing power to perform the MPEG compression it
is usually more efficient to perform as much processing in HW (like dedicated video
cards) as possible. This is not always the case in Multimedia applications where the end
product (do to BW limitations) is not really Broadcast Quality . Quality of Imagery the
user expects is also a major consideration in selecting a content preparation element.
If the user cannot take advantage of a hi-resolution 2k X 2k image; or if the BW of the
distribution network is limited; then a hi-resolution MPEG-2 CODEC might not be
justified. If the CODEC implements the "Spatial scalabilty" capability of MPEG-2 then
the encoder provides the video in a two part format. This lets low-resolution decoders
extract the video signal and with additional processing in more capable decoders, a high
resolution picture can be provided.
Video Server
Requirements
Once the content is uploaded to the video server in the content preparation phase, and
registered appropriately in the database, it becomes available for the end user. In
order for this data to be available and viewable by the end user the server should have
at least a Raid 5 SCSI controller, 4GB Hard Drives with 7200 RPM, and a high speed
network interface. The server should support MPEG-2 compression at 4.0 Mpbs to deliver
approximately 28 hours or 96 Hours of MPEG-1 compression of 30-fps, 640-by-480 pixel
video on demand which equates to a minimum of 50 GB of Hard disk space. The server
should employ RAM in order to buffer the data being received from the disk drive to
ensure a smoother transfer of the video to the end user. A minimum of 256MB is
recommended. The server should be able to handle MPEG-2 and MPEG-1 in NTSC, PAL or SECAM
video formats and be able to meet broadcast and cable requirements for on-air program
applications and video caching.
Compression Method *
Storage Required in Mb per 30 Second video clip
Storage Required in Mb per 60 Second video clip
Total Capacity 52GB HDD Holds
MPEG-1 @ 1.2 Mbps
36
72
96.3 Hours
MPEG-2 @ 4 Mbps
120
240
28.8 Hours
* Assumming the standard compression ratio per method type.
Limitations
There are several major limitations that must be addressed in order to understand why
the above requirements are imposed.
1) Storage--There appears to currently be a storage limitation on video servers because
of retrieval and transmission time associated with video. Multiple servers will be
needed to store and retrieve from large archives of video information. These servers
should be distributed remotely to maximize local retrieval and viewing while minimizing
WAN traffic.
2) Data stream--in order to view video information with a minimum of latency and
without jitter the data stream needs to be constant and uninterrupted (with the exception
of some buffering as necessary). There are several forms of buffering:
a) Media stream storage on hard disk.
b) cached at the transmit buffer
c) network transit latency and buffers may be viewed as another buffer.
d) the receive end may buffer a sufficient amount of the media stream to maintain a
continuous stream for display and suitable synchronization with the transmit end.
3) Concurrent users--The video server should be limited to 100 concurrent users in
order to ensure that each user is able to access the requested data as expeditiously as
possible.
4) Network bandwidth size--The network needs to directly proportional to the number of
simultaneous video streams. The bandwidth of the system is effectively limited by the
bandwidth / transmission capabilities originating at the server.
5) Latency--Although hard to determine, there should be no more than 2 seconds for a
video file retrieved locally and no more than 10 seconds for a video file retrieved over
the WAN from a remote site.
6) ODBMS
Products
Several products that are currently being marketed as video servers are:
1) The Network Connection, M2V Video Server:
a) 120 simultaneous 1.2 Mbps MPEG-1 video streams
b) 112GB, RAID 5 storage.
c) In excess of 200 Hours MPEG-1, and 60 Hours MPEG-2.
d) Supports JPEG, M-JPEG, DVI, AVS, AVI, Wavelte, Indeo and other video formats.
e) Supports Ethernet, Token Ring, FDDI and ATM.
2) Micropolic Corp, AV Server:
a) 16 Mpeg-2 Video Decoder Boards with 4 Channels per card is 64 channels at 6Mbps
per channel.
b) 252GB, Raid storage.
c) In excess of 120 hours MPEG-2
d) Supports only MPEG-2
3) Sun Microsystems, Media Center 1000E Video Server:
a) 63GB, RAID4 storage.
b) In excess of 32 Hours MPEG-2, and 81 Hours MPEG-1
c) Supports MPEG-1 and MPEG-2
d) Supports ATM and Fast-Ethernet
Distribution Network:
Video on Demand (VOD) requires predictability and continuity of traffic flow to ensure
real-time flow of information. MPEG and MPEG-2 (as described above) require an effective
BW of 1.5 - 4 Mbits/sec. Multiplying this "media stream" BW requirement by the number of
clients will give a rough estimate of the effective distribution networks bandwidth. The
Common Imagery Ground/Surface System (CIGSS) 1 Handbook suggests the following steps to
size and specify the LAN technology use for Image dissemination systems:
1. Approximate the system usage profile by estimating the amounts of image, video and
text handling that will be required.
2. Convert the amount of images, video and text to be processed into average effective
data rates. Raw data transferred directly to an archive ( our video server) and near
real- time processed imagery should be estimated separately. The bandwidth
requirements can be combined later if needed.
3. Adjust calculated rate for growth. The growth factor should be at least 50%.
4. Add a fraction (about .3 to .4) of the peak capacity to the growth adjusted rate for
interprocessor communications.
Updating heritage networks to this new BW requirement can incur substantial costs. The
cost of implementing a hi-speed network varies depending on the network architecture.
LAN Types
Several LAN architectures are being used in "trial" VOD systems. ATM, FDDI token ring
and even variations of the Ethernet standard can provide the required 10-100Mb/sec BW.
A version of Ethernet called switched Ethernet can provide up to 10Mbps to all clients.
Since this is a switched architecture the full 10 Mbps can be available to each client.
This architecture provides the quickest most cost effective method of upgrading legacy
systems since it does not require upgrade of existing 10baseT wiring. A voice grade
Ethernet 100VG-AnyLAN can also be implemented in a VOD system. This architecture
however will require some cable upgrades from CAT 3 to CAT 5. Ethernet 100VG is expected
to "top-out" at 100Mbps, no further upgrades are foreseen.
Token ring networks have been implemented in a few VOD trail systems. FDDI can be setup
to provide 100Mbps and because of the Token-ring architecture, the network can specify
BW for each client. A simulated system, described in the Sept '95 edition of Multimedia
Systems would be capable of handling 60 simultaneous MPEG-1 video streams. The video
server (486DX) not the 100-Mbit/sec token ring limited the system size. This is of
course a small system, and due to the "shared" nature of a token ring FDDI architecture
, it should not be implemented for larger (1000+) systems.
ATM provides the highest BW and probably the most expensive network solution. ATM
provides the proper class of service for video on demand applications. ATM connections
running at OC3 rates (155Mbps) are currently priced at approx. $300-$500. ATM is not a
"shared" topology. BW is not dependent on the number of users. In fact, as the number
of users on an ATM net is increased, the effective BW of the ATM network increases. ATM
can have hundreds of services operating simultaneously; voice, video, LAN and ISDN.
These services can all be guaranteed, and assured that they won't interfere with each
other. The LAN marketplace is currently providing 155Mbps products. Some of the ATM
forum leaders (such as FORE systems) are also providing 622Mbps (OC12) network interface
cards (NICs). The problem is that ATM is a relatively new protocol. Several companies
have come together to form the ATM Forum, to help standardize the architecture. For most
network application software the cell-based ATM layer is not an appropriate interface.
The ATM adaption layer (AAL) was designed to bridge the gap between the ATM layer and the
application requirements. The Forum's efforts have been very successful at the lower ATM
adaptive layers but some interoperability issues still exist. The American ATM Forum has
standardized on ATM AAL 5 to map MPEG-2 for transport. While the European ETSI has
chosen AAL2. These inconsistencies effect the transport of multimedia only through ATM
LANS.
Protocols
There are several transport protocols that can be implemented for audio-video
applications; TCP, UDP, SONET, TCP/IP Resource Reservation Protocol (RCVP) and IPX/SPX.
Do to the effective data rate necessary to support VOD, protocols that minimize
client/server interaction are preferable, except in cases where an over-abundance of
network bandwidth exists. In ATM nets supporting mostly non-VOD applications
retransmission of lost packets or corrupt packets will not be possible. For example, if
cells are lost the Fore Systems AVA Real-time Display SW uses pixel tiles from a previous
frame. In a typical VOD system , without error correction, QOS is directly proportional
to network/LAN BER (Bit Error Rate). VOD systems which provide error correction as part
of network protocol have to be designed to allow for the latency created by their error
correcting protocols. (DSS currently implements interleaving, Reed Soloman and viterbi
decoding) QOS trade-offs can be quantified and analyzed (see " QOS control in GRAMS for
ATM LAN", IEEE Journal of Selected Areas in Communications, by Joseph Hui)
Networking, DBMS and server companies have been adopting upper layer protocols to VOD
processes. Oracle Media Net utilizes a "sliding window" protocol. Sliding Window
protocol is a well established methodology for ensuring transmission over lossy data
links. Medianet monitors the response between client and server, lengthens the response
checking time to the point of error and then backs off. (This process theoretically
diminishes disruptive latencies ) . Novell developed the Novell Embedded System
Technology (NEST) and Netware to run over IPX/SPX protocols. The Novell implementation
provides prioritization for video users. Flow control from the client to the server does
not yet exist. (Interoperability, 10/95).
WAN Types
Distributing VOD information outside the LAN requires either a very high bandwidth WAN
with guaranteed availability, or substantial buffering and latency allowances at the
client in order to ensure and maintain a constand display of data. When many people
think of information distribution over a WAN, sourced by many different servers, to many
isolated users; the Internet naturally comes to mind. The Internet was used by the
National Information Infrastructure (NII) workshop as a model for the delivery of video
services. This commercial organization conference in addition to supporting HDTV and DSS
, is interested in providing VOD services to "all Americans". The Internet was seen as a
good first attempt for distributing information. The Internet is inexpensive, requires
no gatekeepers, provides search utilities and has several proven Human Machine Interfaces
(HMIs). Unfortunately the Internet is also bandwidth limited, provides insufficient:
traffic control, security, directories and no guaranteed delivery functions. The
Internet may not be the solution to the VOD distribution problem, but it will expedite
the development of an open architecture commercial VOD WAN.
Commercial enterprises have been considering hybrid fiber/coaxial cable as one possible
solution. This implementation also referred to as "fiber to the curb" requires a partial
upgrade to existing telephone distribution infrastructures. Signals are transmitted over
fiber to a neighborhood distribution (Gateway) point. The signals are then either
converted to RF and sent to the User (home) via coax, or converted to a lower data rate
network Interface and sent onto the home. The RF implementation requires the "Set-Top
Box" for decoding the RF , The latter could be a PC implementation. ISDN-B the
broadband version of ISDN will probably evolve as the leading WAN technology.
Narrowband ISDN is already an excepted method of providing the higher serial data rates
necessary for minimal quality multimedia applications, like teleconferencing. True
motion picture quality VOD implementations will require the Mbps data rates that should
be provided by ISDN-B.
The DOD has also been interested in the distribution of video and imagery across WANs.
The Defense Airborne Reconnaissance Office (DARO) has developed the Common Imagery
|