Fan Fiction

5 pages
5 views

MPEG4: a flexible coding standard for the emerging mobile multimedia applications

of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
This paper analyses the relevance and performance of the emerging MPEG-4 audiovisual coding standard for emerging mobile multimedia applications. Some results are presented for one of the MPEG-4 profiles targeting mobile scenarios
Transcript
  MPEG-4: A Flexible Coding Standard for the Emerging Mobile Multimedia Applications zyx uis Ducla Soares, Fernando Pereira Instituto Superior T6cnico zyxw   Instituto de TelecomunicaGbes E-mail: Luis.Soares@lx.it.pt, Fernando.Pereira@Ix.it.pt ABSTRACT This paper analyses the relevance and performance of the emerging MPEG-4 audiovisual coding standard for emerging mobile multimedia applications. Some results will be presented for one of the MPEG-4 profiles targeting mobile scenarios. I. INTRODUCTION In the last few years, many new digital multimedia communication services and devices, such zyxwvu s videotelephony, videoconference, and digital television, have appeared in the market. These applications span over a large variety of bitrates, ranging from the very low bitrates of current mobile communications to the very high bitrates of high quality television and are covered by the available video coding standards, such as H.261, H263, MPEG-1, and MPEG-2. However, these digital “frame-based” standards use the same video data models as in available analog services, i.e. a sequence of rectangular images formed by a certain number of pixels, which are encoded by exploring their statistical properties. In parallel, some convergence between the Telecommunications, Information Technology (IT) and Entertainment sectors is taking place, in the sense that a clear distinction between the corresponding service models no longer exists [I]. This basically means that similar problems have to be dealt with by the corresponding technical communities. Among the common emerging user requirements that need to be satisfied are content-based interactivity, allowing the user to have a deeper interaction with the information made available, the integration of natural and synthetic content, and universal access, especially interesting for the mobile communications community since mobile networks bring the most critical requirements. In order to avoid the emergence of incompatible solutions for the same problem, standards that address similar needs in the three converging worlds above mentioned are needed. This is where the emerging MPEG-4 standard, being now finalized by IS0 MPEG, intends to play a major role. The main objective of MPEG-4 is to address the problem of audiovisual data coding, including the new content-based requirements, proposing a common solution to the three converging worlds. This new coding standard uses a representation model based on the understanding of a scene as a composition of (semantically) relevant objects zyxwvutsr he zyxwvutsrq ontent. As a consequence, object-based video coding schemes can also be called content-based video coding schemes since the semantics associated to the objects in the scene plays now a major role in defining the object-based structure used to represent visual data. This approach shall allow new and improved functionalities in terms of coding efficiency, universal access, and interactivity since, for the first time, the content is not only selectively processed but also independently accessible. In terms of video coding efficiency, MPEG-4 has been tested for bitrates in the range of zyx   kbps to Mbps, providing acceptable quality for the typical low bitrates used in wireless communications, such as in PCS and IMT-2000. Although MPEG-4 addresses audio and video coding, th.is paper will concentrate on video coding functionalities and pexformance. In the context of mobile environments, the MPEG-4 universal access requirement understood as “audiovisual information shall be accessible from anywhere in anyway” plays a major role. The universal access requirement is a consequence of the growing variety of networks used, including the mobile ones, nowadays more and more important. Since it is well known that some of these networks have critical baindwidth and channel error characteristics, which have to be taken into account, a strong pressure was made towards the study of new error protection, detection and concealment techniques as an essential part of emerging object-based video coding architectures. Object-based scalability is another technology playing an important role in the provision of universal access since it allows to accommodate different transmission resources as well as decoding on processors with varying processing power by just sending to each receiver the most adequate information, in terms of content (more or less objects), SNR, and spatialhemporal resolution. The MPEG-4 basic coding principles and the associated functionalities described above show that this new coding standard can be very flexible in adapting to different transmission and decoding conditions, such as different bitrates and different error conditions. This makes MPEG-4 the ideal, and timely, audiovisual coding technology to implement a whole new range of multimedia applications over wireless channels, such as PCS (GSM and Q-CDMA) and IMT-2000/UMTS. 11 MPEG 4 VIDEO CODING In the context of the MPEG-4 video coding framework [2], each scene is structured as a composition of (semantically) relevant arbitrarily shaped 2D objects Video Objects VOs) - coded using separate elementary bitstreams, one per object, and another one for the composition information. For each 0-7803-4872-9/98/ 10.00 zyxwvu   998 IEEE zyxwv 335  object, shape, texture and motion information is coded. While shape coding is for the first time considered in the context of a video coding standard, motion and texture coding have been done for quite a long time in the available standards, using hybrid coding schemes zyxwvutsr   Discrete Cosine Transform (DCT) and motion compensation. Since this new standard is considering the scene to be composed of various objects, the (semantically) meaningful objects may be not only independently coded but also independently accessed, hence allowing a high level of interactivity between the end-user and the available information. zyxwvuts V-objects AV-objects zyxwvutsrqponmlkj Figure - Simplified architecture zyxwvut f an MPEG-4 system [l] If a scene segmentation is not available or not useful for the considered application, e.g. very simple mobile video communication, it is still possible to encode the whole scene as a single rectangular VO, back to a frame-based situation similar to the one considered in previous standards. However, the current situation in terms of future mobile multimedia terminals allows foreseeing that simple (real-time) segmentation solutions will be soon available, in mobile terminals, for applications such as videotelephony and remote surveillance. The segmentation methods are not standardized, giving the industry the possibility to evolve and compete in this area. MPEG-4 only provides the necessary tools to efficiently encode the various objects in a scene, as well as the composition information, regardless of the technology used to extractlidentify the objects. This non-normative approach, also used for the encoding process, allows the standard to integrate quite easily new improvements in segmentation technology, thus increasing its longevity. In oi-der to better understand zyxwvut he MPEG-4 video coding solution, a brief description of this technology is here presented. The high level syntax used in the MPEG-4 Visual standard consists of four hierarchically organized classes [2]: e zyxwvutsrqponmlkji ideo Session Each Video Session VS) is made up of one or more Video Objects VO), corresponding to the various objects in the scene. Video bject Each one of these VOs can have several scalabilily layers spatial, temporal, or SNR), corresponding to different Video Object Layers VOL). Video Object Layer - Each VOL consists of an ordered sequence of snapshots in time, called Video Object Planes VOP). e e Video Object Plane Each VOP is a snapshot in time of a VO or a certain VOL. This way, one VS can have several VOs (VOO, VO1 which are the objects in the scene), each one of these VOs having several layers (VOLO, VOLl .), which are sequences in time of several VOPs (VOPO, VOP1 ...). And finally, each one of these VOPs is characterized by its shape, motion and texture, which have to be encoded. These encoding techniques will be shortly described in the following paragraphs. Even though MPEG-4 is an object-based standard, texture coding is still block-based and somehow similar to the previous video coding standards. The first step in encoding one arbitrarily VO is finding a rectangular bounding box that completely contains the object to be encoded. Then this bounding box, corresponding to a VO, is divided into blocks of 16x16 pixels called macroblocks (MB), which are then encoded one by one. Each MB consists of four 8x8 luminance blocks and one 8x8 bIock for each sub-sampIed (in both directions) chrominance, therefore giving a total of 6 blocks per MB. First the shape information is encoded. Each MB is analyzed and classified according to three possible classes: transparent (MB outside the object but inside the bounding box), opaque (MB completely inside the object) or border (MB over the border). The border MBs are the ones for which real shape coding is required. The shape coding algorithm used is called Content-based Arithmetic Encoding (CAE) [2]. The motion information is encoded by means of motion vectors. Each MB can either have one or four motion vectors. When four motion vectors are used, each one of them is associated to an 8x8 block within the MB, instead of the whole 16x16 MB. This motion vector telIs the decoder which block (8x8 or 16x16) of pixels in the previous VOP is closest to the current one and therefore will be used for the prediction of the texture. The decoder will use the motion vectors for motion compensation. The process by which the motion vectors are detected is not standardized. The texture data can be encoded by two modes: intra and inter. If intra coding is chosen, then a given MB is encoded by itself (with no temporal prediction), onIy exploring the spatial redundancies. On the other hand, if inter coding is performed then temporal prediction and motion compensation is used to explore the temporal redundancy, and the differences between the current and the prediction MB are encoded. Both the absolute texture values (intra coding) and the differential texture values (inter coding) are then encoded using the DCT transform (applied to the six 8x8 blocks in each MB). The DCT coefficients are after encoded by run-length coding and variable length coding (VLC). Instead of regular VLC, reversible VLC can be used to code texture, if error resilience is a requirement [2]. The information for shape, motion, and texture is multiplexed at the MB level, which means that for each MB the shape information is sent, then the motion data, and finally the 1336  texture data. This type of multiplexing is called the combined mode. This is opposed to sending the shape, motion, and texture data multiplexed at VOP level, which would mean to send first all the shape data for all the MBs in the VOP, then the motion vectors for all the MBs, and finally zyxwvut ll the texture data. Resync. MBA QP Marker A. zyxwvutsrqponml rror Resilience Video Coding Tools HEC Optional Shape& Motion Texture Data Resync. Fields Motion Data Marker Marker Since some networks, such as the mobile ones, have very critical error characteristics, MPEG has devoted a lot of effort to specifying error resilient coding tools. The main idea behind error resilient source coding in MPEG-4 is to split the VOP information into independent resynchronization packets in order to avoid error propagation from one packet to the other. To allow a more evenly distributed protection, the packet size was chosen to be approximately constant in terms of the number of bits, instead of being constant in terms of the number of MBs included. This solution requires the encoder to track the number of bits per packet, in order to start a new packet immediately after the end of the MB where the chosen threshold was exceeded. This guarantees that the information corresponding to a given MB will not be split between two packets. The packets are separated by a resynchronization marker followed by some fields with important information to make the packets totally independent from each other. Based on this resynchronization packet approach, MPEG-4 defined two error resilient video coding modes: zyxwvutsr ombined Mode withoutdata partitioning) zyxwvut   This is the basic error resilient mode where the data is organized in the same way as in the normal combined mode non error resilient), with the exception that the data is divided into packets according to the rules described above. Combined Mode with Data Partitioning This mode increases the error resilience capabilities of the simple combined mode by re-ordering the data inside the packet see Figure 2). This way the shape and motion data are separatedfrom the texture data by a marker, thus allowing one to be recovered even zyxwvuts f the other is lost. If reversible VLC are used zyxwvuts or the texture data, and an error is detected in the texture data, all the bits up to the next resynchronization marker can be skipped and the decoder can resume decoding in the backward direction, thus recovering a lot of information that would otherwise be lost. This mode zyxwvutsrq s the most robust, and thus it will be the one evaluated in the following of this paper. Since the whole architecture of the system is object-based, this will allow prioritizing the different objects according to their importance in the scene, notably by coding the most important objects in the scene with more quality and by making them more error robust. This appears to be especially interesting for networks were bandwidth and error conditions are very critical, such as mobile networks. B. MPEG-4 Profiling: The Mobile View Since MPEG-4 includes coding tools addressing many requirements, which simultaneous use is seldom necessary n the context of real applications, relevant clusters of tools addressing a certain class of applications have been identified to guide terminal implementation the MPEG-4 profiles. Profiles guarantee interoperability while minimizing the complexity. Among the MPEG-4 profiles for natural video coding, two appear more relevant for mobile communications: the Simple Profile and the Core Profile. In both profiles, all the MPEG-4 error resilient video coding tools are included, making them suitable for mobile networks. The main difference between these two profiles is that while the Simple Profile only supports rectangular objects, the Core Profile supports 2D arbitrarily shaped objects (and thus needs to know how to decode MPEG-4 shapes). This means that while the Simple Profile is more adequate for very simple applications, such as mobile videotelephony and remote surveillance, to be implemented in less powerful processors (the first terminals are already being planned), the Core Profile addresses more complex applications, such as mobile broadcasting, where (binary) arbitrary shapes are important, notably to provide some content-based user interaction. III. A MOBILE MULTIPLEXING SOLIJTION In order to show the world the video error resilience capabilities of MPEG-4, a set of formal verification tests was scheduled [3]. This first round of subjective verification tests was intended to address wireless applications and for it two mobile network cases PCS and IMT-2000 were chosen. The Video Simple Profile was evaluated in these t.ests. To test as much zyxw s possible real situations, it was decided to simulate a complete (coding-decoding) system, with the various blocks implemented by independent parties. The authors of this paper have participated in the video encoding and decoding part of the system. The model used for the complete system corresponds to a typical mobile multimedia communication system, and consists of the following four layers [3]: Application Layer MPEG-4 Video EncoderJDecoder and MPEG-4 Audio EncodedDecoder. Sync Layer a component of the MPEG-4 .Systems Layer i51. TransMux Luyer H.223lAnnex B [4J, which is a mobile extension of the multiplex specified for the H.324 ITU-T standard multimedia terminal. Physical Layer IO ms burst error, which is a typical mobile channel error condition. zyx 1337  In the application layer, elementary video and audio streams are generated. Three testing cases were envisioned: one PCS channel with 32 kbps, and two IMT-2000 channels, one with 128 kbps and the other with 384 kbps. Among the tools used in the video encoder are: intra coding (I-mode), inter coding (P- mode), reversible VLC, video packet resynchronization, data partitioning and adaptive intra refreshment (AIR). AIR is a technique that allows refreshing the quality of the sequence using the intra coding mode, thus avoiding error propagation for very long periods [2]. The video packet size used was 480 bits for the PCS case, and 1440 bits for the IMT-2000 case. zyxwvu “Overtime” “Artbit” 2 kbps 376 kbps In the Sync Layer (SL) zyxwvutsrq 5], the elementary streams are packaged into SL-Packetized Streams. Each one of these SL- Packets includes timing and numbering information that will allow the different elementary streams to be decoded and subsequently composed. For these tests, each video resynchronization packet was mapped to a SL-Packets. Error Free dB) Critical Errors dB) After the Sync Layer, MPEG-4 specifies an optional FlexMux Layer, which is a simple multiplexing tool that addresses the specific MPEG-4 needs of low delay and low overhead. However, in these tests this layer was not used. 32.89 36.00 32.30 35.47 The TransMux Layer, which corresponds to the main multiplexing/demultiplexing tool, is not standardized by MPEG-4 since this is left to the bodies in charge of the channel related standards. Instead MPEG-4 specifies an interface to the multiplexeddemultiplexer, assuming that a large number of adequate delivery mechanisms exists below this interface. Since mobile multimedia communications were here the main target, ITU’s mobile multiplexing standard, H.223/Annex B, was chosen. More derails regarding the Sync Layer and the multiplexer configuration can be found in [3] As far as the Physical Layer is concerned, a typical mobile transmission channel was chosen, with 10 ms burst errors and two average bit error rates: lo3 (critical) and zyxwvu O4 (typical). Results will be shown here for the critical case. IV. CONCEALMENT FOR ERROR RESILIENT DECODING Usually three types of tasks have to be accomplished at the decoder to reach high performance in terms of error robustness: zyxwvutsr   zyxwvutsrqponmlkjihgfedcba rror Detection zyxwvutsr   the decoder tries to detect zyxwvu f errors have occurred Error Localization the decoder tries to find where exactly the detected error has Occurred. Error Concealment he decoder tries to hide the effect zyxwv f he detected errors. Since all error concealment tasks to be performed at the decoder are non-normative, the description here presented refers to the MPEG-4 video decoder implementation made by the authors of this paper. In terms of error detection, syntactic and semantic inconsistencies have been checked in order to detect if errors have occurred. For error localization, reversible decoding (using RVLC) has been used in order to minimize the amount of texture data lost. Finally, for error concealment, a technique that takes advantage of the data partitioning has been used: if the partition with the motion data is lost, the entire packet is discarded and replaced with the corresponding data in the previous frame; if the error is found in the texture partition, then reversible decoding is started. In the next section, some results will be shown for the coding and multiplexing conditions described in the previous sections. V. RESULTS AND FINAL REMARKS In this section, some results are provided for the two testing cases above described, PCS and IMT-2000, using the MPEG-4 video codec (Simple Profile) implemented by the authors of this paper. Due to the lack of space, results are only provided for two sequences and bitrates (see Figure 3 . The first sequence (“Overtime”) is a typical videophone sequence, with 1350 frames, in QCIF ormat, at 7 5 Hz, and could be used for a PCS-type communication at 32 kbps 20 kbps are given to video, 8 kbps to audio and the remaining 4 kbps are used for multiplex data. The second sequence (“Artbit”) is a typical multimedia sequence, including natural and synthetic content, with 2250 frames, in CIF format, at 15 Hz, and could be used for mobile multimedia broadcasting in the context of an IMT- 2000 communication, at 384 kbps - 376 kbps for video, 8 kbps to audio, and the remaining 4 kbps for multiplex data. Figure 3 Sample images zy oi- he sequences “Overtime” and “Artbit” In Table 1, the average PSNR figures are shown, for both sequences, in the error free case and in the critical error case. It can be seen that, in average, even in the presence of severe errors, the PSNR only drops about 0.5 dB. In Figures 4 and 5 the PSNR evolution in time is shown for the whole sequences. It can be seen that some errors cause severe drops in the PSNR. However these errors are not subjectively very annoying since they last very little time and the image shown is carefully concealed with previously available information (this means that no very strange data is typically shown to the user). It can also be seen that the PSNR recovers very quickly from errors, which is mainly due to the 1338  38 zyxwvutsrq   36 32 zyxwvutsrqpo   30 zyxwvutsrqp . 26 Q 24 zyxwvutsr   m zyxwvutsr 3 I 8 I ~ I ~~ zyxwvutsrqponmlkjihgfedcb   ” zyxwvuts   0 150 300 zyxwvutsrq 50 zyxwvut   750 900 1 050 1m 1350 Frame Nurrber rror Free itical Bror Condition ~ Figure zyxwvutsr   PSNR evolution or the “Overtime“ equence - 20 kbps 38 36 - 34 32 30 28 26 24 22 20 0 250 500 750 1000 1250 1500 1750 2000 2250 Frame Nuher rror Free riiical Error Condition Figure 5 PSNR evolution or the “Artbit” equence 76 kbps AIR intra coding refreshing technique. Although PSNR figures have here been used to give an idea about the quality performance of MPEG-4 error resilient coding, it is acknowledged that this parameter does not faithfully reflect subjective quality. Due to this fact, MPEG-4 verification tests will be done using a new subjective testing method, called Double Stimulus Continuous Quality Evaluation (DSCQE), specially designed to measure error resilience performance. The results presented in this paper show that the emerging MPEG-4 standard is able to provide video with acceptable quality over mobile networks. Moreover MPEG-4 supports new content-based functionalities which may significantly increase the range of applications to be provided in mobile environments (under evaluation by the authors). In conclusion, MPEG-4 can be the coding solution for mobile multimedia, and it should not take very long before the first products are available in the market. ACKNOWLEDGEMENTS The authors acknowledge the support of the European Commission under the ACTS project MoMuSys. L. Ducla Soares acknowledges the support of PRAXIS XXI through his Ph.D. scholarship. REFERENCES [I] R. Koenen, F. Pereira, L. Chiariglione, “MPEG-4: Context and Objectives”, Image Communication Journal: MPEG-4 Special Issue, vol. 9, no. 4, May 1997 [2] MPEG Video & SNHC, “Coding of Audio-visual Objects: Visual ISO/IEC 14496-2” Doc. ISOAEC JTCl/SC29/WGll N2202, MPEG Tokyo Meeting, March 1998, http://drogo cselt it/mpeg/public [3] Test and Video Subgroup, “Work Plan for Fonnal Verification Tests on Video Error Resilience” Doc. ISOAEC JTCl/SC29/WGll N2061, San Jose MPEG meeting, February 1998 [4] TU-T Draft recommendation H.223/Annex I3 - Multiplex protocol for low bit rate mobile multimedia communication over moderate error-prone channels [51 MPEG Systems, “Coding of Audio-visual Objecrs: Systems ISO/IEC 14496-I” Doc. ISO/IEC JTCl/SC29/WGll N2201, MPEG Tokyo Meeting, March 1998,  http://drogo cselt it/mpeg/public 1339
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x