WO2008148930A1 - System and method for storing multiparty vtoeo conferencing presentations - Google Patents

System and method for storing multiparty vtoeo conferencing presentations Download PDF

Info

Publication number
WO2008148930A1
WO2008148930A1 PCT/FI2008/000061 FI2008000061W WO2008148930A1 WO 2008148930 A1 WO2008148930 A1 WO 2008148930A1 FI 2008000061 W FI2008000061 W FI 2008000061W WO 2008148930 A1 WO2008148930 A1 WO 2008148930A1
Authority
WO
WIPO (PCT)
Prior art keywords
track
file
presentation
multiparty
split
Prior art date
Application number
PCT/FI2008/000061
Other languages
French (fr)
Inventor
Ye-Kui Wang
Miska Hannuksela
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Publication of WO2008148930A1 publication Critical patent/WO2008148930A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1831Tracking arrangements for later retrieval, e.g. recording contents, participants activities or behavior, network status
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/50Telephonic communication in combination with video communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/155Conference systems involving storage of or access to video conference sessions

Definitions

  • the present invention relates generally to video conferencing presentations and video call presentations. More particularly, the present invention relates to the storage of video conferencing presentations and video call presentations in files for local playback or transmission.
  • Background of the Invention This section is intended to provide a background or context to the invention that is recited in the claims.
  • the description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
  • the file format is an important element in the chain of multimedia content production, manipulation, transmission and consumption. There is a difference between the coding format and the file format.
  • the coding format relates to the action of a specific coding algorithm that codes the content information into a bitstream.
  • the file format comprises a mechanism for organizing the generated bitstream in such way that it can be accessed for local decoding and playback, transferred as a file, or streamed, all utilizing a variety of storage and transport architectures.
  • the file format can be used to facilitate the interchange and editing of the media. For example, many streaming applications require a pre-encoded bitstream on a server to be accompanied by metadata (stored in "hint-tracks") that assists the server in streaming the video to the client. Examples of hint-track metadata include timing information, indications of synchronization points, and packetization hints. This information is used to reduce the operational load of the server and to maximize the end-user experience.
  • Available media file format standards include the international organization for standardization (iso) base media file format (iso/international electrotechnical commission (iec) 14496-12) (also referred to as the iso file format in short), the moving picture experts group (mpeg)-4 file format (iso/iec 14496-14), the advanced video coding (avc) file format (iso/iec 14496-15) and the 3 rd generation partnership project (3gpp) file format (3gpp ts 26.244). Efforts are also underway in mpeg for the development of the scalable video coding (svc) file format, which is expected to become an amendment to avc file format.
  • the iso file format is the basis for derivation of all the above-identified file formats
  • each file contains exactly one movie box.
  • the movie box may contain one or more tracks, and each track resides in one track box.
  • typically one track is selected. It is possible for there to be more than one track storing information of a certain media type.
  • a subset of these tracks may form an alternate track group, wherein each track is independently decodable and can be selected for playback.
  • receivers In multiparty conferencing, receivers typically display videos from a selected subset of participants in split-screen windows, e.g. An arrangement of display of decoded video from is illustrated in figure 4.
  • a multipoint control unit may transcode the incoming video streams of the selected subset of participants to one video stream, which contains all the video contents from the selected subset of participants.
  • the mcu can simply forward the incoming video streams of the selected subset of participants to the receivers, after which each video stream is decoded individually.
  • Receivers may want to store multiparty conferencing presentations for future use.
  • the current file format designs do not support the storage of presentations of multiparty video conferences, if the mcu forwards streams to participants.
  • a receiver may store the video streams to be displayed in separate video tracks according to existing file format designs, e.g., the iso base media file format.
  • existing file format designs e.g., the iso base media file format.
  • a player that takes the file as input has no way of knowing which video tracks should be decoded and how to display the respective video tracks.
  • FIG. 1 is a representation of a generic multimedia communications system for use with various embodiments of the present invention
  • Figure 2 is a perspective view of an electronic device that can be used in conjunction with the implementation of various embodiments of the present invention
  • Figure 3 is a schematic representation of the circuitry which may be included in the electronic device of figure 1.
  • Figure 4 is a schematic representation of an arrangement of multi-picture display.
  • Figure 5 is a flow-diagram of generating a media container file according to one embodiment of the file format design.
  • Figure 6 is a flow-diagram of generating a media container file according to another embodiment of the file format design.
  • Figure 7 is a flow-diagram of generating a media container file according to yet another embodiment of the file format design.
  • Various embodiments provide a file format design that supports the storage of multiparty video conferencing presentations. This support is enabled via the inclusion of indications of which tracks belong to a multiparty conference presentation, as well as indications of how to display the decoded video streams in a split-screen. With this arrangement, a player is capable of playing back a recorded multiparty video conferencing presentation in exactly the same manner as it was presented during the actual conference.
  • the file format design also supports the storage of other types of presentations that require the use of simultaneous, multiple independently-decodable video tracks.
  • Various embodiments involve the providing of indications of which tracks to belong to a multiparty conference presentation.
  • Fig. 5 is a flow-diagram of a method of generating a media container file according to the present embodiment of the file format design.
  • the container file format comprises indications of which tracks belong to a multi- party conference presentation.
  • a new track reference of type 'mpcp' is defined.
  • any video track that belongs to a multiparty conference presentation contains a trackreferencetypebox of type 'mpcp' (i.e., with reference type equal to 'mpcp').
  • the track id of each other track belonging to the same multiparty conference presentation is equal to one of the track ids present in the trackreferencetypebox of type 'mpcp'.
  • a file reader can obtain the information regarding which tracks belong to a multiparty conference presentation by checking all of the tracks. In the event that more than one track containing a trackreferencetypebox of type 'mpcp' form an alternate track group, then only one of them is selected for playback.
  • a multiparty_presentation value equal to 1 specifies that the presentation stored in this file is a multiparty presentation that requires more than one video track to be simultaneously decoded and displayed.
  • Figure 6 is a flow-diagram of a method of generating a media container file according to the present embodiment of the file format design.
  • the container file format comprises indications of which tracks is to be simultaneously displayed.
  • a new box is defined and contained in the movie box for the file.
  • the movieheaderbox is also changed as follows, such that one of the reserved bits is used to indicate whether the presentation contained in the file is a multiparty conference presentation.
  • This new box referred to as the track relation box, is defined as follows:
  • version is an integer that specifies the version of this box (equal to 0 in this instance)
  • flags is a 24- bit integer with flags.
  • bit 0 is the least significant bit
  • bit 1 is the second least significant bit
  • bit 0 is the least significant bit
  • bit 0 is the least significant bit
  • bit 1 is the second least significant bit
  • bit 0 is the least significant bit
  • bit 0 is the least significant bit
  • bit 1 is the second least significant bit
  • bit 0 When bit 0 is equal to 1, this indicates that information of multiparty presentation track groups is present in this box. When bit 0 is equal to 0, this indicates that information of multiparty presentation track groups is not present in this box.
  • number_multiparty_presentation_groups indicates the number of multiparty presentation track groups that are signaled.
  • multiparty_presentation_group_id indicates the identifier of the multiparty presentation track group that is signaled.
  • number_tracks_in_group indicates the number of tracks in the multiparty presentation track group that is signaled.
  • multiparty_presentation_track_id indicates the track id of the track in the multiparty presentation track group that is signaled.
  • number_switch_groups indicates the number of switching track groups that are signaled,
  • switch group id indicates the identifier of the i-th switching track group that is signaled. The value does not equal 0. For any track associated with a switchgroup id, if a track selection box is present, then switch_group is equal to switch_group_id. For any track having a track selection box present, if alternate group is not equal to 0, then the track is associated with a switch_group_id.
  • number_tracks_in_switch_group indicates the number of tracks in the i- th switch track group that is signaled
  • switch track id indicates the track id of the j-th track in the i-th switch track group that is signaled.
  • This further embodiment supports the signaling of multiple groups of tracks, where each group forms a multiparty presentation. This is useful when there exists some alternate groups, but no track of one alternate group is appropriate for playing at the same time with any track of another alternate group, for example.
  • FIG. 4 illustrates an arrangement for display of decoded video streams from a plurality of multiparty presentations in a presentation window 400.
  • 401, 402, 403 and 404 each represent decoded video streams from separate multiparty presentation tracks and are characterized by their position in window 400.
  • Figure 7 is a flow-diagram of a method of generating a media container file according to the present embodiment of the file format design.
  • the container file format comprises indications as to how the decoded video streams should be display in a window.
  • the container file comprises indications of the positions within a window where a track is to be displayed.
  • a new sample grouping of type 'sswp' is defined to specify the split-screen window position where each sample of a track should be displayed.
  • each track that belongs to a multiparty presentation includes a sampletogroupbox with grouping type equal to 'sswp' and a samplegroupdescriptionbox with grouping type equal to 'sswp'.
  • the sampletogroupbox maps each sample to a split- screen window position sample group, and each sample group typically contains multiple samples.
  • a sswpsamplegroupentry as defined below is included in the samplegroupdescriptionbox to document the position in the split-screen window where each sample of the corresponding sample group should be displayed: aligned(8) class sswpsamplegroupentry()extends visualsamplegroupentry('sswp') ⁇ unsigned int(8) sswp x; unsigned int(8) sswp_y;
  • sswp_x specifies the horizontal coordinate of the split-screen window where samples of the corresponding sample group should be displayed.
  • the top-left split-screen window has sswp x equal to 0.
  • Sswp_y specifies the vertical coordinate of the split-screen window where samples of the corresponding sample group should be displayed.
  • the top-left split-screen window has sswp_y equal to 1.
  • a new box is included in each track to signal the coordinates in the split-screen window for each segment of decoding time or composition (i.e., display) time.
  • the decoded video of only one participant is displayed, then the mcu does not transmit videos of other participants. Consequently, for that time period, those invisible tracks would have edit lists, and the player can know based upon these lists which track it should display, preferably scaled into the entire screen in one embodiment.
  • Figure 1 is a graphical representation of a generic multimedia communication system within which various embodiments of the present invention may be implemented. As shown in figure 1, a data source 100 provides a source signal in an analog, uncompressed digital, or compressed digital format, or any combination of these formats.
  • An encoder 1 10 encodes the source signal into a coded media bitstream. It should be noted that a bitstream to be decoded can be received directly or indirectly from a remote device located within virtually any type of network. Additionally, the bitstream can be received from local hardware or software.
  • the encoder 110 may be capable of encoding more than one media type, such as audio and video, or more than one encoder 110 may be required to code different media types of the source signal.
  • the encoder 1 10 may also get synthetically produced input, such as graphics and text, or it may be capable of producing coded bitstreams of synthetic media. In the following, only processing of one coded media bitstream of one media type is considered to simplify the description.
  • typically real-time broadcast services comprise several streams (typically at least one audio, video and text sub-titling stream).
  • the system may include many encoders, but in figure 1 only one encoder 110 is represented to simplify the description without a lack of generality.
  • text and examples contained herein may specifically describe an encoding process, one skilled in the art would understand that the same concepts and principles also apply to the corresponding decoding process and vice versa.
  • the coded media bitstream is transferred to a storage 120.
  • the storage 120 may comprise any type of mass memory to store the coded media bitstream.
  • the format of the coded media bitstream in the storage 120 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. Some systems operate "live", i.e. Omit storage and transfer coded media bitstream from the encoder 110 directly to the sender 130.
  • the coded media bitstream is then transferred to the sender 130, also referred to as the server, on a need basis.
  • the format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, or one or more coded media bitstreams may be encapsulated into a container file.
  • the encoder 110, the storage 120, and the server 130 may reside in the same physical device or they may be included in separate devices.
  • the encoder 110 and server 130 may operate with live real-time content, in which case the coded media bitstream is typically not stored permanently, but rather buffered for small periods of time in the content encoder 110 and/or in the server 130 to smooth out variations in processing delay, transfer delay, and coded media bitrate.
  • the server 130 sends the coded media bitstream using a communication protocol stack.
  • the stack may include but is not limited to real-time transport protocol (rtp), user datagram protocol (udp), and internet protocol (ip).
  • rtp real-time transport protocol
  • udp user datagram protocol
  • ip internet protocol
  • the server 130 encapsulates the coded media bitstream into packets.
  • rtp real-time transport protocol
  • udp user datagram protocol
  • ip internet protocol
  • the server 130 encapsulates the coded media bitstream into packets.
  • rtp When rtp is used, the server 130 encapsulates the coded media bitstream into rtp packets according to an rtp payload format. Typically, each media type has a dedicated rtp payload format.
  • a system may contain more than one server 130, but for the sake of simplicity, the following description only considers one server 130.
  • the server 130 may or may not be connected to a gateway 140 through a communication network.
  • the gateway 140 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data stream according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions.
  • Examples of gateways 140 include mcus, gateways between circuit- switched and packet-switched video telephony, push-to-talk over cellular (poc) servers, ip encapsulators in digital video broadcasting-handheld (dvb-h) systems, or set-top boxes that forward broadcast transmissions locally to home wireless networks.
  • the gateway 140 is called an rtp mixer or an rtp translator and typically acts as an endpoint of an rtp connection.
  • the system includes one or more receivers 150, typically capable of receiving, demodulating, and de-capsulating the transmitted signal into a coded media bitstream.
  • the coded media bitstream is transferred to a recording storage 155.
  • the recording storage 155 may comprise any type of mass memory to store the coded media bitstream.
  • the recording storage 155 may alternatively or additively comprise computation memory, such as random access memory.
  • the format of the coded media bitstream in the recording storage 155 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file.
  • a container file is typically used and the receiver 150 comprises or is attached to a container file generator producing a container file from input streams.
  • Some systems operate "live,” i.e. Omit the recording storage 155 and transfer coded media bitstream from the receiver 150 directly to the decoder 160.
  • the most recent part of the recorded stream e.g., the most recent 10-minute excerption of the recorded stream, is maintained in the recording storage 155, while any earlier recorded data is discarded from the recording storage 155.
  • the coded media bitstream is transferred from the recording storage 155 to the decoder 160. If there are many coded media bitstreams, such as an audio stream and a video stream, associated with each other and encapsulated into a container file, a file parser (not shown in the figure) is used to decapsulate each coded media bitstream from the container file.
  • the recording storage 155 or a decoder 160 may comprise the file parser, or the file parser is attached to either recording storage 155 or the decoder 160.
  • the coded media bitstream is typically processed further by a decoder 160, whose output is one or more uncompressed media streams.
  • a renderer 170 may reproduce the uncompressed media streams with a loudspeaker or a display, for example.
  • the receiver 150, recording storage 155, decoder 160, and renderer 170 may reside in the same physical device or they may be included in separate devices.
  • Communication devices of the present invention may communicate using various transmission technologies including, but not limited to, code division multiple access (cdma), global system for mobile communications (gsm), universal mobile telecommunications system (umts), time division multiple access (tdma), frequency division multiple access (fdma), transmission control protocol/internet protocol (tcp/ip), short messaging service (sms), multimedia messaging service (mms), e-mail, instant messaging service (ims), bluetooth, ieee 802.11, etc.
  • a communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.
  • FIGS 2 and 3 show one representative mobile device 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of electronic device.
  • the mobile device 12 of figures 2 and 3 includes a housing 30, a display 32 in the form of a liquid crystal display, a keypad 34, a microphone 36, an ear-piece 38, a battery 40, an infrared port 42, an antenna 44, a smart card 46 in the form of a uicc according to one embodiment of the invention, a card reader 48, radio interface circuitry 52, codec circuitry 54, a controller 56 and a memory 58.
  • Individual circuits and elements are all of a type well known in the art, for example in the nokia range of mobile telephones.

Abstract

A format design that supports the storage of multiparty video conferencing presentations and other types of presentations that require the use of simultaneous, multiple independently-decodable video tracks. This support is enabled via the inclusion of indications of which tracks belong to a multiparty conference presentation, as well as indications of how to display the decoded video streams in a split-screen. With this arrangement, a player is capable of playing back a recorded multiparty video conferencing presentation in exactly the same manner as it was presented during the actual conference.

Description

SYSTEM AND METHOD FOR STORING MULTIPARTY VIDEO CONFERENCING PRESENTATIONS
Field of the Invention The present invention relates generally to video conferencing presentations and video call presentations. More particularly, the present invention relates to the storage of video conferencing presentations and video call presentations in files for local playback or transmission. Background of the Invention This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
The file format is an important element in the chain of multimedia content production, manipulation, transmission and consumption. There is a difference between the coding format and the file format. The coding format relates to the action of a specific coding algorithm that codes the content information into a bitstream. In contrast, the file format comprises a mechanism for organizing the generated bitstream in such way that it can be accessed for local decoding and playback, transferred as a file, or streamed, all utilizing a variety of storage and transport architectures. Additionally, the file format can be used to facilitate the interchange and editing of the media. For example, many streaming applications require a pre-encoded bitstream on a server to be accompanied by metadata (stored in "hint-tracks") that assists the server in streaming the video to the client. Examples of hint-track metadata include timing information, indications of synchronization points, and packetization hints. This information is used to reduce the operational load of the server and to maximize the end-user experience.
Available media file format standards include the international organization for standardization (iso) base media file format (iso/international electrotechnical commission (iec) 14496-12) (also referred to as the iso file format in short), the moving picture experts group (mpeg)-4 file format (iso/iec 14496-14), the advanced video coding (avc) file format (iso/iec 14496-15) and the 3rd generation partnership project (3gpp) file format (3gpp ts 26.244). Efforts are also underway in mpeg for the development of the scalable video coding (svc) file format, which is expected to become an amendment to avc file format. The iso file format is the basis for derivation of all the above-identified file formats
(excluding the iso file format itself). These file formats (including the iso file format itself) are referred to as the iso family of file formats. According to the iso family of file formats, each file contains exactly one movie box. The movie box may contain one or more tracks, and each track resides in one track box. For the presentation of one media type, typically one track is selected. It is possible for there to be more than one track storing information of a certain media type. A subset of these tracks may form an alternate track group, wherein each track is independently decodable and can be selected for playback.
In multiparty conferencing, receivers typically display videos from a selected subset of participants in split-screen windows, e.g. An arrangement of display of decoded video from is illustrated in figure 4. A multipoint control unit (mcu) may transcode the incoming video streams of the selected subset of participants to one video stream, which contains all the video contents from the selected subset of participants. Alternatively, the mcu can simply forward the incoming video streams of the selected subset of participants to the receivers, after which each video stream is decoded individually.
Receivers may want to store multiparty conferencing presentations for future use. However, the current file format designs do not support the storage of presentations of multiparty video conferences, if the mcu forwards streams to participants. A receiver may store the video streams to be displayed in separate video tracks according to existing file format designs, e.g., the iso base media file format. However, in that case, a player that takes the file as input has no way of knowing which video tracks should be decoded and how to display the respective video tracks. Summary of the Invention
Various embodiments provide a file format design that supports the storage of multiparty video conferencing presentations. This support is enabled via the inclusion of indications of which tracks belong to a multiparty conference presentation, as well as indications of how to display the decoded video streams in a split-screen. With this arrangement, a player is capable of playing back a recorded multiparty video conferencing presentation in exactly the same manner as it was presented during the actual conference. The file format design also supports the storage of other types of presentations that require the use of simultaneous, multiple independently-decodable video tracks. These and other advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below. Brief Description of the Drawings Figure 1 is a representation of a generic multimedia communications system for use with various embodiments of the present invention;
Figure 2 is a perspective view of an electronic device that can be used in conjunction with the implementation of various embodiments of the present invention; and Figure 3 is a schematic representation of the circuitry which may be included in the electronic device of figure 1.
Figure 4 is a schematic representation of an arrangement of multi-picture display. Figure 5 is a flow-diagram of generating a media container file according to one embodiment of the file format design. Figure 6 is a flow-diagram of generating a media container file according to another embodiment of the file format design.
Figure 7 is a flow-diagram of generating a media container file according to yet another embodiment of the file format design. Detailed Description of Various Embodiments Various embodiments provide a file format design that supports the storage of multiparty video conferencing presentations. This support is enabled via the inclusion of indications of which tracks belong to a multiparty conference presentation, as well as indications of how to display the decoded video streams in a split-screen. With this arrangement, a player is capable of playing back a recorded multiparty video conferencing presentation in exactly the same manner as it was presented during the actual conference. The file format design also supports the storage of other types of presentations that require the use of simultaneous, multiple independently-decodable video tracks. Various embodiments involve the providing of indications of which tracks to belong to a multiparty conference presentation. Fig. 5 is a flow-diagram of a method of generating a media container file according to the present embodiment of the file format design. The container file format comprises indications of which tracks belong to a multi- party conference presentation. In one embodiment, a new track reference of type 'mpcp' is defined. According to this embodiment, any video track that belongs to a multiparty conference presentation contains a trackreferencetypebox of type 'mpcp' (i.e., with reference type equal to 'mpcp'). The track id of each other track belonging to the same multiparty conference presentation is equal to one of the track ids present in the trackreferencetypebox of type 'mpcp'. With this embodiment, a file reader can obtain the information regarding which tracks belong to a multiparty conference presentation by checking all of the tracks. In the event that more than one track containing a trackreferencetypebox of type 'mpcp' form an alternate track group, then only one of them is selected for playback. In another embodiment, the movieheaderbox is changed as follows, such that one of the reserved bits is used to indicate whether the presentation contained in the file is a multiparty conference presentation: aligned(8) class movieheaderbox extends fullbox('mvhd', version, 0) { if (version==l) { unsigned int(64) creation time; unsigned int(64) modification time; unsigned int(32) timescale; unsigned int(64) duration;
} else { // version=0 unsigned int(32) creation time; unsigned int(32) modification time; unsigned int(32) timescale; unsigned int(32) duration;
} template int(32) rate = 0x00010000; // typically 1.0 template int(16) volume = 0x0100; // typically, full volume bit(l) multiparty_presentation; const bit(15) reserved = 0; const unsigned int(32)[2] reserved = 0; template int(32)[9] matrix = { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 } ;
// unity matrix bit(32)[6] pre defined = 0; unsigned int(32) next track id;
} A multiparty_presentation value equal to 1 specifies that the presentation stored in this file is a multiparty presentation that requires more than one video track to be simultaneously decoded and displayed. Figure 6 is a flow-diagram of a method of generating a media container file according to the present embodiment of the file format design. The container file format comprises indications of which tracks is to be simultaneously displayed. With this embodiment, when multiparty_presentation is equal to 1, it is known that all of the tracks belong to the multiparty presentation. In the event that a number of tracks form an alternate track group, then only one of them is selected for playback. This embodiment can be applied in combination with the previous embodiment discussed above, such that when multipart_presentation is equal to 0, the first embodiment applies.
In yet another embodiment, a new box is defined and contained in the movie box for the file. The movieheaderbox is also changed as follows, such that one of the reserved bits is used to indicate whether the presentation contained in the file is a multiparty conference presentation. This new box, referred to as the track relation box, is defined as follows:
Box type: 'trel'
Container: Movie Box ('moov')
Mandatory: No
Quantity: Zero or One This box specifies the relationship between tracks. Syntax for use in implementing this embodiment is as follows, for example: aligned(8) class trackrelationbox extends fullbox('treF, version = 0, flags) { int i,j; if(flags & 0x000001 = 1) { unsigned int(16) num_multiparty_presentation_groups; for(i=0; i<num_multiparty_presentation_groups; i++) { unsigned int(16) multiparty_presentation_group_id; unsigned int( 16) num tracks in group; for(j=0; j<num_tracks_in_group; j++) unsigned int(32) multiparty _presentation_track_id;
} } }
Relevant semantics for the syntax delineated above is as follows, "version" is an integer that specifies the version of this box (equal to 0 in this instance), "flags" is a 24- bit integer with flags. The following bits are defined, where bit 0 is the least significant bit, bit 1 is the second least significant bit, etc. When bit 0 is equal to 1, this indicates that information of multiparty presentation track groups is present in this box. When bit 0 is equal to 0, this indicates that information of multiparty presentation track groups is not present in this box.
"num_multiparty_presentation_groups" indicates the number of multiparty presentation track groups that are signaled. "multiparty_presentation_group_id" indicates the identifier of the multiparty presentation track group that is signaled. "num_tracks_in_group" indicates the number of tracks in the multiparty presentation track group that is signaled.
"multiparty_presentation_track_id" indicates the track id of the track in the multiparty presentation track group that is signaled. "num_switch_groups" indicates the number of switching track groups that are signaled, "switch group id" indicates the identifier of the i-th switching track group that is signaled. The value does not equal 0. For any track associated with a switchgroup id, if a track selection box is present, then switch_group is equal to switch_group_id. For any track having a track selection box present, if alternate group is not equal to 0, then the track is associated with a switch_group_id. "num_tracks_in_switch_group" indicates the number of tracks in the i- th switch track group that is signaled, "switch track id" indicates the track id of the j-th track in the i-th switch track group that is signaled.
This further embodiment supports the signaling of multiple groups of tracks, where each group forms a multiparty presentation. This is useful when there exists some alternate groups, but no track of one alternate group is appropriate for playing at the same time with any track of another alternate group, for example.
Various embodiments also involve the providing of indications as to how the decoded video streams should be display in a split-screen. Figure 4 illustrates an arrangement for display of decoded video streams from a plurality of multiparty presentations in a presentation window 400. For example, 401, 402, 403 and 404 each represent decoded video streams from separate multiparty presentation tracks and are characterized by their position in window 400. Figure 7 is a flow-diagram of a method of generating a media container file according to the present embodiment of the file format design. The container file format comprises indications as to how the decoded video streams should be display in a window. For example, the container file comprises indications of the positions within a window where a track is to be displayed. In one particular embodiment, a new sample grouping of type 'sswp' is defined to specify the split-screen window position where each sample of a track should be displayed. In this arrangement, each track that belongs to a multiparty presentation includes a sampletogroupbox with grouping type equal to 'sswp' and a samplegroupdescriptionbox with grouping type equal to 'sswp'. The sampletogroupbox maps each sample to a split- screen window position sample group, and each sample group typically contains multiple samples. For each split-screen window position sample group, a sswpsamplegroupentry as defined below is included in the samplegroupdescriptionbox to document the position in the split-screen window where each sample of the corresponding sample group should be displayed: aligned(8) class sswpsamplegroupentry()extends visualsamplegroupentry('sswp') { unsigned int(8) sswp x; unsigned int(8) sswp_y;
} In the above, sswp_x specifies the horizontal coordinate of the split-screen window where samples of the corresponding sample group should be displayed. In this embodiment, the top-left split-screen window has sswp x equal to 0. Sswp_y specifies the vertical coordinate of the split-screen window where samples of the corresponding sample group should be displayed. In this embodiment, the top-left split-screen window has sswp_y equal to 1.
In another embodiment, a new box is included in each track to signal the coordinates in the split-screen window for each segment of decoding time or composition (i.e., display) time. With the above arrangements, if at some time period the decoded video of only one participant is displayed, then the mcu does not transmit videos of other participants. Consequently, for that time period, those invisible tracks would have edit lists, and the player can know based upon these lists which track it should display, preferably scaled into the entire screen in one embodiment. Figure 1 is a graphical representation of a generic multimedia communication system within which various embodiments of the present invention may be implemented. As shown in figure 1, a data source 100 provides a source signal in an analog, uncompressed digital, or compressed digital format, or any combination of these formats. An encoder 1 10 encodes the source signal into a coded media bitstream. It should be noted that a bitstream to be decoded can be received directly or indirectly from a remote device located within virtually any type of network. Additionally, the bitstream can be received from local hardware or software. The encoder 110 may be capable of encoding more than one media type, such as audio and video, or more than one encoder 110 may be required to code different media types of the source signal. The encoder 1 10 may also get synthetically produced input, such as graphics and text, or it may be capable of producing coded bitstreams of synthetic media. In the following, only processing of one coded media bitstream of one media type is considered to simplify the description. It should be noted, however, that typically real-time broadcast services comprise several streams (typically at least one audio, video and text sub-titling stream). It should also be noted that the system may include many encoders, but in figure 1 only one encoder 110 is represented to simplify the description without a lack of generality. It should be further understood that, although text and examples contained herein may specifically describe an encoding process, one skilled in the art would understand that the same concepts and principles also apply to the corresponding decoding process and vice versa.
The coded media bitstream is transferred to a storage 120. The storage 120 may comprise any type of mass memory to store the coded media bitstream. The format of the coded media bitstream in the storage 120 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. Some systems operate "live", i.e. Omit storage and transfer coded media bitstream from the encoder 110 directly to the sender 130. The coded media bitstream is then transferred to the sender 130, also referred to as the server, on a need basis. The format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, or one or more coded media bitstreams may be encapsulated into a container file. The encoder 110, the storage 120, and the server 130 may reside in the same physical device or they may be included in separate devices. The encoder 110 and server 130 may operate with live real-time content, in which case the coded media bitstream is typically not stored permanently, but rather buffered for small periods of time in the content encoder 110 and/or in the server 130 to smooth out variations in processing delay, transfer delay, and coded media bitrate.
The server 130 sends the coded media bitstream using a communication protocol stack. The stack may include but is not limited to real-time transport protocol (rtp), user datagram protocol (udp), and internet protocol (ip). When the communication protocol stack is packet-oriented, the server 130 encapsulates the coded media bitstream into packets. For example, when rtp is used, the server 130 encapsulates the coded media bitstream into rtp packets according to an rtp payload format. Typically, each media type has a dedicated rtp payload format. It should be again noted that a system may contain more than one server 130, but for the sake of simplicity, the following description only considers one server 130.
The server 130 may or may not be connected to a gateway 140 through a communication network. The gateway 140 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data stream according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions. Examples of gateways 140 include mcus, gateways between circuit- switched and packet-switched video telephony, push-to-talk over cellular (poc) servers, ip encapsulators in digital video broadcasting-handheld (dvb-h) systems, or set-top boxes that forward broadcast transmissions locally to home wireless networks. When rtp is used, the gateway 140 is called an rtp mixer or an rtp translator and typically acts as an endpoint of an rtp connection.
The system includes one or more receivers 150, typically capable of receiving, demodulating, and de-capsulating the transmitted signal into a coded media bitstream. The coded media bitstream is transferred to a recording storage 155. The recording storage 155 may comprise any type of mass memory to store the coded media bitstream. The recording storage 155 may alternatively or additively comprise computation memory, such as random access memory. The format of the coded media bitstream in the recording storage 155 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. If there are many coded media bitstreams, such as an audio stream and a video stream, associated with each other, a container file is typically used and the receiver 150 comprises or is attached to a container file generator producing a container file from input streams. Some systems operate "live," i.e. Omit the recording storage 155 and transfer coded media bitstream from the receiver 150 directly to the decoder 160. In some systems, only the most recent part of the recorded stream, e.g., the most recent 10-minute excerption of the recorded stream, is maintained in the recording storage 155, while any earlier recorded data is discarded from the recording storage 155.
The coded media bitstream is transferred from the recording storage 155 to the decoder 160. If there are many coded media bitstreams, such as an audio stream and a video stream, associated with each other and encapsulated into a container file, a file parser (not shown in the figure) is used to decapsulate each coded media bitstream from the container file. The recording storage 155 or a decoder 160 may comprise the file parser, or the file parser is attached to either recording storage 155 or the decoder 160. The coded media bitstream is typically processed further by a decoder 160, whose output is one or more uncompressed media streams. Finally, a renderer 170 may reproduce the uncompressed media streams with a loudspeaker or a display, for example. The receiver 150, recording storage 155, decoder 160, and renderer 170 may reside in the same physical device or they may be included in separate devices.
Communication devices of the present invention may communicate using various transmission technologies including, but not limited to, code division multiple access (cdma), global system for mobile communications (gsm), universal mobile telecommunications system (umts), time division multiple access (tdma), frequency division multiple access (fdma), transmission control protocol/internet protocol (tcp/ip), short messaging service (sms), multimedia messaging service (mms), e-mail, instant messaging service (ims), bluetooth, ieee 802.11, etc. A communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.
Figures 2 and 3 show one representative mobile device 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of electronic device. The mobile device 12 of figures 2 and 3 includes a housing 30, a display 32 in the form of a liquid crystal display, a keypad 34, a microphone 36, an ear-piece 38, a battery 40, an infrared port 42, an antenna 44, a smart card 46 in the form of a uicc according to one embodiment of the invention, a card reader 48, radio interface circuitry 52, codec circuitry 54, a controller 56 and a memory 58. Individual circuits and elements are all of a type well known in the art, for example in the nokia range of mobile telephones.
The various embodiments described herein is described in the general context of method steps or processes, which may be implemented in one embodiment by a computer program product, embodied in a computer-readable medium, including computer- executable instructions, such as program code, executed by computers in networked environments. Generally, program modules may include routines, programs, objects, components, data structures, etc. that performs particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps or processes. Software and web implementations of various embodiments can be accomplished with standard programming techniques with rule-based logic and other logic to accomplish various database searching steps or processes, correlation steps or processes, comparison steps or processes and decision steps or processes. It should be noted that the words "component" and "module," as used herein and in the following claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs.
The foregoing description of embodiments have been presented for purposes of illustration and description. The foregoing description is not intended to be exhaustive or to limit embodiments of the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of various embodiments of the present invention. The embodiments discussed herein were chosen and described in order to explain the principles and the nature of various embodiments of the present invention and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

What is claimed is:
1. A method, comprising: storing in a media container file a plurality of tracks; and including in the file at least one first indication regarding which of the plurality of tracks belong to a multiparty conferencing presentation.
2. The method of claim 1, wherein the including of the at least one first indication comprises including a signifier in a movie header box for the file indicating whether a presentation contained in the file is a multiparty conference presentation.
3. The method of claim 1, wherein the including of the at least one first indication comprises including a signifier in a movie box for the file indicating relationships among tracks in the file.
4. The method of claim 1, further comprising including in the file at least one second indication regarding, for each track that belongs to the multiparty conferencing presentation, where the track should be displayed in a split-screen window.
5. The method of claim 5, wherein the including of the at least one second indication comprises, for each track that belongs to the multiparty conferencing presentation, including a signifier with the track, the signifier signaling coordinates within the split-screen window where each sample of the track is to be displayed.
6. The method of claim 6, wherein the coordinates within the split-screen window vary for different samples in a track.
7. A computer program product, embodied in a computer-readable medium, comprising computer code configured to perform the processes of claim 1.
8. An apparatus, comprising: a processor; and a memory unit communicatively connected to the processor and including: computer code for storing in a media container file a plurality of tracks; and computer code for including in the file at least one first indication regarding which of the plurality of tracks belong to a multiparty conferencing presentation.
9. The apparatus of claim 10, wherein the including of the at least one first indication comprises including a signifier in a movie header box for the file indicating
■ whether a presentation contained in the file is a multiparty conference presentation.
10. The apparatus of claim 10, wherein the including of the at least one first indication comprises including a signifier in a movie box for the file indicating relationships among tracks in the file.
11. The apparatus of claim 10, wherein the memory unit further comprises including in the file at least one second indication regarding, for each track that belongs to the multiparty conferencing presentation, where the track should be displayed in a split- screen window.
12. The apparatus of claim 11, wherein the including of the at least one second indication comprises, for each track that belongs to the multiparty conferencing presentation, including a signifier with the track, the signifier signaling coordinates within the split-screen window where each sample of the track is to be displayed.
13. The apparatus of claim 12, wherein the coordinates within the split-screen window vary for different samples in a track.
14. An apparatus, comprising: means for storing in a media container file a plurality of tracks; and means for including in the file at least one first indication regarding which of the plurality of tracks belong to a multiparty conferencing presentation.
15. The apparatus of claim 14, further comprising means for including in the file at least one second indication regarding, for each track that belongs to the multiparty conferencing presentation, where the track should be displayed in a split-screen window.
16. A method, comprising: identifying at least one of a plurality of tracks within a file as belonging to a multiparty conferencing presentation based upon at least one first indication in the file; and rendering each track identified as belonging to the multiparty conferencing presentation.
17. The method of claim 16, wherein the at least one first indication comprises a signifier in a movie header box for the file indicating that a presentation contained in the file is a multiparty conference presentation.
18. The method of claim 16, wherein the at least one first indication comprises a signifier in a movie box for the file indicating relationships among tracks in the file.
19. The method of claim 16, further comprising: for each track that belongs to the multiparty conferencing presentation, determining where the track should be displayed in a split-screen window based upon at least one second indication contained in the file, wherein each track identified as belonging to the multiparty conferencing presentation is rendered in a location in the split-screen window based on the at least one second indication.
20. The method of claim 19, wherein the at least one second indication comprises, for each track that belongs to the multiparty conferencing presentation, a signifier associated with the track, the signifier signaling coordinates within the split-screen window where each sample of the track is to be displayed.
21. The method of claim 20, wherein the coordinates within the split-screen window vary for different samples in a track.
22. A computer program product, embodied in a computer-readable medium, comprising computer code configured to perform the processes of claim 16.
23. An apparatus, comprising: a processor; and a memory unit communicatively connected to the processor and including: computer code for identifying at least one of a plurality of tracks within a file as belonging to a multiparty conferencing presentation based upon at least one first indication in the file; and computer code for rendering each track identified as belonging to the multiparty conferencing presentation.
24. The apparatus of claim 23, wherein the at least one first indication comprises a signifier in a movie header box for the file indicating that a presentation contained in the file is a multiparty conference presentation.
25. The apparatus of claim 23, wherein the at least one first indication comprises a signifier in a movie box for the file indicating relationships among tracks in the file.
26. The apparatus of claim 23, wherein the memory unit further comprises: computer code for, for each track that belongs to the multiparty conferencing presentation, determining where the track should be displayed in a split-screen window based upon at least one second indication contained in the file, wherein each track identified as belonging to the multiparty conferencing presentation is rendered in a location in the split-screen window based on the at least one second indication.
27. The apparatus of claim 26, wherein the at least one second indication comprises, for each track that belongs to the multiparty conferencing presentation, a signifier associated with the track, the signifier signaling coordinates within the split- screen window where each sample of the track is to be displayed.
28. The apparatus of claim 27, wherein the coordinates within the split-screen window vary for different samples in a track.
29. An apparatus, comprising: means for identifying at least one of a plurality of tracks within a file as belonging to a multiparty conferencing presentation based upon at least one first indication in the file; and means for rendering each track identified as belonging to the multiparty conferencing presentation.
30. The apparatus of claim 29, further comprising: means for, for each track that belongs to the multiparty conferencing presentation, determining where the track should be displayed in a split-screen window based upon at least one second indication contained in the file, wherein each track identified as belonging to the multiparty conferencing presentation is rendered in a location in the split-screen window based on the at least one second indication.
PCT/FI2008/000061 2007-06-08 2008-06-05 System and method for storing multiparty vtoeo conferencing presentations WO2008148930A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US94300907P 2007-06-08 2007-06-08
US60/943,009 2007-06-08

Publications (1)

Publication Number Publication Date
WO2008148930A1 true WO2008148930A1 (en) 2008-12-11

Family

ID=40093230

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2008/000061 WO2008148930A1 (en) 2007-06-08 2008-06-05 System and method for storing multiparty vtoeo conferencing presentations

Country Status (1)

Country Link
WO (1) WO2008148930A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010040898A1 (en) * 2008-10-08 2010-04-15 Nokia Corporation System and method for storing multi-source multimedia presentations
EP2501129A1 (en) * 2009-12-07 2012-09-19 Huawei Technologies Co., Ltd. Implementation method, apparatus and system for computer-supported telecommunications applications

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5915091A (en) * 1993-10-01 1999-06-22 Collaboration Properties, Inc. Synchronization in video conferencing
US6119147A (en) * 1998-07-28 2000-09-12 Fuji Xerox Co., Ltd. Method and system for computer-mediated, multi-modal, asynchronous meetings in a virtual space
WO2003107622A1 (en) * 2002-06-13 2003-12-24 Nice Systems Ltd. A mthod fo recording a multimidia conference
US20060047674A1 (en) * 2004-09-01 2006-03-02 Mohammed Zubair Visharam Method and apparatus for supporting storage of multiple camera views
US20060098086A1 (en) * 2004-11-09 2006-05-11 Nokia Corporation Transmission control in multiparty conference
US20060146124A1 (en) * 2004-12-17 2006-07-06 Andrew Pepperell Video conference recorder
US7170886B1 (en) * 2001-04-26 2007-01-30 Cisco Technology, Inc. Devices, methods and software for generating indexing metatags in real time for a stream of digitally stored voice data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5915091A (en) * 1993-10-01 1999-06-22 Collaboration Properties, Inc. Synchronization in video conferencing
US6119147A (en) * 1998-07-28 2000-09-12 Fuji Xerox Co., Ltd. Method and system for computer-mediated, multi-modal, asynchronous meetings in a virtual space
US7170886B1 (en) * 2001-04-26 2007-01-30 Cisco Technology, Inc. Devices, methods and software for generating indexing metatags in real time for a stream of digitally stored voice data
WO2003107622A1 (en) * 2002-06-13 2003-12-24 Nice Systems Ltd. A mthod fo recording a multimidia conference
US20060047674A1 (en) * 2004-09-01 2006-03-02 Mohammed Zubair Visharam Method and apparatus for supporting storage of multiple camera views
US20060098086A1 (en) * 2004-11-09 2006-05-11 Nokia Corporation Transmission control in multiparty conference
US20060146124A1 (en) * 2004-12-17 2006-07-06 Andrew Pepperell Video conference recorder

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010040898A1 (en) * 2008-10-08 2010-04-15 Nokia Corporation System and method for storing multi-source multimedia presentations
US9357274B2 (en) 2008-10-08 2016-05-31 Nokia Technologies Oy System and method for storing multi-source multimedia presentations
EP2501129A1 (en) * 2009-12-07 2012-09-19 Huawei Technologies Co., Ltd. Implementation method, apparatus and system for computer-supported telecommunications applications
EP2501129A4 (en) * 2009-12-07 2012-12-19 Huawei Tech Co Ltd Implementation method, apparatus and system for computer-supported telecommunications applications

Similar Documents

Publication Publication Date Title
US8365060B2 (en) System and method for indicating track relationships in media files
CA2661578C (en) System and method for indicating track relationships in media files
US9992555B2 (en) Signaling random access points for streaming video data
US9357274B2 (en) System and method for storing multi-source multimedia presentations
EP2314072B1 (en) Track and track-subset grouping for multi view video decoding.
US8774284B2 (en) Signaling of multiple decoding times in media files
CN103069828A (en) Providing sequence data sets for streaming video data
CN103081488A (en) Signaling video samples for trick mode video representations
US7711718B2 (en) System and method for using multiple meta boxes in the ISO base media file format
WO2008148930A1 (en) System and method for storing multiparty vtoeo conferencing presentations
AU2012202346B2 (en) System and method for indicating track relationships in media files
US11863767B2 (en) Transporting HEIF-formatted images over real-time transport protocol
WO2022213034A1 (en) Transporting heif-formatted images over real-time transport protocol including overlay images
CN117099375A (en) Transmitting HEIF formatted images via real-time transport protocol

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08761615

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08761615

Country of ref document: EP

Kind code of ref document: A1