EP1312188A1 - Audio data processing - Google Patents

Audio data processing

Info

Publication number
EP1312188A1
EP1312188A1 EP01960879A EP01960879A EP1312188A1 EP 1312188 A1 EP1312188 A1 EP 1312188A1 EP 01960879 A EP01960879 A EP 01960879A EP 01960879 A EP01960879 A EP 01960879A EP 1312188 A1 EP1312188 A1 EP 1312188A1
Authority
EP
European Patent Office
Prior art keywords
audio
network
audio data
streams
data streams
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP01960879A
Other languages
German (de)
French (fr)
Other versions
EP1312188B1 (en
Inventor
Milena Radenkovic
Christopher Michael Greenhalgh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Telecommunications PLC
Original Assignee
British Telecommunications PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Telecommunications PLC filed Critical British Telecommunications PLC
Priority to EP01960879A priority Critical patent/EP1312188B1/en
Publication of EP1312188A1 publication Critical patent/EP1312188A1/en
Application granted granted Critical
Publication of EP1312188B1 publication Critical patent/EP1312188B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • H04L2012/6445Admission control
    • H04L2012/6456Channel and bandwidth allocation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • H04L2012/6481Speech, voice
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • H04L2012/6486Signalling Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • H04L2012/6497Feedback to the source

Definitions

  • This invention relates to audio data processing and in particular concerns a method, system and software application program for a real-time audio service for processing audio data streams transmitted over a communications network.
  • the Internet standard Real Time Protocol (RTP) (IETF RFC 1889) is the standard packet format used for continuous media traffic, such as audio streams, on the Internet.
  • RTP includes sophisticated algorithms to control the amount of management traffic placed on the network, but assumes that audio traffic will not be a problem: "For example, in any audio conference the data [audio] traffic is inherently self-limiting because only one or two people will speak at a time .", (RFC 1889, section 6.1).
  • the multicast Internet backbone (MBone) which provides the Internet's wide-area multicast capabilities, does not actively support simultaneous speakers. The Mbone guidelines for use and resources assume that each audio session will not have more than one active speaker at a time.
  • CVEs collaborative virtual environments
  • Speech is the primary communication medium for on-line social interaction and so real-time audio services are vital for these new applications.
  • speech is arguably the most critical aspect of any real-time collaborative application.
  • problems with other aspects of collaborative systems such as video, shared tools, or 3D graphics, can be resolved or compensated for by speech.
  • Audio services are known in which peer-to-peer architectures provide each user with an independent audio stream from each respective speaker. By receiving independent audio streams each listener can create their own personal audio mix. The mix can be tailored to the user's own audio equipment capabilities; the audio streams can be spatialised according to associated speaker locations within a shared virtual space; and the streams can be individually controlled, for example allowing the volume of particular speakers to be raised or lowered according to their relative importance, for example.
  • Peer-to-Peer audio services can be implemented using unicast or multicast protocols.
  • This congestion causes speech to become delayed (due both to direct delay and also jitter) and increasingly broken and disjointed (due to audio data packets being lost in transit).
  • the peer-to-peer approach is also very demanding in terms of the processing to be done by each listener's terminal, which must be capable of receiving, decoding, processing, mixing and playing out all of the available audio streams.
  • User terminals may become overloaded, causing problems similar to an overloaded network.
  • Audio services which use total mixing are also known.
  • each audio stream is sent to a central audio server that mixes the audio streams into a single stream that is then redistributed to each listener.
  • This approach requires that each listener and surrounding network handles only one audio stream.
  • Total mixing prevents each listener from creating their own personal mix and the central server becomes a potential bottleneck in the system, since it and its nearby network still have to receive, mix and distribute n audio streams.
  • Other drawbacks associated with total mixing include increased delay due to additional processing, reduced audio fidelity, additional hardware requirements and management complexity. In this respect total mixing is only appropriate for relatively simple applications where resources are limited.
  • Total mixing is not appropriate for high fidelity applications such as home entertainment and tele-conferencing where more resources are generally available and where audio spatialisation is more important, that is to say where separate audio streams are required.
  • An example of total mixing is described by one of the present inventors in "Inhabited TV: Multimedia Broadcasting from Large Scale Collaborative Virtual World”, Fracta Universitatisi, Ser. Electronics and Energetics, 13(1) IEEE, ISBN 0-7803-5678-X.
  • Another example of an audio service that supports many simultaneous speakers is described in "Diamond Park and Spline: A Social Virtual Reality System With 3D Animation, Spoken Interaction, and Runtime Extendibility", PRESENCE, 6(4), 461-481 , 1997 MIT.
  • This paper describes a system which allow users with low bandwidth connections to access an audio-graphical multi-user virtual world.
  • one or more low-bandwidth users connect to a specialised access server that interfaces to a main system which is configured for peer-to-peer multicast audio.
  • the access server or servers deal with and mix all of the available audio streams.
  • a method of processing audio data streams transmitted in a communications network comprising the steps of:- i) receiving a plurality of audio data streams transmitted from one or more audio data stream transmitters distributed in the network; and, ii) processing data relating to at least one respective network resource parameter to determine respective network resources available for subsequent communication of said audio data streams to at least one respective audio data stream receiver in the network; and, iii) comparing said available resources with respective network resource requirements necessary for communicating said audio streams to at least one respective audio data stream receiver in the network; and, iv) determining whether to mix selected audio data streams prior to transmission in response to said comparison.
  • network resources refers to any component in a communications network necessary for communicating audio data streams to potential recipients, including but not limited to network communication links, network audio processors or mixers, audio stream transmitters and receivers and user terminals, for example.
  • the above method enables audio mixing decisions to be made dynamically in response to changing network conditions or application requirements.
  • the number of data streams transmitted can be controlled so that network traffic can be optimised according to available network resources.
  • This aspect of the invention is particularly relevant for dynamic virtual environment applications involving varying numbers of active participants engaged in various activities and running over dynamic networks where congestion and delay may change rapidly.
  • the method enables mixing decisions to be made that adapt to changing network conditions and application requirements so as to optimise the conflicting requirements of audio quality and traffic management.
  • the method further comprises the step of:- v) processing two or more audio streams in response to said comparison to provide at least one mixed audio data stream for subsequent transmission in said network.
  • the traffic introduced into the network can be reduced.
  • the amount of processing required by neighbouring data stream receivers can also be reduced.
  • Mixing enables two or more selected audio data streams to be combined to reduce overall network congestion without significantly affecting the intelligibility of the mixed audio streams when received by a user, for example in a similar way that stereo audio signals can be combined for playback on a non-stereo enabled output device such as a hand held radio receiver having a single loudspeaker.
  • steps ii) and iii) comprise the steps of:- determining a current value for the or each respective network resource parameter; and, comparing the or each respective current resource parameter value with a respective minimum resource threshold value necessary for communicating said unmixed audio data streams to the or each respective receiver.
  • steps ii) and iii) comprise the steps of:- determining a current value for the or each respective network resource parameter; and, comparing the or each respective current resource parameter value with a respective minimum resource threshold value necessary for communicating said unmixed audio data streams to the or each respective receiver.
  • said minimum resource threshold value is determined according to at least one pre-defined quality of service parameter.
  • said network is a packet switched network and said pre-defined quality of service parameter is defined by a maximum packet loss rate.
  • a minimum threshold value may be determined according to an acceptable packet loss rate associated with a codec used to encode respective audio streams. For instance, a maximum acceptable packet loss rate for an audio codec may be 15%.
  • one network resource parameter relates to available network bandwidth for transmission of said audio data streams to the or each respective receiver. In this way mixing decisions can be determined according to the bandwidth available for transmitting the audio streams to a next audio stream receiver in the network. This provides for efficient use of network bandwidth and readily enables a maximum number of mixed or unmixed audio data streams to be transmitted without causing congestion in the network.
  • said available bandwidth capacity is determined by user specific quality of service requirements.
  • bandwidth resources can be allocated or reserved for use according to user specified quality of service requirements.
  • the allocation or reservation of bandwidth may be controlled by different charging tariffs associated with the quality of service required.
  • a user may specify a quality of service requirement of say 3 x 64kb/s audio channels in which case selected audio streams will be mixed by the network when more than three separate audio streams are to be transmitted by the network.
  • available bandwidth may be considered as allocated or reserved bandwidth.
  • one network resource parameter relates to receiver processing characteristics.
  • mixing decisions can be determined according to the characteristics of respective receivers. In this way separate audio streams being sent to a receiver having a low processing capability or capacity can be mixed in the network so that the number of audio streams the receiver receives is reduced.
  • said audio data streams are selected for mixing according to predetermined criteria. This enables selection criteria to be used to determine which of the received audio data streams should be mixed.
  • said audio streams are mixed according to audio stream content. For instance, in a virtual environment one or more audio streams may be more significant in terms of audio content than others and the loss of audio spatialisation experienced by a recipient will be less if the less significant audio streams are mixed in preference to the more significant ones.
  • said audio , streams are mixed according to recipient requirements.
  • mixing can be determined by the recipient's own requirements, for instance the extent of audio spatialisation required.
  • said audio streams are mixed according to respective audio stream sources.
  • audio streams from related sources can be mixed, for instance it may be desirable to mix all audio streams associated with a particular group of participants in an audio conference or virtual environment.
  • audio streams are mixed according to receiver capabilities.
  • audio streams may be mixed according to the capabilities of the receiver.
  • a receiver may comprise a full 3-D audio system capable of recreating fully spatialised studio quality audio where mixing considerations are important for recreating spatialised audio.
  • a receiver may comprise a simple stereo audio system where mixing considerations are less important.
  • a system for processing audio data streams transmitted in a communications network comprising:- i) a receiver for receiving a plurality of audio data streams transmitted from one or more audio data stream transmitters distributed in the network; and, ii) a processor for processing data relating to at least one respective network resource parameter to determine respective network resources available for subsequent transmission of said audio data streams in the network; and, iii) a comparator for comparing said available resources with respective network resource requirements necessary for transmission of said audio streams to at least one respective audio data stream receiver in the network; and, iv) determining means for determining whether to mix selected audio data streams prior to transmission in response to said comparison.
  • Figure 1 is a schematic representation of a network used for implementing an embodiment of the present invention
  • Figure 2 is a schematic representation of a logical network topology for part of the network of Figure 1;
  • Figure 3 shows a modular block diagram of an audio data processor for processing audio data streams transmitted in the network of Figures 1 and 2;
  • FIG. 4 is a flowchart showing steps involved in implementing an embodiment of the invention.
  • Figure 5 is a schematic representation of a distributed processing system for simulating network conditions
  • Figure 6 is a schematic representation of one implementation of the distributed processing system of Figure 5 used for determining network performance characteristics
  • Figure 7 is a graphical representation of simulated network performance characteristics showing packet loss rates for different audio distribution strategies; and, Figure 8 is a graphical representation similar to Figure 7 showing the different audio stream distribution characteristics for different audio distribution strategies.
  • FIG. 1 An example of an IP communications network for implementing one arrangement of the invention is shown in Figure 1.
  • a plurality of user terminals 100 are connected to the Internet 102 via respective Internet access servers 106 which are each connected to an audio processor 104.
  • the audio processors 104 are each capable of mixing a plurality of audio data streams received over the network.
  • the audio data streams are typically transmitted using the Internet standard data stream transfer protocol Real Time Protocol (RTP) and underlying multicast transport protocols although unicast could be used also.
  • RTP Real Time Protocol
  • the network of Figure 1 can be considered to comprise a plurality of audio data stream sources 200, audio data stream sinks 202 and audio mixer 204 type components.
  • each terminal 100 and audio processor 104 may comprise at least one source, sink or mixer component and the network of Figure 1 may comprise a plurality of these components linked together by audio data streams 206.
  • a simple logical network topology is shown in which the source and sink components are positioned at the terminal nodes of the network and the mixer components are positioned at the non-terminal nodes.
  • each source 200 transmits one or more audio data streams to a respective parent mixer node 204.
  • the mixer nodes may then forward the received audio data stream or streams directly to the other connected nodes, mix selected ones of the received audio data streams and then forward the respective mixed data streams, or perform a combination of these two actions.
  • Each sink eventually receives each of the data streams either in the original un-mixed state or part of a new mixed stream.
  • each mixer component may additionally create multiple mixes from arbitrary subsets of the received data streams and transmit these mixes instead of or in addition to the audio data streams it would otherwise forward to other nodes in the network. For example, each mixer component may select to mix only some of its incoming audio data streams and forward the resultant partial mix on to other connected nodes.
  • each audio processor 104 comprises an audio data stream receiver 302 for receiving audio data streams transmitted over the IP network, an audio mixer component 304 for mixing selected audio data streams and an audio data stream transmitter 306 for forwarding mixed and unmixed audio data streams to other audio processors 104 or terminals 100 in the IP network.
  • the audio data stream receiver 302 and transmitter 306 comprises software components of the type implemented in audio applications such as Real lnc's Real Player or Microsoft lnc's Media Player, for example IP sockets, packet buffers, packet assemblers etc.
  • the audio mixer component 306 is provided for mixing two or more audio data streams received at the processor 104.
  • the audio mixer component is arranged to average selected audio data streams to create a single combined data stream comprising the same number of bits as each unmixed data stream.
  • Each audio processor is further provided with a congestion monitor 308 for monitoring congestion on the respective communication transmission links connecting the audio processor to other parts of the network.
  • the congestion monitor 308 utilises Real Time Control Protocol (RTCP) control messages or the like received from respective audio stream receivers in the network to determine congestion levels on respective transmission links. These control messages provide the audio processors with information relating to data stream congestion on respective neighbouring transmission links and audio processors.
  • RTCP Real Time Control Protocol
  • Selected audio processors typically those on the edge of the network connecting terminals 100 to the Internet, are also provided with a database 310 containing data relating to recipient terminal equipment characteristics including audio data processing and playback capabilities, network connection types and speeds, other recipient specific data including current tariff data for determining an appropriate quality of service to be provided and user specific audio data stream mixing profiles comprising user defined mixing preferences and other data relating to user specific mixing policies. Mixing may occur at many of the audio processor mixer nodes in the network of
  • FIG. 1 When an audio processor mixes selected audio streams the number of audio data streams to be transmitted to neighbouring audio processor or terminal nodes is reduced and this also reduces the amount of processing required at the neighbouring nodes. Distributing the audio processor mixer nodes throughout the network as described enables the overall audio mixing task to be shared and mixing bottlenecks avoided. Distributed processing not only provides for scalability, say in terms of the maximum number of simultaneous speakers allowed in an audio conference, but also enables audio processors to monitor and respond to changing local network conditions in heterogeneous environments such as the Internet. Since mixing reduces the quality of the audio streams being mixed the audio processors are arranged to limit the amount of mixing they perform
  • Mixing decisions are made dynamically by the audio data processors according to software program logic stored in audio processor memory and executed by the audio processors.
  • network conditions such as available bandwidth, packet loss rate and delay are monitored by the congestion monitor 308 in step 400 to determine current values for respective network resource parameters associated with transmission links and neighbouring processor and terminal nodes located in the network.
  • Current values for variable network resource parameters including but not limited to available bandwidth on respective transmission links and processing capacity and delay of respective processor nodes are stored in the data base 310 in step 402.
  • Current values for other more stable network resource parameters including but not limited to terminal node processing capability and capacity and user specific mixing preferences and quality of service requirements are also stored in the processor databases 310.
  • the more stable resource parameters may be monitored in step 400 in the same way as the variable resource parameters or by periodic polling the resources for current parameter values.
  • step 404 the receiver 302 monitors relevant unicast or multicast communication channels for incoming audio data streams and determines in step 406 whether any streams are being received at the audio processor node. If audio data streams are being received the audio processor proceeds to step 408 where an appropriate algorithm determines the network resources necessary for transmitting the received data streams to the next relevant node or nodes in the network. If audio data streams are not being received monitoring continues and control passes back to step 400. In step 408 network resources necessary for forwarding all the received data streams to each respective node in the relevant network distribution tree are determined and in step 410 the respective available resources for transmitting the data streams are determined.
  • step 412 the current values of the relevant network resource parameters are compared with respective resource requirements necessary for communicating the received audio streams to the respective next network nodes comprising audio data stream receivers. If the available resources match those required for subsequent communication of the data streams all the data streams are transmitted onwards in step 420 to the relevant next nodes. However, if the audio processors determine in step 412 that there are insufficient resources selected mixing occurs. For instance, if there are only three communications channels available on a particular transmission link and four separate data streams are received, at least one pair of data streams are mixed prior to subsequent transmission by the respective audio processor. Similarly, if a recipient terminal node is only capable of processing two data streams simultaneously only two sets of mixed streams are provided for transmission to that terminal.
  • a terminal node is designated as having a predetermined quality of service as defined by a user selected tariff there may be insufficient bandwidth allocated for forwarding all the streams without mixing so that mixing will occur even if the network has sufficient bandwidth resources available on the relevant link or links to the user terminal node.
  • the difference between available and required resources is determined so that the number of streams to be mixed can be determined.
  • the audio data stream processors select appropriate data streams to be mixed in step 416.
  • the selected streams are mixed in step 418 and then transmitted by the appropriate audio processor transmitter 306 in step 420. Audio data stream selection in step 416 may be based on any number of considerations relevant to network, application or user requirements, for example.
  • Some real-time CSCW applications assign participants to different roles within an event. For example, early experiments in inhabited television differentiated between performers, inhabitants and viewers. Performers are part of the core content of an on-line TV show whereas inhabitants are active within the virtual world, but typically receive a broadcast mix created by a director. These roles are complemented by differences in the technologies used to access the real-time event. Performers typically use professional studio-quality equipment, with fully spatialised 3D audio. Inhabitants may use commodity PCs, equipped with headphones. Viewers on the other hand may use conventional television sets, equipped with multiple loudspeaker surround sound audio systems. Roles or so called "layers of participation" can determine mixing policy.
  • CSCW applications may also benefit from defining layers of participation and using these to prioritise audio sources for mixing.
  • Another consideration for mixing may concern the roles of listeners or recipients. Mixing can also be prioritised according to listener requirements. Roles or layers of participation can also define the different ways in which listeners take part in an event, although many participants will be both speakers and listeners in an event. For example, an active inhabitant may benefit from fully spatialised audio that provides clues to support navigation and conversation management. A passive viewer with a surround-sound audio system may benefit from a mix that clearly separates the key performers, but where their accurate location in the world is less important. In the case of inhabitants it may be important to maintain the separation of streams from nearby participants, whereas for viewers it may be appropriate to maintain the separation of key performers only.
  • a further mixing consideration may concern the grouping of audio sources.
  • CSCW applications often group participants in some pre-determined way. It is often appropriate to mix audio streams from one coherent group. For example, avatars in a CVE may have gathered together to form definable and separate groups. Audio streams from each group could be mixed to form a single stream that could be spatialised to the average position of the group as a whole in the CVE.
  • some CSCW applications calculate levels of mutual awareness among participants which may provide a more dynamic basis for mixing respective audio streams.
  • Another mixing consideration may concern voice characteristics. The timbre of voices or other audio sources may be useful for determining which streams to mix. For example, it may be appropriate to mix a high and a low pitch voice into a single stream so that a listener can readily separate them when hearing the mixed stream.
  • Patterns of activity within a multiple speaker environment may also determine mixing decisions. For example, audio data streams from participants whose speech rarely overlaps could be mixed together
  • Mixing may depend upon aspects of the available communication network, including its topology (i.e. shape and structure), underlying bandwidth, regional variations, or transitory congestion. Mixing decisions may also depend on the available computing resources, for example the number and capability of available mixer components, how many are positioned within the network, and how heavily loaded they are. Mixing decisions may also consider the current and past states of the system. For example, the transition from one choice of mixed streams to another may be noticeable to users, and potentially undesirable.
  • audio priority will be higher for the referee say, than the other players.
  • the system will avoid mixing the referee's audio stream with the respective players audio streams unless this becomes absolutely necessary due to network resource limitations.
  • audio streams from a crowd of spectators will have a lower priority since each spectator will not need to be heard individually. It may be sufficient to mix all the streams from the same "stand" or group of collocated spectators and spatialise only the resulting stream at the receiver. In this way mixing is based on the roles of the speakers, that is to say, audio streams from more important speakers are forwarded and less important speakers are mixed
  • mixing policy may be based on the collocation and mutual awareness of the participants. For example, participants may want to receive separate audio streams from other collocated participants or other more important participants of whom they are more aware of than others. The remainder of the of the group can be mixed together or divided into smaller groups which are mixed separately, similar to the different stands in the football game example.
  • the lecturer's and demonstrators' audio streams may be forwarded so that the other participants can process the individual streams on receipt, where the respective students audio streams will be mixed.
  • the present inventors have implemented and tested the invention in a distributed processing system simulator 500 shown in Figure 5.
  • the distributed processing system two end user systems 502a and 502b are shown on a first local area network 506 and two end systems 502c and 502d on a second local area network 512.
  • the system 500 may comprise any number of end user systems 502 depending on the networks being simulated.
  • Each end user system comprises a respective virtual world client 514 for accessing a shared virtual world generated by a virtual world server 516 on LAN 512.
  • Each end user system is also provided with a local audio server 518 that is interfaced to respective audio hardware (not shown) so that users can speak to each other within the virtual world environment.
  • Each client 514 controls the local audio server 518 for the respective end system and uses information in the virtual world to determine how the audio server should transmit, receive and process audio streams, for example according to the positions of other users in the virtual world.
  • each user's audio server 518 sends an audio data stream directly to all the other audio servers in the system using underlying unicast or multicast protocols.
  • An audio processor 104 is provided on each LAN for mixing selected data streams received from the connected audio servers.
  • the audio processors are both controlled directly by the virtual world server 516 and are connected together by means of a WAN simulator 520.
  • the local audio servers 518 and audio processors 104 together define an audio distribution tree as shown by the dashed lines 522 in Figure 5.
  • Each audio processor is capable of receiving audio data streams from the audio servers on its respective LAN and transmitting these streams to the remote audio processor on the other LAN.
  • the audio processors are arranged to adapt to changing network conditions so that instead of three separate audio streams being forwarded from say end user systems 502a, 502b and 502c to 502d the audio processors 104 can mix respective data streams so that end user systems 502c and 502d receive a single stream comprising a mixture of the streams from end user systems 502a and 502b and a separate audio stream from each other, for example.
  • Two quantifiable aspects of audio quality were considered, the level of packet loss experienced and the degree of audio stream spatialisation, that is the number of separate audio streams delivered to an end system. These two criteria were chosen since they both relate to the end user's perceived experience of the system and they can also be readily determined from measurements of the system 500, for example the number of packets being sent per second.
  • the first measure, the level of packet loss experienced is the primary determinant of whether a network audio stream will be intelligible to the user and therefore of any use at all.
  • FIG. 6 shows a system 600 configured from the system of Figure 5 for evaluating the effect of network congestion on audio quality.
  • the LANs 506 and 512 are assumed to be generally congestion free high bandwidth networks connected via a lower bandwidth shared WAN 520 which is prone to congestion.
  • Six end user systems 502 are provided on LAN 506 for simulating network usage.
  • a single end user system is provided on LAN 512.
  • a WAN simulation tool is provided for simulating network delays and bandwidth restrictions for a limited bandwidth WAN connection.
  • An additional application 602 is provided for introducing controlled levels of competing traffic onto the simulated WAN connection 520 in order to create network congestion.
  • All packets on the system 600 are monitored and analysed to classify the packets and to measure the number of audio streams in transit and the amount of competing traffic. Packet loss experienced by the audio streams in transit is measured by matching the number of packets leaving LAN 506 to those arriving on LAN 512.
  • Figure 7 shows the effect that increasing levels of congestion has on the packet loss rate experienced for each of the three audio distribution strategies.
  • the peer-to-peer approach (line 700 in the graph of Figure 7) experiences increasing levels of packet loss as competing traffic increases.
  • the packet loss rate exceeds 15% at 210 Kbits/s of additional traffic.
  • Full-mixing (line 702 in the drawing) uses the minimum bandwidth throughout, and only starts to experience congestion when the competing traffic reaches 490 Kbits/s.
  • Distributed partial mixing (line 704 in the drawing) gives higher loss rates than full mixing, but much lower rates than all- forwarding peer-to-peer, and maintains its loss rate below 15% even with 490 Kbit/s of competing traffic (as for full mixing).
  • Figure 8 shows the number of separate audio streams being transmitted to a listener on the end user system on LAN 512.
  • line 800 in the graph of Figure 8 six streams are always transmitted by the LAN 506, however none of these arrive in any useful form when competing traffic levels exceed 210 Kbits/s.
  • For total mixing (line 802 in Figure 8), one stream is always sent.
  • Dynamic mixing (line 804 in Figure 8) lies between these two extremes. With no congestion, six distinct steams are transmitted over the WAN connection from the WAN 506 to the WAN 512. As competing traffic, and hence congestion increases, dynamic mixing reduces the number of distinct streams by mixing more audio streams together. When competing traffic levels reach 490 Kbits/s dynamic mixing falls back to total mixing, with only a single stream sent over the WAN.
  • the distributed partial mixing approach has the following distinctive benefits: it is adaptive, reacting to network congestion in a way that peer-to- peer systems cannot; it supports dynamic load balancing between different distributed components of the audio service; it readily supports heterogeneous networks and different end user terminal capabilities; and it is adapative to varying application requirements.

Abstract

The invention provides a method, software program and system for processing audio data streams (206) transmitted in a communications network. In one aspect of the invention a method of processing audio data streams comprises the steps of: i) (404) receiving a plurality of audio data streams (206) transmitted from one or more audio data stream transmitters (306) distributed in the network; ii) (408) processing data relating to at least one respective network resource parameter to determine respective network resources available for subsequent communication of the audio data streams to at least one respective audio data stream receiver (306); iii) (412) comparing available resources with respective network resource requirements necessary for communicating the audio streams to at least one respective audio data stream receiver; and, (412, 414, 416) determining whether to mix selected audio data streams prior to transmission in response to the comparison (412). The invention allows network resources to be matched to network resource requirements so that audio quality is optimised when silmultaneous real-time audio data streams are transmitted in a communications network.

Description

AUDIO DATA PROCESSING
This invention relates to audio data processing and in particular concerns a method, system and software application program for a real-time audio service for processing audio data streams transmitted over a communications network.
Developers of real-time audio services have for some time recognised the need for scalability in terms of the number of simultaneous listeners to an audio stream. Applications such as video on demand, on-line lectures and Internet radio require an audio stream to be broadcast from a single source to potentially many listeners. Various techniques have been proposed for reducing the bandwidth required for such broadcasts, most notably network multicasting including Internet multicasting, for example. By comparison however, support for many simultaneous speakers has not been fully developed. Some telephony and CSCW (Computer Supported Co-operative Work) applications such as audio and video-conferencing do support several simultaneous speakers, but the focus has been mostly on small groups. In addition, where the possibility of simultaneous speaking has been acknowledged applications usually prevent it through the use of floor control and "push-to-talk" concepts that enforce turn-taking and ensure that only one participant speaks at a time.
Communication protocols often discount the possibility of multiple simultaneous speakers. The Internet standard Real Time Protocol (RTP) (IETF RFC 1889) is the standard packet format used for continuous media traffic, such as audio streams, on the Internet. RTP includes sophisticated algorithms to control the amount of management traffic placed on the network, but assumes that audio traffic will not be a problem: "For example, in any audio conference the data [audio] traffic is inherently self-limiting because only one or two people will speak at a time ...", (RFC 1889, section 6.1). Similarly, the multicast Internet backbone (MBone), which provides the Internet's wide-area multicast capabilities, does not actively support simultaneous speakers. The Mbone guidelines for use and resources assume that each audio session will not have more than one active speaker at a time. Telecommunication applications are now emerging that support on- line events involving large groups of people. For example, collaborative virtual environments (CVEs) support interactive social events such as multi-player games and inhabited television for large on-line communities. Speech is the primary communication medium for on-line social interaction and so real-time audio services are vital for these new applications. Indeed, speech is arguably the most critical aspect of any real-time collaborative application. Research has shown that problems with other aspects of collaborative systems such as video, shared tools, or 3D graphics, can be resolved or compensated for by speech.
Analysis of patterns of audio activity in CVE applications, for example virtual teleconferencing, has revealed significant periods when several participants are simultaneously generating audio traffic. In relatively focused applications such as teleconferencing, audio activity is best approximated by a model of people transmitting audio at random, rather than deliberately avoiding overlapping speech. Indeed, overlapping audio, including speech and other sounds, is likely to be a basic requirement for many CSCW applications Unfortunately, there are many disadvantages associated with real-time audio services that allow many simultaneous speakers. Real-time audio is bandwidth intensive, particularly for applications that support large numbers of users. In addition, packet-loss due to network congestion can severely reduce the intelligibility of received audio streams.
Providing an audio service for many simultaneous speakers is a significant problem since each speaker independently introduces a new audio stream that has to be accommodated by the network and that also has to be processed by each recipient's audio receiver. While multicasting reduces the bandwidth required to distribute a single audio stream to many listeners, it does not address the problem of many simultaneous speakers. Audio services are known in which peer-to-peer architectures provide each user with an independent audio stream from each respective speaker. By receiving independent audio streams each listener can create their own personal audio mix. The mix can be tailored to the user's own audio equipment capabilities; the audio streams can be spatialised according to associated speaker locations within a shared virtual space; and the streams can be individually controlled, for example allowing the volume of particular speakers to be raised or lowered according to their relative importance, for example. Peer-to-Peer audio services can be implemented using unicast or multicast protocols.
The peer-to-peer approach is very demanding in terms of network resources, particularly bandwidth, and can easily flood the network with traffic. With underlying unicast protocols, which are typically used today for wide-area communication, the resulting number of audio streams is of the other n2, where n is the number of simultaneous users, that is to say for the case in which all users are sending audio data simultaneously. With underlying multicast protocols, which are still experimental over wide-area networks, this reduces to the order n, but no lower. With conventional networks such as the Internet, once the traffic in any part of the network exceeds the local capacity then that part of the network becomes congested. This congestion causes speech to become delayed (due both to direct delay and also jitter) and increasingly broken and disjointed (due to audio data packets being lost in transit). The peer-to-peer approach is also very demanding in terms of the processing to be done by each listener's terminal, which must be capable of receiving, decoding, processing, mixing and playing out all of the available audio streams. User terminals may become overloaded, causing problems similar to an overloaded network.
Audio services which use total mixing are also known. In the total mixing approach each audio stream is sent to a central audio server that mixes the audio streams into a single stream that is then redistributed to each listener. This approach requires that each listener and surrounding network handles only one audio stream. Total mixing, however, prevents each listener from creating their own personal mix and the central server becomes a potential bottleneck in the system, since it and its nearby network still have to receive, mix and distribute n audio streams. Other drawbacks associated with total mixing include increased delay due to additional processing, reduced audio fidelity, additional hardware requirements and management complexity. In this respect total mixing is only appropriate for relatively simple applications where resources are limited. Total mixing is not appropriate for high fidelity applications such as home entertainment and tele-conferencing where more resources are generally available and where audio spatialisation is more important, that is to say where separate audio streams are required. An example of total mixing is described by one of the present inventors in "Inhabited TV: Multimedia Broadcasting from Large Scale Collaborative Virtual World", Fracta Universitatisi, Ser. Electronics and Energetics, 13(1) IEEE, ISBN 0-7803-5678-X. Another example of an audio service that supports many simultaneous speakers is described in "Diamond Park and Spline: A Social Virtual Reality System With 3D Animation, Spoken Interaction, and Runtime Extendibility", PRESENCE, 6(4), 461-481 , 1997 MIT. This paper describes a system which allow users with low bandwidth connections to access an audio-graphical multi-user virtual world. In this approach one or more low-bandwidth users connect to a specialised access server that interfaces to a main system which is configured for peer-to-peer multicast audio. The access server or servers deal with and mix all of the available audio streams. According to an aspect of the present invention there is provided a method of processing audio data streams transmitted in a communications network; said method comprising the steps of:- i) receiving a plurality of audio data streams transmitted from one or more audio data stream transmitters distributed in the network; and, ii) processing data relating to at least one respective network resource parameter to determine respective network resources available for subsequent communication of said audio data streams to at least one respective audio data stream receiver in the network; and, iii) comparing said available resources with respective network resource requirements necessary for communicating said audio streams to at least one respective audio data stream receiver in the network; and, iv) determining whether to mix selected audio data streams prior to transmission in response to said comparison.
The term "network resources" used herein refers to any component in a communications network necessary for communicating audio data streams to potential recipients, including but not limited to network communication links, network audio processors or mixers, audio stream transmitters and receivers and user terminals, for example.
By comparing available network resources with network resource requirements necessary for communicating audio streams the above method enables audio mixing decisions to be made dynamically in response to changing network conditions or application requirements. In this way the number of data streams transmitted can be controlled so that network traffic can be optimised according to available network resources. This aspect of the invention is particularly relevant for dynamic virtual environment applications involving varying numbers of active participants engaged in various activities and running over dynamic networks where congestion and delay may change rapidly. The method enables mixing decisions to be made that adapt to changing network conditions and application requirements so as to optimise the conflicting requirements of audio quality and traffic management.
Preferably the method further comprises the step of:- v) processing two or more audio streams in response to said comparison to provide at least one mixed audio data stream for subsequent transmission in said network. Thus, if a decision is made to mix selected audio data streams the traffic introduced into the network can be reduced. The amount of processing required by neighbouring data stream receivers can also be reduced. Mixing enables two or more selected audio data streams to be combined to reduce overall network congestion without significantly affecting the intelligibility of the mixed audio streams when received by a user, for example in a similar way that stereo audio signals can be combined for playback on a non-stereo enabled output device such as a hand held radio receiver having a single loudspeaker.
Conveniently, steps ii) and iii) comprise the steps of:- determining a current value for the or each respective network resource parameter; and, comparing the or each respective current resource parameter value with a respective minimum resource threshold value necessary for communicating said unmixed audio data streams to the or each respective receiver. In this way the available network resources can be compared with pre-determined minimum resource requirements necessary for transmitting or processing all the unmixed data streams.
In preferred embodiments, said minimum resource threshold value is determined according to at least one pre-defined quality of service parameter. In this way mixing decisions can be made according to pre-determined quality of service requirements. Preferably, said network is a packet switched network and said pre-defined quality of service parameter is defined by a maximum packet loss rate. Thus, a minimum threshold value may be determined according to an acceptable packet loss rate associated with a codec used to encode respective audio streams. For instance, a maximum acceptable packet loss rate for an audio codec may be 15%. Conveniently, one network resource parameter relates to available network bandwidth for transmission of said audio data streams to the or each respective receiver. In this way mixing decisions can be determined according to the bandwidth available for transmitting the audio streams to a next audio stream receiver in the network. This provides for efficient use of network bandwidth and readily enables a maximum number of mixed or unmixed audio data streams to be transmitted without causing congestion in the network.
Preferably, said available bandwidth capacity is determined by user specific quality of service requirements. In this way bandwidth resources can be allocated or reserved for use according to user specified quality of service requirements. The allocation or reservation of bandwidth may be controlled by different charging tariffs associated with the quality of service required. In this way a user may specify a quality of service requirement of say 3 x 64kb/s audio channels in which case selected audio streams will be mixed by the network when more than three separate audio streams are to be transmitted by the network. In this way available bandwidth may be considered as allocated or reserved bandwidth. Conveniently, one network resource parameter relates to receiver processing characteristics. Thus, mixing decisions can be determined according to the characteristics of respective receivers. In this way separate audio streams being sent to a receiver having a low processing capability or capacity can be mixed in the network so that the number of audio streams the receiver receives is reduced.
In preferred embodiments, said audio data streams are selected for mixing according to predetermined criteria. This enables selection criteria to be used to determine which of the received audio data streams should be mixed.
In one way, said audio streams are mixed according to audio stream content. For instance, in a virtual environment one or more audio streams may be more significant in terms of audio content than others and the loss of audio spatialisation experienced by a recipient will be less if the less significant audio streams are mixed in preference to the more significant ones.
In another way, said audio , streams are mixed according to recipient requirements. In this way mixing can be determined by the recipient's own requirements, for instance the extent of audio spatialisation required.
In a further way, said audio streams are mixed according to respective audio stream sources. In this way audio streams from related sources can be mixed, for instance it may be desirable to mix all audio streams associated with a particular group of participants in an audio conference or virtual environment.
In a yet further way, said audio streams are mixed according to receiver capabilities. In this way audio streams may be mixed according to the capabilities of the receiver. For example, a receiver may comprise a full 3-D audio system capable of recreating fully spatialised studio quality audio where mixing considerations are important for recreating spatialised audio. Alternatively a receiver may comprise a simple stereo audio system where mixing considerations are less important.
According to another aspect of the invention there is provided a software program arranged to process audio data streams according to the above mentioned method.
According to a further aspect of the invention there is provided a system for processing audio data streams transmitted in a communications network; said system comprising:- i) a receiver for receiving a plurality of audio data streams transmitted from one or more audio data stream transmitters distributed in the network; and, ii) a processor for processing data relating to at least one respective network resource parameter to determine respective network resources available for subsequent transmission of said audio data streams in the network; and, iii) a comparator for comparing said available resources with respective network resource requirements necessary for transmission of said audio streams to at least one respective audio data stream receiver in the network; and, iv) determining means for determining whether to mix selected audio data streams prior to transmission in response to said comparison.
The invention will now be described by way of example only with reference to the accompanying drawings in which:-
Figure 1 is a schematic representation of a network used for implementing an embodiment of the present invention;
Figure 2 is a schematic representation of a logical network topology for part of the network of Figure 1; Figure 3 shows a modular block diagram of an audio data processor for processing audio data streams transmitted in the network of Figures 1 and 2;
Figure 4 is a flowchart showing steps involved in implementing an embodiment of the invention;
Figure 5 is a schematic representation of a distributed processing system for simulating network conditions;
Figure 6 is a schematic representation of one implementation of the distributed processing system of Figure 5 used for determining network performance characteristics;
Figure 7 is a graphical representation of simulated network performance characteristics showing packet loss rates for different audio distribution strategies; and, Figure 8 is a graphical representation similar to Figure 7 showing the different audio stream distribution characteristics for different audio distribution strategies.
An example of an IP communications network for implementing one arrangement of the invention is shown in Figure 1. In Figure 1 a plurality of user terminals 100 are connected to the Internet 102 via respective Internet access servers 106 which are each connected to an audio processor 104. The audio processors 104 are each capable of mixing a plurality of audio data streams received over the network. Although only two audio processors 104 are shown in the network of Figure 1 in practice any number of audio processors may be provided and distributed throughout the network for receiving, mixing or transmitting respective audio data streams. The audio data streams are typically transmitted using the Internet standard data stream transfer protocol Real Time Protocol (RTP) and underlying multicast transport protocols although unicast could be used also.
Referring to Figure 2, logically the network of Figure 1 can be considered to comprise a plurality of audio data stream sources 200, audio data stream sinks 202 and audio mixer 204 type components. Thus, each terminal 100 and audio processor 104 may comprise at least one source, sink or mixer component and the network of Figure 1 may comprise a plurality of these components linked together by audio data streams 206. In Figure 2 a simple logical network topology is shown in which the source and sink components are positioned at the terminal nodes of the network and the mixer components are positioned at the non-terminal nodes. In the network of Figure 2 each source 200 transmits one or more audio data streams to a respective parent mixer node 204. The mixer nodes may then forward the received audio data stream or streams directly to the other connected nodes, mix selected ones of the received audio data streams and then forward the respective mixed data streams, or perform a combination of these two actions. Each sink eventually receives each of the data streams either in the original un-mixed state or part of a new mixed stream. As will be described in greater detail later, each mixer component may additionally create multiple mixes from arbitrary subsets of the received data streams and transmit these mixes instead of or in addition to the audio data streams it would otherwise forward to other nodes in the network. For example, each mixer component may select to mix only some of its incoming audio data streams and forward the resultant partial mix on to other connected nodes.
In the arrangement of Figure 3 the audio data stream processors are integrated with selected IP multicast enabled routers 300 distributed in the network of Figure 1. Each audio processor 104 comprises an audio data stream receiver 302 for receiving audio data streams transmitted over the IP network, an audio mixer component 304 for mixing selected audio data streams and an audio data stream transmitter 306 for forwarding mixed and unmixed audio data streams to other audio processors 104 or terminals 100 in the IP network. The audio data stream receiver 302 and transmitter 306 comprises software components of the type implemented in audio applications such as Real lnc's Real Player or Microsoft lnc's Media Player, for example IP sockets, packet buffers, packet assemblers etc. The audio mixer component 306 is provided for mixing two or more audio data streams received at the processor 104. The audio mixer component is arranged to average selected audio data streams to create a single combined data stream comprising the same number of bits as each unmixed data stream. Each audio processor is further provided with a congestion monitor 308 for monitoring congestion on the respective communication transmission links connecting the audio processor to other parts of the network. In one arrangement the congestion monitor 308 utilises Real Time Control Protocol (RTCP) control messages or the like received from respective audio stream receivers in the network to determine congestion levels on respective transmission links. These control messages provide the audio processors with information relating to data stream congestion on respective neighbouring transmission links and audio processors. Selected audio processors, typically those on the edge of the network connecting terminals 100 to the Internet, are also provided with a database 310 containing data relating to recipient terminal equipment characteristics including audio data processing and playback capabilities, network connection types and speeds, other recipient specific data including current tariff data for determining an appropriate quality of service to be provided and user specific audio data stream mixing profiles comprising user defined mixing preferences and other data relating to user specific mixing policies. Mixing may occur at many of the audio processor mixer nodes in the network of
Figure 1. When an audio processor mixes selected audio streams the number of audio data streams to be transmitted to neighbouring audio processor or terminal nodes is reduced and this also reduces the amount of processing required at the neighbouring nodes. Distributing the audio processor mixer nodes throughout the network as described enables the overall audio mixing task to be shared and mixing bottlenecks avoided. Distributed processing not only provides for scalability, say in terms of the maximum number of simultaneous speakers allowed in an audio conference, but also enables audio processors to monitor and respond to changing local network conditions in heterogeneous environments such as the Internet. Since mixing reduces the quality of the audio streams being mixed the audio processors are arranged to limit the amount of mixing they perform
Mixing decisions are made dynamically by the audio data processors according to software program logic stored in audio processor memory and executed by the audio processors.
Referring now to the flowchart of Figure 4, network conditions such as available bandwidth, packet loss rate and delay are monitored by the congestion monitor 308 in step 400 to determine current values for respective network resource parameters associated with transmission links and neighbouring processor and terminal nodes located in the network. Current values for variable network resource parameters including but not limited to available bandwidth on respective transmission links and processing capacity and delay of respective processor nodes are stored in the data base 310 in step 402. Current values for other more stable network resource parameters including but not limited to terminal node processing capability and capacity and user specific mixing preferences and quality of service requirements are also stored in the processor databases 310. The more stable resource parameters may be monitored in step 400 in the same way as the variable resource parameters or by periodic polling the resources for current parameter values.
In step 404 the receiver 302 monitors relevant unicast or multicast communication channels for incoming audio data streams and determines in step 406 whether any streams are being received at the audio processor node. If audio data streams are being received the audio processor proceeds to step 408 where an appropriate algorithm determines the network resources necessary for transmitting the received data streams to the next relevant node or nodes in the network. If audio data streams are not being received monitoring continues and control passes back to step 400. In step 408 network resources necessary for forwarding all the received data streams to each respective node in the relevant network distribution tree are determined and in step 410 the respective available resources for transmitting the data streams are determined. In step 412 the current values of the relevant network resource parameters are compared with respective resource requirements necessary for communicating the received audio streams to the respective next network nodes comprising audio data stream receivers. If the available resources match those required for subsequent communication of the data streams all the data streams are transmitted onwards in step 420 to the relevant next nodes. However, if the audio processors determine in step 412 that there are insufficient resources selected mixing occurs. For instance, if there are only three communications channels available on a particular transmission link and four separate data streams are received, at least one pair of data streams are mixed prior to subsequent transmission by the respective audio processor. Similarly, if a recipient terminal node is only capable of processing two data streams simultaneously only two sets of mixed streams are provided for transmission to that terminal. Further, if a terminal node is designated as having a predetermined quality of service as defined by a user selected tariff there may be insufficient bandwidth allocated for forwarding all the streams without mixing so that mixing will occur even if the network has sufficient bandwidth resources available on the relevant link or links to the user terminal node. In step 414 the difference between available and required resources is determined so that the number of streams to be mixed can be determined. The audio data stream processors select appropriate data streams to be mixed in step 416. The selected streams are mixed in step 418 and then transmitted by the appropriate audio processor transmitter 306 in step 420. Audio data stream selection in step 416 may be based on any number of considerations relevant to network, application or user requirements, for example.
The following discussion concerns mixing considerations that may be relevant in CSCW environments which support many simultaneous speakers.
One consideration for mixing may concern the roles of speakers. Some real-time CSCW applications assign participants to different roles within an event. For example, early experiments in inhabited television differentiated between performers, inhabitants and viewers. Performers are part of the core content of an on-line TV show whereas inhabitants are active within the virtual world, but typically receive a broadcast mix created by a director. These roles are complemented by differences in the technologies used to access the real-time event. Performers typically use professional studio-quality equipment, with fully spatialised 3D audio. Inhabitants may use commodity PCs, equipped with headphones. Viewers on the other hand may use conventional television sets, equipped with multiple loudspeaker surround sound audio systems. Roles or so called "layers of participation" can determine mixing policy. For instance, it may be appropriate to ensure that performers are heard with the maximum possible audio quality. Thus, as network congestion increases the audio streams for inhabitants might be mixed together first, with the performers streams being kept separate for as long as possible. Other CSCW applications may also benefit from defining layers of participation and using these to prioritise audio sources for mixing.
Another consideration for mixing may concern the roles of listeners or recipients. Mixing can also be prioritised according to listener requirements. Roles or layers of participation can also define the different ways in which listeners take part in an event, although many participants will be both speakers and listeners in an event. For example, an active inhabitant may benefit from fully spatialised audio that provides clues to support navigation and conversation management. A passive viewer with a surround-sound audio system may benefit from a mix that clearly separates the key performers, but where their accurate location in the world is less important. In the case of inhabitants it may be important to maintain the separation of streams from nearby participants, whereas for viewers it may be appropriate to maintain the separation of key performers only.
A further mixing consideration may concern the grouping of audio sources. CSCW applications often group participants in some pre-determined way. It is often appropriate to mix audio streams from one coherent group. For example, avatars in a CVE may have gathered together to form definable and separate groups. Audio streams from each group could be mixed to form a single stream that could be spatialised to the average position of the group as a whole in the CVE. In addition, some CSCW applications calculate levels of mutual awareness among participants which may provide a more dynamic basis for mixing respective audio streams. Another mixing consideration may concern voice characteristics. The timbre of voices or other audio sources may be useful for determining which streams to mix. For example, it may be appropriate to mix a high and a low pitch voice into a single stream so that a listener can readily separate them when hearing the mixed stream.
Patterns of activity within a multiple speaker environment may also determine mixing decisions. For example, audio data streams from participants whose speech rarely overlaps could be mixed together
A number of more practical concerns may also shape affect mixing decisions. Mixing may depend upon aspects of the available communication network, including its topology (i.e. shape and structure), underlying bandwidth, regional variations, or transitory congestion. Mixing decisions may also depend on the available computing resources, for example the number and capability of available mixer components, how many are positioned within the network, and how heavily loaded they are. Mixing decisions may also consider the current and past states of the system. For example, the transition from one choice of mixed streams to another may be noticeable to users, and potentially undesirable.
It is clear from the above discussion that the process of selecting data streams for optimal mixing is a complex task and will very often be application specific. For instance, different applications may have different mixing requirements. In particular applications may have different ways of assigning priorities to audio streams to determine the order in which streams are gradually mixed together as network resources become scarce. Low priority streams will be mixed before higher priority streams. These requirements may also vary between different phases of the same application as a session progresses. In particular, priorities may change as participants take on different roles or move to different locations. For example, a virtual football game with the crowd (audience) will not have the same mixing requirements as a virtual shopping mall, a virtual education application (on-line demonstration, lecture etc), or an on-line television drama in a virtual world.
In a virtual football game for example, audio priority will be higher for the referee say, than the other players. In this respect, the system will avoid mixing the referee's audio stream with the respective players audio streams unless this becomes absolutely necessary due to network resource limitations. On the other hand audio streams from a crowd of spectators will have a lower priority since each spectator will not need to be heard individually. It may be sufficient to mix all the streams from the same "stand" or group of collocated spectators and spatialise only the resulting stream at the receiver. In this way mixing is based on the roles of the speakers, that is to say, audio streams from more important speakers are forwarded and less important speakers are mixed
In a virtual shopping mall, mixing policy may be based on the collocation and mutual awareness of the participants. For example, participants may want to receive separate audio streams from other collocated participants or other more important participants of whom they are more aware of than others. The remainder of the of the group can be mixed together or divided into smaller groups which are mixed separately, similar to the different stands in the football game example.
In a virtual lecture environment there is likely to be one lecturer, may be a few demonstrators and many mutually aware students. In this environment mixing can be based on speaker roles. For instance, the lecturer's and demonstrators' audio streams may be forwarded so that the other participants can process the individual streams on receipt, where the respective students audio streams will be mixed. The present inventors have implemented and tested the invention in a distributed processing system simulator 500 shown in Figure 5. In the distributed processing system two end user systems 502a and 502b are shown on a first local area network 506 and two end systems 502c and 502d on a second local area network 512. The system 500 may comprise any number of end user systems 502 depending on the networks being simulated. Each end user system comprises a respective virtual world client 514 for accessing a shared virtual world generated by a virtual world server 516 on LAN 512. Each end user system is also provided with a local audio server 518 that is interfaced to respective audio hardware (not shown) so that users can speak to each other within the virtual world environment. Each client 514 controls the local audio server 518 for the respective end system and uses information in the virtual world to determine how the audio server should transmit, receive and process audio streams, for example according to the positions of other users in the virtual world.
For peer-to-peer audio, each user's audio server 518 sends an audio data stream directly to all the other audio servers in the system using underlying unicast or multicast protocols. An audio processor 104 is provided on each LAN for mixing selected data streams received from the connected audio servers. The audio processors are both controlled directly by the virtual world server 516 and are connected together by means of a WAN simulator 520. The local audio servers 518 and audio processors 104 together define an audio distribution tree as shown by the dashed lines 522 in Figure 5. Each audio processor is capable of receiving audio data streams from the audio servers on its respective LAN and transmitting these streams to the remote audio processor on the other LAN. The audio processors are arranged to adapt to changing network conditions so that instead of three separate audio streams being forwarded from say end user systems 502a, 502b and 502c to 502d the audio processors 104 can mix respective data streams so that end user systems 502c and 502d receive a single stream comprising a mixture of the streams from end user systems 502a and 502b and a separate audio stream from each other, for example.
The inventors evaluated the effectiveness of dynamic mixing by investigating the effect of network congestion on audio quality. Two quantifiable aspects of audio quality were considered, the level of packet loss experienced and the degree of audio stream spatialisation, that is the number of separate audio streams delivered to an end system. These two criteria were chosen since they both relate to the end user's perceived experience of the system and they can also be readily determined from measurements of the system 500, for example the number of packets being sent per second. The first measure, the level of packet loss experienced, is the primary determinant of whether a network audio stream will be intelligible to the user and therefore of any use at all. Audio codecs that encode audio streams in 40ms to 80ms packets and utilise silence substitution for packet loss recovery typically become unintelligible if 15% or more of the packets are lost during transmission. Other factors such as delay or jitter are of secondary importance by comparison. The second measure, the degree of spatialisation, is the primary distinguishing feature between the peer-to-peer, fully mixed and partially mixed approaches. For instance, research has shown that spatialised audio is a key factor in providing users with a sense of presence in a virtual environment.
Referring to Figure 6 which shows a system 600 configured from the system of Figure 5 for evaluating the effect of network congestion on audio quality. In Figure 6 the LANs 506 and 512 are assumed to be generally congestion free high bandwidth networks connected via a lower bandwidth shared WAN 520 which is prone to congestion. Six end user systems 502 are provided on LAN 506 for simulating network usage. A single end user system is provided on LAN 512. A WAN simulation tool is provided for simulating network delays and bandwidth restrictions for a limited bandwidth WAN connection. An additional application 602 is provided for introducing controlled levels of competing traffic onto the simulated WAN connection 520 in order to create network congestion.
All packets on the system 600 are monitored and analysed to classify the packets and to measure the number of audio streams in transit and the amount of competing traffic. Packet loss experienced by the audio streams in transit is measured by matching the number of packets leaving LAN 506 to those arriving on LAN 512.
The following strategy was used to evaluate the system of Figure 6:-
• Six simulated users on LAN 506 continuously sending audio data to a single user on LAN 512. • Virtual WAN bandwidth limit of 500 Kbit/second corresponding to just over seven audio streams for 8KHz, 8bit, Ulaw, mono audio encoding. A WAN buffer size of 250 Kbits.
• Eight levels of competing (congestion-inducing) traffic: 0, 70, 140, 210, 280, 350, 420, and 490 Kbits/s. • Three audio distribution strategies: forward all audio streams without mixing
(equivalent to peer-to-peer multicast), mix all audio streams before forwarding (equivalent to total mixing at LAN 1), and mix a dynamic subset of audio streams (parial mixing).
In the final distribution strategy dynamic or partial mixing was used to keep the packet loss rate below 15% whilst maintaining the maximum number of separate audio streams.
The experimental results are shown in Figures 7 and 8. Figure 7 shows the effect that increasing levels of congestion has on the packet loss rate experienced for each of the three audio distribution strategies. The peer-to-peer approach (line 700 in the graph of Figure 7) experiences increasing levels of packet loss as competing traffic increases. The packet loss rate exceeds 15% at 210 Kbits/s of additional traffic. Full-mixing (line 702 in the drawing) uses the minimum bandwidth throughout, and only starts to experience congestion when the competing traffic reaches 490 Kbits/s. Distributed partial mixing (line 704 in the drawing) gives higher loss rates than full mixing, but much lower rates than all- forwarding peer-to-peer, and maintains its loss rate below 15% even with 490 Kbit/s of competing traffic (as for full mixing).
Figure 8 shows the number of separate audio streams being transmitted to a listener on the end user system on LAN 512. For all-forwarding (line 800 in the graph of Figure 8), six streams are always transmitted by the LAN 506, however none of these arrive in any useful form when competing traffic levels exceed 210 Kbits/s. For total mixing (line 802 in Figure 8), one stream is always sent. Dynamic mixing (line 804 in Figure 8) lies between these two extremes. With no congestion, six distinct steams are transmitted over the WAN connection from the WAN 506 to the WAN 512. As competing traffic, and hence congestion increases, dynamic mixing reduces the number of distinct streams by mixing more audio streams together. When competing traffic levels reach 490 Kbits/s dynamic mixing falls back to total mixing, with only a single stream sent over the WAN.
The above described investigation demonstrates that dynamic or distributed partial mixing combines the benefits of both peer-to-peer and total mixing audio services. With sufficient bandwidth, the system operates like a peer-to-peer system, delivering independent audio streams to each listener, giving maximum individual flexibility and control over what users hear. As bandwidth becomes restricted the distributed partial mixing scheme moves incrementally towards a totally mixed (minimum bandwidth) service, thereby preserving a useful level of audio communication under a wide range of network conditions.
More generally, the distributed partial mixing approach has the following distinctive benefits: it is adaptive, reacting to network congestion in a way that peer-to- peer systems cannot; it supports dynamic load balancing between different distributed components of the audio service; it readily supports heterogeneous networks and different end user terminal capabilities; and it is adapative to varying application requirements.

Claims

1. A method of processing audio data streams transmitted in a communications network; said method comprising the steps of:- i) receiving a plurality of audio data streams transmitted from one or more audio data stream transmitters distributed in the network; and, ii) processing data relating to at least one respective network resource parameter to determine respective network resources available for subsequent communication of said audio data streams to at least one respective audio data stream receiver in the network; and, iii) comparing said available resources with respective network resource requirements necessary for communicating said audio streams to at least one respective audio data stream receiver in the network; and, iv) determining whether to mix selected audio data streams prior to transmission in response to said comparison.
2. A method according to claim 1 further comprising the step of:- v) processing two or more audio streams in response to said comparison to provide at least one mixed audio data stream for subsequent transmission in said network.
3. A method according to claim 1 or claim 2 wherein steps ii) and iii) comprise the steps of:- determining a current value for the or each respective network resource parameter; and comparing the or each respective current resource parameter value with a respective minimum resource threshold value necessary for communicating said unmixed audio data streams to the or each respective receiver.
4. A method according to claim 3 wherein said minimum resource threshold value is determined according to at least one pre-defined quality of service parameter.
5. A method according to claim 4 wherein said network is a packet switched network and said pre-defined quality of service parameter is defined by a maximum packet loss rate.
6. A method according to any preceding claim wherein one network resource parameter relates to available network bandwidth for transmission of said audio data streams to the or each respective receiver.
7. A method according to claim 6 wherein said available bandwidth capacity is determined by user specific quality of service requirements.
8. A method according to any preceding claim wherein one network resource parameter relates to receiver processing characteristics.
9. A method according to any preceding claim wherein said audio data streams are selected for mixing according to predetermined criteria.
10. A method according to claim 9 wherein said audio streams are mixed according to respective audio stream content.
11. A method according to claim 9 wherein said audio streams are mixed according to respective recipient requirements.
12. A method according to claim 9 wherein said audio streams are mixed according to respective audio stream sources.
13. A method according to claim 9 wherein said audio streams are mixed according to respective receiver audio data stream processing capabilities.
14. A software program for processing audio data streams transmitted in a communications network; said program being arranged to:- i) receive a plurality of audio data streams transmitted from one or more audio data stream transmitters distributed in the network; and, ii) process data relating to at least one respective network resource parameter to determine respective network resources available for subsequent transmission of said audio data streams in the network; and, iii) compare said available resources with respective network resource requirements necessary for transmission of said audio streams to at least one respective audio data stream receiver in the network; and, iv) determine whether to mix selected audio data streams prior to transmission in response to said comparison.
15. A system for processing audio data streams transmitted in a communications network; said system comprising:- i) a receiver for receiving a plurality of audio data streams transmitted from one or more audio data stream transmitters distributed in the network; and, ii) a processor for processing data relating to at least one respective network resource parameter to determine respective network resources available for subsequent transmission of said audio data streams in the network; and, iii) a comparator for comparing said available resources with respective network resource requirements necessary for transmission of said audio streams to at least one respective audio data stream receiver in the network; and, iv) determining means for determining whether to mix selected audio data streams prior to transmission in response to said comparison.
EP01960879A 2000-08-25 2001-08-09 Audio data processing Expired - Lifetime EP1312188B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP01960879A EP1312188B1 (en) 2000-08-25 2001-08-09 Audio data processing

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP00307348 2000-08-25
EP00307348 2000-08-25
PCT/GB2001/003595 WO2002017579A1 (en) 2000-08-25 2001-08-09 Audio data processing
EP01960879A EP1312188B1 (en) 2000-08-25 2001-08-09 Audio data processing

Publications (2)

Publication Number Publication Date
EP1312188A1 true EP1312188A1 (en) 2003-05-21
EP1312188B1 EP1312188B1 (en) 2007-09-26

Family

ID=8173224

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01960879A Expired - Lifetime EP1312188B1 (en) 2000-08-25 2001-08-09 Audio data processing

Country Status (6)

Country Link
US (1) US20030182001A1 (en)
EP (1) EP1312188B1 (en)
AU (1) AU8227201A (en)
CA (1) CA2419151C (en)
DE (1) DE60130665T2 (en)
WO (1) WO2002017579A1 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011003059A1 (en) * 2009-07-02 2011-01-06 Avaya Inc. Method and apparatus for dynamically determining mix sets in an audio processor
US9544707B2 (en) 2014-02-06 2017-01-10 Sonos, Inc. Audio output balancing
US9549258B2 (en) 2014-02-06 2017-01-17 Sonos, Inc. Audio output balancing
US9658820B2 (en) 2003-07-28 2017-05-23 Sonos, Inc. Resuming synchronous playback of content
US9681223B2 (en) 2011-04-18 2017-06-13 Sonos, Inc. Smart line-in processing in a group
US9729115B2 (en) 2012-04-27 2017-08-08 Sonos, Inc. Intelligently increasing the sound level of player
US9734242B2 (en) 2003-07-28 2017-08-15 Sonos, Inc. Systems and methods for synchronizing operations among a plurality of independently clocked digital data processing devices that independently source digital data
US9749760B2 (en) 2006-09-12 2017-08-29 Sonos, Inc. Updating zone configuration in a multi-zone media system
US9748647B2 (en) 2011-07-19 2017-08-29 Sonos, Inc. Frequency routing based on orientation
US9756424B2 (en) 2006-09-12 2017-09-05 Sonos, Inc. Multi-channel pairing in a media system
US9766853B2 (en) 2006-09-12 2017-09-19 Sonos, Inc. Pair volume control
US9787550B2 (en) 2004-06-05 2017-10-10 Sonos, Inc. Establishing a secure wireless network with a minimum human intervention
US9977561B2 (en) 2004-04-01 2018-05-22 Sonos, Inc. Systems, methods, apparatus, and articles of manufacture to provide guest access
US10031716B2 (en) 2013-09-30 2018-07-24 Sonos, Inc. Enabling components of a playback device
US10061379B2 (en) 2004-05-15 2018-08-28 Sonos, Inc. Power increase based on packet type
US10306364B2 (en) 2012-09-28 2019-05-28 Sonos, Inc. Audio processing adjustments for playback devices based on determined characteristics of audio content
US10359987B2 (en) 2003-07-28 2019-07-23 Sonos, Inc. Adjusting volume levels
US10613817B2 (en) 2003-07-28 2020-04-07 Sonos, Inc. Method and apparatus for displaying a list of tracks scheduled for playback by a synchrony group
US11106425B2 (en) 2003-07-28 2021-08-31 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US11106424B2 (en) 2003-07-28 2021-08-31 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US11265652B2 (en) 2011-01-25 2022-03-01 Sonos, Inc. Playback device pairing
US11294618B2 (en) 2003-07-28 2022-04-05 Sonos, Inc. Media player system
US11403062B2 (en) 2015-06-11 2022-08-02 Sonos, Inc. Multiple groupings in a playback system
US11429343B2 (en) 2011-01-25 2022-08-30 Sonos, Inc. Stereo playback configuration and control
US11481182B2 (en) 2016-10-17 2022-10-25 Sonos, Inc. Room association based on name
US11650784B2 (en) 2003-07-28 2023-05-16 Sonos, Inc. Adjusting volume levels
US11894975B2 (en) 2004-06-05 2024-02-06 Sonos, Inc. Playback device connection

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6501739B1 (en) * 2000-05-25 2002-12-31 Remoteability, Inc. Participant-controlled conference calling system
US20060259607A1 (en) * 2001-09-13 2006-11-16 Network Foundation Technologies, Llc System and method for distributing data over a computer network
US7499969B1 (en) * 2004-06-25 2009-03-03 Apple Inc. User interface for multiway audio conferencing
US20060023900A1 (en) * 2004-07-28 2006-02-02 Erhart George W Method and apparatus for priority based audio mixing
US7460495B2 (en) * 2005-02-23 2008-12-02 Microsoft Corporation Serverless peer-to-peer multi-party real-time audio communication system and method
US7688817B2 (en) * 2005-04-15 2010-03-30 International Business Machines Corporation Real time transport protocol (RTP) processing component
US8036766B2 (en) * 2006-09-11 2011-10-11 Apple Inc. Intelligent audio mixing among media playback and at least one other non-playback application
US8559646B2 (en) * 2006-12-14 2013-10-15 William G. Gardner Spatial audio teleconferencing
WO2009039304A2 (en) * 2007-09-18 2009-03-26 Lightspeed Audio Labs, Inc. System and method for processing data signals
US7769806B2 (en) 2007-10-24 2010-08-03 Social Communications Company Automated real-time data stream switching in a shared virtual area communication environment
US7844724B2 (en) * 2007-10-24 2010-11-30 Social Communications Company Automated real-time data stream switching in a shared virtual area communication environment
US8397168B2 (en) 2008-04-05 2013-03-12 Social Communications Company Interfacing with a spatial virtual communication environment
JP2010122826A (en) * 2008-11-18 2010-06-03 Sony Computer Entertainment Inc On-line conversation system, on-line conversation server, on-line conversation control method, and program
US9853922B2 (en) 2012-02-24 2017-12-26 Sococo, Inc. Virtual area communications
US8972984B2 (en) * 2011-05-20 2015-03-03 Citrix Systems, Inc. Methods and systems for virtualizing audio hardware for one or more virtual machines
US20130067050A1 (en) * 2011-09-11 2013-03-14 Microsoft Corporation Playback manager
US20130329609A1 (en) * 2012-06-07 2013-12-12 Infinet Finacial Systems Voice conference unit selection
US20130329607A1 (en) * 2012-06-07 2013-12-12 Infinet Financial Systems Trading room voice routing solution
US9761229B2 (en) * 2012-07-20 2017-09-12 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for audio object clustering
US9516446B2 (en) 2012-07-20 2016-12-06 Qualcomm Incorporated Scalable downmix design for object-based surround codec with cluster analysis by synthesis
CN103065641B (en) * 2013-02-01 2014-12-10 飞天诚信科技股份有限公司 Method for analyzing audio data
US9466278B2 (en) 2014-05-08 2016-10-11 High Fidelity, Inc. Systems and methods for providing immersive audio experiences in computer-generated virtual environments
US10086285B2 (en) 2014-05-08 2018-10-02 High Fidelity, Inc. Systems and methods for implementing distributed computer-generated virtual environments using user contributed computing devices
US10325610B2 (en) 2016-03-30 2019-06-18 Microsoft Technology Licensing, Llc Adaptive audio rendering
US10056086B2 (en) 2016-12-16 2018-08-21 Microsoft Technology Licensing, Llc Spatial audio resource management utilizing minimum resource working sets
US10956117B2 (en) 2018-12-04 2021-03-23 International Business Machines Corporation Conference system volume control

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6047323A (en) * 1995-10-19 2000-04-04 Hewlett-Packard Company Creation and migration of distributed streams in clusters of networked computers
US6301246B1 (en) * 1998-08-17 2001-10-09 Siemens Information And Communication Networks, Inc. Silent monitoring in a voice-over-data-network environment
US7023839B1 (en) * 1999-01-26 2006-04-04 Siemens Communications, Inc. System and method for dynamic codec alteration
US7089285B1 (en) * 1999-10-05 2006-08-08 Polycom, Inc. Videoconferencing apparatus having integrated multi-point conference capabilities
US7139813B1 (en) * 1999-11-01 2006-11-21 Nokia Networks Oy Timedependent hyperlink system in videocontent
US6501739B1 (en) * 2000-05-25 2002-12-31 Remoteability, Inc. Participant-controlled conference calling system
US6609213B1 (en) * 2000-08-10 2003-08-19 Dell Products, L.P. Cluster-based system and method of recovery from server failures
WO2002065249A2 (en) * 2001-02-13 2002-08-22 Candera, Inc. Storage virtualization and storage management to provide higher level storage services

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0217579A1 *

Cited By (133)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10949163B2 (en) 2003-07-28 2021-03-16 Sonos, Inc. Playback device
US11625221B2 (en) 2003-07-28 2023-04-11 Sonos, Inc Synchronizing playback by media playback devices
US10324684B2 (en) 2003-07-28 2019-06-18 Sonos, Inc. Playback device synchrony group states
US10365884B2 (en) 2003-07-28 2019-07-30 Sonos, Inc. Group volume control
US11635935B2 (en) 2003-07-28 2023-04-25 Sonos, Inc. Adjusting volume levels
US10359987B2 (en) 2003-07-28 2019-07-23 Sonos, Inc. Adjusting volume levels
US9658820B2 (en) 2003-07-28 2017-05-23 Sonos, Inc. Resuming synchronous playback of content
US11556305B2 (en) 2003-07-28 2023-01-17 Sonos, Inc. Synchronizing playback by media playback devices
US11550536B2 (en) 2003-07-28 2023-01-10 Sonos, Inc. Adjusting volume levels
US9727304B2 (en) 2003-07-28 2017-08-08 Sonos, Inc. Obtaining content from direct source and other source
US9727303B2 (en) 2003-07-28 2017-08-08 Sonos, Inc. Resuming synchronous playback of content
US11550539B2 (en) 2003-07-28 2023-01-10 Sonos, Inc. Playback device
US9727302B2 (en) 2003-07-28 2017-08-08 Sonos, Inc. Obtaining content from remote source for playback
US9733893B2 (en) 2003-07-28 2017-08-15 Sonos, Inc. Obtaining and transmitting audio
US9733892B2 (en) 2003-07-28 2017-08-15 Sonos, Inc. Obtaining content based on control by multiple controllers
US9734242B2 (en) 2003-07-28 2017-08-15 Sonos, Inc. Systems and methods for synchronizing operations among a plurality of independently clocked digital data processing devices that independently source digital data
US9733891B2 (en) 2003-07-28 2017-08-15 Sonos, Inc. Obtaining content from local and remote sources for playback
US9740453B2 (en) 2003-07-28 2017-08-22 Sonos, Inc. Obtaining content from multiple remote sources for playback
US10303431B2 (en) 2003-07-28 2019-05-28 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US11301207B1 (en) 2003-07-28 2022-04-12 Sonos, Inc. Playback device
US11294618B2 (en) 2003-07-28 2022-04-05 Sonos, Inc. Media player system
US11200025B2 (en) 2003-07-28 2021-12-14 Sonos, Inc. Playback device
US10303432B2 (en) 2003-07-28 2019-05-28 Sonos, Inc Playback device
US9778900B2 (en) 2003-07-28 2017-10-03 Sonos, Inc. Causing a device to join a synchrony group
US9778898B2 (en) 2003-07-28 2017-10-03 Sonos, Inc. Resynchronization of playback devices
US11132170B2 (en) 2003-07-28 2021-09-28 Sonos, Inc. Adjusting volume levels
US9778897B2 (en) 2003-07-28 2017-10-03 Sonos, Inc. Ceasing playback among a plurality of playback devices
US11106424B2 (en) 2003-07-28 2021-08-31 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US11106425B2 (en) 2003-07-28 2021-08-31 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US11080001B2 (en) 2003-07-28 2021-08-03 Sonos, Inc. Concurrent transmission and playback of audio information
US10296283B2 (en) 2003-07-28 2019-05-21 Sonos, Inc. Directing synchronous playback between zone players
US10970034B2 (en) 2003-07-28 2021-04-06 Sonos, Inc. Audio distributor selection
US10963215B2 (en) 2003-07-28 2021-03-30 Sonos, Inc. Media playback device and system
US10956119B2 (en) 2003-07-28 2021-03-23 Sonos, Inc. Playback device
US11650784B2 (en) 2003-07-28 2023-05-16 Sonos, Inc. Adjusting volume levels
US10754612B2 (en) 2003-07-28 2020-08-25 Sonos, Inc. Playback device volume control
US10289380B2 (en) 2003-07-28 2019-05-14 Sonos, Inc. Playback device
US10031715B2 (en) 2003-07-28 2018-07-24 Sonos, Inc. Method and apparatus for dynamic master device switching in a synchrony group
US10754613B2 (en) 2003-07-28 2020-08-25 Sonos, Inc. Audio master selection
US10282164B2 (en) 2003-07-28 2019-05-07 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US10747496B2 (en) 2003-07-28 2020-08-18 Sonos, Inc. Playback device
US10613817B2 (en) 2003-07-28 2020-04-07 Sonos, Inc. Method and apparatus for displaying a list of tracks scheduled for playback by a synchrony group
US10120638B2 (en) 2003-07-28 2018-11-06 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US10387102B2 (en) 2003-07-28 2019-08-20 Sonos, Inc. Playback device grouping
US10545723B2 (en) 2003-07-28 2020-01-28 Sonos, Inc. Playback device
US10133536B2 (en) 2003-07-28 2018-11-20 Sonos, Inc. Method and apparatus for adjusting volume in a synchrony group
US10140085B2 (en) 2003-07-28 2018-11-27 Sonos, Inc. Playback device operating states
US10146498B2 (en) 2003-07-28 2018-12-04 Sonos, Inc. Disengaging and engaging zone players
US10157034B2 (en) 2003-07-28 2018-12-18 Sonos, Inc. Clock rate adjustment in a multi-zone system
US10157033B2 (en) 2003-07-28 2018-12-18 Sonos, Inc. Method and apparatus for switching between a directly connected and a networked audio source
US10157035B2 (en) 2003-07-28 2018-12-18 Sonos, Inc. Switching between a directly connected and a networked audio source
US10175930B2 (en) 2003-07-28 2019-01-08 Sonos, Inc. Method and apparatus for playback by a synchrony group
US10175932B2 (en) 2003-07-28 2019-01-08 Sonos, Inc. Obtaining content from direct source and remote source
US10185541B2 (en) 2003-07-28 2019-01-22 Sonos, Inc. Playback device
US10185540B2 (en) 2003-07-28 2019-01-22 Sonos, Inc. Playback device
US10209953B2 (en) 2003-07-28 2019-02-19 Sonos, Inc. Playback device
US10216473B2 (en) 2003-07-28 2019-02-26 Sonos, Inc. Playback device synchrony group states
US10228902B2 (en) 2003-07-28 2019-03-12 Sonos, Inc. Playback device
US10445054B2 (en) 2003-07-28 2019-10-15 Sonos, Inc. Method and apparatus for switching between a directly connected and a networked audio source
US11907610B2 (en) 2004-04-01 2024-02-20 Sonos, Inc. Guess access to a media playback system
US10983750B2 (en) 2004-04-01 2021-04-20 Sonos, Inc. Guest access to a media playback system
US11467799B2 (en) 2004-04-01 2022-10-11 Sonos, Inc. Guest access to a media playback system
US9977561B2 (en) 2004-04-01 2018-05-22 Sonos, Inc. Systems, methods, apparatus, and articles of manufacture to provide guest access
US10228754B2 (en) 2004-05-15 2019-03-12 Sonos, Inc. Power decrease based on packet type
US10126811B2 (en) 2004-05-15 2018-11-13 Sonos, Inc. Power increase based on packet type
US10254822B2 (en) 2004-05-15 2019-04-09 Sonos, Inc. Power decrease and increase based on packet type
US10061379B2 (en) 2004-05-15 2018-08-28 Sonos, Inc. Power increase based on packet type
US10372200B2 (en) 2004-05-15 2019-08-06 Sonos, Inc. Power decrease based on packet type
US11157069B2 (en) 2004-05-15 2021-10-26 Sonos, Inc. Power control based on packet type
US10303240B2 (en) 2004-05-15 2019-05-28 Sonos, Inc. Power decrease based on packet type
US11733768B2 (en) 2004-05-15 2023-08-22 Sonos, Inc. Power control based on packet type
US9960969B2 (en) 2004-06-05 2018-05-01 Sonos, Inc. Playback device connection
US10097423B2 (en) 2004-06-05 2018-10-09 Sonos, Inc. Establishing a secure wireless network with minimum human intervention
US11909588B2 (en) 2004-06-05 2024-02-20 Sonos, Inc. Wireless device connection
US11894975B2 (en) 2004-06-05 2024-02-06 Sonos, Inc. Playback device connection
US10439896B2 (en) 2004-06-05 2019-10-08 Sonos, Inc. Playback device connection
US11456928B2 (en) 2004-06-05 2022-09-27 Sonos, Inc. Playback device connection
US9787550B2 (en) 2004-06-05 2017-10-10 Sonos, Inc. Establishing a secure wireless network with a minimum human intervention
US11025509B2 (en) 2004-06-05 2021-06-01 Sonos, Inc. Playback device connection
US10541883B2 (en) 2004-06-05 2020-01-21 Sonos, Inc. Playback device connection
US10979310B2 (en) 2004-06-05 2021-04-13 Sonos, Inc. Playback device connection
US9866447B2 (en) 2004-06-05 2018-01-09 Sonos, Inc. Indicator on a network device
US10965545B2 (en) 2004-06-05 2021-03-30 Sonos, Inc. Playback device connection
US9749760B2 (en) 2006-09-12 2017-08-29 Sonos, Inc. Updating zone configuration in a multi-zone media system
US9860657B2 (en) 2006-09-12 2018-01-02 Sonos, Inc. Zone configurations maintained by playback device
US10306365B2 (en) 2006-09-12 2019-05-28 Sonos, Inc. Playback device pairing
US9766853B2 (en) 2006-09-12 2017-09-19 Sonos, Inc. Pair volume control
US10848885B2 (en) 2006-09-12 2020-11-24 Sonos, Inc. Zone scene management
US11385858B2 (en) 2006-09-12 2022-07-12 Sonos, Inc. Predefined multi-channel listening environment
US10228898B2 (en) 2006-09-12 2019-03-12 Sonos, Inc. Identification of playback device and stereo pair names
US10897679B2 (en) 2006-09-12 2021-01-19 Sonos, Inc. Zone scene management
US10028056B2 (en) 2006-09-12 2018-07-17 Sonos, Inc. Multi-channel pairing in a media system
US11388532B2 (en) 2006-09-12 2022-07-12 Sonos, Inc. Zone scene activation
US9928026B2 (en) 2006-09-12 2018-03-27 Sonos, Inc. Making and indicating a stereo pair
US11540050B2 (en) 2006-09-12 2022-12-27 Sonos, Inc. Playback device pairing
US10966025B2 (en) 2006-09-12 2021-03-30 Sonos, Inc. Playback device pairing
US10448159B2 (en) 2006-09-12 2019-10-15 Sonos, Inc. Playback device pairing
US10555082B2 (en) 2006-09-12 2020-02-04 Sonos, Inc. Playback device pairing
US10136218B2 (en) 2006-09-12 2018-11-20 Sonos, Inc. Playback device pairing
US9756424B2 (en) 2006-09-12 2017-09-05 Sonos, Inc. Multi-channel pairing in a media system
US10469966B2 (en) 2006-09-12 2019-11-05 Sonos, Inc. Zone scene management
US9813827B2 (en) 2006-09-12 2017-11-07 Sonos, Inc. Zone configuration based on playback selections
US11082770B2 (en) 2006-09-12 2021-08-03 Sonos, Inc. Multi-channel pairing in a media system
WO2011003059A1 (en) * 2009-07-02 2011-01-06 Avaya Inc. Method and apparatus for dynamically determining mix sets in an audio processor
GB2484032B (en) * 2009-07-02 2014-04-02 Avaya Inc Method and apparatus for dynamically determining mix sets in an audio processor
US8577060B2 (en) 2009-07-02 2013-11-05 Avaya Inc. Method and apparatus for dynamically determining mix sets in an audio processor
GB2484032A (en) * 2009-07-02 2012-03-28 Avaya Inc Method and apparatus for dynamically determining mix sets in an audio processor
US11265652B2 (en) 2011-01-25 2022-03-01 Sonos, Inc. Playback device pairing
US11429343B2 (en) 2011-01-25 2022-08-30 Sonos, Inc. Stereo playback configuration and control
US11758327B2 (en) 2011-01-25 2023-09-12 Sonos, Inc. Playback device pairing
US10108393B2 (en) 2011-04-18 2018-10-23 Sonos, Inc. Leaving group and smart line-in processing
US11531517B2 (en) 2011-04-18 2022-12-20 Sonos, Inc. Networked playback device
US10853023B2 (en) 2011-04-18 2020-12-01 Sonos, Inc. Networked playback device
US9681223B2 (en) 2011-04-18 2017-06-13 Sonos, Inc. Smart line-in processing in a group
US9686606B2 (en) 2011-04-18 2017-06-20 Sonos, Inc. Smart-line in processing
US10256536B2 (en) 2011-07-19 2019-04-09 Sonos, Inc. Frequency routing based on orientation
US9748646B2 (en) 2011-07-19 2017-08-29 Sonos, Inc. Configuration based on speaker orientation
US9748647B2 (en) 2011-07-19 2017-08-29 Sonos, Inc. Frequency routing based on orientation
US11444375B2 (en) 2011-07-19 2022-09-13 Sonos, Inc. Frequency routing based on orientation
US10965024B2 (en) 2011-07-19 2021-03-30 Sonos, Inc. Frequency routing based on orientation
US10063202B2 (en) 2012-04-27 2018-08-28 Sonos, Inc. Intelligently modifying the gain parameter of a playback device
US10720896B2 (en) 2012-04-27 2020-07-21 Sonos, Inc. Intelligently modifying the gain parameter of a playback device
US9729115B2 (en) 2012-04-27 2017-08-08 Sonos, Inc. Intelligently increasing the sound level of player
US10306364B2 (en) 2012-09-28 2019-05-28 Sonos, Inc. Audio processing adjustments for playback devices based on determined characteristics of audio content
US10871938B2 (en) 2013-09-30 2020-12-22 Sonos, Inc. Playback device using standby mode in a media playback system
US11816390B2 (en) 2013-09-30 2023-11-14 Sonos, Inc. Playback device using standby in a media playback system
US10031716B2 (en) 2013-09-30 2018-07-24 Sonos, Inc. Enabling components of a playback device
US9781513B2 (en) 2014-02-06 2017-10-03 Sonos, Inc. Audio output balancing
US9794707B2 (en) 2014-02-06 2017-10-17 Sonos, Inc. Audio output balancing
US9549258B2 (en) 2014-02-06 2017-01-17 Sonos, Inc. Audio output balancing
US9544707B2 (en) 2014-02-06 2017-01-10 Sonos, Inc. Audio output balancing
US11403062B2 (en) 2015-06-11 2022-08-02 Sonos, Inc. Multiple groupings in a playback system
US11481182B2 (en) 2016-10-17 2022-10-25 Sonos, Inc. Room association based on name

Also Published As

Publication number Publication date
DE60130665T2 (en) 2008-06-26
WO2002017579A1 (en) 2002-02-28
US20030182001A1 (en) 2003-09-25
WO2002017579A9 (en) 2007-03-08
CA2419151A1 (en) 2002-02-28
CA2419151C (en) 2009-09-08
DE60130665D1 (en) 2007-11-08
EP1312188B1 (en) 2007-09-26
AU2001282272B2 (en) 2007-02-15
AU8227201A (en) 2002-03-04

Similar Documents

Publication Publication Date Title
EP1312188B1 (en) Audio data processing
US9131016B2 (en) Method and apparatus for virtual auditorium usable for a conference call or remote live presentation with audience response thereto
Elliot High-quality multimedia conferencing through a long-haul packet network
US20080165708A1 (en) Multimedia conferencing method and signal
US7577110B2 (en) Audio chat system based on peer-to-peer architecture
US20040111472A1 (en) Methods and systems for linking virtual meeting attendees over a network
US8385234B2 (en) Media stream setup in a group communication system
JP2003500935A (en) Teleconferencing Bridge with Edge Point Mixing
Huang et al. SyncCast: synchronized dissemination in multi-site interactive 3D tele-immersion
US6651089B1 (en) Surfing friendly multicasting arrangement
CN102957729B (en) A kind of multimedia conferencing audio/video transmission method and equipment
US6633570B1 (en) Efficient transmission of data to multiple network nodes
Radenkovic et al. Multi-party distributed audio service with TCP fairness
Radenkovic et al. Deployment issues for multi-user audio support in CVEs
AU2001282272B8 (en) Audio data processing
AU2001282272A1 (en) Audio data processing
Radenkovic et al. A scaleable and adaptive audio service to support large scale collaborative work and entertainment
Aguirre et al. Darkcube: A k-Hypercube based P2P VoIP protocol
Radenkovic et al. A scaleable audio service to support many simultaneous speakers
US11425464B2 (en) Communication device, communication control device, and data distribution system
KR100562145B1 (en) Information Transmitting Method By Network Grouping
Elleuch Speech quality evaluation for Large-Scale D2D (Device to device) VoIP conference
Radenkovic A framework for building and deploying the multiparty audio service for collaborative environments
Prasad et al. A scalable architecture for VoIP conferencing
Radenkovic Scaleable audio for collaborative environments

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20030211

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60130665

Country of ref document: DE

Date of ref document: 20071108

Kind code of ref document: P

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20080627

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 16

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 17

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20200721

Year of fee payment: 20

Ref country code: FR

Payment date: 20200721

Year of fee payment: 20

Ref country code: GB

Payment date: 20200722

Year of fee payment: 20

REG Reference to a national code

Ref country code: DE

Ref legal event code: R071

Ref document number: 60130665

Country of ref document: DE

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20210808

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20210808