CN100459696C

CN100459696C - Audio mixed processing method and processor

Info

Publication number: CN100459696C
Application number: CNB2006100629521A
Authority: CN
Inventors: 梁丽燕
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2006-09-29
Filing date: 2006-09-29
Publication date: 2009-02-04
Anticipated expiration: 2026-09-29
Also published as: CN1941891A

Abstract

The method comprises: when a terminal with max voice volume varies, the coding control is respectively made for the audio signals of terminals with max voice volume before and after variation of voice volume. The invention also reveals a mixed sound processing apparatus comprises a decoder, a mixing module, an encoding module and a decoder switching module.

Description

A kind of audio mixing processing method and device thereof

Technical field

The present invention relates to the Audio Signal Processing field, specifically, relate to a kind of audio mixing processing method and device thereof.

Background technology

Along with the application more and more widely of video conference, more and more higher to the processing resource requirement of video signal conference system MCU (multipoint control unit).And in the finite element network bandwidth resources with do not reduce under the prerequisite of audio quality, the minimizing of Audio Processing resource can better meet high-quality and look audio protocols is handled or same Audio Processing resource realizes the requirement that more Audio Processing inserts.During the audio mixing of MCU is handled in traditional video conference, the sound that most of terminal that inserts is heard under a lot of situations all is the same, promptly can unify to handle the replacement individual processing to the terminal of this part, this provides very big space just for saving Audio Processing resource.

In the tradition video conference, as shown in Figure 1, MCU handles looking audio frequency media, make and to hear sound between the terminal that participates in a conference mutually and see image, wherein audio-frequency unit is handled the function that main realization respectively inserts audio mixing between the meeting-place, promptly make in the meeting sound of can field energy hearing the speech meeting-place, speech also can be heard sound mutually between the meeting-place, thereby realizes the purpose of long-range interchange.

Existing solution one:

Audio Processing mainly comprises three parts: decoding processing, audio mixing are handled and encoding process.Wherein decoding processing is that audio decoder is carried out in all access meeting-place, and purpose is to obtain the original voice data in all meeting-place; Audio mixing is handled several sides meeting-place of at first these meeting field data being carried out envelope calculating and relatively obtaining making a speech in the meeting and (is decided to be maximum tripartite herein approximately, the i.e. three parts of the speech volume maximum that only each meeting-place terminal collects in meeting, can be heard by other meeting-place), sound to the maximum tripartite meeting-place in the meeting carries out audio mixing then, exactly the voice data in the maximum tripartite meeting-place of volume is stacked up and give maximum tripartite other all meeting-place in addition in the meeting, be the sound that the tripartite meeting-place of volume maximum all can be heard in other meeting-place, the data that superpose in twos in the tripartite meeting-place of volume maximum are given the meeting-place of another one volume maximum, i.e. the arbitrary sound that can hear other two sides among the maximum three parts of volume; Encoding process mainly is that the voice data that the decoding of process and the audio mixing in each meeting-place are handled is encoded, and exports the meeting-place to.

As shown in Figure 2, supposing has

terminal

1,2,3,4,5...N in the meeting, and volume maximum tripartite corresponding be

terminal

1,2,3, during Audio Processing, at first can decode the data of receiving all terminals.Then, in audio mixing is handled, at first the envelope to all meeting-place calculates, relatively obtain the maximum tripartite terminal 1 of volume, 2,3, so export to the data of terminal 1 is the data stack of meeting-

place

2 and 3, the data of exporting to terminal 2 are the data stack of meeting-

place

1 and 3, the data of exporting to terminal 3 are the data stack of meeting-

place

1 and 2, other-

end outlet terminal

1,2 and 3 superimposed data, if next maximum three parts has constantly become

terminal

2,3,5, terminal 2 is just heard the sound of terminal 3 and 5 so, and terminal 3 is just heard the sound of terminal 2 and 5, and terminal 5 is heard the sound of

terminal

2 and 3, other-end is heard

terminal

2,3 and 5 sound, the rest may be inferred for other situations.After encoding to the data of each terminal, last encoding process part exports to corresponding terminal.So just finished the sound mixing function of a meeting.

The shortcoming of existing solution one:

In above-mentioned technology, the meeting-place relative fixed of participating in speech in the next meeting of a lot of situations, particularly opening under the situation of Great Council, what hear under the most of situation of terminal like this all is maximum tripartite sound, if for the resource of an encoder of each terminal distribution to the output of encoding of same data, the coder resource that needs is the number N that accesses terminal, when accessing terminal number N value than greatly the time, will cause the wasting of resources, thereby increase cost.

Existing solution two:

Technology two is to improve on the basis of prior art one, and its core concept is exactly the encoder that as far as possible merges same treatment, makes resource utilization reach the highest.As shown in Figure 3,

terminal

1,2,3 is maximum tripartite and can keep a period of time (being assumed to be more than the 2s) as volume, then terminal 4,5...N coding needs

outlet terminal

1,2 and the 3 voice data superimposed data that produce, so only need be with an encoder to

terminal

1,2 and 3 voice datas that produce superpose, just can satisfy terminal 4,5...N output, and other three encoders are terminal 1 to exporting to the maximum three parts of speech volume respectively, 2,3 data are encoded, be encoder C1, C2, the corresponding coding of C3 is given the tripartite meeting-place of volume maximum in the meeting, the meeting-place beyond the corresponding three parts who gives the volume maximum that encodes of encoder C4.In this case, the coder resource that needs is 1+3=4, and when the number N value that accesses terminal was big, this programme can be saved very most of resource.

The shortcoming of prior art two:

Top situation is the maximum tripartite processing in one case of hypothesis volume, if the meeting-place of making a speech in meeting changes, be that the maximum three parts of volume during corresponding audio mixing is handled changes, maximum three parts is changed to terminal 1,4,5 as volume, the audio frequency superimposed data of this moment terminal 1 outlet terminal 4 and 5 generations, terminal 4 outlet terminals 1 and the 5 audio frequency superimposed data that produce, terminal 5 outlet terminals 1 and the 4 audio frequency superimposed data that produce, other-end outlet terminal 1,4 and the 5 audio frequency superimposed data that produce.

Yet the maximum three parts in this meeting changes, and when causing the encoder that sends to terminal to switch, because the state of encoder is relevant before and after being, directly switching to influence sound effect, thus the sound effect variation that causes terminal to be heard.For example in above-mentioned Fig. 3, the three parts of volume maximum is varied to 1,4,5 from original 1,2,3, then concerning terminal 2, be originally that encoder C2 encodes to the data that send to terminal 2, after maximum three parts switches to 1,4,5, the data that send to terminal 2 become the coding with encoder C4, then sound that terminal 2 is heard will be in a period of time of switching variation.For terminal 3,4,5 also same problem can appear.

Summary of the invention

For overcoming the above problems, the invention provides a kind of audio mixing processing method and device, when avoiding the maximum three parts of volume in the meeting to change, the problem of the sound effect variation of hearing in terminal.

A kind of audio mixing processing method provided by the invention comprises: when the terminal of volume maximum changes, to export to change before and the coding control carried out respectively of the audio signal of the terminal of afterwards volume maximum.

Wherein, the described coding control of carrying out respectively comprise to before changing and the terminal of volume maximum afterwards separately the distributing independent encoder come described coding audio signal control.

The present invention comprises further that also the terminal retention time of volume maximum merges the encoder of same process above after certain threshold value.

Wherein said same process includes identical input and output signal.

And the terminal of described volume maximum is meant a strongest side or the above corresponding terminal of a side of audio signal that is input to multipoint control unit MCU.

Audio mixing processing unit disclosed by the invention comprises decoder, audio mixing module, encoder and encoder hand-off process module; Wherein:

Decoder: the audio frequency that receives is carried out audio decoder, obtain original voice data;

Audio mixing module: described voice data through decoder processes is carried out envelope calculate, several sides of volume maximum are carried out audio mixing handle;

Encoder: the voice data after handling through audio mixing is encoded;

Encoder hand-off process module: quantity and handoff procedure to the encoder that carries out encoding process are controlled.

Wherein, the described process that the quantity and the handoff procedure of encoder are controlled is: when the terminal of volume maximum changes, when the terminal of volume maximum changes, to export to change before and the coding control carried out of distributing independent encoder respectively of the audio signal of the terminal of afterwards volume maximum; Behind the certain hour, merge the encoder of same process.And when the terminal of volume maximum changed, the information exchange of the described terminal corresponding codes of described encoder hand-off process module controls device kept continuously the information of encoder and state.

Utilize the present invention, when the terminal of volume maximum changes, after the voice data that accesses terminal being decoded and audio mixing handles, to the terminal of the volume maximum before and after changing distributing independent encoder separately, give corresponding terminal after the coding audio signal control and treatment to its output.In the quantity that has controlled encoder, guaranteed speech quality.

Description of drawings

Fig. 1 is the video conference networking diagram;

Fig. 2 handles schematic diagram for audio-frequency unit;

Fig. 3 merges the processing schematic diagram of encoder for audio frequency;

Fig. 4 is an audio frequency processing system frame diagram of the present invention;

Fig. 5 is an embodiment of the invention Audio Processing flow chart.

Embodiment

Core concept of the present invention is exactly at the encoder that merges same treatment as far as possible, simultaneously when several sides of volume maximum change, the quantity and the handoff procedure of the encoder that the audio signal that sends to terminal is handled are controlled, and guarantee the terminal audio frequency output quality when saving encoder quantity.

Audio mixing treatment system provided by the invention is to carrying out audio mixing after the terminal data decoding that receives, output after controlled encoder is encoded to audio signal then.This system comprises decoder module, audio mixing module, coding module and encoder hand-off process module.As shown in Figure 4, wherein

Decoder module: the audio frequency that receives is carried out audio decoder, obtain original voice data;

Audio mixing module: voice data is carried out envelope calculate, several sides of volume maximum are carried out audio mixing handle;

Coding module: encode to handling the original voice data in back through audio mixing;

Encoder hand-off process module: quantity and handoff procedure to the encoder that sends to terminal are controlled.

The encoder changing method that the present invention adopts, when the terminal of volume maximum changes, to export to change before and the coding control carried out respectively of the audio signal of the terminal of afterwards volume maximum.After keeping a period of time, merge the encoder of same treatment.

With a specific embodiment the present invention is specifically described below, as shown in Figure 5:

Suppose that a meeting has meeting-

place

1,2,3,4,5,6,7,8,9,10, the maximum tripartite terminal of certain volume constantly is 1,2,3, and the maximum tripartite terminal of next moment volume is 1,5,6, and keeps more than the 2s.

When the maximum three parts of volume is

terminal

1,2,3, used 4 encoders, wherein give the volume in the meeting maximum tripartite respectively for 3, promptly the C1 encoder is distributed to terminal 1, the data of

encoding terminal

2 and 3 stacks; The C2 encoder is distributed to terminal 2, the data of

encoding terminal

1 and 3 stacks; The C3 encoder is distributed to terminal 3, the data of

encoding terminal

1 and 2 stacks; Another one is given the terminal beyond the three parts of other volume maximums, i.e. terminal 4,5,6,7,8,9,10 shared encoder C4, the data of

encoding terminal

1,2 and 3 stacks.

Maximum three parts is changed in 1,5,6 in volume, in order to keep the continuous of encoder, distributed encoder C5, C6 in addition for respectively new terminal 5 and 6 of participating in audio mixing, the information that copies terminal 5 and 6 encoder C4 is before simultaneously given C5 and C6, and coded message and encoding state thereof in terminal 5 and 6 are kept continuously.In addition for

terminal

2 and 3, though it is the same with the data of giving terminal 4,7,8,9,10 with 3 data to give terminal 2, but for the influence that encoder switches sound is reduced, so temporarily keep the encoder of

terminal

2 and 3, the encoder that other-end 4,7,8,9,10 uses is constant, uses encoder C4.The terminal 1 original encoder C1 that uses does not change yet.Like this, when the maximum three parts of volume changed into terminal 1,5,6, the number of the encoder that uses was 6 altogether.

If the maximum three parts of volume is a terminal 1,5,6 state (is assumed to be 2s more than keeping 2s, purpose is that the switching that guarantees encoder does not influence sound effect as far as possible), for

terminal

2 and 3, its encoder C2 is the same with encoder C4 coded data with the C3 coded data, through after a while (2s) synchronously after, can think the state consistency of state fundamental sum encoder C4 of encoder C2 and C3, so just can reclaim encoder C2 and C3, and encoder C4 coded data is given

terminal

2 and 3 simultaneously, it is

terminal

2,3,4,7,8,9,10 shared encoder C4, the maximum tripartite terminal of volume is used encoder separately respectively, and the encoder number of this moment becomes 4 again.

If the maximum tripartite terminal of volume changes again in the time of 2s,,, otherwise do not need newly-increased encoder then for it distributes new encoder if the maximum tripartite terminal of volume did not have independently encoder originally.For the terminal corresponding codes device beyond the three parts of volume maximum, if coded data is the same with shared encoder C4 coded data and the duration reaches 2s when above, can reclaim terminal corresponding codes device, and export with common encoder C4 coded data.The rest may be inferred for other situations.

In sum, audio mixing treatment system in the video conference can be divided into several sections such as decoding, audio mixing, encoder switching, coding, after the voice data that accesses terminal being decoded and audio mixing handles, according to the method output needs coded data that above-mentioned encoder switches, then these are carried out giving corresponding terminal after the encoding process.When having guaranteed speech quality, controlled the quantity of encoder.

The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of claim.

Claims

1, a kind of audio mixing processing method, it is characterized in that, when the terminal of volume maximum changes, give before changing and the terminal of volume maximum afterwards distributing independent encoder separately, to export to change before and the coding control independently carried out respectively of the audio signal of the terminal of afterwards volume maximum;

After the terminal retention time of volume maximum surpasses certain threshold value, merge the encoder of same process.

2, method according to claim 1 is characterized in that, described same process includes identical input and output signal.

3, method according to claim 1 is characterized in that, the terminal of described volume maximum is meant a strongest side or the above corresponding terminal of a side of audio signal that is input to multipoint control unit MCU.

4, a kind of audio mixing processing unit is characterized in that, this device comprises decoder, audio mixing module, encoder and encoder hand-off process module; Wherein:

Encoder: the voice data after handling through audio mixing is encoded;

Encoder hand-off process module: when the terminal of volume maximum changes, to export to change before and the coding control carried out of distributing independent encoder respectively of the audio signal of the terminal of afterwards volume maximum, after the terminal retention time of volume maximum surpasses certain threshold value, merge the encoder of same process.

5, device according to claim 4 is characterized in that, described same process includes identical input and output signal.

6, device according to claim 4 is characterized in that, when the terminal of volume maximum changed, the information exchange of the described terminal corresponding codes of described encoder hand-off process module controls device kept continuously the information of encoder and state.