WO2016150235A1

WO2016150235A1 - Method and device for webrtc p2p audio and video call

Info

Publication number: WO2016150235A1
Application number: PCT/CN2016/070377
Authority: WO
Inventors: 巫妍
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-03-26
Filing date: 2016-01-07
Publication date: 2016-09-29
Also published as: CN104780335A; CN104780335B

Abstract

A method for a WebRTC point-to-point audio and video call, a WebRTC server, and a WebRTC client, so as to allow a user to overcome a language obstacle and make a call more conveniently. In a multi-user video conference, a speaker automatically analyzes, displays and translates subtitles or translates audios, and the user can easily determine who is speaking and identify speech content, without searching for the speaker in multiple video windows.

Description

Method and device for WebRTC P2P audio and video call

Technical field

The invention relates to the field of WebRTC P2P audio and video call technology, in particular to a method for WebRTC P2P audio and video call and a WebRTC server and a WebRTC client.

Background technique

With the development of www World Wide Web and mobile Internet, HTML5 (Hyper Text Mark-up Language 5) has gradually become a hot spot in the market and standards. As a new development direction of network technology, an important core technical content of HTML5 is WEBRTC technology. WebRTC (Web Real-Time Communication) implements web-based video conferencing, with the goal of real-time communication (Real-Time Communications) capabilities by providing simple javascript in the browser.

The ultimate goal of the WebRTC project is to enable web developers to quickly and easily develop rich real-time multimedia applications based on browsers (such as Chrome, FireFox...) without having to download and install any plug-ins. Web developers do not need to pay attention to multimedia numbers. The signal processing process can be realized by simply writing a simple Javascript program. The W3C (World Wide Web Consortium, World Wide Web Consortium) and other organizations are responsible for formulating Javascript (JS) standard API (Application Programming Interface), and WebRTC also I hope to build a platform for robust real-time communication between multiple Internet browsers, forming a good ecological environment for developers and browser vendors.

WebRTC technology has become one of the HTML5 standards. And with the maturity of the WebRTC standard, various applications based on WebRTC technology have emerged in the market. These applications are characterized by the use of WEB (web) technology for development, and because browser vendors have gradually supported webrtc technology, applications developed using webrtc technology can also run on various PC terminals or mobile terminals that support webrtc browsers. . Such technology trends have made development much less difficult, and the development work for maintaining multi-terminal and multi-version has been greatly reduced.

With the development of web technologies, some applications developed using HTML5 technology are also increasing. As an important part of the HTML5 standard, WEBRTC implements real-time communication between browsers. More and more browser manufacturers, led by chrome, have announced support for the webrtc standard.

among them. Typical application scenarios for Webrtc technology and standards are point-to-point calls, multi-party video conferencing, customer service centers, and distance education. That is to say, the browser application developed by webrtc technology can realize the functions of acquiring microphone, sharing screen, acquiring camera, streaming media transmission, etc. in real-time communication, so that the user can perform real-time conversation in the browser. However, the effect and experience of the conference of audio and video multi-party calls in the browser developed by the webrtc standard interface still needs further improvement. For example, the screen window of the multi-party conference is relatively small, it is difficult to judge who is speaking, and the conference speaks. Records can only be saved by recording, but subtitles cannot be saved. For example, when the participants in the conference use different languages, the language barrier of communication needs to help display the subtitles to better enhance the user experience.

Summary of the invention

The technical problem to be solved by the present invention is to provide a WebRTC point-to-point audio and video call method and a WebRTC server and a WebRTC client to implement a call across language barriers.

In order to solve the above technical problems, the following technical solutions are adopted:

A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:

After receiving the subtitle request message or the subtitle request message of the first WebRTC client, the WebRTC server sends the subtitle request message or the subtitle request message to one or more target WebRTC clients;

After receiving the subtitles or translated subtitles returned by the target WebRTC client, the WebRTC server sends the subtitles or the translated subtitles to the first WebRTC client in real time.

Optionally, the subtitle request message includes: a translation source language, a translation target language, and a translation return type, and the translation return type includes a text translation and/or a speech translation.

A webpage real-time communication WebRTC server includes: a first transmission module and a second transmission module, wherein

The first transmission module is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;

The second transmission module is configured to: after receiving the subtitle or the translated subtitle returned by the one or more target WebRTC clients, send the subtitle or the translated subtitle to the first WebRTC client in real time. end.

The WebRTC client sends a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;

After receiving the subtitles or subtitles returned by the WebRTC server, the WebRTC client displays the subtitles or subtitles in a video frame of the corresponding target WebRTC client.

Optionally, the method further includes:

The WebRTC client saves the subtitle or the translated subtitle.

A WebRTC client, including a sending module and a display module, wherein

The sending module is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;

The display module is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, displaying the subtitles or subtitles in a video frame of the corresponding target WebRTC client.

Optionally, the client further includes a save module, wherein

The saving module is configured to: save the subtitle or the translated subtitle.

After receiving the subtitle request message of the WebRTC server, the WebRTC client sends its own audio to the voice analysis subtitle server;

The WebRTC client returns the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.

Optionally, the step of the WebRTC client receiving the subtitle returned by the voice analysis subtitle server and returning the subtitle to the WebRTC server includes:

After receiving the subtitle returned by the voice analysis subtitle server, the WebRTC client sends a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;

After receiving the translated subtitle returned by the translation server, the WebRTC client sends the translated subtitle to the WebRTC server.

Optionally, the subtitle request further includes: a translation return type, where the translation return type includes a voice translation;

The method further includes: after receiving the translated audio returned by the translation server, the WebRTC client puts the translated audio into a real-time video stream, and sends the requested subtitle through a pre-established media channel. WebRTC client.

A WebRTC client includes: a first transmission module and a second transmission module, wherein

The first transmission module is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;

The second transmission module is configured to: return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.

Optionally, the second transmission module is configured to return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server as follows:

After receiving the subtitle returned by the voice analysis subtitle server, sending a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;

After receiving the translated subtitles returned by the translation server, the translated subtitles are sent to the WebRTC server.

Optionally,

The subtitle request further includes: a translation return type, the translation return type including a voice translation;

The WebRTC client further includes a third transmission module, wherein

The third transmission module is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to a WebRTC client requesting subtitles through a pre-established media channel. end.

In summary, the method for WebRTC point-to-point audio and video call provided by the embodiment of the present invention and the WebRTC server and the WebRTC client enable the user to cross the language barrier and make the call more convenient. In a multi-person video conference, the speaker will automatically parse and display subtitles, subtitles, or translated audio. Users can easily determine who is speaking and identifying the content of the speech without having to find a speaker in multiple video windows.

BRIEF abstract

1 is a functional block diagram of a related art webrtc server;

2 is a flow chart of a related art using a webrtc technology to establish a call between two parties;

FIG. 3 is a flowchart of requesting subtitles when a webrtc establishes a P2P (Peer to Peer) call between two parties according to the first embodiment of the present invention;

4 is a flowchart of requesting translation of subtitles when a webrtc establishes a P2P two-party call according to a second embodiment of the present invention;

FIG. 5 is a schematic diagram of a P2P media channel established when webrtc establishes a P2P three-party conference;

6 is a flowchart of requesting subtitles when a webrtc establishes a P2P three-party conference according to Embodiment 3 of the present invention;

FIG. 7 is a schematic diagram of requesting subtitle/turning when establishing a P2P three-party conference by webrtc according to Embodiment 4 of the present invention; Translated audio flow chart;

FIG. 8 is a schematic diagram of a WebRTC server according to an embodiment of the present invention; FIG.

FIG. 9 is a schematic diagram of a WebRTC client as a requesting subtitle side according to an embodiment of the present invention; FIG.

FIG. 10 is a schematic diagram of a target WebRTC client according to an embodiment of the present invention.

Preferred embodiment of the invention

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

1 is a functional block diagram of a related art webrtc server. The Webrtc server includes:

Web server: A web service for providing webrtc. The user accesses the webserver in a browser app (application) client to obtain a webrtc service.

The user accesses the web server function module of the webrtc server through the browser to open the application. The service deployed on the web server complies with the relevant standards of the webrtc, and the user can register, establish an audio call, and establish a multi-party video through the webrtc standard JS in the browser. Call and other functions. The web server can also include application management related functions other than the standard, such as user information maintenance and friend management.

Signaling server: used for signaling interaction when webrtc establishes a connection.

Media processing module: for processing media, including segmentation segmentation of the real-time media stream, sending to the external subtitle server, the translation server, and integrating the subtitle or audio into the audio and video stream of the real-time conversation when receiving the returned subtitle or audio.

Conference Control Module: The user controls the conference in the webrtc conference, including the control of creating a conference, exiting the conference, joining the conference member, and controlling the conference host.

The firewall traverses the server and is used for firewall traversal of webrtc audio and video conferences/audio and video calls.

The firewall traversal function module enables application developers on the webrtc browser side to use standard interfaces. The port is used to obtain firewall traversal information. The function module can be deployed on the webrtc server or deployed elsewhere.

The Webrtc client refers to the address that the user accesses through the browser, and the application on the browser side deployed above, and the user accesses the web server on the webrtc server through the webrtc client.

In the webrtc application, regardless of the client-side Javascript code of the browser or the server code on the web server, the process of establishing audio and video communication using the webrtc standard needs to conform to the webrtc standard.

On the application side, using javascript code to control browser access to the webrtc service on the webrtc server is a typical feature of webrtc technology. This technical feature makes the browser take on more work, and the browser vendor must provide the necessary functions to support the webrtc technology, so that the javascript code running in the browser can be called in the video call through the unified standard of the browser. Signaling and media interaction. This technology greatly simplifies the browser services provided by developers, shielding the underlying media and signaling, and only needs to call simple javascript. So webrtc technology will be a trend now and in the future. With the development of mobile terminals, more and more browsers, mobile browsers, and mobile webkits support webrtc. This trend makes application development easier and more convenient for multi-terminals.

The webrtc P2P audio and video conference/audio and video call realized by the device enables the user to conduct a call/meeting in real time using multiple languages, realizing synchronous subtitle translation of audio and video streams or directly translating into voice. It enables users to cross language barriers and communicate more conveniently when using the device for calls/conferences.

The webrtc P2P audio and video conference/audio and video call application mainly has the following features: 1. The user of the audio/video conference/audio video call can view the subtitle of the other party's voice in real time in the audio and video conference/audio and video call; 2, audio and video Users of conference/audio video calls can choose to translate the target language, the system translates the other language into a language that they can understand, and displays the translated subtitles; 3. Users of audio and video conferences/audio and video calls can choose to translate the target language, the system Translate the language of the other party into the target language and display the translated subtitles while playing the spoken language.

Figure 2 is a flow diagram of implementing a point-to-point call using webrtc technology. This flowchart covers the core functions of the various functional modules in the webrtc server during webrtc point-to-point calls. User A is used in the flowchart to represent User A's browser and the user's client application. Client should With the web service provided by the web server function module deployed on the webrtc server, user A opens an address through the browser to open the application. A detailed description of this process, as shown in Figure 2, includes the following steps:

Step 201: User A requests firewall traversal information to the firewall traversing server, and the firewall returns information for traversing to user A;

Step 202: User A sends a media call request to a signaling server in the webrtc server.

Step 203: The signaling server sends a media call request of A to user B.

Step 204: User B sends a request for firewall traversal information to the firewall traversing server, and the firewall returns information for traversing to user B.

Step 205: User B sends a response to the signaling server.

Step 206: The media connection between the user A and the user B is established, and the A and B can make a point-to-point call through the media link.

The above steps are the process of making a point-to-point call in the browser using the webrtc protocol. This process is also a typical process used by existing webrtcs to implement point-to-point calls.

The improvement of the related procedure of the webrtcP2P video call in the embodiment of the present invention is mainly after the establishment of the P2P media channel or the data channel of the two parties, the process is a standard process for the webrtc to establish a media channel, which is before the embodiment of the present invention. Set the condition. After the P2P media channel is established by the party, the subtitle or request for subtitles can still be requested through the signaling server of the webrtc server, which is the invention of the present invention.

The embodiment of the invention provides a WebRTC point-to-point audio and video call method and a WebRTC server and a WebRTC client, so that the user can cross the language barrier and make the call more convenient. In a multi-person video conference, the speaker will automatically parse and display the subtitles, and the user can easily determine who is speaking without having to find a speaker in multiple video windows. And, when the language is not available, this system architecture also provides full multi-language subtitle translation and speech translation. Subtitle translation refers to the translation of subtitles into the language of the requested translation based on the real-time speech text after speech analysis is performed on a user who is speaking. Voice translation refers to the speech analysis of a user who is speaking to form a text, and then translate the subtitle into the language of the request translation according to the real-time speech text. The corresponding subtitles of the words, and the subtitles are converted into corresponding audios of the language requesting translation.

The method of the embodiment of the present invention can perform speech analysis on the speech of the speaking member, form a text and display the subtitle. Further, the parsed text can be translated to display the subtitle of the translation target language, and further, Translating the text of the target language for voice conversion, synthesizing the converted audio stream into the video stream, and directly playing the voice of the translation target language.

For subtitle and subtitle translation, there are three typical application scenarios: 1. User A requests user B's subtitle, 2, User A requests User B's subtitle, and User A requests User B's translation voice.

The following embodiments will describe these application scenarios in detail.

Example 1

FIG. 3 is an operation diagram of requesting subtitles during a P2P call between webrtc. It is assumed that user A and user B have established media channels according to the process of FIG. 2 or the WEBRTC application itself, and the media channel can be used for normal P2P video calls. This embodiment describes a flow chart of user A requesting subtitles of user B during a P2P video call.

Step 301: User A sends a subtitle request message to a signaling server of the webrtc server.

Step 302: The webrtc signaling server sends a subtitle request message to the user B.

Step 303, after receiving the subtitle request, the user B sends its own audio to the voice analysis subtitle server;

Step 304, the voice analysis subtitle server parses the audio into subtitles, and returns the subtitles to the user B;

Step 305, user B returns the subtitle to the webrtc signaling server;

Step 306, the webrtc signaling server returns the subtitle to the user A, and the browser of the user A displays the received subtitle of B in the video frame of B.

The voice analysis subtitle server is an external server, which is not the inventive content of the present invention. The main function of the speech analysis subtitle server is to analyze the audio in real time, and parse the speech into subtitles and return. In this embodiment, the browser side client of the user must send the audio part of the video stream to the voice analysis subtitle server in real time to parse the voice in real time, and the rules of the audio segmentation are sent by the client on the browser side according to the user's habits and The voice pauses to decide.

The flow of this embodiment is the flow of the user A requesting the subtitle of the user B. Similarly, B can also request the subtitle of A at the same time. The process is the same. For both video calls, the default is to display subtitles. Only the webrtc application itself uses the basic principle of this process to set whether to request subtitles.

Embodiment 2 is a flow for requesting subtitle translation. Compared with the first embodiment, the flow in the second embodiment has a step after the speech analysis parses the subtitles. The step is to send the parsed words to the external translation server, and the external translation server translates the subtitles. Returns text-to-speech or voice-audio in a translated language. Figure 4 is a diagram showing the steps of requesting translation of text subtitles in the second embodiment. among them,

Step 401: User A sends a subtitle request message to the signaling server of the webrtc server, and formulates a target language for translation. Assuming B uses the language as English, A wants B's subtitle to be translated into Chinese and displayed.

Step 402: The webrtc signaling server sends a subtitle request message to the user B, where the request message includes a translation source language, a translation target language, and a translation return type (the translation return type is assumed to be a text translation or a speech translation);

Step 403, after receiving the subtitle request, the user B sends its own audio to the voice analysis subtitle server;

Step 404, the voice analysis subtitle server parses the audio into subtitles, and returns the subtitles to the user B;

In step 405, user B sends a subtitle request to the translation server. The request contains parsed subtitles, translation source language, translation target language, translation return type;

Assuming the parameter translation return type of the translation request is set to literal translation, perform the following steps:

Step 406a, the translation server returns the translated subtitles to the user B according to the translation request;

Step 407a, user B returns the translated subtitles to the webrtc signaling server;

Step 408a, the webrtc signaling server returns the translated subtitles to the user A, and the browser of the user A displays the received subtitles of the B in the video frame of the B;

Assuming the parameter translation return type of the translation request is set to speech translation, perform the following steps:

In step 406b, the translation server returns the translated subtitles and audio to the user B according to the translation request. User B puts the translated audio into the real-time video stream and turns the video through the media channel. The translated audio is sent to user A;

Step 407b, user B returns the translated subtitles to the webrtc signaling server;

In step 408b, the webrtc signaling server returns the translated subtitles to the user A, and the browser of the user A displays the translated subtitles of the received B in the video frame of B.

For requests of different translation types, the external translation server selects different operational flows based on the return type parameters in the request.

FIG. 5 is a schematic diagram of a three-party P2P call after a media channel is established. In the embodiment of the present invention, the media channel connection of the P2P has been completed in the webrtc, that is, on the basis of completing FIG. 5, the process of subtitle parsing, subtitle translation, and audio translation is added, so that the user can cross the webrtc P2P call during the three-party webrtc P2P call. Language barriers, subtitle analysis, language translation, and speech translation.

Embodiment 3, FIG. 6 shows a flow of realizing subtitle parsing after webrtc has completed the media channel connection of P2P.

Pre-conditions: User A, User B and User C have logged in using the WEBRTC video conferencing system and established a three-party P2P call. A media channel has been established between A, B and C. The signaling channel is still commanded by the signaling server of webrtc.

This embodiment assumes that A requests the subtitles of B and C.

Step 601: User A requests subtitles of user B and user C to the webrtc signaling server.

Step 602: The webrtc signaling server sends a subtitle request to the user C.

Step 603: The user C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.

Step 604: The voice analysis subtitle server returns the subtitles parsed by the voice to C;

Step 605: User C returns real-time subtitles to the webrtc signaling server.

Step 606: The webrtc signaling server sends a subtitle request to the user B.

Step 607: User B sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.

Step 608: The voice analysis subtitle server returns the subtitles parsed by the voice to B;

Step 609: User B returns real-time subtitles to the webrtc signaling server.

Step 610: The webrtc signaling server sends the subtitles to the user A in real time when receiving the subtitles of the users B and C, and the user A displays the subtitles in the video dialog boxes of the users B and C according to the returned result.

For the above process, step 602 to step 605 and step 606 to step 609 can be performed simultaneously, that is, when the webrtc signaling server receives the requested subtitle, it can simultaneously initiate subtitle requests to users B and C, and users B and C are When the speech is made, the subtitles are returned to the webrtc signaling server in real time according to the situation of the speech, and the webrtc signaling server transmits the subtitles to the user A in real time upon receiving the subtitles.

Similarly, when the user B needs to request the subtitle, the subtitle request can also be initiated to the webrtc signaling server. When the user C needs to request the subtitle, the subtitle request can also be initiated to the webrtc signaling server.

The conference can also be set to automatically add subtitles to each user. In this case, the browser-side application only needs to be tested by the user to initiate a subtitle request to the speech analysis subtitle server to obtain subtitles and then send the subtitles to the webrtc signaling server and be signaled by webrtc. The server can perform subtitle distribution.

Example 4:

This embodiment assumes that User A requests B and C subtitles.

Step 701: User A requests translation subtitles of user B and user C to the webrtc signaling server.

Step 702: The webrtc signaling server sends a request to the user C to request subtitle translation;

Step 703: The user C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.

Step 704: The voice analysis subtitle server returns a subtitle that is parsed by the voice to C.

Step 705: The user C initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is a text translation;

Step 706, the translation server returns the translated subtitles to the user C according to the translation request;

Step 707: User C returns the translated subtitles to the webrtc signaling server.

Step 708: The webrtc signaling server sends a request to the user B to request subtitle translation;

Step 709: User B sends its own speech audio to an external speech analysis subtitle server. Request subtitle parsing;

Step 710: The voice analysis subtitle server returns the subtitles parsed by the voice to B.

Step 711: User B initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is text translation.

Step 712, the translation server returns the translated subtitles to the user B according to the translation request;

Step 713: User B returns the translated subtitle to the webrtc signaling server.

Step 714: The WEBRTC signaling server returns the translated subtitles of B and C according to the user A.

For the above process, step 702 to step 707 and step 708 to step 613 can be performed simultaneously, that is, when the webrtc signaling server receives the requested subtitle, it can simultaneously initiate subtitle requests to users B and C, and user B and C are When the speech is made, the translated subtitles are returned to the webrtc signaling server in real time according to the situation of the speech, and the webrtc signaling server transmits the subtitles to the user A in real time upon receiving the subtitles. After receiving A, the subtitles of B or C are displayed in real time.

For the process of requesting subtitles, the request only needs to be sent once, but the returned subtitle message is returned in real time according to the design of the application. That is to say, A only needs to request subtitles once. As user B, after receiving the request of A, B will send its own audio segment to the external speech analysis subtitle server and the external translation server during the call, and then according to The segmentation of the speech returns the subtitle or subtitle or translated audio.

Example 5:

This embodiment assumes that A requests B and C to translate audio and subtitles. Assume that the language used by A is Chinese, and the language used by User B and User C is English. User A wants to translate the conference voices of B and C in the video conference. The flowchart of this embodiment is also shown in FIG. 7, and includes the following steps:

Step 801: User A requests translation subtitles of User B and User C to the webrtc signaling server.

Step 802: The webrtc signaling server sends a request to the user C to request subtitle translation;

Step 803: User C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.

Step 804: The voice analysis subtitle server returns the subtitles parsed by the voice to the C;

Step 805: The user C initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is text and speech translation.

Step 806, the translation server returns the subtitles and the translated audio to the user C according to the translation request;

Step 807: User C replaces the translated audio into the associated video stream. At the same time, the translated subtitles are returned to the webrtc signaling server;

Step 808: The webrtc signaling server sends a request to the user B to request subtitle translation;

Step 809: User B sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.

Step 810: The voice analysis subtitle server returns the subtitles parsed by the voice to B;

Step 811: User B initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is text and speech translation.

Step 812: The translation server returns the translated subtitles and audio to the user B according to the translation request. User B puts the translated audio into the real-time video stream and sends the video and the translated audio to User A through the media channel.

Step 813, user B replaces the translated audio into the related video stream, and user B returns the translated subtitle to the webrtc signaling server;

Step 814: The webrtc signaling server returns the translated subtitles of B and C to the user A, and the browser application of the user A displays the translated subtitle of the B in the video frame of the B according to the received subtitle, and the received user C The translated subtitles are displayed in the video box of C.

The embodiment of the present invention provides a WebRTC point-to-point audio and video call method, which uses the webrtc technology to implement voice parsing in a video call and a video conference, and generates subtitles, subtitles, and translated audio. Through this system, the session members of the webrtc video conference can view the real-time subtitles of the conference spokesperson in the conference video window. Through the system, voice parsing and voice translation can also be completed in the webrtc point-to-point audio and video call, and the translated voice is parsed into text subtitles displayed on the user's video call window, or the translated speech is parsed into other languages. Speech and synthesis In the original video stream. The translated language text can also be saved as a meeting minutes content. The embodiment of the present invention can request subtitle translation or voice translation in a user who uses a different language for a call or conference, and can save the conference content as a conference minutes in a dialog text manner.

FIG. 8 is a schematic diagram of a WebRTC server according to an embodiment of the present invention. As shown in FIG. 8, the WebRTC server of this embodiment includes:

The first transmission module 801 is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;

The second transmission module 802 is configured to: after receiving the subtitle or the translated subtitle returned by the target WebRTC client, send the subtitle or the translated subtitle to the first WebRTC client in real time.

FIG. 9 is a schematic diagram of a WebRTC client according to an embodiment of the present invention. The WebRTC client can be used as a requesting subtitle. As shown in FIG. 9, the WebRTC client in this embodiment includes:

The sending module 901 is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;

The display module 902 is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, display the subtitles or subtitles in a video frame of the corresponding target WebRTC client.

In a preferred embodiment, the WebRTC client further includes:

The saving module 903 is configured to: save the subtitle or the subtitle.

FIG. 10 is a schematic diagram of a WebRTC client according to an embodiment of the present invention. The WebRTC client can be used as a target client. As shown in FIG. 10, the WebRTC client in this embodiment includes:

The first transmission module 1001 is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;

The second transmission module 1002 is configured to: after receiving the subtitle returned by the speech analysis subtitle server, return the subtitle to the WebRTC server.

In a preferred embodiment, the second transmission module 1002 is specifically configured to: after receiving the subtitle returned by the voice analysis subtitle server, send a subtitle request to the translation server, where The subtitle request includes: the subtitle, the translation source language, and the translation target language; after receiving the translated subtitle returned by the translation server, the translated subtitle is sent to the WebRTC server.

In a preferred embodiment, the subtitle request further includes: a translation return type, the translation return type includes a voice translation; and the WebRTC client further includes:

The third transmission module 1003 is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to the WebRTC client requesting the translation of the subtitle through the pre-established media channel. end.

The embodiment of the invention further discloses a computer program, comprising program instructions, when the program instruction is executed by the server, so that the terminal can perform any of the above-mentioned server-side webpage real-time communication WebRTC point-to-point audio and video call methods.

The embodiment of the invention also discloses a carrier carrying the computer program.

The embodiment of the invention further discloses a computer program, including program instructions, when the program instruction is executed as a client requesting a subtitle party, so that the terminal can execute any of the above-mentioned webpages of the client side as the request subtitle party in real time. A method of communicating WebRTC point-to-point audio and video calls.

The embodiment of the invention further discloses a computer program, comprising program instructions, when the program instruction is executed by the target client, the terminal can execute any of the above-mentioned target client side webpage real-time communication WebRTC point-to-point audio and video call method .

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

One of ordinary skill in the art will appreciate that all or a portion of the steps described above can be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any What is the combination of specific forms of hardware and software.

The above is only a preferred embodiment of the present invention, and of course, the present invention may be embodied in various other embodiments without departing from the spirit and scope of the invention. Corresponding changes and modifications are intended to be included within the scope of the appended claims.

Industrial applicability

The method for the WebRTC point-to-point audio and video call provided by the embodiment of the invention and the WebRTC server and the WebRTC client enable the user to cross the language barrier and make the call more convenient. In a multi-person video conference, the speaker will automatically parse and display subtitles, subtitles, or translated audio. Users can easily determine who is speaking and identifying the content of the speech without having to find a speaker in multiple video windows. Therefore, the present invention has strong industrial applicability.

Claims

A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:

After receiving the subtitle request message or the subtitle request message of the first WebRTC client, the WebRTC server sends the subtitle request message or the subtitle request message to one or more target WebRTC clients;

After receiving the subtitles or translated subtitles returned by the target WebRTC client, the WebRTC server sends the subtitles or the translated subtitles to the first WebRTC client in real time.
The method of WebRTC point-to-point audio and video call according to claim 1, wherein

The subtitle request message includes a translation source language, a translation target language, and a translation return type, and the translation return type includes text translation and/or speech translation.
A webpage real-time communication WebRTC server includes: a first transmission module and a second transmission module, wherein

The first transmission module is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;

The second transmission module is configured to: after receiving the subtitle or the translated subtitle returned by the one or more target WebRTC clients, send the subtitle or the translated subtitle to the first WebRTC client in real time. end.
The WebRTC server of claim 3, wherein

The subtitle request message includes a translation source language, a translation target language, and a translation return type, and the translation return type includes text translation and/or speech translation.
A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:

The WebRTC client sends a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;

After receiving the subtitles or subtitles returned by the WebRTC server, the WebRTC client displays the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
A method of WebRTC point-to-point audio and video calling as claimed in claim 5, wherein

The subtitle request message includes a translation source language, a translation target language, and a translation return type, and the translation return type includes text translation and/or speech translation.
The method of the WebRTC point-to-point audio and video call according to claim 5 or 6, the method further comprising:

The WebRTC client saves the subtitle or the translated subtitle.
A WebRTC client, including a sending module and a display module, wherein

The sending module is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;

The display module is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, displaying the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
The WebRTC client of claim 8, the client further comprising a save module, wherein

The saving module is configured to: save the subtitle or the translated subtitle.
A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:

After receiving the subtitle request message of the WebRTC server, the WebRTC client sends its own audio to the voice analysis subtitle server;

The WebRTC client returns the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
A method of WebRTC point-to-point audio and video calling as claimed in claim 10, wherein

The step of the WebRTC client receiving the subtitle returned by the voice analysis subtitle server and returning the subtitle to the WebRTC server includes:

After receiving the subtitle returned by the voice analysis subtitle server, the WebRTC client sends a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;

After receiving the translated subtitle returned by the translation server, the WebRTC client sends the translated subtitle to the WebRTC server.
A method of WebRTC point-to-point audio and video calling according to claim 11

The subtitle request further includes: a translation return type, the translation return type including a voice translation;

The method further includes: after receiving the translated audio returned by the translation server, the WebRTC client puts the translated audio into a real-time video stream, and sends the requested subtitle through a pre-established media channel. WebRTC client.
A WebRTC client includes: a first transmission module and a second transmission module, wherein

The first transmission module is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;

The second transmission module is configured to: return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
The WebRTC client of claim 13 wherein

The second transmission module is configured to return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server as follows:

After receiving the subtitle returned by the voice analysis subtitle server, sending a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;

After receiving the translated subtitles returned by the translation server, the translated subtitles are sent to the WebRTC server.
The WebRTC client of claim 14 wherein

The subtitle request further includes: a translation return type, the translation return type including a voice translation;

The WebRTC client further includes a third transmission module, wherein

The third transmission module is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to a WebRTC client requesting subtitles through a pre-established media channel. end.