WO2016150235A1 - Method and device for webrtc p2p audio and video call - Google Patents

Method and device for webrtc p2p audio and video call Download PDF

Info

Publication number
WO2016150235A1
WO2016150235A1 PCT/CN2016/070377 CN2016070377W WO2016150235A1 WO 2016150235 A1 WO2016150235 A1 WO 2016150235A1 CN 2016070377 W CN2016070377 W CN 2016070377W WO 2016150235 A1 WO2016150235 A1 WO 2016150235A1
Authority
WO
WIPO (PCT)
Prior art keywords
webrtc
subtitle
translation
server
subtitles
Prior art date
Application number
PCT/CN2016/070377
Other languages
French (fr)
Chinese (zh)
Inventor
巫妍
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016150235A1 publication Critical patent/WO2016150235A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the invention relates to the field of WebRTC P2P audio and video call technology, in particular to a method for WebRTC P2P audio and video call and a WebRTC server and a WebRTC client.
  • HTML5 Hyper Text Mark-up Language 5
  • WEBRTC Web Real-Time Communication
  • the ultimate goal of the WebRTC project is to enable web developers to quickly and easily develop rich real-time multimedia applications based on browsers (such as Chrome, FireFox%) without having to download and install any plug-ins. Web developers do not need to pay attention to multimedia numbers.
  • the signal processing process can be realized by simply writing a simple Javascript program.
  • the W3C World Wide Web Consortium, World Wide Web Consortium
  • other organizations are responsible for formulating Javascript (JS) standard API (Application Programming Interface), and WebRTC also I hope to build a platform for robust real-time communication between multiple Internet browsers, forming a good ecological environment for developers and browser vendors.
  • WebRTC technology has become one of the HTML5 standards. And with the maturity of the WebRTC standard, various applications based on WebRTC technology have emerged in the market. These applications are characterized by the use of WEB (web) technology for development, and because browser vendors have gradually supported webrtc technology, applications developed using webrtc technology can also run on various PC terminals or mobile terminals that support webrtc browsers. . Such technology trends have made development much less difficult, and the development work for maintaining multi-terminal and multi-version has been greatly reduced.
  • Typical application scenarios for Webrtc technology and standards are point-to-point calls, multi-party video conferencing, customer service centers, and distance education. That is to say, the browser application developed by webrtc technology can realize the functions of acquiring microphone, sharing screen, acquiring camera, streaming media transmission, etc. in real-time communication, so that the user can perform real-time conversation in the browser.
  • the effect and experience of the conference of audio and video multi-party calls in the browser developed by the webrtc standard interface still needs further improvement.
  • the screen window of the multi-party conference is relatively small, it is difficult to judge who is speaking, and the conference speaks. Records can only be saved by recording, but subtitles cannot be saved. For example, when the participants in the conference use different languages, the language barrier of communication needs to help display the subtitles to better enhance the user experience.
  • the technical problem to be solved by the present invention is to provide a WebRTC point-to-point audio and video call method and a WebRTC server and a WebRTC client to implement a call across language barriers.
  • a method for webpage real-time communication WebRTC point-to-point audio and video call comprising:
  • the WebRTC server After receiving the subtitle request message or the subtitle request message of the first WebRTC client, the WebRTC server sends the subtitle request message or the subtitle request message to one or more target WebRTC clients;
  • the WebRTC server After receiving the subtitles or translated subtitles returned by the target WebRTC client, the WebRTC server sends the subtitles or the translated subtitles to the first WebRTC client in real time.
  • the subtitle request message includes: a translation source language, a translation target language, and a translation return type
  • the translation return type includes a text translation and/or a speech translation
  • a webpage real-time communication WebRTC server includes: a first transmission module and a second transmission module, wherein
  • the first transmission module is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;
  • the second transmission module is configured to: after receiving the subtitle or the translated subtitle returned by the one or more target WebRTC clients, send the subtitle or the translated subtitle to the first WebRTC client in real time. end.
  • the subtitle request message includes: a translation source language, a translation target language, and a translation return type
  • the translation return type includes a text translation and/or a speech translation
  • a method for webpage real-time communication WebRTC point-to-point audio and video call comprising:
  • the WebRTC client sends a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
  • the WebRTC client After receiving the subtitles or subtitles returned by the WebRTC server, the WebRTC client displays the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
  • the subtitle request message includes: a translation source language, a translation target language, and a translation return type
  • the translation return type includes a text translation and/or a speech translation
  • the method further includes:
  • the WebRTC client saves the subtitle or the translated subtitle.
  • a WebRTC client including a sending module and a display module, wherein
  • the sending module is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
  • the display module is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, displaying the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
  • the client further includes a save module, wherein
  • the saving module is configured to: save the subtitle or the translated subtitle.
  • a method for webpage real-time communication WebRTC point-to-point audio and video call comprising:
  • the WebRTC client After receiving the subtitle request message of the WebRTC server, the WebRTC client sends its own audio to the voice analysis subtitle server;
  • the WebRTC client returns the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
  • the step of the WebRTC client receiving the subtitle returned by the voice analysis subtitle server and returning the subtitle to the WebRTC server includes:
  • the WebRTC client After receiving the subtitle returned by the voice analysis subtitle server, the WebRTC client sends a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;
  • the WebRTC client After receiving the translated subtitle returned by the translation server, the WebRTC client sends the translated subtitle to the WebRTC server.
  • the subtitle request further includes: a translation return type, where the translation return type includes a voice translation;
  • the method further includes: after receiving the translated audio returned by the translation server, the WebRTC client puts the translated audio into a real-time video stream, and sends the requested subtitle through a pre-established media channel. WebRTC client.
  • a WebRTC client includes: a first transmission module and a second transmission module, wherein
  • the first transmission module is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;
  • the second transmission module is configured to: return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
  • the second transmission module is configured to return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server as follows:
  • the subtitle server After receiving the subtitle returned by the voice analysis subtitle server, sending a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;
  • the translated subtitles After receiving the translated subtitles returned by the translation server, the translated subtitles are sent to the WebRTC server.
  • the subtitle request further includes: a translation return type, the translation return type including a voice translation;
  • the WebRTC client further includes a third transmission module, wherein
  • the third transmission module is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to a WebRTC client requesting subtitles through a pre-established media channel. end.
  • the method for WebRTC point-to-point audio and video call provided by the embodiment of the present invention and the WebRTC server and the WebRTC client enable the user to cross the language barrier and make the call more convenient.
  • the speaker will automatically parse and display subtitles, subtitles, or translated audio. Users can easily determine who is speaking and identifying the content of the speech without having to find a speaker in multiple video windows.
  • FIG. 1 is a functional block diagram of a related art webrtc server
  • FIG. 2 is a flow chart of a related art using a webrtc technology to establish a call between two parties;
  • FIG. 3 is a flowchart of requesting subtitles when a webrtc establishes a P2P (Peer to Peer) call between two parties according to the first embodiment of the present invention
  • FIG. 4 is a flowchart of requesting translation of subtitles when a webrtc establishes a P2P two-party call according to a second embodiment of the present invention
  • FIG. 5 is a schematic diagram of a P2P media channel established when webrtc establishes a P2P three-party conference
  • FIG. 6 is a flowchart of requesting subtitles when a webrtc establishes a P2P three-party conference according to Embodiment 3 of the present invention
  • FIG. 7 is a schematic diagram of requesting subtitle/turning when establishing a P2P three-party conference by webrtc according to Embodiment 4 of the present invention. Translated audio flow chart;
  • FIG. 8 is a schematic diagram of a WebRTC server according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a WebRTC client as a requesting subtitle side according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a target WebRTC client according to an embodiment of the present invention.
  • the Webrtc server includes:
  • Web server A web service for providing webrtc. The user accesses the webserver in a browser app (application) client to obtain a webrtc service.
  • the user accesses the web server function module of the webrtc server through the browser to open the application.
  • the service deployed on the web server complies with the relevant standards of the webrtc, and the user can register, establish an audio call, and establish a multi-party video through the webrtc standard JS in the browser. Call and other functions.
  • the web server can also include application management related functions other than the standard, such as user information maintenance and friend management.
  • Signaling server used for signaling interaction when webrtc establishes a connection.
  • Media processing module for processing media, including segmentation segmentation of the real-time media stream, sending to the external subtitle server, the translation server, and integrating the subtitle or audio into the audio and video stream of the real-time conversation when receiving the returned subtitle or audio.
  • Conference Control Module The user controls the conference in the webrtc conference, including the control of creating a conference, exiting the conference, joining the conference member, and controlling the conference host.
  • the firewall traverses the server and is used for firewall traversal of webrtc audio and video conferences/audio and video calls.
  • the firewall traversal function module enables application developers on the webrtc browser side to use standard interfaces.
  • the port is used to obtain firewall traversal information.
  • the function module can be deployed on the webrtc server or deployed elsewhere.
  • the Webrtc client refers to the address that the user accesses through the browser, and the application on the browser side deployed above, and the user accesses the web server on the webrtc server through the webrtc client.
  • the webrtc P2P audio and video conference/audio and video call realized by the device enables the user to conduct a call/meeting in real time using multiple languages, realizing synchronous subtitle translation of audio and video streams or directly translating into voice. It enables users to cross language barriers and communicate more conveniently when using the device for calls/conferences.
  • the webrtc P2P audio and video conference/audio and video call application mainly has the following features: 1.
  • the user of the audio/video conference/audio video call can view the subtitle of the other party's voice in real time in the audio and video conference/audio and video call; 2, audio and video Users of conference/audio video calls can choose to translate the target language, the system translates the other language into a language that they can understand, and displays the translated subtitles; 3.
  • Users of audio and video conferences/audio and video calls can choose to translate the target language, the system Translate the language of the other party into the target language and display the translated subtitles while playing the spoken language.
  • FIG. 2 is a flow diagram of implementing a point-to-point call using webrtc technology. This flowchart covers the core functions of the various functional modules in the webrtc server during webrtc point-to-point calls.
  • User A is used in the flowchart to represent User A's browser and the user's client application. Client should With the web service provided by the web server function module deployed on the webrtc server, user A opens an address through the browser to open the application.
  • a detailed description of this process, as shown in Figure 2 includes the following steps:
  • Step 201 User A requests firewall traversal information to the firewall traversing server, and the firewall returns information for traversing to user A;
  • Step 202 User A sends a media call request to a signaling server in the webrtc server.
  • Step 203 The signaling server sends a media call request of A to user B.
  • Step 204 User B sends a request for firewall traversal information to the firewall traversing server, and the firewall returns information for traversing to user B.
  • Step 205 User B sends a response to the signaling server.
  • Step 206 The media connection between the user A and the user B is established, and the A and B can make a point-to-point call through the media link.
  • the above steps are the process of making a point-to-point call in the browser using the webrtc protocol. This process is also a typical process used by existing webrtcs to implement point-to-point calls.
  • the improvement of the related procedure of the webrtcP2P video call in the embodiment of the present invention is mainly after the establishment of the P2P media channel or the data channel of the two parties, the process is a standard process for the webrtc to establish a media channel, which is before the embodiment of the present invention. Set the condition. After the P2P media channel is established by the party, the subtitle or request for subtitles can still be requested through the signaling server of the webrtc server, which is the invention of the present invention.
  • the embodiment of the invention provides a WebRTC point-to-point audio and video call method and a WebRTC server and a WebRTC client, so that the user can cross the language barrier and make the call more convenient.
  • the speaker will automatically parse and display the subtitles, and the user can easily determine who is speaking without having to find a speaker in multiple video windows.
  • this system architecture also provides full multi-language subtitle translation and speech translation.
  • Subtitle translation refers to the translation of subtitles into the language of the requested translation based on the real-time speech text after speech analysis is performed on a user who is speaking.
  • Voice translation refers to the speech analysis of a user who is speaking to form a text, and then translate the subtitle into the language of the request translation according to the real-time speech text.
  • the corresponding subtitles of the words, and the subtitles are converted into corresponding audios of the language requesting translation.
  • the method of the embodiment of the present invention can perform speech analysis on the speech of the speaking member, form a text and display the subtitle. Further, the parsed text can be translated to display the subtitle of the translation target language, and further, Translating the text of the target language for voice conversion, synthesizing the converted audio stream into the video stream, and directly playing the voice of the translation target language.
  • FIG. 3 is an operation diagram of requesting subtitles during a P2P call between webrtc. It is assumed that user A and user B have established media channels according to the process of FIG. 2 or the WEBRTC application itself, and the media channel can be used for normal P2P video calls. This embodiment describes a flow chart of user A requesting subtitles of user B during a P2P video call.
  • Step 301 User A sends a subtitle request message to a signaling server of the webrtc server.
  • Step 302 The webrtc signaling server sends a subtitle request message to the user B.
  • Step 303 after receiving the subtitle request, the user B sends its own audio to the voice analysis subtitle server;
  • Step 304 the voice analysis subtitle server parses the audio into subtitles, and returns the subtitles to the user B;
  • Step 305 user B returns the subtitle to the webrtc signaling server
  • Step 306 the webrtc signaling server returns the subtitle to the user A, and the browser of the user A displays the received subtitle of B in the video frame of B.
  • the voice analysis subtitle server is an external server, which is not the inventive content of the present invention.
  • the main function of the speech analysis subtitle server is to analyze the audio in real time, and parse the speech into subtitles and return.
  • the browser side client of the user must send the audio part of the video stream to the voice analysis subtitle server in real time to parse the voice in real time, and the rules of the audio segmentation are sent by the client on the browser side according to the user's habits and The voice pauses to decide.
  • the flow of this embodiment is the flow of the user A requesting the subtitle of the user B.
  • B can also request the subtitle of A at the same time.
  • the process is the same.
  • the default is to display subtitles. Only the webrtc application itself uses the basic principle of this process to set whether to request subtitles.
  • Embodiment 2 is a flow for requesting subtitle translation.
  • the flow in the second embodiment has a step after the speech analysis parses the subtitles.
  • the step is to send the parsed words to the external translation server, and the external translation server translates the subtitles.
  • Figure 4 is a diagram showing the steps of requesting translation of text subtitles in the second embodiment. among them,
  • Step 401 User A sends a subtitle request message to the signaling server of the webrtc server, and formulates a target language for translation. Assuming B uses the language as English, A wants B's subtitle to be translated into Chinese and displayed.
  • Step 402 The webrtc signaling server sends a subtitle request message to the user B, where the request message includes a translation source language, a translation target language, and a translation return type (the translation return type is assumed to be a text translation or a speech translation);
  • Step 403 after receiving the subtitle request, the user B sends its own audio to the voice analysis subtitle server;
  • Step 404 the voice analysis subtitle server parses the audio into subtitles, and returns the subtitles to the user B;
  • step 405 user B sends a subtitle request to the translation server.
  • the request contains parsed subtitles, translation source language, translation target language, translation return type;
  • Step 406a the translation server returns the translated subtitles to the user B according to the translation request
  • Step 407a user B returns the translated subtitles to the webrtc signaling server
  • Step 408a the webrtc signaling server returns the translated subtitles to the user A, and the browser of the user A displays the received subtitles of the B in the video frame of the B;
  • step 406b the translation server returns the translated subtitles and audio to the user B according to the translation request.
  • User B puts the translated audio into the real-time video stream and turns the video through the media channel.
  • the translated audio is sent to user A;
  • Step 407b user B returns the translated subtitles to the webrtc signaling server
  • step 408b the webrtc signaling server returns the translated subtitles to the user A, and the browser of the user A displays the translated subtitles of the received B in the video frame of B.
  • the external translation server selects different operational flows based on the return type parameters in the request.
  • FIG. 5 is a schematic diagram of a three-party P2P call after a media channel is established.
  • the media channel connection of the P2P has been completed in the webrtc, that is, on the basis of completing FIG. 5, the process of subtitle parsing, subtitle translation, and audio translation is added, so that the user can cross the webrtc P2P call during the three-party webrtc P2P call.
  • Language barriers, subtitle analysis, language translation, and speech translation are examples of languages that are used to translate the media channel connection of the P2P has been completed in the webrtc, that is, on the basis of completing FIG. 5, the process of subtitle parsing, subtitle translation, and audio translation is added, so that the user can cross the webrtc P2P call during the three-party webrtc P2P call.
  • Language barriers, subtitle analysis, language translation, and speech translation are examples of languages that are used to translate the media channel connection of the P2P.
  • FIG. 6 shows a flow of realizing subtitle parsing after webrtc has completed the media channel connection of P2P.
  • Pre-conditions User A, User B and User C have logged in using the WEBRTC video conferencing system and established a three-party P2P call. A media channel has been established between A, B and C. The signaling channel is still commanded by the signaling server of webrtc.
  • This embodiment assumes that A requests the subtitles of B and C.
  • Step 601 User A requests subtitles of user B and user C to the webrtc signaling server.
  • Step 602 The webrtc signaling server sends a subtitle request to the user C.
  • Step 603 The user C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
  • Step 604 The voice analysis subtitle server returns the subtitles parsed by the voice to C;
  • Step 605 User C returns real-time subtitles to the webrtc signaling server.
  • Step 606 The webrtc signaling server sends a subtitle request to the user B.
  • Step 607 User B sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
  • Step 608 The voice analysis subtitle server returns the subtitles parsed by the voice to B;
  • Step 609 User B returns real-time subtitles to the webrtc signaling server.
  • Step 610 The webrtc signaling server sends the subtitles to the user A in real time when receiving the subtitles of the users B and C, and the user A displays the subtitles in the video dialog boxes of the users B and C according to the returned result.
  • step 602 to step 605 and step 606 to step 609 can be performed simultaneously, that is, when the webrtc signaling server receives the requested subtitle, it can simultaneously initiate subtitle requests to users B and C, and users B and C are When the speech is made, the subtitles are returned to the webrtc signaling server in real time according to the situation of the speech, and the webrtc signaling server transmits the subtitles to the user A in real time upon receiving the subtitles.
  • the subtitle request can also be initiated to the webrtc signaling server.
  • the subtitle request can also be initiated to the webrtc signaling server.
  • the conference can also be set to automatically add subtitles to each user.
  • the browser-side application only needs to be tested by the user to initiate a subtitle request to the speech analysis subtitle server to obtain subtitles and then send the subtitles to the webrtc signaling server and be signaled by webrtc.
  • the server can perform subtitle distribution.
  • Step 701 User A requests translation subtitles of user B and user C to the webrtc signaling server.
  • Step 702 The webrtc signaling server sends a request to the user C to request subtitle translation
  • Step 703 The user C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
  • Step 704 The voice analysis subtitle server returns a subtitle that is parsed by the voice to C.
  • Step 705 The user C initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type.
  • the translation return type is a text translation
  • Step 706 the translation server returns the translated subtitles to the user C according to the translation request
  • Step 707 User C returns the translated subtitles to the webrtc signaling server.
  • Step 708 The webrtc signaling server sends a request to the user B to request subtitle translation
  • Step 709 User B sends its own speech audio to an external speech analysis subtitle server. Request subtitle parsing;
  • Step 710 The voice analysis subtitle server returns the subtitles parsed by the voice to B.
  • Step 711 User B initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type.
  • the translation return type is text translation.
  • Step 712 the translation server returns the translated subtitles to the user B according to the translation request
  • Step 713 User B returns the translated subtitle to the webrtc signaling server.
  • Step 714 The WEBRTC signaling server returns the translated subtitles of B and C according to the user A.
  • step 702 to step 707 and step 708 to step 613 can be performed simultaneously, that is, when the webrtc signaling server receives the requested subtitle, it can simultaneously initiate subtitle requests to users B and C, and user B and C are
  • the translated subtitles are returned to the webrtc signaling server in real time according to the situation of the speech, and the webrtc signaling server transmits the subtitles to the user A in real time upon receiving the subtitles.
  • the subtitles of B or C are displayed in real time.
  • the request only needs to be sent once, but the returned subtitle message is returned in real time according to the design of the application. That is to say, A only needs to request subtitles once.
  • B will send its own audio segment to the external speech analysis subtitle server and the external translation server during the call, and then according to The segmentation of the speech returns the subtitle or subtitle or translated audio.
  • This embodiment assumes that A requests B and C to translate audio and subtitles. Assume that the language used by A is Chinese, and the language used by User B and User C is English. User A wants to translate the conference voices of B and C in the video conference.
  • the flowchart of this embodiment is also shown in FIG. 7, and includes the following steps:
  • Step 801 User A requests translation subtitles of User B and User C to the webrtc signaling server.
  • Step 802 The webrtc signaling server sends a request to the user C to request subtitle translation
  • Step 803 User C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
  • Step 804 The voice analysis subtitle server returns the subtitles parsed by the voice to the C;
  • Step 805 The user C initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type.
  • the translation return type is text and speech translation.
  • Step 806 the translation server returns the subtitles and the translated audio to the user C according to the translation request;
  • Step 807 User C replaces the translated audio into the associated video stream.
  • the translated subtitles are returned to the webrtc signaling server;
  • Step 808 The webrtc signaling server sends a request to the user B to request subtitle translation
  • Step 809 User B sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
  • Step 810 The voice analysis subtitle server returns the subtitles parsed by the voice to B;
  • Step 811 User B initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type.
  • the translation return type is text and speech translation.
  • Step 812 The translation server returns the translated subtitles and audio to the user B according to the translation request.
  • User B puts the translated audio into the real-time video stream and sends the video and the translated audio to User A through the media channel.
  • Step 813 user B replaces the translated audio into the related video stream, and user B returns the translated subtitle to the webrtc signaling server;
  • Step 814 The webrtc signaling server returns the translated subtitles of B and C to the user A, and the browser application of the user A displays the translated subtitle of the B in the video frame of the B according to the received subtitle, and the received user C The translated subtitles are displayed in the video box of C.
  • the embodiment of the present invention provides a WebRTC point-to-point audio and video call method, which uses the webrtc technology to implement voice parsing in a video call and a video conference, and generates subtitles, subtitles, and translated audio.
  • the session members of the webrtc video conference can view the real-time subtitles of the conference spokesperson in the conference video window.
  • voice parsing and voice translation can also be completed in the webrtc point-to-point audio and video call, and the translated voice is parsed into text subtitles displayed on the user's video call window, or the translated speech is parsed into other languages. Speech and synthesis In the original video stream.
  • the translated language text can also be saved as a meeting minutes content.
  • the embodiment of the present invention can request subtitle translation or voice translation in a user who uses a different language for a call or conference, and can save the conference content as a conference minutes in a dialog text manner.
  • FIG. 8 is a schematic diagram of a WebRTC server according to an embodiment of the present invention. As shown in FIG. 8, the WebRTC server of this embodiment includes:
  • the first transmission module 801 is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;
  • the second transmission module 802 is configured to: after receiving the subtitle or the translated subtitle returned by the target WebRTC client, send the subtitle or the translated subtitle to the first WebRTC client in real time.
  • FIG. 9 is a schematic diagram of a WebRTC client according to an embodiment of the present invention.
  • the WebRTC client can be used as a requesting subtitle.
  • the WebRTC client in this embodiment includes:
  • the sending module 901 is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
  • the display module 902 is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, display the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
  • the WebRTC client further includes:
  • the saving module 903 is configured to: save the subtitle or the subtitle.
  • FIG. 10 is a schematic diagram of a WebRTC client according to an embodiment of the present invention.
  • the WebRTC client can be used as a target client.
  • the WebRTC client in this embodiment includes:
  • the first transmission module 1001 is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;
  • the second transmission module 1002 is configured to: after receiving the subtitle returned by the speech analysis subtitle server, return the subtitle to the WebRTC server.
  • the second transmission module 1002 is specifically configured to: after receiving the subtitle returned by the voice analysis subtitle server, send a subtitle request to the translation server, where The subtitle request includes: the subtitle, the translation source language, and the translation target language; after receiving the translated subtitle returned by the translation server, the translated subtitle is sent to the WebRTC server.
  • the subtitle request further includes: a translation return type, the translation return type includes a voice translation; and the WebRTC client further includes:
  • the third transmission module 1003 is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to the WebRTC client requesting the translation of the subtitle through the pre-established media channel. end.
  • the embodiment of the invention further discloses a computer program, comprising program instructions, when the program instruction is executed by the server, so that the terminal can perform any of the above-mentioned server-side webpage real-time communication WebRTC point-to-point audio and video call methods.
  • the embodiment of the invention also discloses a carrier carrying the computer program.
  • the embodiment of the invention further discloses a computer program, including program instructions, when the program instruction is executed as a client requesting a subtitle party, so that the terminal can execute any of the above-mentioned webpages of the client side as the request subtitle party in real time.
  • the embodiment of the invention also discloses a carrier carrying the computer program.
  • the embodiment of the invention further discloses a computer program, comprising program instructions, when the program instruction is executed by the target client, the terminal can execute any of the above-mentioned target client side webpage real-time communication WebRTC point-to-point audio and video call method .
  • the embodiment of the invention also discloses a carrier carrying the computer program.
  • the method for the WebRTC point-to-point audio and video call provided by the embodiment of the invention and the WebRTC server and the WebRTC client enable the user to cross the language barrier and make the call more convenient.
  • the speaker will automatically parse and display subtitles, subtitles, or translated audio. Users can easily determine who is speaking and identifying the content of the speech without having to find a speaker in multiple video windows. Therefore, the present invention has strong industrial applicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

A method for a WebRTC point-to-point audio and video call, a WebRTC server, and a WebRTC client, so as to allow a user to overcome a language obstacle and make a call more conveniently. In a multi-user video conference, a speaker automatically analyzes, displays and translates subtitles or translates audios, and the user can easily determine who is speaking and identify speech content, without searching for the speaker in multiple video windows.

Description

一种WebRTC P2P音视频通话的方法及装置Method and device for WebRTC P2P audio and video call 技术领域Technical field
本发明涉及WebRTC P2P音视频通话技术领域,特别是涉及一种WebRTC P2P音视频通话的方法及WebRTC服务器与WebRTC客户端。The invention relates to the field of WebRTC P2P audio and video call technology, in particular to a method for WebRTC P2P audio and video call and a WebRTC server and a WebRTC client.
背景技术Background technique
随着www万维网和移动互联网的发展,近几年HTML5(Hyper Text Mark-up Language5,超文本标记语言5)逐渐成为市场和标准都追捧的热点。作为一个新的网络技术的发展方向,HTML5的一个重要的核心的技术内容就是WEBRTC技术。WebRTC(Web Real-Time Communication,网页实时通信)实现了基于网页的视频会议,目的是通过浏览器提供简单的javascript就可以达到实时通讯(Real-Time Communications)能力。With the development of www World Wide Web and mobile Internet, HTML5 (Hyper Text Mark-up Language 5) has gradually become a hot spot in the market and standards. As a new development direction of network technology, an important core technical content of HTML5 is WEBRTC technology. WebRTC (Web Real-Time Communication) implements web-based video conferencing, with the goal of real-time communication (Real-Time Communications) capabilities by providing simple javascript in the browser.
WebRTC项目的最终目的主要是让Web开发者能够基于浏览器(比如Chrome、FireFox...)轻易快捷开发出丰富的实时多媒体应用,而无需下载安装任何插件,Web开发者也无需关注多媒体的数字信号处理过程,只需编写简单的Javascript程序即可实现,W3C(World Wide Web Consortium,万维网联盟)等组织负责制定Javascript(简称JS)标准API(Application Programming Interface,应用程序编程接口),另外WebRTC还希望能够建立一个多互联网浏览器间健壮的实时通信的平台,形成开发者与浏览器厂商良好的生态环境。The ultimate goal of the WebRTC project is to enable web developers to quickly and easily develop rich real-time multimedia applications based on browsers (such as Chrome, FireFox...) without having to download and install any plug-ins. Web developers do not need to pay attention to multimedia numbers. The signal processing process can be realized by simply writing a simple Javascript program. The W3C (World Wide Web Consortium, World Wide Web Consortium) and other organizations are responsible for formulating Javascript (JS) standard API (Application Programming Interface), and WebRTC also I hope to build a platform for robust real-time communication between multiple Internet browsers, forming a good ecological environment for developers and browser vendors.
WebRTC的技术已成为HTML5标准之一。并且随着WebRTC标准的日渐成熟,市场上也逐渐出现了基于WebRTC技术实现的各类应用。这些应用的特点就是使用WEB(网页)技术来开发,并且因为浏览器厂商已经逐渐支持webrtc技术,所以使用webrtc技术开发的应用也可以运行在各种支持webrtc的浏览器的PC终端或者手机终端上。这样的技术趋势使得开发难度大大降低,同时维护多终端和多版本的开发工作量也大大减少了。WebRTC technology has become one of the HTML5 standards. And with the maturity of the WebRTC standard, various applications based on WebRTC technology have emerged in the market. These applications are characterized by the use of WEB (web) technology for development, and because browser vendors have gradually supported webrtc technology, applications developed using webrtc technology can also run on various PC terminals or mobile terminals that support webrtc browsers. . Such technology trends have made development much less difficult, and the development work for maintaining multi-terminal and multi-version has been greatly reduced.
随着web技术的发展,一些使用HTML5技术开发的应用也越来越多。 WEBRTC作为HTML5标准的一个重要组成部分,实现了浏览器之间的实时通讯,以chrome为首的越来越多的浏览器厂家宣布支持webrtc标准。With the development of web technologies, some applications developed using HTML5 technology are also increasing. As an important part of the HTML5 standard, WEBRTC implements real-time communication between browsers. More and more browser manufacturers, led by chrome, have announced support for the webrtc standard.
其中。Webrtc技术和标准的典型应用场景为点对点通话、多方视频会议、客服中心和远程教育。也就是说,使用webrtc技术来开发的浏览器应用,可以实现实时通讯中的获取话筒、屏幕共享、获取摄像头、流媒体传输等功能,使得用户在浏览器中就可以进行实时通话。但是使用webrtc标准接口所开发的浏览器中的音视频多方通话的会议的效果、使用体验仍然需要进一步的改善,比如多方会议时屏幕窗口都比较小,很难判断是谁在进行发言,会议发言记录只能用录音的方式保存,而无法保存字幕,比如,当参加会议的人员使用不同的语言进行时,沟通的语言障碍需要辅助显示字幕才能更好的提升用户的体验。among them. Typical application scenarios for Webrtc technology and standards are point-to-point calls, multi-party video conferencing, customer service centers, and distance education. That is to say, the browser application developed by webrtc technology can realize the functions of acquiring microphone, sharing screen, acquiring camera, streaming media transmission, etc. in real-time communication, so that the user can perform real-time conversation in the browser. However, the effect and experience of the conference of audio and video multi-party calls in the browser developed by the webrtc standard interface still needs further improvement. For example, the screen window of the multi-party conference is relatively small, it is difficult to judge who is speaking, and the conference speaks. Records can only be saved by recording, but subtitles cannot be saved. For example, when the participants in the conference use different languages, the language barrier of communication needs to help display the subtitles to better enhance the user experience.
发明内容Summary of the invention
本发明要解决的技术问题是提供一种WebRTC点对点音视频通话的方法及WebRTC服务器与WebRTC客户端,以实现跨越语言的障碍进行通话。The technical problem to be solved by the present invention is to provide a WebRTC point-to-point audio and video call method and a WebRTC server and a WebRTC client to implement a call across language barriers.
为了解决上述技术问题,采用如下技术方案:In order to solve the above technical problems, the following technical solutions are adopted:
一种网页实时通信WebRTC点对点音视频通话的方法,包括:A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:
WebRTC服务器接收到第一WebRTC客户端的字幕请求消息或翻译字幕请求消息后,将所述字幕请求消息或翻译字幕请求消息发送给一个或多个目标WebRTC客户端;After receiving the subtitle request message or the subtitle request message of the first WebRTC client, the WebRTC server sends the subtitle request message or the subtitle request message to one or more target WebRTC clients;
所述WebRTC服务器接收到一个或多个所述目标WebRTC客户端返回的字幕或翻译后的字幕后,实时地将所述字幕或翻译后的字幕发送给所述第一WebRTC客户端。After receiving the subtitles or translated subtitles returned by the target WebRTC client, the WebRTC server sends the subtitles or the translated subtitles to the first WebRTC client in real time.
可选地,所述翻译字幕请求消息包括:翻译源语言、翻译目标语言以及翻译返回类型,所述翻译返回类型包括文字翻译和/或语音翻译。Optionally, the subtitle request message includes: a translation source language, a translation target language, and a translation return type, and the translation return type includes a text translation and/or a speech translation.
一种网页实时通信WebRTC服务器,包括:第一传输模块和第二传输模块,其中 A webpage real-time communication WebRTC server includes: a first transmission module and a second transmission module, wherein
所述第一传输模块设置成:接收到第一WebRTC客户端的字幕请求消息或翻译字幕请求消息后,将所述字幕请求消息或翻译字幕请求消息发送给一个或多个目标WebRTC客户端;The first transmission module is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;
所述第二传输模块设置成:接收到一个或多个所述目标WebRTC客户端返回的字幕或翻译后的字幕后,实时地将所述字幕或翻译后的字幕发送给所述第一WebRTC客户端。The second transmission module is configured to: after receiving the subtitle or the translated subtitle returned by the one or more target WebRTC clients, send the subtitle or the translated subtitle to the first WebRTC client in real time. end.
可选地,所述翻译字幕请求消息包括:翻译源语言、翻译目标语言以及翻译返回类型,所述翻译返回类型包括文字翻译和/或语音翻译。Optionally, the subtitle request message includes: a translation source language, a translation target language, and a translation return type, and the translation return type includes a text translation and/or a speech translation.
一种网页实时通信WebRTC点对点音视频通话的方法,包括:A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:
WebRTC客户端向WebRTC服务器发送请求一个或多个目标WebRTC客户端的字幕请求消息或翻译字幕请求消息;The WebRTC client sends a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
所述WebRTC客户端接收到所述WebRTC服务器返回的字幕或翻译字幕后,将所述字幕或翻译字幕显示在相应的目标WebRTC客户端的视频框中。After receiving the subtitles or subtitles returned by the WebRTC server, the WebRTC client displays the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
可选地,所述翻译字幕请求消息包括:翻译源语言、翻译目标语言以及翻译返回类型,所述翻译返回类型包括文字翻译和/或语音翻译。Optionally, the subtitle request message includes: a translation source language, a translation target language, and a translation return type, and the translation return type includes a text translation and/or a speech translation.
可选地,该方法还包括:Optionally, the method further includes:
所述WebRTC客户端保存所述字幕或所述翻译字幕。The WebRTC client saves the subtitle or the translated subtitle.
一种WebRTC客户端,包括发送模块和显示模块,其中A WebRTC client, including a sending module and a display module, wherein
所述发送模块设置成:向WebRTC服务器发送请求一个或多个目标WebRTC客户端的字幕请求消息或翻译字幕请求消息;The sending module is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
所述显示模块设置成:接收到所述WebRTC服务器返回的字幕或翻译字幕后,将所述字幕或翻译字幕显示在相应的目标WebRTC客户端的视频框中。The display module is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, displaying the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
可选地,该客户端还包括保存模块,其中Optionally, the client further includes a save module, wherein
所述保存模块设置成:保存所述字幕或所述翻译字幕。 The saving module is configured to: save the subtitle or the translated subtitle.
一种网页实时通信WebRTC点对点音视频通话的方法,包括:A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:
WebRTC客户端接收到WebRTC服务器的字幕请求消息后,将自己的音频发送给语音分析字幕服务器;After receiving the subtitle request message of the WebRTC server, the WebRTC client sends its own audio to the voice analysis subtitle server;
所述WebRTC客户端接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器。The WebRTC client returns the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
可选地,所述WebRTC客户端接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器的步骤包括:Optionally, the step of the WebRTC client receiving the subtitle returned by the voice analysis subtitle server and returning the subtitle to the WebRTC server includes:
所述WebRTC客户端接收到所述语音分析字幕服务器返回的字幕后,向翻译服务器发送翻译字幕请求,所述翻译字幕请求包括:所述字幕、翻译源语言、翻译目标语言;After receiving the subtitle returned by the voice analysis subtitle server, the WebRTC client sends a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;
所述WebRTC客户端接收到所述翻译服务器返回的翻译后的字幕后,将翻译后的字幕发送给所述WebRTC服务器。After receiving the translated subtitle returned by the translation server, the WebRTC client sends the translated subtitle to the WebRTC server.
可选地,所述翻译字幕请求还包括:翻译返回类型,所述翻译返回类型包括语音翻译;Optionally, the subtitle request further includes: a translation return type, where the translation return type includes a voice translation;
所述方法还包括:所述WebRTC客户端接收到所述翻译服务器返回的翻译后的音频后,将翻译后的音频放到实时的视频流中,通过预先建立的媒体通道发送给请求翻译字幕的WebRTC客户端。The method further includes: after receiving the translated audio returned by the translation server, the WebRTC client puts the translated audio into a real-time video stream, and sends the requested subtitle through a pre-established media channel. WebRTC client.
一种WebRTC客户端,包括:第一传输模块和第二传输模块,其中A WebRTC client includes: a first transmission module and a second transmission module, wherein
所述第一传输模块设置成:接收到WebRTC服务器的翻译字幕请求消息后,将自己的音频发送给语音分析字幕服务器;The first transmission module is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;
所述第二传输模块设置成:接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器。The second transmission module is configured to: return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
可选地,所述第二传输模块设置成按照如下方式接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器:Optionally, the second transmission module is configured to return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server as follows:
接收到所述语音分析字幕服务器返回的字幕后,向翻译服务器发送翻译字幕请求,所述翻译字幕请求包括:所述字幕、翻译源语言、翻译目标语言; After receiving the subtitle returned by the voice analysis subtitle server, sending a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;
接收到所述翻译服务器返回的翻译后的字幕后,将翻译后的字幕发送给所述WebRTC服务器。After receiving the translated subtitles returned by the translation server, the translated subtitles are sent to the WebRTC server.
可选地,Optionally,
所述翻译字幕请求还包括:翻译返回类型,所述翻译返回类型包括语音翻译;The subtitle request further includes: a translation return type, the translation return type including a voice translation;
所述WebRTC客户端还包括第三传输模块,其中The WebRTC client further includes a third transmission module, wherein
所述第三传输模块设置成:接收到所述翻译服务器返回的翻译后的音频后,将翻译后的音频放到实时的视频流中,通过预先建立的媒体通道发送给请求翻译字幕的WebRTC客户端。The third transmission module is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to a WebRTC client requesting subtitles through a pre-established media channel. end.
综上,本发明实施例提供的一种WebRTC点对点音视频通话的方法及WebRTC服务器与WebRTC客户端,使用户可以跨越语言的障碍,更方便的进行通话。在多人视频会议中,发言人将自动解析和显示字幕、翻译字幕或翻译音频,用户可以轻松判断谁正在发言和识别发言内容,而不需要在多个视频窗口中寻找发言人。In summary, the method for WebRTC point-to-point audio and video call provided by the embodiment of the present invention and the WebRTC server and the WebRTC client enable the user to cross the language barrier and make the call more convenient. In a multi-person video conference, the speaker will automatically parse and display subtitles, subtitles, or translated audio. Users can easily determine who is speaking and identifying the content of the speech without having to find a speaker in multiple video windows.
附图概述BRIEF abstract
图1为相关技术的webrtc服务器的功能模块结构图;1 is a functional block diagram of a related art webrtc server;
图2为相关技术的使用webrtc技术建立双方通话的流程图;2 is a flow chart of a related art using a webrtc technology to establish a call between two parties;
图3为本发明实施例一的webrtc建立P2P(Peer to Peer,点对点)双方通话时请求字幕的流程图;FIG. 3 is a flowchart of requesting subtitles when a webrtc establishes a P2P (Peer to Peer) call between two parties according to the first embodiment of the present invention;
图4为本发明实施例二的webrtc建立P2P双方通话时请求翻译字幕的流程图;4 is a flowchart of requesting translation of subtitles when a webrtc establishes a P2P two-party call according to a second embodiment of the present invention;
图5为webrtc建立P2P三方会议时已经建立P2P媒体通道的示意图;FIG. 5 is a schematic diagram of a P2P media channel established when webrtc establishes a P2P three-party conference;
图6为本发明实施例三的webrtc建立P2P三方会议时请求字幕的的流程图;6 is a flowchart of requesting subtitles when a webrtc establishes a P2P three-party conference according to Embodiment 3 of the present invention;
图7为本发明实施例四的webrtc建立P2P三方会议时请求翻译字幕/翻 译音频的流程图;FIG. 7 is a schematic diagram of requesting subtitle/turning when establishing a P2P three-party conference by webrtc according to Embodiment 4 of the present invention; Translated audio flow chart;
图8为本发明实施例的WebRTC服务器的示意图;FIG. 8 is a schematic diagram of a WebRTC server according to an embodiment of the present invention; FIG.
图9为本发明实施例的作为请求字幕方的WebRTC客户端的示意图;FIG. 9 is a schematic diagram of a WebRTC client as a requesting subtitle side according to an embodiment of the present invention; FIG.
图10为本发明实施例的目标WebRTC客户端的示意图。FIG. 10 is a schematic diagram of a target WebRTC client according to an embodiment of the present invention.
本发明的较佳实施方式Preferred embodiment of the invention
下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
图1是相关技术的webrtc服务器的功能模块结构图。该Webrtc server包括:1 is a functional block diagram of a related art webrtc server. The Webrtc server includes:
web server:用于提供webrtc的web服务,用户在浏览器app(应用程序)客户端访问该webserver来获取webrtc服务。Web server: A web service for providing webrtc. The user accesses the webserver in a browser app (application) client to obtain a webrtc service.
用户通过浏览器访问webrtc server的web server功能模块打开应用,该web server上部署的服务遵守webrtc的相关标准,用户在浏览器中可以通过webrtc标准的JS来进行注册、建立音频通话、建立多方视频通话等功能。Web server上还可以包括标准之外的应用管理相关功能,比如用户信息维护、好友管理。The user accesses the web server function module of the webrtc server through the browser to open the application. The service deployed on the web server complies with the relevant standards of the webrtc, and the user can register, establish an audio call, and establish a multi-party video through the webrtc standard JS in the browser. Call and other functions. The web server can also include application management related functions other than the standard, such as user information maintenance and friend management.
信令服务器:用于在webrtc建立连接时进行信令交互。Signaling server: used for signaling interaction when webrtc establishes a connection.
媒体处理模块:用于处理媒体,包括实时媒体流的切分分段发送到外部字幕服务器,翻译服务器,接收到返回的字幕或音频时将字幕或音频整合到实时的对话的音视频流中。Media processing module: for processing media, including segmentation segmentation of the real-time media stream, sending to the external subtitle server, the translation server, and integrating the subtitle or audio into the audio and video stream of the real-time conversation when receiving the returned subtitle or audio.
会议控制模块:用户控制webrtc会议中的会议,包括创建会议、退出会议、加入会议成员、会议主持人控制等控制。Conference Control Module: The user controls the conference in the webrtc conference, including the control of creating a conference, exiting the conference, joining the conference member, and controlling the conference host.
防火墙穿越服务器,用于webrtc音视频会议/音视频通话的防火墙穿越。The firewall traverses the server and is used for firewall traversal of webrtc audio and video conferences/audio and video calls.
防火墙穿越功能模块使得webrtc浏览器侧的应用开发者可以使用标准接 口来获取防火墙穿越信息,该功能模块可以部署在webrtc server上,也可以部署在其他地方。The firewall traversal function module enables application developers on the webrtc browser side to use standard interfaces. The port is used to obtain firewall traversal information. The function module can be deployed on the webrtc server or deployed elsewhere.
Webrtc client指的是用户通过浏览器访问的地址,上面部署的浏览器侧的应用,用户通过webrtc client来访问webrtc服务器上的web server。The Webrtc client refers to the address that the user accesses through the browser, and the application on the browser side deployed above, and the user accesses the web server on the webrtc server through the webrtc client.
在webrtc的应用中,无论浏览器的client端Javascript代码还是web server上的服务器代码,在使用webrtc标准建立音视频通讯的过程都需要符合webrtc的标准。In the webrtc application, regardless of the client-side Javascript code of the browser or the server code on the web server, the process of establishing audio and video communication using the webrtc standard needs to conform to the webrtc standard.
在应用侧,使用javascript代码来控制浏览器访问webrtc server上的webrtc服务,是webrtc技术的典型特征。这种技术特征使得浏览器承担了更多的工作,浏览器厂家须提供必要的功能来支持webrtc技术,使得运行在浏览器中的javascript代码可以通过浏览器统一的标准来调用在视频通话中必要的信令和媒体交互。这一技术使得开发者提供的浏览器服务大大的简化了,屏蔽了底层的媒体和信令,只需要调用简单的javascript即可实现。所以在现在和未来,webrtc技术都将是一个趋势。随着移动终端的发展,也会有越来越多浏览器、手机浏览器、手机webkit支持webrtc,这一趋势使得应用的开发更加的简便,并且可以更方便的适应多终端。On the application side, using javascript code to control browser access to the webrtc service on the webrtc server is a typical feature of webrtc technology. This technical feature makes the browser take on more work, and the browser vendor must provide the necessary functions to support the webrtc technology, so that the javascript code running in the browser can be called in the video call through the unified standard of the browser. Signaling and media interaction. This technology greatly simplifies the browser services provided by developers, shielding the underlying media and signaling, and only needs to call simple javascript. So webrtc technology will be a trend now and in the future. With the development of mobile terminals, more and more browsers, mobile browsers, and mobile webkits support webrtc. This trend makes application development easier and more convenient for multi-terminals.
使用该装置实现的webrtc P2P音视频会议/音视频通话使得用户可以使用多语言实时进行通话/会议,实现了音视频流的实时同步翻译字幕或直接翻译为语音。使得用户在使用该装置进行通话/会议时可以跨越语言障碍,更方便的进行交流。The webrtc P2P audio and video conference/audio and video call realized by the device enables the user to conduct a call/meeting in real time using multiple languages, realizing synchronous subtitle translation of audio and video streams or directly translating into voice. It enables users to cross language barriers and communicate more conveniently when using the device for calls/conferences.
该webrtc P2P音视频会议/音视频通话应用主要有如下几个功能特点:1,音视频会议/音视频通话的用户可以在音视频会议/音视频通话实时查看对方语音的字幕;2,音视频会议/音视频通话的用户可以选择翻译目标语言,系统将对方语言翻译为自己可以理解的语言,并显示翻译后的字幕;3,音视频会议/音视频通话的用户可以选择翻译目标语言,系统将对方语言翻译为目标语言,并显示翻译后的字幕的同时播放翻译后的语言的语音。The webrtc P2P audio and video conference/audio and video call application mainly has the following features: 1. The user of the audio/video conference/audio video call can view the subtitle of the other party's voice in real time in the audio and video conference/audio and video call; 2, audio and video Users of conference/audio video calls can choose to translate the target language, the system translates the other language into a language that they can understand, and displays the translated subtitles; 3. Users of audio and video conferences/audio and video calls can choose to translate the target language, the system Translate the language of the other party into the target language and display the translated subtitles while playing the spoken language.
图2是使用webrtc技术实现点对点呼叫的流程图。这个流程图中涉及了webrtc点对点呼叫时webrtc server中各个功能模块的核心功能。在流程图中使用用户A来表示用户A的浏览器和用户的client(客户端)应用。Client应 用实际上是部署在webrtc server上的web server功能模块提供的web服务,用户A通过浏览器打开一个地址来打开这个应用。对这个流程进行详细的说明,如图2所示,包括以下步骤:Figure 2 is a flow diagram of implementing a point-to-point call using webrtc technology. This flowchart covers the core functions of the various functional modules in the webrtc server during webrtc point-to-point calls. User A is used in the flowchart to represent User A's browser and the user's client application. Client should With the web service provided by the web server function module deployed on the webrtc server, user A opens an address through the browser to open the application. A detailed description of this process, as shown in Figure 2, includes the following steps:
步骤201、用户A向防火墙穿越服务器请求防火墙穿越信息,防火墙将用于穿越的信息返回给用户A;Step 201: User A requests firewall traversal information to the firewall traversing server, and the firewall returns information for traversing to user A;
步骤202、用户A向webrtc server中的信令服务器发送媒体呼叫请求;Step 202: User A sends a media call request to a signaling server in the webrtc server.
步骤203、信令服务器向用户B发送A的媒体呼叫请求;Step 203: The signaling server sends a media call request of A to user B.
步骤204、用户B向防火墙穿越服务器发送请求防火墙穿越信息,防火墙将用于穿越的信息返回给用户B;Step 204: User B sends a request for firewall traversal information to the firewall traversing server, and the firewall returns information for traversing to user B.
步骤205、用户B向信令服务器发送应答;Step 205: User B sends a response to the signaling server.
步骤206、用户A和用户B之间的媒体连接建立完成,A和B可以通过该媒体链接进行点对点通话。Step 206: The media connection between the user A and the user B is established, and the A and B can make a point-to-point call through the media link.
以上步骤是使用webrtc的协议在浏览器中进行点对点呼叫的流程。该流程也是现有的webrtc实现点对点呼叫使用的一个典型流程。The above steps are the process of making a point-to-point call in the browser using the webrtc protocol. This process is also a typical process used by existing webrtcs to implement point-to-point calls.
本发明实施例对相关技术的webrtcP2P视频通话的流程的改进主要是,在双方的P2P媒体通道或数据通道建立完毕之后,这一过程是webrtc建立媒体通道的标准流程,是本发明实施例的前置条件。在通话方建立了P2P的媒体通道后,仍可通过webrtc server的信令服务器来请求字幕或请求翻译字幕,是本发明的发明点所在。The improvement of the related procedure of the webrtcP2P video call in the embodiment of the present invention is mainly after the establishment of the P2P media channel or the data channel of the two parties, the process is a standard process for the webrtc to establish a media channel, which is before the embodiment of the present invention. Set the condition. After the P2P media channel is established by the party, the subtitle or request for subtitles can still be requested through the signaling server of the webrtc server, which is the invention of the present invention.
本发明实施例提供一种WebRTC点对点音视频通话的方法及WebRTC服务器与WebRTC客户端,使用户可以跨越语言的障碍,更方便的进行通话。在多人视频会议中,发言人将自动解析和显示字幕,用户可以轻松判断谁正在发言,而不需要在多个视频窗口中寻找发言人。并且,当语言不通的时候,这种系统架构也提供了完整的多语言字幕翻译和语音翻译的功能。字幕翻译指的是,对某个正在发言的用户进行语音分析形成文本后,根据实时的发言文本将字幕翻译为请求翻译的语言。语音翻译指的是对某个正在发言的用户进行语音分析形成文本后,根据实时的发言文本将字幕翻译为请求翻译的语 言的相应字幕,并将该字幕转化为请求翻译的语言的相应的音频播放出来。The embodiment of the invention provides a WebRTC point-to-point audio and video call method and a WebRTC server and a WebRTC client, so that the user can cross the language barrier and make the call more convenient. In a multi-person video conference, the speaker will automatically parse and display the subtitles, and the user can easily determine who is speaking without having to find a speaker in multiple video windows. And, when the language is not available, this system architecture also provides full multi-language subtitle translation and speech translation. Subtitle translation refers to the translation of subtitles into the language of the requested translation based on the real-time speech text after speech analysis is performed on a user who is speaking. Voice translation refers to the speech analysis of a user who is speaking to form a text, and then translate the subtitle into the language of the request translation according to the real-time speech text. The corresponding subtitles of the words, and the subtitles are converted into corresponding audios of the language requesting translation.
本发明实施例的方法能够将发言的会议成员的语音进行语音解析,形成文本并显示字幕,进一步的,也可以对解析出来的文本进行翻译,显示翻译目标语言的字幕,进一步的,也可以对翻译目标语言的文本进行语音转换,将转换后的音频流合成到视频流中,直接播放翻译目标语言的语音。The method of the embodiment of the present invention can perform speech analysis on the speech of the speaking member, form a text and display the subtitle. Further, the parsed text can be translated to display the subtitle of the translation target language, and further, Translating the text of the target language for voice conversion, synthesizing the converted audio stream into the video stream, and directly playing the voice of the translation target language.
对于字幕和字幕翻译,有三种典型的应用场景,1,用户A请求用户B的字幕,2,用户A请求用户B的翻译字幕,3,用户A请求用户B的翻译语音。For subtitle and subtitle translation, there are three typical application scenarios: 1. User A requests user B's subtitle, 2, User A requests User B's subtitle, and User A requests User B's translation voice.
下面的实施例将对这几种应用场景进行详细的描述。The following embodiments will describe these application scenarios in detail.
实施例1Example 1
图3是webrtc双方P2P通话时请求字幕的操作图。假设用户A和用户B已经按照图2的流程或者WEBRTC应用本身的流程建立了媒体通道,已经可以使用媒体通道进行正常的P2P视频通话了。本实施例描述了P2P视频通话过程中用户A请求用户B的字幕的流程图。FIG. 3 is an operation diagram of requesting subtitles during a P2P call between webrtc. It is assumed that user A and user B have established media channels according to the process of FIG. 2 or the WEBRTC application itself, and the media channel can be used for normal P2P video calls. This embodiment describes a flow chart of user A requesting subtitles of user B during a P2P video call.
步骤301,用户A向webrtc server的信令服务器发送字幕请求消息;Step 301: User A sends a subtitle request message to a signaling server of the webrtc server.
步骤302,webrtc信令服务器向用户B发送字幕请求消息;Step 302: The webrtc signaling server sends a subtitle request message to the user B.
步骤303,用户B收到字幕请求后,将自己的音频发送给语音分析字幕服务器;Step 303, after receiving the subtitle request, the user B sends its own audio to the voice analysis subtitle server;
步骤304,语音分析字幕服务器将音频解析为字幕,将字幕返回给用户B;Step 304, the voice analysis subtitle server parses the audio into subtitles, and returns the subtitles to the user B;
步骤305,用户B将字幕返回给webrtc信令服务器;Step 305, user B returns the subtitle to the webrtc signaling server;
步骤306,webrtc信令服务器将字幕返回给用户A,用户A的浏览器将收到的B的字幕显示在B的视频框中。Step 306, the webrtc signaling server returns the subtitle to the user A, and the browser of the user A displays the received subtitle of B in the video frame of B.
其中,语音分析字幕服务器为外部服务器,不是本发明的发明内容。语音分析字幕服务器的主要功能是根据音频实时进行分析,将语音解析为字幕后返回。在本实施例中,用户的浏览器侧client须将视频流中的音频部分实时分段发送给语音分析字幕服务器来实时解析语音,音频分段发送的规则由浏览器侧的client根据用户习惯和语音停顿来决定。 The voice analysis subtitle server is an external server, which is not the inventive content of the present invention. The main function of the speech analysis subtitle server is to analyze the audio in real time, and parse the speech into subtitles and return. In this embodiment, the browser side client of the user must send the audio part of the video stream to the voice analysis subtitle server in real time to parse the voice in real time, and the rules of the audio segmentation are sent by the client on the browser side according to the user's habits and The voice pauses to decide.
本实施例的流程是用户A请求用户B的字幕的流程,同样的,B也可以同时请求A的字幕。流程相同。对于双方视频通话时默认为都需要显示字幕的情形,只需要webrtc应用本身使用本流程的基本原理来设置是否请求字幕即可。The flow of this embodiment is the flow of the user A requesting the subtitle of the user B. Similarly, B can also request the subtitle of A at the same time. The process is the same. For both video calls, the default is to display subtitles. Only the webrtc application itself uses the basic principle of this process to set whether to request subtitles.
实施例2为请求翻译字幕的流程。与实施例1相比,实施例2中的流程在语音分析解析出字幕后多了一个步骤,该步骤就是将解析出来的字每句发给外部翻译服务器,由外部翻译服务器对字幕进行翻译并返回文字翻译字幕或者翻译后的语言的语音音频。图4就是实施例2请求翻译文字字幕的步骤图。其中,Embodiment 2 is a flow for requesting subtitle translation. Compared with the first embodiment, the flow in the second embodiment has a step after the speech analysis parses the subtitles. The step is to send the parsed words to the external translation server, and the external translation server translates the subtitles. Returns text-to-speech or voice-audio in a translated language. Figure 4 is a diagram showing the steps of requesting translation of text subtitles in the second embodiment. among them,
步骤401,用户A向webrtc server的信令服务器发送翻译字幕请求消息,并制定翻译的目标语言,假设B使用语言为英语,A希望B的字幕被翻译为中文并显示出来;Step 401: User A sends a subtitle request message to the signaling server of the webrtc server, and formulates a target language for translation. Assuming B uses the language as English, A wants B's subtitle to be translated into Chinese and displayed.
步骤402,webrtc信令服务器向用户B发送字幕请求消息,该请求消息包含翻译源语言、翻译目标语言、翻译返回类型(翻译返回类型假设为文字翻译或语音翻译);Step 402: The webrtc signaling server sends a subtitle request message to the user B, where the request message includes a translation source language, a translation target language, and a translation return type (the translation return type is assumed to be a text translation or a speech translation);
步骤403,用户B收到字幕请求后,将自己的音频发送给语音分析字幕服务器;Step 403, after receiving the subtitle request, the user B sends its own audio to the voice analysis subtitle server;
步骤404,语音分析字幕服务器将音频解析为字幕,将字幕返回给用户B;Step 404, the voice analysis subtitle server parses the audio into subtitles, and returns the subtitles to the user B;
步骤405,用户B发送翻译字幕请求到翻译服务器。该请求包含了解析后的字幕,翻译源语言,翻译目标语言,翻译返回类型;In step 405, user B sends a subtitle request to the translation server. The request contains parsed subtitles, translation source language, translation target language, translation return type;
假设翻译请求的参数翻译返回类型设置为文字翻译,那么执行以下步骤:Assuming the parameter translation return type of the translation request is set to literal translation, perform the following steps:
步骤406a,翻译服务器根据翻译请求,将翻译字幕返回给用户B;Step 406a, the translation server returns the translated subtitles to the user B according to the translation request;
步骤407a,用户B将翻译字幕返回给webrtc信令服务器;Step 407a, user B returns the translated subtitles to the webrtc signaling server;
步骤408a,webrtc信令服务器将翻译字幕返回给用户A,用户A的浏览器将收到的B的字幕显示在B的视频框中;Step 408a, the webrtc signaling server returns the translated subtitles to the user A, and the browser of the user A displays the received subtitles of the B in the video frame of the B;
假设翻译请求的参数翻译返回类型设置为语音翻译,那么执行以下步骤:Assuming the parameter translation return type of the translation request is set to speech translation, perform the following steps:
步骤406b,翻译服务器根据翻译请求,将翻译后的字幕和音频返回给用户B。用户B将翻译后的音频放到实时的视频流中通过媒体通道将视频和翻 译后的音频发送给用户A;In step 406b, the translation server returns the translated subtitles and audio to the user B according to the translation request. User B puts the translated audio into the real-time video stream and turns the video through the media channel. The translated audio is sent to user A;
步骤407b,用户B将翻译字幕返回给webrtc信令服务器;Step 407b, user B returns the translated subtitles to the webrtc signaling server;
步骤408b,webrtc信令服务器将翻译字幕返回给用户A,用户A的浏览器将收到的B的翻译字幕显示在B的视频框中。In step 408b, the webrtc signaling server returns the translated subtitles to the user A, and the browser of the user A displays the translated subtitles of the received B in the video frame of B.
对于不同的翻译类型的请求,外部的翻译服务器会根据请求中的返回类型参数而选择不同的操作流程。For requests of different translation types, the external translation server selects different operational flows based on the return type parameters in the request.
图5是三方P2P通话建立了媒体通道之后的示意图。本发明实施例在webrtc已经完成了P2P的媒体通道连接,也就是在完成了图5的基础上,增加了字幕解析、翻译字幕、翻译音频的流程,使得用户在三方webrtc P2P通话的时候可以跨越语言的障碍,实现字幕解析、语言翻译、语音翻译。FIG. 5 is a schematic diagram of a three-party P2P call after a media channel is established. In the embodiment of the present invention, the media channel connection of the P2P has been completed in the webrtc, that is, on the basis of completing FIG. 5, the process of subtitle parsing, subtitle translation, and audio translation is added, so that the user can cross the webrtc P2P call during the three-party webrtc P2P call. Language barriers, subtitle analysis, language translation, and speech translation.
实施例3,图6显示了webrtc已经完成了P2P的媒体通道连接之后实现字幕解析的流程。Embodiment 3, FIG. 6 shows a flow of realizing subtitle parsing after webrtc has completed the media channel connection of P2P.
前置条件:用户A,用户B和用户C已经使用WEBRTC视频会议系统进行了登陆并建立了三方P2P通话,A、B和C之间已经建立了媒体通道。信令通道仍然通过和webrtc的信令服务器来进行命令操作。Pre-conditions: User A, User B and User C have logged in using the WEBRTC video conferencing system and established a three-party P2P call. A media channel has been established between A, B and C. The signaling channel is still commanded by the signaling server of webrtc.
本实施例假设A请求B和C的发言字幕。This embodiment assumes that A requests the subtitles of B and C.
步骤601、用户A向webrtc信令服务器请求用户B和用户C的字幕;Step 601: User A requests subtitles of user B and user C to the webrtc signaling server.
步骤602、webrtc信令服务器向用户C发出字幕请求;Step 602: The webrtc signaling server sends a subtitle request to the user C.
步骤603、用户C向外部的语音分析字幕服务器发送自己的发言音频,请求字幕解析;Step 603: The user C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
步骤604、语音分析字幕服务器向C返回语音解析出来的字幕;Step 604: The voice analysis subtitle server returns the subtitles parsed by the voice to C;
步骤605、用户C向webrtc信令服务器返回实时字幕;Step 605: User C returns real-time subtitles to the webrtc signaling server.
步骤606、webrtc信令服务器向用户B发出字幕请求;Step 606: The webrtc signaling server sends a subtitle request to the user B.
步骤607、用户B向外部的语音分析字幕服务器发送自己的发言音频,请求字幕解析;Step 607: User B sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
步骤608、语音分析字幕服务器向B返回语音解析出来的字幕;Step 608: The voice analysis subtitle server returns the subtitles parsed by the voice to B;
步骤609、用户B向webrtc信令服务器返回实时字幕; Step 609: User B returns real-time subtitles to the webrtc signaling server.
步骤610、webrtc信令服务器在接受到用户B和C的字幕时将实时将字幕发送给用户A,用户A根据返回结果将字幕显示在用户B和C的视频对话框中。Step 610: The webrtc signaling server sends the subtitles to the user A in real time when receiving the subtitles of the users B and C, and the user A displays the subtitles in the video dialog boxes of the users B and C according to the returned result.
对于以上流程,步骤602~步骤605和步骤606~步骤609可以同时进行,也就是说,当webrtc信令服务器收到请求字幕的时候可以同时向用户B和C发起字幕请求,用户B和C在进行发言时根据发言的情形实时的将字幕返回给webrtc信令服务器,webrtc信令服务器收到字幕就实时将字幕发送给用户A。For the above process, step 602 to step 605 and step 606 to step 609 can be performed simultaneously, that is, when the webrtc signaling server receives the requested subtitle, it can simultaneously initiate subtitle requests to users B and C, and users B and C are When the speech is made, the subtitles are returned to the webrtc signaling server in real time according to the situation of the speech, and the webrtc signaling server transmits the subtitles to the user A in real time upon receiving the subtitles.
同理,当用户B需要请求字幕时也可以向webrtc信令服务器发起字幕请求,当用户C需要请求字幕时也可以向webrtc信令服务器发起字幕请求。Similarly, when the user B needs to request the subtitle, the subtitle request can also be initiated to the webrtc signaling server. When the user C needs to request the subtitle, the subtitle request can also be initiated to the webrtc signaling server.
会议也可以设置为自动为每个用户添加字幕,这种情形下,只需要用户测的浏览器端应用向语音分析字幕服务器发起字幕请求获取到字幕后发给webrtc信令服务器并由webrtc信令服务器进行字幕分发即可。The conference can also be set to automatically add subtitles to each user. In this case, the browser-side application only needs to be tested by the user to initiate a subtitle request to the speech analysis subtitle server to obtain subtitles and then send the subtitles to the webrtc signaling server and be signaled by webrtc. The server can perform subtitle distribution.
实施例4:Example 4:
本实施例假设用户A请求B和C的翻译字幕。This embodiment assumes that User A requests B and C subtitles.
步骤701、用户A向webrtc信令服务器请求用户B和用户C的翻译字幕;Step 701: User A requests translation subtitles of user B and user C to the webrtc signaling server.
步骤702、webrtc信令服务器向用户C发出请求翻译字幕的请求;Step 702: The webrtc signaling server sends a request to the user C to request subtitle translation;
步骤703、用户C向外部的语音分析字幕服务器发送自己的发言音频,请求字幕解析;Step 703: The user C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
步骤704、语音分析字幕服务器向C返回语音解析出来的字幕;Step 704: The voice analysis subtitle server returns a subtitle that is parsed by the voice to C.
步骤705、用户C向外部功能模块翻译服务器发起翻译字幕请求,该请求包含了解析后的字幕、翻译源语言、翻译目标语言、翻译返回类型。本实施例中假设翻译返回类型为文字翻译;Step 705: The user C initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is a text translation;
步骤706、翻译服务器根据翻译请求,将翻译字幕返回给用户C;Step 706, the translation server returns the translated subtitles to the user C according to the translation request;
步骤707、用户C将翻译字幕返回给webrtc信令服务器;Step 707: User C returns the translated subtitles to the webrtc signaling server.
步骤708、webrtc信令服务器向用户B发出请求翻译字幕的请求;Step 708: The webrtc signaling server sends a request to the user B to request subtitle translation;
步骤709、用户B向外部的语音分析字幕服务器发送自己的发言音频, 请求字幕解析;Step 709: User B sends its own speech audio to an external speech analysis subtitle server. Request subtitle parsing;
步骤710、语音分析字幕服务器向B返回语音解析出来的字幕;Step 710: The voice analysis subtitle server returns the subtitles parsed by the voice to B.
步骤711、用户B向外部功能模块翻译服务器发起翻译字幕请求,该请求包含了解析后的字幕、翻译源语言、翻译目标语言、翻译返回类型。本实施例中假设翻译返回类型为文字翻译。Step 711: User B initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is text translation.
步骤712、翻译服务器根据翻译请求,将翻译字幕返回给用户B;Step 712, the translation server returns the translated subtitles to the user B according to the translation request;
步骤713、用户B将翻译字幕返回给webrtc信令服务器;Step 713: User B returns the translated subtitle to the webrtc signaling server.
步骤714、WEBRTC信令服务器根据向用户A返回B和C的翻译字幕。Step 714: The WEBRTC signaling server returns the translated subtitles of B and C according to the user A.
对于以上流程,步骤702~步骤707和步骤708~步骤613可以同时进行,也就是说,当webrtc信令服务器收到请求字幕的时候可以同时向用户B和C发起字幕请求,用户B和C在进行发言时根据发言的情形实时的将翻译字幕返回给webrtc信令服务器,webrtc信令服务器收到字幕就实时将字幕发送给用户A。A收到后实时的显示B或C的字幕。For the above process, step 702 to step 707 and step 708 to step 613 can be performed simultaneously, that is, when the webrtc signaling server receives the requested subtitle, it can simultaneously initiate subtitle requests to users B and C, and user B and C are When the speech is made, the translated subtitles are returned to the webrtc signaling server in real time according to the situation of the speech, and the webrtc signaling server transmits the subtitles to the user A in real time upon receiving the subtitles. After receiving A, the subtitles of B or C are displayed in real time.
对于请求字幕的流程来说,请求只需要发送一次,但是,返回的字幕消息则实时的根据应用的设计来进行返回。也就是说,A只需要请求一次字幕,作为用户B,收到A的请求后,B会在通话过程中将自己的音频分段发送给外部的语音分析字幕服务器和外部的翻译服务器,然后根据发言情况分段将字幕或翻译字幕或翻译音频返回。For the process of requesting subtitles, the request only needs to be sent once, but the returned subtitle message is returned in real time according to the design of the application. That is to say, A only needs to request subtitles once. As user B, after receiving the request of A, B will send its own audio segment to the external speech analysis subtitle server and the external translation server during the call, and then according to The segmentation of the speech returns the subtitle or subtitle or translated audio.
实施例5:Example 5:
本实施例假设A请求B和C的翻译音频及字幕。假设A使用的语言是中文,用户B和用户C使用的语言是英文,用户A希望在视频会议中对B和C的会议语音进行翻译。本实施例的流程图也如图7所示,包括以下步骤:This embodiment assumes that A requests B and C to translate audio and subtitles. Assume that the language used by A is Chinese, and the language used by User B and User C is English. User A wants to translate the conference voices of B and C in the video conference. The flowchart of this embodiment is also shown in FIG. 7, and includes the following steps:
步骤801、用户A向webrtc信令服务器请求用户B和用户C的翻译字幕。Step 801: User A requests translation subtitles of User B and User C to the webrtc signaling server.
步骤802、webrtc信令服务器向用户C发出请求翻译字幕的请求;Step 802: The webrtc signaling server sends a request to the user C to request subtitle translation;
步骤803、用户C向外部的语音分析字幕服务器发送自己的发言音频,请求字幕解析;Step 803: User C sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
步骤804、语音分析字幕服务器向C返回语音解析出来的字幕; Step 804: The voice analysis subtitle server returns the subtitles parsed by the voice to the C;
步骤805、用户C向外部功能模块翻译服务器发起翻译字幕请求,该请求包含了解析后的字幕、翻译源语言、翻译目标语言、翻译返回类型。本实施例中假设翻译返回类型为文字及语音翻译。Step 805: The user C initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is text and speech translation.
步骤806、翻译服务器根据翻译请求,将翻译字幕和翻译音频返回给用户C;Step 806, the translation server returns the subtitles and the translated audio to the user C according to the translation request;
步骤807、用户C将翻译音频替换到相关的视频流中。同时将翻译字幕返回给webrtc信令服务器;Step 807: User C replaces the translated audio into the associated video stream. At the same time, the translated subtitles are returned to the webrtc signaling server;
步骤808、webrtc信令服务器向用户B发出请求翻译字幕的请求;Step 808: The webrtc signaling server sends a request to the user B to request subtitle translation;
步骤809、用户B向外部的语音分析字幕服务器发送自己的发言音频,请求字幕解析;Step 809: User B sends its own speech audio to the external speech analysis subtitle server, and requests subtitle analysis.
步骤810、语音分析字幕服务器向B返回语音解析出来的字幕;Step 810: The voice analysis subtitle server returns the subtitles parsed by the voice to B;
步骤811、用户B向外部功能模块翻译服务器发起翻译字幕请求,该请求包含了解析后的字幕、翻译源语言、翻译目标语言、翻译返回类型。本实施例中假设翻译返回类型为文字及语音翻译。Step 811: User B initiates a subtitle request to the external function module translation server, where the request includes the parsed subtitle, the translation source language, the translation target language, and the translation return type. In this embodiment, it is assumed that the translation return type is text and speech translation.
步骤812、翻译服务器根据翻译请求,将翻译后的字幕和音频返回给用户B。用户B将翻译后的音频放到实时的视频流中通过媒体通道将视频和翻译后的音频发送给用户A。Step 812: The translation server returns the translated subtitles and audio to the user B according to the translation request. User B puts the translated audio into the real-time video stream and sends the video and the translated audio to User A through the media channel.
步骤813、用户B将翻译音频替换到相关的视频流中,用户B将翻译字幕返回给webrtc信令服务器;Step 813, user B replaces the translated audio into the related video stream, and user B returns the translated subtitle to the webrtc signaling server;
步骤814、webrtc信令服务器将B和C的翻译字幕返回给用户A,用户A的浏览器应用根据收到的字幕将B的翻译字幕显示在B的视频框中,将收到的用户C的翻译字幕显示在C的视频框中。Step 814: The webrtc signaling server returns the translated subtitles of B and C to the user A, and the browser application of the user A displays the translated subtitle of the B in the video frame of the B according to the received subtitle, and the received user C The translated subtitles are displayed in the video box of C.
本发明实施例提供了的WebRTC点对点音视频通话的方法,使用webrtc技术实现的视频通话和视频会议中的语音解析并且生成字幕、翻译字幕、翻译音频。通过本系统,webrtc视频会议的会话成员可以在会议视频窗口中查看会议发言人的实时字幕。通过本系统,在webrtc的点对点音视频通话中也可以完成语音解析和语音翻译,并将翻译后的语音解析为文本字幕显示在用户的视频通话窗口上,或者将翻译后的语音解析为其他语言的语音并合成到 原有视频流中。翻译出来的语言文本也可以作为会议纪要内容保存起来。本发明实施例可以在使用不同语言进行通话或会议的用户请求字幕翻译或语音翻译,并可以将会议内容以对话文本的方式保存为会议纪要。The embodiment of the present invention provides a WebRTC point-to-point audio and video call method, which uses the webrtc technology to implement voice parsing in a video call and a video conference, and generates subtitles, subtitles, and translated audio. Through this system, the session members of the webrtc video conference can view the real-time subtitles of the conference spokesperson in the conference video window. Through the system, voice parsing and voice translation can also be completed in the webrtc point-to-point audio and video call, and the translated voice is parsed into text subtitles displayed on the user's video call window, or the translated speech is parsed into other languages. Speech and synthesis In the original video stream. The translated language text can also be saved as a meeting minutes content. The embodiment of the present invention can request subtitle translation or voice translation in a user who uses a different language for a call or conference, and can save the conference content as a conference minutes in a dialog text manner.
图8为本发明实施例的WebRTC服务器的示意图,如图8所示,本实施例的WebRTC服务器包括:FIG. 8 is a schematic diagram of a WebRTC server according to an embodiment of the present invention. As shown in FIG. 8, the WebRTC server of this embodiment includes:
第一传输模块801,设置成:接收到第一WebRTC客户端的字幕请求消息或翻译字幕请求消息后,将所述字幕请求消息或翻译字幕请求消息发送给一个或多个目标WebRTC客户端;The first transmission module 801 is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;
第二传输模块802,设置成:接收到所述目标WebRTC客户端返回的字幕或翻译后的字幕后,实时地将所述字幕或翻译后的字幕发送给所述第一WebRTC客户端。The second transmission module 802 is configured to: after receiving the subtitle or the translated subtitle returned by the target WebRTC client, send the subtitle or the translated subtitle to the first WebRTC client in real time.
图9为本发明实施例的WebRTC客户端的示意图,该WebRTC客户端可以作为请求字幕方,如图9所示,本实施例的WebRTC客户端包括:FIG. 9 is a schematic diagram of a WebRTC client according to an embodiment of the present invention. The WebRTC client can be used as a requesting subtitle. As shown in FIG. 9, the WebRTC client in this embodiment includes:
发送模块901,设置成:向WebRTC服务器发送请求一个或多个目标WebRTC客户端的字幕请求消息或翻译字幕请求消息;The sending module 901 is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
显示模块902,设置成:接收到所述WebRTC服务器返回的字幕或翻译字幕后,将所述字幕或翻译字幕显示在对应的目标WebRTC客户端的视频框中。The display module 902 is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, display the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
在一优选实施例中,所述WebRTC客户端还包括:In a preferred embodiment, the WebRTC client further includes:
保存模块903,设置成:保存所述字幕或所述翻译字幕。The saving module 903 is configured to: save the subtitle or the subtitle.
图10为本发明一实施例的WebRTC客户端的示意图,该WebRTC客户端可以作为目标客户端,如图10所示,本实施例的WebRTC客户端包括:FIG. 10 is a schematic diagram of a WebRTC client according to an embodiment of the present invention. The WebRTC client can be used as a target client. As shown in FIG. 10, the WebRTC client in this embodiment includes:
第一传输模块1001,设置成:接收到WebRTC服务器的字幕请求消息后,将自己的音频发送给语音分析字幕服务器;The first transmission module 1001 is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;
第二传输模块1002,设置成:接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器。The second transmission module 1002 is configured to: after receiving the subtitle returned by the speech analysis subtitle server, return the subtitle to the WebRTC server.
在一优选实施例中,所述第二传输模块1002,具体设置成:接收到所述语音分析字幕服务器返回的字幕后,向翻译服务器发送翻译字幕请求,所述 翻译字幕请求包括:所述字幕、翻译源语言、翻译目标语言;接收到所述翻译服务器返回的翻译后的字幕后,将翻译后的字幕发送给所述WebRTC服务器。In a preferred embodiment, the second transmission module 1002 is specifically configured to: after receiving the subtitle returned by the voice analysis subtitle server, send a subtitle request to the translation server, where The subtitle request includes: the subtitle, the translation source language, and the translation target language; after receiving the translated subtitle returned by the translation server, the translated subtitle is sent to the WebRTC server.
在一优选实施例中,所述翻译字幕请求还包括:翻译返回类型,所述翻译返回类型包括语音翻译;所述WebRTC客户端还包括:In a preferred embodiment, the subtitle request further includes: a translation return type, the translation return type includes a voice translation; and the WebRTC client further includes:
第三传输模块1003,设置成:接收到所述翻译服务器返回的翻译后的音频后,将翻译后的音频放到实时的视频流中,通过预先建立的媒体通道发送给请求翻译字幕的WebRTC客户端。The third transmission module 1003 is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to the WebRTC client requesting the translation of the subtitle through the pre-established media channel. end.
本发明实施例还公开了一种计算机程序,包括程序指令,当该程序指令被服务器执行时,使得该终端可执行上述任意的服务器侧的网页实时通信WebRTC点对点音视频通话的方法。The embodiment of the invention further discloses a computer program, comprising program instructions, when the program instruction is executed by the server, so that the terminal can perform any of the above-mentioned server-side webpage real-time communication WebRTC point-to-point audio and video call methods.
本发明实施例还公开了一种载有所述的计算机程序的载体。The embodiment of the invention also discloses a carrier carrying the computer program.
本发明实施例还公开了一种计算机程序,包括程序指令,当该程序指令被作为请求字幕方的客户端执行时,使得该终端可执行上述任意的作为请求字幕方的客户端侧的网页实时通信WebRTC点对点音视频通话的方法。The embodiment of the invention further discloses a computer program, including program instructions, when the program instruction is executed as a client requesting a subtitle party, so that the terminal can execute any of the above-mentioned webpages of the client side as the request subtitle party in real time. A method of communicating WebRTC point-to-point audio and video calls.
本发明实施例还公开了一种载有所述的计算机程序的载体。The embodiment of the invention also discloses a carrier carrying the computer program.
本发明实施例还公开了一种计算机程序,包括程序指令,当该程序指令被目标客户端执行时,使得该终端可执行上述任意的目标客户端侧的网页实时通信WebRTC点对点音视频通话的方法。The embodiment of the invention further discloses a computer program, comprising program instructions, when the program instruction is executed by the target client, the terminal can execute any of the above-mentioned target client side webpage real-time communication WebRTC point-to-point audio and video call method .
本发明实施例还公开了一种载有所述的计算机程序的载体。The embodiment of the invention also discloses a carrier carrying the computer program.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明不限制于任 何特定形式的硬件和软件的结合。One of ordinary skill in the art will appreciate that all or a portion of the steps described above can be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any What is the combination of specific forms of hardware and software.
以上仅为本发明的优选实施例,当然,本发明还可有其他多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。The above is only a preferred embodiment of the present invention, and of course, the present invention may be embodied in various other embodiments without departing from the spirit and scope of the invention. Corresponding changes and modifications are intended to be included within the scope of the appended claims.
工业实用性Industrial applicability
本发明实施例提供的一种WebRTC点对点音视频通话的方法及WebRTC服务器与WebRTC客户端,使用户可以跨越语言的障碍,更方便的进行通话。在多人视频会议中,发言人将自动解析和显示字幕、翻译字幕或翻译音频,用户可以轻松判断谁正在发言和识别发言内容,而不需要在多个视频窗口中寻找发言人。因此本发明具有很强的工业实用性。 The method for the WebRTC point-to-point audio and video call provided by the embodiment of the invention and the WebRTC server and the WebRTC client enable the user to cross the language barrier and make the call more convenient. In a multi-person video conference, the speaker will automatically parse and display subtitles, subtitles, or translated audio. Users can easily determine who is speaking and identifying the content of the speech without having to find a speaker in multiple video windows. Therefore, the present invention has strong industrial applicability.

Claims (15)

  1. 一种网页实时通信WebRTC点对点音视频通话的方法,包括:A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:
    WebRTC服务器接收到第一WebRTC客户端的字幕请求消息或翻译字幕请求消息后,将所述字幕请求消息或翻译字幕请求消息发送给一个或多个目标WebRTC客户端;After receiving the subtitle request message or the subtitle request message of the first WebRTC client, the WebRTC server sends the subtitle request message or the subtitle request message to one or more target WebRTC clients;
    所述WebRTC服务器接收到一个或多个所述目标WebRTC客户端返回的字幕或翻译后的字幕后,实时地将所述字幕或翻译后的字幕发送给所述第一WebRTC客户端。After receiving the subtitles or translated subtitles returned by the target WebRTC client, the WebRTC server sends the subtitles or the translated subtitles to the first WebRTC client in real time.
  2. 如权利要求1所述的WebRTC点对点音视频通话的方法,其中The method of WebRTC point-to-point audio and video call according to claim 1, wherein
    所述翻译字幕请求消息包括:翻译源语言、翻译目标语言以及翻译返回类型,所述翻译返回类型包括文字翻译和/或语音翻译。The subtitle request message includes a translation source language, a translation target language, and a translation return type, and the translation return type includes text translation and/or speech translation.
  3. 一种网页实时通信WebRTC服务器,包括:第一传输模块和第二传输模块,其中A webpage real-time communication WebRTC server includes: a first transmission module and a second transmission module, wherein
    所述第一传输模块设置成:接收到第一WebRTC客户端的字幕请求消息或翻译字幕请求消息后,将所述字幕请求消息或翻译字幕请求消息发送给一个或多个目标WebRTC客户端;The first transmission module is configured to: after receiving the subtitle request message or the subtitle request message of the first WebRTC client, send the subtitle request message or the subtitle request message to one or more target WebRTC clients;
    所述第二传输模块设置成:接收到一个或多个所述目标WebRTC客户端返回的字幕或翻译后的字幕后,实时地将所述字幕或翻译后的字幕发送给所述第一WebRTC客户端。The second transmission module is configured to: after receiving the subtitle or the translated subtitle returned by the one or more target WebRTC clients, send the subtitle or the translated subtitle to the first WebRTC client in real time. end.
  4. 如权利要求3所述的WebRTC服务器,其中The WebRTC server of claim 3, wherein
    所述翻译字幕请求消息包括:翻译源语言、翻译目标语言以及翻译返回类型,所述翻译返回类型包括文字翻译和/或语音翻译。The subtitle request message includes a translation source language, a translation target language, and a translation return type, and the translation return type includes text translation and/or speech translation.
  5. 一种网页实时通信WebRTC点对点音视频通话的方法,包括:A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:
    WebRTC客户端向WebRTC服务器发送请求一个或多个目标WebRTC客户端的字幕请求消息或翻译字幕请求消息;The WebRTC client sends a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
    所述WebRTC客户端接收到所述WebRTC服务器返回的字幕或翻译字幕后,将所述字幕或翻译字幕显示在相应的目标WebRTC客户端的视频框中。 After receiving the subtitles or subtitles returned by the WebRTC server, the WebRTC client displays the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
  6. 如权利要求5所述的WebRTC点对点音视频通话的方法,其中A method of WebRTC point-to-point audio and video calling as claimed in claim 5, wherein
    所述翻译字幕请求消息包括:翻译源语言、翻译目标语言以及翻译返回类型,所述翻译返回类型包括文字翻译和/或语音翻译。The subtitle request message includes a translation source language, a translation target language, and a translation return type, and the translation return type includes text translation and/or speech translation.
  7. 如权利要求5或6所述的WebRTC点对点音视频通话的方法,该方法还包括:The method of the WebRTC point-to-point audio and video call according to claim 5 or 6, the method further comprising:
    所述WebRTC客户端保存所述字幕或所述翻译字幕。The WebRTC client saves the subtitle or the translated subtitle.
  8. 一种WebRTC客户端,包括发送模块和显示模块,其中A WebRTC client, including a sending module and a display module, wherein
    所述发送模块设置成:向WebRTC服务器发送请求一个或多个目标WebRTC客户端的字幕请求消息或翻译字幕请求消息;The sending module is configured to: send a subtitle request message or a subtitle request message requesting one or more target WebRTC clients to the WebRTC server;
    所述显示模块设置成:接收到所述WebRTC服务器返回的字幕或翻译字幕后,将所述字幕或翻译字幕显示在相应的目标WebRTC客户端的视频框中。The display module is configured to: after receiving the subtitles or subtitles returned by the WebRTC server, displaying the subtitles or subtitles in a video frame of the corresponding target WebRTC client.
  9. 如权利要求8所述的WebRTC客户端,该客户端还包括保存模块,其中The WebRTC client of claim 8, the client further comprising a save module, wherein
    所述保存模块设置成:保存所述字幕或所述翻译字幕。The saving module is configured to: save the subtitle or the translated subtitle.
  10. 一种网页实时通信WebRTC点对点音视频通话的方法,包括:A method for webpage real-time communication WebRTC point-to-point audio and video call, comprising:
    WebRTC客户端接收到WebRTC服务器的字幕请求消息后,将自己的音频发送给语音分析字幕服务器;After receiving the subtitle request message of the WebRTC server, the WebRTC client sends its own audio to the voice analysis subtitle server;
    所述WebRTC客户端接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器。The WebRTC client returns the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
  11. 如权利要求10所述的WebRTC点对点音视频通话的方法,其中A method of WebRTC point-to-point audio and video calling as claimed in claim 10, wherein
    所述WebRTC客户端接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器的步骤包括:The step of the WebRTC client receiving the subtitle returned by the voice analysis subtitle server and returning the subtitle to the WebRTC server includes:
    所述WebRTC客户端接收到所述语音分析字幕服务器返回的字幕后,向翻译服务器发送翻译字幕请求,所述翻译字幕请求包括:所述字幕、翻译源语言、翻译目标语言;After receiving the subtitle returned by the voice analysis subtitle server, the WebRTC client sends a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;
    所述WebRTC客户端接收到所述翻译服务器返回的翻译后的字幕后,将翻译后的字幕发送给所述WebRTC服务器。 After receiving the translated subtitle returned by the translation server, the WebRTC client sends the translated subtitle to the WebRTC server.
  12. 如权利要求11所述的WebRTC点对点音视频通话的方法,其中A method of WebRTC point-to-point audio and video calling according to claim 11
    所述翻译字幕请求还包括:翻译返回类型,所述翻译返回类型包括语音翻译;The subtitle request further includes: a translation return type, the translation return type including a voice translation;
    所述方法还包括:所述WebRTC客户端接收到所述翻译服务器返回的翻译后的音频后,将翻译后的音频放到实时的视频流中,通过预先建立的媒体通道发送给请求翻译字幕的WebRTC客户端。The method further includes: after receiving the translated audio returned by the translation server, the WebRTC client puts the translated audio into a real-time video stream, and sends the requested subtitle through a pre-established media channel. WebRTC client.
  13. 一种WebRTC客户端,包括:第一传输模块和第二传输模块,其中A WebRTC client includes: a first transmission module and a second transmission module, wherein
    所述第一传输模块设置成:接收到WebRTC服务器的翻译字幕请求消息后,将自己的音频发送给语音分析字幕服务器;The first transmission module is configured to: after receiving the subtitle request message of the WebRTC server, send the audio to the voice analysis subtitle server;
    所述第二传输模块设置成:接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器。The second transmission module is configured to: return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server.
  14. 如权利要求13所述的WebRTC客户端,其中The WebRTC client of claim 13 wherein
    所述第二传输模块设置成按照如下方式接收到所述语音分析字幕服务器返回的字幕后将所述字幕返回给所述WebRTC服务器:The second transmission module is configured to return the subtitles to the WebRTC server after receiving the subtitles returned by the speech analysis subtitle server as follows:
    接收到所述语音分析字幕服务器返回的字幕后,向翻译服务器发送翻译字幕请求,所述翻译字幕请求包括:所述字幕、翻译源语言、翻译目标语言;After receiving the subtitle returned by the voice analysis subtitle server, sending a subtitle request to the translation server, where the subtitle request includes: the subtitle, a translation source language, and a translation target language;
    接收到所述翻译服务器返回的翻译后的字幕后,将翻译后的字幕发送给所述WebRTC服务器。After receiving the translated subtitles returned by the translation server, the translated subtitles are sent to the WebRTC server.
  15. 如权利要求14所述的WebRTC客户端,其中The WebRTC client of claim 14 wherein
    所述翻译字幕请求还包括:翻译返回类型,所述翻译返回类型包括语音翻译;The subtitle request further includes: a translation return type, the translation return type including a voice translation;
    所述WebRTC客户端还包括第三传输模块,其中The WebRTC client further includes a third transmission module, wherein
    所述第三传输模块设置成:接收到所述翻译服务器返回的翻译后的音频后,将翻译后的音频放到实时的视频流中,通过预先建立的媒体通道发送给请求翻译字幕的WebRTC客户端。 The third transmission module is configured to: after receiving the translated audio returned by the translation server, put the translated audio into a real-time video stream, and send it to a WebRTC client requesting subtitles through a pre-established media channel. end.
PCT/CN2016/070377 2015-03-26 2016-01-07 Method and device for webrtc p2p audio and video call WO2016150235A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510136472.4 2015-03-26
CN201510136472.4A CN104780335B (en) 2015-03-26 2015-03-26 WebRTC P2P audio and video call method and device

Publications (1)

Publication Number Publication Date
WO2016150235A1 true WO2016150235A1 (en) 2016-09-29

Family

ID=53621547

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/070377 WO2016150235A1 (en) 2015-03-26 2016-01-07 Method and device for webrtc p2p audio and video call

Country Status (2)

Country Link
CN (1) CN104780335B (en)
WO (1) WO2016150235A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919562A (en) * 2017-04-28 2017-07-04 深圳市大乘科技股份有限公司 A kind of real-time translation system, method and device
CN111970473A (en) * 2020-08-19 2020-11-20 彩讯科技股份有限公司 Method, device, equipment and storage medium for realizing synchronous display of double video streams
CN112203040A (en) * 2020-11-06 2021-01-08 通号通信信息集团有限公司 Railway emergency communication method and system based on communication conference
CN112435690A (en) * 2019-08-08 2021-03-02 百度在线网络技术(北京)有限公司 Duplex Bluetooth translation processing method and device, computer equipment and storage medium
CN112672099A (en) * 2020-12-31 2021-04-16 深圳市潮流网络技术有限公司 Subtitle data generation and presentation method, device, computing equipment and storage medium
CN112822557A (en) * 2019-11-15 2021-05-18 中移物联网有限公司 Information processing method, information processing device, electronic equipment and computer readable storage medium
CN113014849A (en) * 2021-02-23 2021-06-22 中电海康集团有限公司 Driving training video call system and method based on Web RTC
CN113473238A (en) * 2020-04-29 2021-10-01 海信集团有限公司 Intelligent device and simultaneous interpretation method during video call
CN117439976A (en) * 2023-12-13 2024-01-23 深圳大数信科技术有限公司 Audio and video call system based on WebRTC

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104780335B (en) * 2015-03-26 2021-06-22 中兴通讯股份有限公司 WebRTC P2P audio and video call method and device
US9374536B1 (en) 2015-11-12 2016-06-21 Captioncall, Llc Video captioning communication system, devices and related methods for captioning during a real-time video communication session
US9525830B1 (en) 2015-11-12 2016-12-20 Captioncall Llc Captioning communication systems
CN105743889B (en) * 2016-01-27 2019-05-17 福建星网智慧科技股份有限公司 A kind of method and system for realizing multi-party audio call based on webrtc
CN107707868B (en) * 2016-08-08 2020-09-25 中国电信股份有限公司 Video conference joining method, multi-access conference server and video conference system
CN109274634B (en) * 2017-07-18 2021-06-11 腾讯科技(深圳)有限公司 Multimedia communication method and device, and storage medium
CN109309802A (en) * 2017-07-27 2019-02-05 中兴通讯股份有限公司 Management method, server and the computer readable storage medium of video interactive
CN107277646A (en) * 2017-08-08 2017-10-20 四川长虹电器股份有限公司 A kind of captions configuration system of audio and video resources
CN107682657B (en) * 2017-09-13 2020-11-10 中山市华南理工大学现代产业技术研究院 WebRTC-based multi-user voice video call method and system
CN108829688A (en) * 2018-06-21 2018-11-16 北京密境和风科技有限公司 Implementation method and device across languages interaction
CN109688364A (en) * 2018-08-21 2019-04-26 平安科技(深圳)有限公司 Video-meeting method, device, server and storage medium
CN110418099B (en) * 2018-08-30 2021-08-31 腾讯科技(深圳)有限公司 Audio and video processing method and device and storage medium
CN110876033B (en) * 2018-08-30 2021-08-31 腾讯科技(深圳)有限公司 Audio and video processing method and device and storage medium
CN109104586B (en) * 2018-10-08 2021-05-07 北京小鱼在家科技有限公司 Special effect adding method and device, video call equipment and storage medium
CN109688363A (en) * 2018-12-31 2019-04-26 深圳爱为移动科技有限公司 The method and system of private chat in the multilingual real-time video group in multiple terminals
CN110415706A (en) * 2019-08-08 2019-11-05 常州市小先信息技术有限公司 A kind of technology and its application of superimposed subtitle real-time in video calling
CN112584078B (en) * 2019-09-27 2022-03-18 深圳市万普拉斯科技有限公司 Video call method, video call device, computer equipment and storage medium
CN112825551B (en) * 2019-11-21 2023-05-26 中国科学院沈阳计算技术研究所有限公司 Video conference important content prompting and transferring storage method and system
CN111654658B (en) * 2020-06-17 2022-04-15 平安科技(深圳)有限公司 Audio and video call processing method and system, coder and decoder and storage device
CN115314660A (en) * 2021-05-07 2022-11-08 阿里巴巴新加坡控股有限公司 Processing method and device for audio and video conference
CN114915616B (en) * 2022-03-16 2024-04-02 青岛希望鸟科技有限公司 Program synchronous communication method based on client real-time communication and client

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002163400A (en) * 2000-11-28 2002-06-07 Mitsuaki Arita Language conversion mediating method, language conversion mediation processor and computer readable recording medium
CN101542462A (en) * 2007-05-16 2009-09-23 莫卡有限公司 Establishing and translating within multilingual group messaging sessions using multiple messaging protocols
CN102209227A (en) * 2010-03-30 2011-10-05 宝利通公司 Method and system for adding translation in a videoconference
CN102572532A (en) * 2010-12-14 2012-07-11 洪煌炳 TV caption relay translation system based on cable TV network
US20140157113A1 (en) * 2012-11-30 2014-06-05 Ricoh Co., Ltd. System and Method for Translating Content between Devices
CN104025079A (en) * 2011-09-09 2014-09-03 谷歌公司 User interface for translation webpage
CN104780335A (en) * 2015-03-26 2015-07-15 中兴通讯股份有限公司 Method and device for WebRTC P2P (web real-time communication peer-to-peer) audio and video call

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101931779A (en) * 2009-06-23 2010-12-29 中兴通讯股份有限公司 Video telephone and communication method thereof
CN101697581B (en) * 2009-10-26 2012-11-21 华为终端有限公司 Method, device and system for supporting simultaneous interpretation video conference

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002163400A (en) * 2000-11-28 2002-06-07 Mitsuaki Arita Language conversion mediating method, language conversion mediation processor and computer readable recording medium
CN101542462A (en) * 2007-05-16 2009-09-23 莫卡有限公司 Establishing and translating within multilingual group messaging sessions using multiple messaging protocols
CN102209227A (en) * 2010-03-30 2011-10-05 宝利通公司 Method and system for adding translation in a videoconference
CN102572532A (en) * 2010-12-14 2012-07-11 洪煌炳 TV caption relay translation system based on cable TV network
CN104025079A (en) * 2011-09-09 2014-09-03 谷歌公司 User interface for translation webpage
US20140157113A1 (en) * 2012-11-30 2014-06-05 Ricoh Co., Ltd. System and Method for Translating Content between Devices
CN104780335A (en) * 2015-03-26 2015-07-15 中兴通讯股份有限公司 Method and device for WebRTC P2P (web real-time communication peer-to-peer) audio and video call

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919562A (en) * 2017-04-28 2017-07-04 深圳市大乘科技股份有限公司 A kind of real-time translation system, method and device
CN106919562B (en) * 2017-04-28 2024-01-05 深圳市大乘科技股份有限公司 Real-time translation system, method and device
CN112435690A (en) * 2019-08-08 2021-03-02 百度在线网络技术(北京)有限公司 Duplex Bluetooth translation processing method and device, computer equipment and storage medium
CN112822557A (en) * 2019-11-15 2021-05-18 中移物联网有限公司 Information processing method, information processing device, electronic equipment and computer readable storage medium
CN113473238A (en) * 2020-04-29 2021-10-01 海信集团有限公司 Intelligent device and simultaneous interpretation method during video call
CN111970473A (en) * 2020-08-19 2020-11-20 彩讯科技股份有限公司 Method, device, equipment and storage medium for realizing synchronous display of double video streams
CN112203040A (en) * 2020-11-06 2021-01-08 通号通信信息集团有限公司 Railway emergency communication method and system based on communication conference
CN112203040B (en) * 2020-11-06 2023-01-13 通号通信信息集团有限公司 Railway emergency communication method and system based on communication conference
CN112672099B (en) * 2020-12-31 2023-11-17 深圳市潮流网络技术有限公司 Subtitle data generating and presenting method, device, computing equipment and storage medium
CN112672099A (en) * 2020-12-31 2021-04-16 深圳市潮流网络技术有限公司 Subtitle data generation and presentation method, device, computing equipment and storage medium
CN113014849A (en) * 2021-02-23 2021-06-22 中电海康集团有限公司 Driving training video call system and method based on Web RTC
CN113014849B (en) * 2021-02-23 2023-03-14 中电海康集团有限公司 Driving training video call system and method based on Web RTC
CN117439976A (en) * 2023-12-13 2024-01-23 深圳大数信科技术有限公司 Audio and video call system based on WebRTC
CN117439976B (en) * 2023-12-13 2024-03-26 深圳大数信科技术有限公司 Audio and video call system based on WebRTC

Also Published As

Publication number Publication date
CN104780335A (en) 2015-07-15
CN104780335B (en) 2021-06-22

Similar Documents

Publication Publication Date Title
WO2016150235A1 (en) Method and device for webrtc p2p audio and video call
US10276064B2 (en) Method and system for adjusting user speech in a communication session
US10142459B2 (en) Method and system for managing multimedia accessiblity
TWI440346B (en) Open architecture based domain dependent real time multi-lingual communication service
US8400489B2 (en) Method of controlling a video conference
US9232049B2 (en) Quality of experience determination for multi-party VoIP conference calls that account for focus degradation effects
US10896298B2 (en) Systems and methods for configuring an automatic translation of sign language in a video conference
WO2020124725A1 (en) Audio and video pushing method and audio and video stream pushing client based on webrtc protocol
US20080075095A1 (en) Method and system for network communication
US20150373081A1 (en) Method of sharing browsing on a web page displayed by a web browser
Fowdur et al. Performance analysis of webrtc and sip-based audio and video communication systems
Singh et al. Developing WebRTC-based team apps with a cross-platform mobile framework
KR102545276B1 (en) Communication terminal based group call security apparatus and method
WO2022203891A1 (en) Method and system for integrating video content in a video conference session
Davies et al. Evaluating two approaches for browser-based real-time multimedia communication
Wang et al. A design of multimedia conferencing system based on WebRTC Technology
Kullberg Implementing remote customer service api using webrtc and jitsi sdk
JP4990718B2 (en) Media stream processing system, media stream processing method, component realization apparatus
CN117729188B (en) Water affair video acquisition system and method based on WebRTC
US11811843B2 (en) Supporting quality of service for media communications
US11522934B1 (en) Media provider shim for virtual events
US20240146560A1 (en) Participant Audio Stream Modification Within A Conference
Al-Khayyat et al. Peer-to-peer media streaming with HTML5
WO2023087925A1 (en) Telecommunication method, electronic device, and storage medium
US20240078339A1 (en) Anonymized videoconferencing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16767612

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16767612

Country of ref document: EP

Kind code of ref document: A1