WO2013062509A1 - Applying geometric correction to a media stream - Google Patents

Applying geometric correction to a media stream Download PDF

Info

Publication number
WO2013062509A1
WO2013062509A1 PCT/US2011/057452 US2011057452W WO2013062509A1 WO 2013062509 A1 WO2013062509 A1 WO 2013062509A1 US 2011057452 W US2011057452 W US 2011057452W WO 2013062509 A1 WO2013062509 A1 WO 2013062509A1
Authority
WO
WIPO (PCT)
Prior art keywords
participant
representation
geometric
meeting
geometric correction
Prior art date
Application number
PCT/US2011/057452
Other languages
French (fr)
Inventor
Mark E. Gorzynski
Michael D. Derocher
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2011/057452 priority Critical patent/WO2013062509A1/en
Publication of WO2013062509A1 publication Critical patent/WO2013062509A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • Remote conferencing systems allow for collaboration between people at different locations. These systems allow participants to interact with one another through the use of audio and/or video equipment that provides audio and/or video communications.
  • Fig. 1 illustrates various example images of participants in a meeting, to be displayed using techniques according to some implementations
  • Fig. 2 is a flow diagram of a process according to some implementations.
  • Fig. 3 is a block diagram of a conference system according to some implementations.
  • Fig. 4 illustrates a composite image after application of geometric correction, in accordance with some implementations
  • Fig. 5 illustrates various parameters that are part of meeting geometric metadata used for application of geometric correction according to some
  • Figs. 6A-6D illustrate example standard display frames for use in application of geometric correction according to some implementations
  • Fig. 7 illustrates an example application of geometric correction according to some implementations.
  • Fig. 8 illustrates application of geometric correction of an image, according to some implementations.
  • media streams of participants at different conference sites are displayed at each of multiple endpoints involved in the meeting.
  • Examples of the communications network include the Internet, a local area network, a wide area network, and so forth.
  • communications network or “network” can refer to a single network or multiple networks.
  • a “media stream” can refer to content for depicting respective
  • a "resource” can include any type of visual material that can be displayed during the meeting, where the visual material can include a document (e.g. word processing document, presentation slides, spreadsheet document, etc.), an image, a movie, or any other visual material that is not a participant of the meeting.
  • a "meeting” can refer to a session established over a communications network for exchanging content among multiple conference sites.
  • the media streams of participants at different conference sites are combined for display at a particular endpoint.
  • the combining of the media streams from the multiple sites produces a "multi-point stream" that includes media streams from the multiple conference sites.
  • the multi-point stream provides an immersive view of the media streams from the conference sites to produce, from the perspective of a participant, a sense of being with other participants in a virtual room.
  • the sense of being in a single virtual room with participants from multiple conference sites can depend on the degree on which representations (e.g. images) of participants from the multiple sites match each other, and on other characteristics (including grouping of representations of participants, colors in the representations, and geometries of the representations).
  • the media streams of participants at different conference sites can be kept separate when displayed at a particular endpoint. Techniques or mechanisms according to various implementations are applicable in either of the scenarios discussed above.
  • Audio/video data may refer to audio data, or video data, or both audio and video data.
  • An endpoint can include one or multiple cameras for capturing one or multiple images of a participant or multiple participants.
  • the cameras at different endpoints can be associated with different positions relative to respective
  • a first participant can use a desktop or notebook computer as the endpoint, where the desktop or notebook computer includes a webcam to capture the image of the first participant.
  • a higher quality camera located in an office can be used to capture the image of a second participant.
  • multiple participants can be located in a conference room, with the camera positioned relatively far away from these participants, as compared to the first and second participants.
  • FIG. 1 Examples of different images (102, 1 04, 106, 1 08, 1 1 0, and 1 12) captured by different endpoints are shown in Fig. 1 .
  • the images 102, 1 04, 106, 108, 1 10, and 1 12 have varying geometries; as a result, when these images (102, 104, 106, 1 08, 1 10, and 1 12) are displayed at an endpoint, the result can be inconsistent individual images that are displayed by the respective display device at each endpoint.
  • a first participant (represented by image 102) using her computer can sit relatively close to the computer's display device (and thus to the attached webcam). This can result in display of a relatively large face, and the image can be distorted due to poor image resolution provided by the webcam.
  • the conference room represented by image 1 06
  • the participants sit relatively far away from the camera, such that the faces of these participants are relatively small, with possibly wide-angle distortion present.
  • the images 102 and 106 are in turn quite different from the image 104, which can be captured by a high-resolution camera (having better resolution than the webcam for the image 102, for example) that is placed an intermediate distance from the respective participant.
  • the image 104 can have relatively high quality, and can have a better geometrical layout as compared to those of the images 102 and 106.
  • the participants in images 108, 1 10, and 1 12 are at relatively good distances from the respective cameras.
  • the participants that are in the images 104, 1 08, 1 1 0, and 1 12 can possibly have an advantage in the meeting, since their relatively superior images would enhance their non-verbal communications as compared with other participants in the images 102 and 106.
  • geometric correction can be applied to correct representations (e.g. images) of participants in media streams received from conference sites of a meeting established over a communications network.
  • Such geometric correction of a representation may involve modification of a geometric aspect (or multiple geometric aspects) of the representation, including, for example, modification of a size of a feature of a participant, modification of a distance from a side of the participant in the representation to a corresponding side of a frame containing the representation of the participant, a size or amount of background behind the participant, a size of the frame containing the representation, and so forth.
  • a geometric aspect or multiple geometric aspects
  • the geometric correction can be based on meeting geometric metadata that is applicable to all the media streams of the meeting.
  • the meeting geometric metadata can be provided by a meeting controller (e.g., a multi-point control unit (MCU)), which is the controller used for establishing and controlling a meeting established over a communications network.
  • the meeting geometric metadata can define a geometrical ruleset (including one or multiple rules) relating to parameters that control geometries of images of participants in the media streams.
  • Examples of the parameters can include one or some combination of the following: eye height from the bottom of a display frame containing the image of the participant(s); head/face size (size of the head or face of a participant); space between a top, bottom, or side of the display frame and an image of a participant; an aspect ratio (which defines a ratio of width and height of a display frame including an image of participant(s)); and so forth.
  • a "head/face size” is noted as referring to the size of the head or face of a participant; in the ensuing discussion, reference is made to a "head size," which is intended to refer to either the size of the head of a participant, or to the size of the face of a participant.
  • the foregoing parameters represent examples of geometric aspects of images that can be modified by the geometric correction based on the geometrical ruleset. There can be other or alternative examples of parameters of meeting geometric metadata, which are discussed further below in connection with Fig. 5.
  • the geometrical ruleset of the meeting geometric metadata can define standard display frames for containing images of participants. Examples of standard display frames are discussed further below in connection with Figs. 6A-6D and 7. Depending upon the number of participants (e.g. one participant, two participants, etc.) represented by an image within a display frame, the respective standard display frame can specify the head size of each such participant relative to the border of the display frame, and can specify an eye height of each participant (as measured from the bottom of the display frame).
  • the standard display frames are also associated with specific frame aspect ratios, as well as other parameters.
  • Selecting a particular one of multiple standard display frames to display an image of participant(s) at a given conference site is an example of modifying a geometric aspect of the image— selecting one of multiple standard display frames effectively selects an aspect ratio of the display frame to use for displaying an image of participant(s).
  • the geometrical ruleset of the meeting geometric metadata can also specify rules relating to how a multi-participant image is to be scaled.
  • a first rule can specify that the image of the multiple participants is to be scaled according to the largest head size identified in the image.
  • the head sizes of the other participants in the multi-participant image would be scaled also by X.
  • a second rule of the geometrical ruleset can specify that none of the participants should be cut off after scaling or cropping.
  • application of the first rule may be incompatible with application of the second rule; in such cases, priorities can be assigned to the respective rules to define which rule takes precedence. For example, if the first rule has higher priority than the second rule, then scaling of the multi-participant image may cause at least one of the participants to disappear or be partially removed in the image. However, if the second rule has a higher priority than the second rule, then the head sizes of the participants may be maintained smaller to avoid removing any participant from the image. In other examples, the priorities assigned to the respective rules can be changed. As yet other examples, to avoid removing a participant from the image, an endpoint can locally zoom into portions of an image that contain faces of participants.
  • the geometrical ruleset of the meeting geometric metadata can also include other or alternative rules in other implementations, such as rules that pertain to how various parameters as noted above are to be modified when applying geometric correction.
  • the eye height and head size of the image of a first participant in a first media stream can be made consistent with the eye height and head size of the image of a second participant in a second media stream.
  • Making the eye heights "consistent” may involve making the eye heights to be within some predefined difference of one another.
  • making the head sizes "consistent” may involve making the head sizes to be within some predefined difference of one another.
  • geometric correction can also be based on a local geometric rule that is associated with a display device of a given endpoint involved in the meeting.
  • a local geometric rule applicable to the relatively small display device of a notebook computer may be different from the local geometric rule that is applicable to a studio conference system that has one or multiple relatively large display devices.
  • the local geometric rule applicable to a small handheld device such as a personal digital assistant (PDA) or smartphone, can differ from the local geometric rules applicable to display devices associated with computers and studio conference systems.
  • PDA personal digital assistant
  • the local geometric rule for a notebook computer with a relatively small display device can specify use of a larger scale (to provide larger head sizes) for a participant image with tighter cropping applied. Cropping an image of a participant refers to removing a background surrounding the image(s) of the participant(s).
  • the local geometric rule for a relatively large display device can specify a different scale for participant images, and can relax cropping rules (a greater amount of background can be shown, for example).
  • a "local geometric rule” refers to a geometric rule applicable to a particular endpoint (but not to another endpoint) that defines at least one specification associated with images of participants to be presented in the corresponding display device.
  • Fig. 2 is a flow diagram of a process of performing geometric correction, in accordance with some implementations.
  • the process of Fig. 2 can be performed by an endpoint, or alternatively, by a meeting controller (e.g. an MCU) or by another device.
  • the process receives (at 202) at least one media stream from at least one of multiple conference sites involved in a meeting established over a communications network.
  • the received media stream contains an image of a participant (or participants) in the meeting.
  • the media stream can be received over a network from the remote conference site, or the media stream can be passed through an intermediate device, such as a meeting controller, before reaching the receiving endpoint.
  • the process next applies (at 204) geometric correction to the image in the media stream.
  • the geometric correction is based on meeting geometric metadata and a local geometric rule, as discussed above.
  • Application of the geometric correction to the media stream causes the modification of the image of the participant(s) in the received media stream.
  • the application of the geometric correction can be performed at one or multiple devices (e.g. at a receiving endpoint or at a combination of the receiving endpoint and a meeting controller).
  • Fig. 2 refers to application of geometric correction to the image of a media stream
  • the application of geometric correction can alternatively be made to images in multiple corresponding media streams that are to be displayed at an endpoint.
  • the application of geometric correction to images that are to be displayed at an endpoint allows for coordination of images of the participants such that consistent views of the participants can be maintained in the composite image— in other words, the images of the participants from multiple media streams are made to be geometrically more consistent with each other.
  • geometric correction of images can include modification of one or multiple parameters noted above, including, as examples, eye height from the bottom of a display frame containing the image of the participant(s); head/face size; space between a top, bottom, or side of the display frame and an image of a participant; an aspect ratio (which can be based on selection of a standard display frame); and so forth.
  • eye heights and head sizes of participants in multiple images can be made consistent. To do so, techniques or mechanisms according to some implementations are able to detect locations of heads and eyes in images, such that alignment of head sizes and/or eye heights can be made possible.
  • the foregoing are examples of geometric correction using the meeting geometric metadata.
  • the geometric correction of images can include modification using a local geometric rule applicable to a particular endpoint, as noted above.
  • Fig. 3 is a block diagram illustrating an example remote conferencing system that includes N (N > 1 ) conference sites (conference site A, conference site B, conference site N shown in Fig. 3). Each conference site has a respective endpoint 302, 304, and 306, as shown in Fig. 3.
  • the endpoints at the respective conference sites can communicate with one another directly over a network 308, such as the Internet or other network. Alternatively, the endpoints can communicate with each other using one or multiple intermediate devices, including a meeting controller 310 (sometimes referred to as a multipoint control unit or MCU).
  • MCU multipoint control unit
  • Each endpoint 302, 304, or 306 is connected to a respective display device (or multiple display devices) 312, 314, or 316, as well as to a respective camera (or multiple cameras) 318, 320, or 322.
  • the display devices are used to display media streams of respective participants of a meeting established using the remote conferencing system of Fig. 3.
  • the cameras are used to capture video streams of the participants.
  • a particular endpoint can be connected to multiple display devices and/or cameras (such as multiple cameras to capture participants at a site from different angles).
  • the endpoints 302, 304, and 306 can also include respective audio capture equipment to capture audio streams. Additionally, the endpoints can include respective equipment to allow for collaboration of resources, such as documents and so forth. The endpoints 302, 304, and 306 are able to exchange audio/video streams and resources with each other over the network 308.
  • the endpoint 302 includes a display layout module 324 executable on one or multiple processors 328.
  • the processor(s) 328 is (are) connected to a storage medium 330.
  • the display layout module 324 is able to generate a multi-point image (that combines media streams from multiple conference sites) for display in the attached display device (or display devices) 312.
  • the media streams from multiple conference sites can be displayed as separate media streams in the display device(s) attached to the endpoint 302.
  • the display layout module 324 includes a geometric correction module 326 that is able to apply geometric correction as discussed herein, based on meeting geometric metadata 332 and a local geometric rule 334 stored in the storage medium 330.
  • the geometric correction module 326 can be separate from the display layout module 324.
  • the meeting geometric metadata 332 can be received from the meeting controller 310 (or from another source).
  • the other endpoints 304 and 306 can include similar components as the endpoint 302.
  • a geometric correction module 336 can instead be included in another device, such as the meeting controller 310 (or another device). As shown in Fig. 3, the geometric correction module 336 in the meeting controller 310 is executable on processor(s) 340. The geometric correction module 336 can apply geometric correction as discussed herein, based on meeting geometric metadata 344 and a local geometric rule 346 stored in a storage medium 342 of the meeting controller 310.
  • Fig. 4 shows an example composite image 400 of individual images received in media streams from corresponding conference sites of a meeting, after geometric correction has been applied based on meeting geometric metadata and a local geometric rule associated with a given display device used to display the multipoint image 400 shown in Fig. 4.
  • the composite image 400 is displayed by the given display device— other display devices of endpoints involved in the meeting can display other composite images.
  • the various images of Fig. 4 can be maintained in separate media streams.
  • a corrected individual image 102A is based on application of geometric correction to the image 1 02 shown in Fig. 1 .
  • a corrected image 104A is based on application of geometric correction to the image 104 of Fig. 1 .
  • a corrected image 106A is based on application of geometric correction to the image 106 of Fig. 1 .
  • the corrected images 108A, 1 10A, and 1 12A are based on respective applications of geometric corrections to corresponding images 108, 1 10, and 1 12 of Fig. 1 .
  • the images 108A and 1 12A may be the same as respective images 108 and 1 12 (in other words, the images 1 08A and 1 12A have not been modified from respective images 108 and 1 12).
  • each of images 102A 104A and 1 10A represents just one respective participant.
  • each of images 1 08A and 1 12A represents two participants, and the image 106A represents more than two participants.
  • the corrected image 1 02A the image 102 (Fig. 1 ) that was captured with a notebook computer webcam, for example, has been rescaled and positioned at a consistent eye height, as compared to the other individual images in the composite image 400. Also, the rescaling of the image 1 02 has resulted in reducing the head size of the participant in the corrected image 102A to be consistent with head sizes of participants in several other images in the composite image 400 (e.g. images 108A, 1 04A, 1 10A, 1 12A). As a result of reducing of the size of the image 102, padding has been provided in the corrected image 102A, where the padding is in the form of the dark border around the image 102A.
  • the geometric correction of the image 102 is based on the meeting geometric metadata (applicable to multiple endpoints involved in a meeting) and a local geometric rule applicable to just the particular endpoint that is displaying the composite image 400.
  • the meeting geometric metadata can include eye height, head/face size; space between a top, bottom, or side of the display frame and an image of a participant; and an aspect ratio.
  • the meeting geometric metadata can have a geometrical ruleset specifying that eye heights and head/face sizes of different images of participants are to be maintained consistent where possible. (An example of where head/sizes of participants cannot be made consistent with other images is in the image 106A having seven participants).
  • head/face detection and eye detection can be performed using any of various techniques.
  • the geometrical ruleset can also specify that the space between a top, bottom, or side of the display frame and an image of a participant be within some predefined range. Additionally, for a one-participant image, the geometrical ruleset can specify that a one-participant standard frame be used that has a corresponding aspect ratio. Note that in Fig. 4, the one-participant images 102A, 104A, and 1 1 OA have aspect ratios.
  • the image 102A can also have been geometrically corrected based on application of a local geometric rule for the endpoint that is displaying the composite image 400.
  • the local geometric rule for a notebook computer with a relatively small display device can specify use of a larger scale (to provide larger head sizes) for a participant image with tighter cropping applied.
  • the local geometric rule for a relatively large display device can specify a different scale for participant images, and can relax cropping rules (a greater amount of background can be shown, for example).
  • Geometric correction based on the meeting geometric metadata and the local geometric rule can also be applied to the other images of the composite image 400.
  • Rescaling has been applied to the image 106 of Fig. 1 to increase the head sizes of the participants in the corrected image 106A (these participants are in the conference room).
  • An average eye height of the eye heights of respective participants in the conference room depicted in the corrected image 106A are maintained to be consistent with the eye heights of the image 1 12A.
  • cropping has been applied to remove certain background content in the image 106.
  • the aspect ratio of the corrected image 106A is modified from the aspect ratio of the image 106—the modified aspect ratio allows the corrected image 106A to display more of the participants in image 106A.
  • the image 104 (Fig. 1 ) has been cropped to reduce an amount of background information in the corrected image 104A. Also, aspect ratio modification has been performed such that the corrected image 104A is more tightly focused on the participant in image 104A. The width of the image 1 10A has been reduced as compared to the image 1 10 in Fig. 1 .
  • a captured image can be a panoramic image made up of multiple sub-images, where each sub-image can depict one or multiple participants. Geometric correction can be applied to individual ones of the sub-images.
  • Fig. 5 illustrates various example parameters in meeting geometric metadata. Note that in other examples, other types of meeting geometric metadata can be used. The parameters are illustrated in the context of a display frame 500 (containing an image of two participants) that is displayed by a display device 501 . These parameters can be modified according to the geometrical ruleset defined by the meeting geometric metadata.
  • the meeting geometric metadata includes a participant eye height 502, which measures the height of the eye of a participant from a bottom 514 of the display frame 500 that contains the image of the participant.
  • the meeting geometric metadata also includes a head room 504, which defines the distance from the top of a head of a participant to a top 506 of the display frame 500.
  • the meeting geometric metadata further includes a side room 508, which defines the distance between a side 510 of the display frame 500 and the side of the nearest participant.
  • the side room 508 applies to either the left or right side.
  • the meeting geometric metadata can also include an image-to-bottom height 512, which defines the distance from a bottom surface 51 3 (e.g. virtual floor or other predefined surface) to the bottom 514 of the display frame 500.
  • an eye-to-bottom height 516 defines the distance between the bottom surface 513 and the eye height of a participant.
  • An image table height 518 defines a height of a virtual table 519 from the bottom (514) of the display frame 500 to a top of the virtual table 519.
  • a display frame height 520 defines the height of the display frame 500, and a display frame width 522 defines the width of the display frame 500. The ratio between the height 520 and width 522 is the aspect ratio of the display frame 500.
  • the scaling of the various parameters in the meeting geometric metadata can be standardized in terms of content, including participant(s) and other object(s), to be displayed in a display frame.
  • the scaling of the various parameters can be based on a number of participants and other objects that can fit in a display frame.
  • Figs. 6A-6D illustrate various standard display frames that define different scalings to be applied.
  • the display frame 602 is an example of a display frame, of a particular size, for depicting two participants.
  • Fig. 6B shows a standard three-participant display frame 604, which depicts the images of three participants within the frame of the particular size.
  • Fig. 6C shows a standard four-person display frame 606, which depicts four persons within the frame of the particular size.
  • Fig. 6D shows a standard six-person display frame 608, which is a frame of an enlarged width to depict six participants (two of the six participants are represented by respective chairs) within the display frame 608.
  • a standard display frame can also be defined by the meeting geometric metadata for presenting a single participant, which is discussed in connection with Fig. 7 below.
  • the geometrical ruleset of the meeting geometric metadata can specify selection of a particular standard display frame based on the number of participants to be presented by the frame. This affects the aspect ratio of the frame containing the image of the participant(s), which is an example of a geometric correction that is to be applied.
  • Fig. 7 shows an example of geometric correction for a media stream that contains an image of a single participant.
  • face detection 702 is first performed in an image 700. Face detection can be based on face detection features of cameras or other subsystems. After detecting the face of the participant in the image 700, a standard display frame 704 is identified— in this case, the identified standard display frame 704 is the standard display frame for presenting a single participant. The standard display frame 704 is provided around the image of the participant. Note that the placement of the standard display frame 704 in the image 700 can be based on various parameters of the meeting geometric metadata, including eye height, side room, and so forth. The eye height is specified to be consistent with other images of other participants.
  • cropping and scaling is performed to apply the geometric correction, to produce a cropped image 706.
  • the cropping causes background content of the original image 700 to be removed.
  • padding can also be performed, such as to add pixels to an image if the image turns out to be too small to fit into a standard display frame.
  • a similar procedure can be applied in the case where there are multiple participants in an image.
  • face detection of multiple participants in the image is performed, and a standard display frame for presenting the multiple participants is identified.
  • Cropping, scaling, and/or padding can then be performed, similar to the procedure of Fig. 7, to produce a corrected image.
  • the application of the geometric correction can be applied in a send pipeline or in a receive pipeline.
  • the "send pipeline” refers to equipment of the system for sending a media stream to recipient equipment.
  • the "send pipeline” can include a source endpoint (that has captured a media stream for communication in a meeting); alternatively, the "send pipeline” can include an intermediate device, such as the meeting controller 310 of Fig. 3 or some other intermediate device.
  • a “receive pipeline” refers to equipment to receive a media stream.
  • the “receive pipeline” can refer to an endpoint or to an intermediate device.
  • the camera 318 at a source endpoint (e.g. endpoint 302), captures an original image 804 (of a participant or multiple participants).
  • a source endpoint e.g. endpoint 302
  • geometric correction is applied (at 806) in the send pipeline, and the corrected image is communicated (at 808) by the send pipeline for receipt by a receive pipeline. Applying geometric correction in the send pipeline may reduce the burden placed on receiving endpoints.
  • the original image 804 can be communicated (at 810) by the send pipeline to the receive pipeline, which then applies (at 812) geometric correction to the original image 804.
  • the receiving endpoint (304 or 306) can generate (at 814) a composite image, which includes the corrected image and image(s) of other participant(s) in the meeting.
  • the composite image is then displayed (at 81 6).
  • the geometric correction can be applied in a single processing system (e.g. a computer) or in multiple processing systems. Distribution of geometric correction across multiple processing systems allows for more efficient use of resources of a conference system.
  • a single processing system e.g. a computer
  • multiple processing systems Distribution of geometric correction across multiple processing systems allows for more efficient use of resources of a conference system.
  • the location of a face or operations to be performed (for application of geometric correction) can be tagged with tag information.
  • a downstream processing system can use the tag information for application of geometric correction at the downstream processing system.
  • the geometric correction can be performed on-demand (on-demand mode of geometric correction).
  • on-demand mode an image can be passed uncorrected to an endpoint. If an end user determines that the layout is unsatisfactory, then the user can use a user interface to submit a command to perform a geometric correction. The endpoint and/or the meeting controller can then coordinate the exchange of meeting geometric metadata and a local geometric rule to perform application of the geometric correction.
  • on-demand mode usage of processing resources for purposes of performing geometric correction can be reduced, since the geometric correction is not performed unless requested.
  • geometric correction can be applied at regular steps in a process, such as during meeting startup or during meeting changes (e.g. when conference sites are added or dropped).
  • geometric correction is not continuously applied, but rather, is performed in response to discrete events.
  • a dynamic system can be used in which geometric correction is continually applied.
  • regular sampling of face positions is performed (where the regular sampling can be performed on a periodic basis) to determine when a particular image is out of tolerance and thus correction is triggered. If participants are moving around, the changes can be triggered to apply geometric correction as the participants move around.
  • an endpoint can include a camera that is adjustable; for example, a zoom setting of the camera can be adjusted (to zoom into or away from a target object), the pan and/or tilt motion of the camera can be adjusted, and so forth. In such cases, in addition to application of geometric correction, one or multiple settings of the camera can be adjusted.
  • modules discussed above can be in the form of machine-readable instructions that can be loaded for execution on a processor or processors (e.g. 328 or 340 in Fig. 3).
  • a processor can include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device.
  • Data and instructions are stored in respective storage devices, which are implemented as one or more computer-readable or machine-readable storage media.
  • the storage media include different forms of memory including
  • DRAMs or SRAMs dynamic or static random access memories
  • EPROMs erasable and programmable read-only memories
  • EEPROMs electrically erasable and programmable read-only memories
  • flash memories magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other types of storage devices.
  • CDs compact disks
  • DVDs digital video disks
  • the instructions discussed above can be provided on one computer-readable or machine-readable storage medium, or alternatively, can be provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes.
  • Such computer-readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture).
  • An article or article of manufacture can refer to any manufactured single component or multiple components.
  • the storage medium or media can be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions can

Abstract

A media stream from a conference site involved in a meeting established over a communications network is received, where the media stream contains a representation of at least one participant of the meeting. Geometric correction is applied to the representation of the at least one participant based on metadata.

Description

APPLYING GEOMETRIC CORRECTION TO A MEDIA STREAM
Background
[0001 ] Remote conferencing systems allow for collaboration between people at different locations. These systems allow participants to interact with one another through the use of audio and/or video equipment that provides audio and/or video communications.
Brief Description Of The Drawings
[0002] Some implementations are described with respect to the following figures:
Fig. 1 illustrates various example images of participants in a meeting, to be displayed using techniques according to some implementations;
Fig. 2 is a flow diagram of a process according to some implementations;
Fig. 3 is a block diagram of a conference system according to some implementations;
Fig. 4 illustrates a composite image after application of geometric correction, in accordance with some implementations;
Fig. 5 illustrates various parameters that are part of meeting geometric metadata used for application of geometric correction according to some
implementations;
Figs. 6A-6D illustrate example standard display frames for use in application of geometric correction according to some implementations;
Fig. 7 illustrates an example application of geometric correction according to some implementations; and
Fig. 8 illustrates application of geometric correction of an image, according to some implementations. Detailed Description
[0003] In a meeting established over a communications network, media streams of participants at different conference sites are displayed at each of multiple endpoints involved in the meeting. Examples of the communications network include the Internet, a local area network, a wide area network, and so forth. Note that "communications network" or "network" can refer to a single network or multiple networks. A "media stream" can refer to content for depicting respective
participant(s) or resource(s), where the media stream can contain audio/video data, an image, and so forth. A "resource" can include any type of visual material that can be displayed during the meeting, where the visual material can include a document (e.g. word processing document, presentation slides, spreadsheet document, etc.), an image, a movie, or any other visual material that is not a participant of the meeting. A "meeting" can refer to a session established over a communications network for exchanging content among multiple conference sites.
[0004] In some examples, the media streams of participants at different conference sites are combined for display at a particular endpoint. The combining of the media streams from the multiple sites produces a "multi-point stream" that includes media streams from the multiple conference sites. The multi-point stream provides an immersive view of the media streams from the conference sites to produce, from the perspective of a participant, a sense of being with other participants in a virtual room. The sense of being in a single virtual room with participants from multiple conference sites can depend on the degree on which representations (e.g. images) of participants from the multiple sites match each other, and on other characteristics (including grouping of representations of participants, colors in the representations, and geometries of the representations). Alternatively, the media streams of participants at different conference sites can be kept separate when displayed at a particular endpoint. Techniques or mechanisms according to various implementations are applicable in either of the scenarios discussed above.
[0005] The endpoints involved in a meeting can include different types of audio/video equipment for capturing audio/video that is to be transmitted over the communications network to other conference sites involved in the meeting.
"Audio/video data" may refer to audio data, or video data, or both audio and video data.
[0006] An endpoint can include one or multiple cameras for capturing one or multiple images of a participant or multiple participants. The cameras at different endpoints can be associated with different positions relative to respective
participants (e.g. different distances between cameras and participants). Also, there can be different types of cameras that capture images of different characteristics (some images may be of higher quality or resolution than other images). For example, a first participant can use a desktop or notebook computer as the endpoint, where the desktop or notebook computer includes a webcam to capture the image of the first participant. At another conference site, a higher quality camera located in an office can be used to capture the image of a second participant. At a third
conference site, multiple participants can be located in a conference room, with the camera positioned relatively far away from these participants, as compared to the first and second participants.
[0007] Examples of different images (102, 1 04, 106, 1 08, 1 1 0, and 1 12) captured by different endpoints are shown in Fig. 1 . As shown in Fig. 1 , the images 102, 1 04, 106, 108, 1 10, and 1 12 have varying geometries; as a result, when these images (102, 104, 106, 1 08, 1 10, and 1 12) are displayed at an endpoint, the result can be inconsistent individual images that are displayed by the respective display device at each endpoint.
[0008] A first participant (represented by image 102) using her computer can sit relatively close to the computer's display device (and thus to the attached webcam). This can result in display of a relatively large face, and the image can be distorted due to poor image resolution provided by the webcam. In the conference room (represented by image 1 06), on the other hand, the participants sit relatively far away from the camera, such that the faces of these participants are relatively small, with possibly wide-angle distortion present. The images 102 and 106 are in turn quite different from the image 104, which can be captured by a high-resolution camera (having better resolution than the webcam for the image 102, for example) that is placed an intermediate distance from the respective participant. As a result, the image 104 can have relatively high quality, and can have a better geometrical layout as compared to those of the images 102 and 106. Like image 104, the participants in images 108, 1 10, and 1 12 are at relatively good distances from the respective cameras. The participants that are in the images 104, 1 08, 1 1 0, and 1 12 can possibly have an advantage in the meeting, since their relatively superior images would enhance their non-verbal communications as compared with other participants in the images 102 and 106.
[0009] In accordance with some implementations, geometric correction can be applied to correct representations (e.g. images) of participants in media streams received from conference sites of a meeting established over a communications network. Such geometric correction of a representation may involve modification of a geometric aspect (or multiple geometric aspects) of the representation, including, for example, modification of a size of a feature of a participant, modification of a distance from a side of the participant in the representation to a corresponding side of a frame containing the representation of the participant, a size or amount of background behind the participant, a size of the frame containing the representation, and so forth. In the ensuing discussion, reference is made to applying geometric correction to images of participants— however, note that geometric correction can be applied to other representations of participants, such as avatars and the like.
[0010] The geometric correction can be based on meeting geometric metadata that is applicable to all the media streams of the meeting. In some implementations, the meeting geometric metadata can be provided by a meeting controller (e.g., a multi-point control unit (MCU)), which is the controller used for establishing and controlling a meeting established over a communications network. The meeting geometric metadata can define a geometrical ruleset (including one or multiple rules) relating to parameters that control geometries of images of participants in the media streams. Examples of the parameters can include one or some combination of the following: eye height from the bottom of a display frame containing the image of the participant(s); head/face size (size of the head or face of a participant); space between a top, bottom, or side of the display frame and an image of a participant; an aspect ratio (which defines a ratio of width and height of a display frame including an image of participant(s)); and so forth. In the foregoing, a "head/face size" is noted as referring to the size of the head or face of a participant; in the ensuing discussion, reference is made to a "head size," which is intended to refer to either the size of the head of a participant, or to the size of the face of a participant. The foregoing parameters represent examples of geometric aspects of images that can be modified by the geometric correction based on the geometrical ruleset. There can be other or alternative examples of parameters of meeting geometric metadata, which are discussed further below in connection with Fig. 5.
[001 1 ] The geometrical ruleset of the meeting geometric metadata can define standard display frames for containing images of participants. Examples of standard display frames are discussed further below in connection with Figs. 6A-6D and 7. Depending upon the number of participants (e.g. one participant, two participants, etc.) represented by an image within a display frame, the respective standard display frame can specify the head size of each such participant relative to the border of the display frame, and can specify an eye height of each participant (as measured from the bottom of the display frame). The standard display frames are also associated with specific frame aspect ratios, as well as other parameters. Selecting a particular one of multiple standard display frames to display an image of participant(s) at a given conference site is an example of modifying a geometric aspect of the image— selecting one of multiple standard display frames effectively selects an aspect ratio of the display frame to use for displaying an image of participant(s).
[0012] The geometrical ruleset of the meeting geometric metadata can also specify rules relating to how a multi-participant image is to be scaled. A first rule can specify that the image of the multiple participants is to be scaled according to the largest head size identified in the image. Thus, for example, if the system
determines that the head size of a particular participant is to be scaled by a factor of (where is some number), then the head sizes of the other participants in the multi-participant image would be scaled also by X.
[0013] A second rule of the geometrical ruleset can specify that none of the participants should be cut off after scaling or cropping. In some cases, application of the first rule may be incompatible with application of the second rule; in such cases, priorities can be assigned to the respective rules to define which rule takes precedence. For example, if the first rule has higher priority than the second rule, then scaling of the multi-participant image may cause at least one of the participants to disappear or be partially removed in the image. However, if the second rule has a higher priority than the second rule, then the head sizes of the participants may be maintained smaller to avoid removing any participant from the image. In other examples, the priorities assigned to the respective rules can be changed. As yet other examples, to avoid removing a participant from the image, an endpoint can locally zoom into portions of an image that contain faces of participants.
[0014] The geometrical ruleset of the meeting geometric metadata can also include other or alternative rules in other implementations, such as rules that pertain to how various parameters as noted above are to be modified when applying geometric correction. As examples, the eye height and head size of the image of a first participant in a first media stream can be made consistent with the eye height and head size of the image of a second participant in a second media stream.
Making the eye heights "consistent" may involve making the eye heights to be within some predefined difference of one another. Similarly, making the head sizes "consistent" may involve making the head sizes to be within some predefined difference of one another.
[0015] In addition to applying geometric correction based on the meeting geometric metadata that are applicable to multiple media streams of a particular meeting, geometric correction can also be based on a local geometric rule that is associated with a display device of a given endpoint involved in the meeting.
Different endpoints can have different local geometric rules. For example, a local geometric rule applicable to the relatively small display device of a notebook computer may be different from the local geometric rule that is applicable to a studio conference system that has one or multiple relatively large display devices. Yet further, the local geometric rule applicable to a small handheld device, such as a personal digital assistant (PDA) or smartphone, can differ from the local geometric rules applicable to display devices associated with computers and studio conference systems. [0016] The local geometric rule for a notebook computer with a relatively small display device can specify use of a larger scale (to provide larger head sizes) for a participant image with tighter cropping applied. Cropping an image of a participant refers to removing a background surrounding the image(s) of the participant(s). On the other hand, the local geometric rule for a relatively large display device can specify a different scale for participant images, and can relax cropping rules (a greater amount of background can be shown, for example). There can be other examples of local geometric rules. More generally, a "local geometric rule" refers to a geometric rule applicable to a particular endpoint (but not to another endpoint) that defines at least one specification associated with images of participants to be presented in the corresponding display device.
[0017] Fig. 2 is a flow diagram of a process of performing geometric correction, in accordance with some implementations. The process of Fig. 2 can be performed by an endpoint, or alternatively, by a meeting controller (e.g. an MCU) or by another device. The process receives (at 202) at least one media stream from at least one of multiple conference sites involved in a meeting established over a communications network. The received media stream contains an image of a participant (or participants) in the meeting. In examples where the media stream is received by a receiving endpoint, the media stream can be received over a network from the remote conference site, or the media stream can be passed through an intermediate device, such as a meeting controller, before reaching the receiving endpoint.
[0018] The process next applies (at 204) geometric correction to the image in the media stream. The geometric correction is based on meeting geometric metadata and a local geometric rule, as discussed above. Application of the geometric correction to the media stream causes the modification of the image of the participant(s) in the received media stream. The application of the geometric correction can be performed at one or multiple devices (e.g. at a receiving endpoint or at a combination of the receiving endpoint and a meeting controller).
[0019] Although Fig. 2 refers to application of geometric correction to the image of a media stream, note that the application of geometric correction can alternatively be made to images in multiple corresponding media streams that are to be displayed at an endpoint. The application of geometric correction to images that are to be displayed at an endpoint allows for coordination of images of the participants such that consistent views of the participants can be maintained in the composite image— in other words, the images of the participants from multiple media streams are made to be geometrically more consistent with each other.
[0020] As noted above, geometric correction of images can include modification of one or multiple parameters noted above, including, as examples, eye height from the bottom of a display frame containing the image of the participant(s); head/face size; space between a top, bottom, or side of the display frame and an image of a participant; an aspect ratio (which can be based on selection of a standard display frame); and so forth. As noted above, the eye heights and head sizes of participants in multiple images can be made consistent. To do so, techniques or mechanisms according to some implementations are able to detect locations of heads and eyes in images, such that alignment of head sizes and/or eye heights can be made possible. The foregoing are examples of geometric correction using the meeting geometric metadata.
[0021 ] In addition, the geometric correction of images can include modification using a local geometric rule applicable to a particular endpoint, as noted above.
[0022] Fig. 3 is a block diagram illustrating an example remote conferencing system that includes N (N > 1 ) conference sites (conference site A, conference site B, conference site N shown in Fig. 3). Each conference site has a respective endpoint 302, 304, and 306, as shown in Fig. 3. The endpoints at the respective conference sites can communicate with one another directly over a network 308, such as the Internet or other network. Alternatively, the endpoints can communicate with each other using one or multiple intermediate devices, including a meeting controller 310 (sometimes referred to as a multipoint control unit or MCU).
[0023] Each endpoint 302, 304, or 306 is connected to a respective display device (or multiple display devices) 312, 314, or 316, as well as to a respective camera (or multiple cameras) 318, 320, or 322. The display devices are used to display media streams of respective participants of a meeting established using the remote conferencing system of Fig. 3. The cameras are used to capture video streams of the participants. In alternative examples, a particular endpoint can be connected to multiple display devices and/or cameras (such as multiple cameras to capture participants at a site from different angles).
[0024] Although not shown, the endpoints 302, 304, and 306 can also include respective audio capture equipment to capture audio streams. Additionally, the endpoints can include respective equipment to allow for collaboration of resources, such as documents and so forth. The endpoints 302, 304, and 306 are able to exchange audio/video streams and resources with each other over the network 308.
[0025] As shown in Fig. 3, the endpoint 302 includes a display layout module 324 executable on one or multiple processors 328. The processor(s) 328 is (are) connected to a storage medium 330. The display layout module 324 is able to generate a multi-point image (that combines media streams from multiple conference sites) for display in the attached display device (or display devices) 312.
Alternatively, instead of generating a multi-point image, the media streams from multiple conference sites can be displayed as separate media streams in the display device(s) attached to the endpoint 302. The display layout module 324 includes a geometric correction module 326 that is able to apply geometric correction as discussed herein, based on meeting geometric metadata 332 and a local geometric rule 334 stored in the storage medium 330. In other examples, the geometric correction module 326 can be separate from the display layout module 324. The meeting geometric metadata 332 can be received from the meeting controller 310 (or from another source).
[0026] The other endpoints 304 and 306 can include similar components as the endpoint 302.
[0027] Instead of including the geometric correction module 326 in each endpoint, a geometric correction module 336 can instead be included in another device, such as the meeting controller 310 (or another device). As shown in Fig. 3, the geometric correction module 336 in the meeting controller 310 is executable on processor(s) 340. The geometric correction module 336 can apply geometric correction as discussed herein, based on meeting geometric metadata 344 and a local geometric rule 346 stored in a storage medium 342 of the meeting controller 310.
[0028] Fig. 4 shows an example composite image 400 of individual images received in media streams from corresponding conference sites of a meeting, after geometric correction has been applied based on meeting geometric metadata and a local geometric rule associated with a given display device used to display the multipoint image 400 shown in Fig. 4. Note that the composite image 400 is displayed by the given display device— other display devices of endpoints involved in the meeting can display other composite images. In different examples, instead of forming the composite image 400, the various images of Fig. 4 can be maintained in separate media streams.
[0029] A corrected individual image 102A is based on application of geometric correction to the image 1 02 shown in Fig. 1 . A corrected image 104A is based on application of geometric correction to the image 104 of Fig. 1 . A corrected image 106A is based on application of geometric correction to the image 106 of Fig. 1 . Similarly, the corrected images 108A, 1 10A, and 1 12A are based on respective applications of geometric corrections to corresponding images 108, 1 10, and 1 12 of Fig. 1 . Note that relatively small corrections have been applied to the images 108A and 1 12A— in other examples, the images 108A and 1 12A may be the same as respective images 108 and 1 12 (in other words, the images 1 08A and 1 12A have not been modified from respective images 108 and 1 12). Note that each of images 102A 104A and 1 10A represents just one respective participant. On the other hand, each of images 1 08A and 1 12A represents two participants, and the image 106A represents more than two participants.
[0030] In the corrected image 1 02A, the image 102 (Fig. 1 ) that was captured with a notebook computer webcam, for example, has been rescaled and positioned at a consistent eye height, as compared to the other individual images in the composite image 400. Also, the rescaling of the image 1 02 has resulted in reducing the head size of the participant in the corrected image 102A to be consistent with head sizes of participants in several other images in the composite image 400 (e.g. images 108A, 1 04A, 1 10A, 1 12A). As a result of reducing of the size of the image 102, padding has been provided in the corrected image 102A, where the padding is in the form of the dark border around the image 102A.
[0031 ] The geometric correction of the image 102 is based on the meeting geometric metadata (applicable to multiple endpoints involved in a meeting) and a local geometric rule applicable to just the particular endpoint that is displaying the composite image 400. As noted above, the meeting geometric metadata can include eye height, head/face size; space between a top, bottom, or side of the display frame and an image of a participant; and an aspect ratio. The meeting geometric metadata can have a geometrical ruleset specifying that eye heights and head/face sizes of different images of participants are to be maintained consistent where possible. (An example of where head/sizes of participants cannot be made consistent with other images is in the image 106A having seven participants). To achieve corrections of head/face sizes and eye heights, head/face detection and eye detection can be performed using any of various techniques. The geometrical ruleset can also specify that the space between a top, bottom, or side of the display frame and an image of a participant be within some predefined range. Additionally, for a one-participant image, the geometrical ruleset can specify that a one-participant standard frame be used that has a corresponding aspect ratio. Note that in Fig. 4, the one-participant images 102A, 104A, and 1 1 OA have aspect ratios.
[0032] In addition, the image 102A can also have been geometrically corrected based on application of a local geometric rule for the endpoint that is displaying the composite image 400. The local geometric rule for a notebook computer with a relatively small display device can specify use of a larger scale (to provide larger head sizes) for a participant image with tighter cropping applied. On the other hand, the local geometric rule for a relatively large display device can specify a different scale for participant images, and can relax cropping rules (a greater amount of background can be shown, for example).
[0033] Geometric correction based on the meeting geometric metadata and the local geometric rule can also be applied to the other images of the composite image 400. [0034] Rescaling has been applied to the image 106 of Fig. 1 to increase the head sizes of the participants in the corrected image 106A (these participants are in the conference room). An average eye height of the eye heights of respective participants in the conference room depicted in the corrected image 106A are maintained to be consistent with the eye heights of the image 1 12A. Also, cropping has been applied to remove certain background content in the image 106.
Additionally, the aspect ratio of the corrected image 106A is modified from the aspect ratio of the image 106— the modified aspect ratio allows the corrected image 106A to display more of the participants in image 106A.
[0035] The image 104 (Fig. 1 ) has been cropped to reduce an amount of background information in the corrected image 104A. Also, aspect ratio modification has been performed such that the corrected image 104A is more tightly focused on the participant in image 104A. The width of the image 1 10A has been reduced as compared to the image 1 10 in Fig. 1 .
[0036] Although reference has been made to images captured by discrete conference sites, in alternative implementations, a captured image can be a panoramic image made up of multiple sub-images, where each sub-image can depict one or multiple participants. Geometric correction can be applied to individual ones of the sub-images.
[0037] Fig. 5 illustrates various example parameters in meeting geometric metadata. Note that in other examples, other types of meeting geometric metadata can be used. The parameters are illustrated in the context of a display frame 500 (containing an image of two participants) that is displayed by a display device 501 . These parameters can be modified according to the geometrical ruleset defined by the meeting geometric metadata.
[0038] The meeting geometric metadata includes a participant eye height 502, which measures the height of the eye of a participant from a bottom 514 of the display frame 500 that contains the image of the participant.
[0039] The meeting geometric metadata also includes a head room 504, which defines the distance from the top of a head of a participant to a top 506 of the display frame 500. The meeting geometric metadata further includes a side room 508, which defines the distance between a side 510 of the display frame 500 and the side of the nearest participant. The side room 508 applies to either the left or right side.
[0040] The meeting geometric metadata can also include an image-to-bottom height 512, which defines the distance from a bottom surface 51 3 (e.g. virtual floor or other predefined surface) to the bottom 514 of the display frame 500. In addition, an eye-to-bottom height 516 defines the distance between the bottom surface 513 and the eye height of a participant. An image table height 518 defines a height of a virtual table 519 from the bottom (514) of the display frame 500 to a top of the virtual table 519. A display frame height 520 defines the height of the display frame 500, and a display frame width 522 defines the width of the display frame 500. The ratio between the height 520 and width 522 is the aspect ratio of the display frame 500.
[0041 ] The scaling of the various parameters in the meeting geometric metadata can be standardized in terms of content, including participant(s) and other object(s), to be displayed in a display frame. For example, the scaling of the various parameters can be based on a number of participants and other objects that can fit in a display frame. Figs. 6A-6D illustrate various standard display frames that define different scalings to be applied. The display frame 602 is an example of a display frame, of a particular size, for depicting two participants. Fig. 6B shows a standard three-participant display frame 604, which depicts the images of three participants within the frame of the particular size. Fig. 6C shows a standard four-person display frame 606, which depicts four persons within the frame of the particular size.
[0042] Fig. 6D shows a standard six-person display frame 608, which is a frame of an enlarged width to depict six participants (two of the six participants are represented by respective chairs) within the display frame 608.
[0043] A standard display frame can also be defined by the meeting geometric metadata for presenting a single participant, which is discussed in connection with Fig. 7 below. As noted above, the geometrical ruleset of the meeting geometric metadata can specify selection of a particular standard display frame based on the number of participants to be presented by the frame. This affects the aspect ratio of the frame containing the image of the participant(s), which is an example of a geometric correction that is to be applied.
[0044] Fig. 7 shows an example of geometric correction for a media stream that contains an image of a single participant. According to Fig. 7, face detection (702) is first performed in an image 700. Face detection can be based on face detection features of cameras or other subsystems. After detecting the face of the participant in the image 700, a standard display frame 704 is identified— in this case, the identified standard display frame 704 is the standard display frame for presenting a single participant. The standard display frame 704 is provided around the image of the participant. Note that the placement of the standard display frame 704 in the image 700 can be based on various parameters of the meeting geometric metadata, including eye height, side room, and so forth. The eye height is specified to be consistent with other images of other participants. Also the side room is selected to be within a predefined range, for example. Next, cropping and scaling is performed to apply the geometric correction, to produce a cropped image 706. As depicted in Fig. 7, the cropping causes background content of the original image 700 to be removed. Although not shown in Fig. 7, padding can also be performed, such as to add pixels to an image if the image turns out to be too small to fit into a standard display frame.
[0045] A similar procedure can be applied in the case where there are multiple participants in an image. In such case, face detection of multiple participants in the image is performed, and a standard display frame for presenting the multiple participants is identified. Cropping, scaling, and/or padding can then be performed, similar to the procedure of Fig. 7, to produce a corrected image.
[0046] Application of geometric correction can be performed at different locations. For example, as shown in Fig. 8, the application of the geometric correction can be applied in a send pipeline or in a receive pipeline. The "send pipeline" refers to equipment of the system for sending a media stream to recipient equipment. For example, the "send pipeline" can include a source endpoint (that has captured a media stream for communication in a meeting); alternatively, the "send pipeline" can include an intermediate device, such as the meeting controller 310 of Fig. 3 or some other intermediate device.
[0047] A "receive pipeline" refers to equipment to receive a media stream. The "receive pipeline" can refer to an endpoint or to an intermediate device.
[0048] As shown in Fig. 8, the camera 318, at a source endpoint (e.g. endpoint 302), captures an original image 804 (of a participant or multiple participants).
According to a first technique, geometric correction is applied (at 806) in the send pipeline, and the corrected image is communicated (at 808) by the send pipeline for receipt by a receive pipeline. Applying geometric correction in the send pipeline may reduce the burden placed on receiving endpoints.
[0049] According to a second technique, the original image 804 can be communicated (at 810) by the send pipeline to the receive pipeline, which then applies (at 812) geometric correction to the original image 804.
[0050] Using the corrected image (corrected at 806 or 812), the receiving endpoint (304 or 306) can generate (at 814) a composite image, which includes the corrected image and image(s) of other participant(s) in the meeting. The composite image is then displayed (at 81 6).
[0051 ] The geometric correction can be applied in a single processing system (e.g. a computer) or in multiple processing systems. Distribution of geometric correction across multiple processing systems allows for more efficient use of resources of a conference system. In some examples, within a media stream, the location of a face or operations to be performed (for application of geometric correction) can be tagged with tag information. A downstream processing system can use the tag information for application of geometric correction at the downstream processing system.
[0052] In some examples, the geometric correction can be performed on-demand (on-demand mode of geometric correction). With the on-demand mode, an image can be passed uncorrected to an endpoint. If an end user determines that the layout is unsatisfactory, then the user can use a user interface to submit a command to perform a geometric correction. The endpoint and/or the meeting controller can then coordinate the exchange of meeting geometric metadata and a local geometric rule to perform application of the geometric correction. In on-demand mode, usage of processing resources for purposes of performing geometric correction can be reduced, since the geometric correction is not performed unless requested.
[0053] Alternatively, geometric correction can be applied at regular steps in a process, such as during meeting startup or during meeting changes (e.g. when conference sites are added or dropped). In such alternative implementations, geometric correction is not continuously applied, but rather, is performed in response to discrete events.
[0054] As yet a further alternative, a dynamic system can be used in which geometric correction is continually applied. In such implementations, regular sampling of face positions is performed (where the regular sampling can be performed on a periodic basis) to determine when a particular image is out of tolerance and thus correction is triggered. If participants are moving around, the changes can be triggered to apply geometric correction as the participants move around.
[0055] In some implementations, an endpoint can include a camera that is adjustable; for example, a zoom setting of the camera can be adjusted (to zoom into or away from a target object), the pan and/or tilt motion of the camera can be adjusted, and so forth. In such cases, in addition to application of geometric correction, one or multiple settings of the camera can be adjusted.
[0056] Various modules discussed above, including modules 324, 326, and 336 in Fig. 3, can be in the form of machine-readable instructions that can be loaded for execution on a processor or processors (e.g. 328 or 340 in Fig. 3). A processor can include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device.
[0057] Data and instructions are stored in respective storage devices, which are implemented as one or more computer-readable or machine-readable storage media. The storage media include different forms of memory including
semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other types of storage devices. Note that the instructions discussed above can be provided on one computer-readable or machine-readable storage medium, or alternatively, can be provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes. Such computer-readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. The storage medium or media can be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions can be downloaded over a network for execution.
[0058] In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.

Claims

What is claimed is: 1 . A method of a system having a processor, comprising:
receiving a first media stream from a first conference site involved in a meeting established over a communications network, wherein the first media stream contains a representation of at least one participant of the meeting; and
applying geometric correction to the representation of the at least one participant based on:
metadata including a geometric ruleset relating to at least one parameter that affects a geometry of a representation of a participant, wherein the metadata is applicable to plural media streams from conference sites involved in the meeting; and
a local geometric rule associated with a display device for displaying the media stream.
2. The method of claim 1 , further comprising:
displaying the first media stream after application of the geometric correction with at least a second media stream to cause display of representations of participants in the first media stream and the second media stream.
3. The method of claim 2, wherein application of the geometric correction causes the representation of the at least one participant in the first media stream to be geometrically consistent with a representation of at least another participant in the second media stream.
4. The method of claim 1 , wherein the at least one parameter that controls the geometry of a representation of a participant comprises a head size of a participant.
5. The method of claim 1 , wherein the at least one parameter that affects the geometry of a representation of a participant comprises an eye height of a participant.
6. The method of claim 1 , wherein the geometric ruleset specifies standard display frames each for displaying a respective different number of participants.
7. The method of claim 1 , wherein applying the geometric correction comprises at least one selected from the group consisting of: scaling a size of the
representation of the at least one participant, cropping background content around the at least one participant, changing an aspect ratio of a display frame containing the representation of the at least one participant, and padding the representation of the at least one participant.
8. The method of claim 1 , further comprising:
defining a display frame around the representation of the at least one participant based on a location of a face of the at least one participant,
wherein applying the geometric correction uses the display frame.
9. The method of claim 1 , wherein the geometric ruleset includes a plurality of rules and associated priorities, and wherein applying the geometric correction considers the priorities of the plurality of rules.
10. A system comprising:
at least one processor to:
receive a media stream from a conference site involved in a meeting established over a communications network, wherein the media stream contains a representation of at least one participant of the meeting; and
apply geometric correction to the representation of the at least one participant based on meeting geometric metadata that includes an eye height and a head size.
1 1 . The system of claim 10, wherein the geometric correction is to cause the eye height and the head size of the at least one participant to be consistent with a corresponding eye height and head size of another representation of another participant in another media stream.
12. The system of claim 10, wherein the application of the geometric correction is in response to a predefined event or a request.
13. The system of claim 10, wherein the application of the geometric correction is in response to movement of a participant in the meeting.
14. The system of claim 10, wherein the application of the geometric correction comprises at least one selected from the group consisting of: scaling a size of the representation of the at least one participant, cropping background content around the at least one participant, changing an aspect ratio of a display frame containing the representation of the at least one participant, and padding the representation of the at least one participant.
15. An article comprising at least one machine-readable storage medium storing instructions that upon execution cause a system to:
receive media streams from respective conference sites involved in a meeting established over a communications network, wherein each of the media streams contains a representation of at least one corresponding participant of the meeting; and
apply geometric correction to the representations of the participants based on: metadata including a geometric ruleset relating to at least one parameter that affects a geometry of a representation of a participant, wherein the metadata is applicable to the media streams from the conference sites; and
a local geometric rule associated with a display device for displaying the media streams.
PCT/US2011/057452 2011-10-24 2011-10-24 Applying geometric correction to a media stream WO2013062509A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2011/057452 WO2013062509A1 (en) 2011-10-24 2011-10-24 Applying geometric correction to a media stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2011/057452 WO2013062509A1 (en) 2011-10-24 2011-10-24 Applying geometric correction to a media stream

Publications (1)

Publication Number Publication Date
WO2013062509A1 true WO2013062509A1 (en) 2013-05-02

Family

ID=48168186

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/057452 WO2013062509A1 (en) 2011-10-24 2011-10-24 Applying geometric correction to a media stream

Country Status (1)

Country Link
WO (1) WO2013062509A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019058017A1 (en) * 2017-09-20 2019-03-28 Coberg Oy A system for harmonizing the cropping of digital images
WO2020103122A1 (en) 2018-11-23 2020-05-28 Polycom, Inc. Selective distortion or deformation correction in images from a camera with a wide angle lens
NO344903B1 (en) * 2019-06-28 2020-06-22 Pexip AS Intelligent adaptive and corrective layout composition
US10861159B2 (en) 2015-01-27 2020-12-08 Apical Limited Method, system and computer program product for automatically altering a video stream
WO2023064153A1 (en) * 2021-10-15 2023-04-20 Cisco Technology, Inc. Dynamic video layout design during online meetings
US11928731B1 (en) * 2020-04-09 2024-03-12 Cboe Exchange, Inc. Virtual trading floor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080068446A1 (en) * 2006-08-29 2008-03-20 Microsoft Corporation Techniques for managing visual compositions for a multimedia conference call
US7916897B2 (en) * 2006-08-11 2011-03-29 Tessera Technologies Ireland Limited Face tracking for controlling imaging parameters
US20110141219A1 (en) * 2009-12-10 2011-06-16 Apple Inc. Face detection as a metric to stabilize video during video chat session

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7916897B2 (en) * 2006-08-11 2011-03-29 Tessera Technologies Ireland Limited Face tracking for controlling imaging parameters
US20080068446A1 (en) * 2006-08-29 2008-03-20 Microsoft Corporation Techniques for managing visual compositions for a multimedia conference call
US20110141219A1 (en) * 2009-12-10 2011-06-16 Apple Inc. Face detection as a metric to stabilize video during video chat session

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10861159B2 (en) 2015-01-27 2020-12-08 Apical Limited Method, system and computer program product for automatically altering a video stream
WO2019058017A1 (en) * 2017-09-20 2019-03-28 Coberg Oy A system for harmonizing the cropping of digital images
WO2020103122A1 (en) 2018-11-23 2020-05-28 Polycom, Inc. Selective distortion or deformation correction in images from a camera with a wide angle lens
EP3884461A4 (en) * 2018-11-23 2022-06-29 Polycom, Inc. Selective distortion or deformation correction in images from a camera with a wide angle lens
US11922605B2 (en) 2018-11-23 2024-03-05 Hewlett-Packard Development Company, L.P. Selective distortion or deformation correction in images from a camera with a wide angle lens
NO344903B1 (en) * 2019-06-28 2020-06-22 Pexip AS Intelligent adaptive and corrective layout composition
EP3758368A1 (en) * 2019-06-28 2020-12-30 Pexip AS Intelligent adaptive and corrective layout composition
US10972702B2 (en) 2019-06-28 2021-04-06 Pexip AS Intelligent adaptive and corrective layout composition
US11928731B1 (en) * 2020-04-09 2024-03-12 Cboe Exchange, Inc. Virtual trading floor
WO2023064153A1 (en) * 2021-10-15 2023-04-20 Cisco Technology, Inc. Dynamic video layout design during online meetings

Similar Documents

Publication Publication Date Title
US8957940B2 (en) Utilizing a smart camera system for immersive telepresence
US10554921B1 (en) Gaze-correct video conferencing systems and methods
EP1872306B1 (en) A user interface for a system and method for head size equalization in 360 degree panoramic images
US9369667B2 (en) Conveying gaze information in virtual conference
JP4860687B2 (en) System and method for equalizing the size of heads of 360 degree panoramic images
EP2368364B1 (en) Multiple video camera processing for teleconferencing
EP1376467B1 (en) System and method for real time wide angle digital image correction
US9065974B2 (en) System and method for combining a plurality of video streams generated in a videoconference
WO2013062509A1 (en) Applying geometric correction to a media stream
GB2440376A (en) Wide angle video conference imaging
EP2255530A1 (en) Displaying panoramic video image streams
US20180098027A1 (en) System and method for mirror utilization in meeting rooms
Liu et al. Head-size equalization for better visual perception of video conferencing
TWI750967B (en) Image display method for video conference system with wide-angle webcam
US10863112B1 (en) Smoothing image transitions when changing views
JP7288022B2 (en) Image display system, image display program, image display method and server
WO2024028843A2 (en) Systems and methods for framing meeting environments and participants

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11874770

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11874770

Country of ref document: EP

Kind code of ref document: A1