US8665333B1 - Method and system for optimizing the observation and annotation of complex human behavior from video sources - Google Patents

Method and system for optimizing the observation and annotation of complex human behavior from video sources Download PDF

Info

Publication number
US8665333B1
US8665333B1 US12/011,385 US1138508A US8665333B1 US 8665333 B1 US8665333 B1 US 8665333B1 US 1138508 A US1138508 A US 1138508A US 8665333 B1 US8665333 B1 US 8665333B1
Authority
US
United States
Prior art keywords
events
video stream
behavior
capturing images
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/011,385
Inventor
Rajeev Sharma
Satish Mummareddy
Emilio Schapira
Namsoon Jung
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
VideoMining Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by VideoMining Corp filed Critical VideoMining Corp
Priority to US12/011,385 priority Critical patent/US8665333B1/en
Assigned to VIDEOMINING CORPORATION reassignment VIDEOMINING CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MUMMAREDDY, SATISH, SCHAPIRA, EMILIO, SHARMA, RAJEEV, JUNG, NAMSOON
Application granted granted Critical
Publication of US8665333B1 publication Critical patent/US8665333B1/en
Assigned to PEARSON, CHARLES C., JR, BENTZ, RICHARD E., PAPSON, MICHAEL G., MESSIAH COLLEGE, WEIDNER, DEAN A., POOLE, ROBERT E., BRENNAN, MICHAEL, AGAMEMNON HOLDINGS, SEIG TRUST #1 (PHILIP H. SEIG, TRUSTEE), PARMER, GEORGE A., STRUTHERS, RICHARD K., SCHIANO, ANTHONY J. reassignment PEARSON, CHARLES C., JR SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VIDEOMINING CORPORATION
Assigned to 9051147 CANADA INC. reassignment 9051147 CANADA INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VIDEOMINING CORPORATION
Assigned to VIDEOMINING CORPORATION reassignment VIDEOMINING CORPORATION RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: AGAMEMNON HOLDINGS, BENTZ, RICHARD E., BRENNAN, MICHAEL, MESSIAH COLLEGE, PAPSON, MICHAEL G., PARMER, GEORGE A., PEARSON, CHARLES C., JR., POOLE, ROBERT E., SCHIANO, ANTHONY J., SEIG TRUST #1 (PHILIP H. SEIG, TRUSTEE), STRUTHERS, RICHARD K., WEIDNER, DEAN A.
Assigned to VIDEO MINING CORPORATION reassignment VIDEO MINING CORPORATION RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: AGAMEMNON HOLDINGS, BENTZ, RICHARD E., BRENNER A/K/A MICHAEL BRENNAN, MICHAEL A., MESSIAH COLLEGE, PAPSON, MICHAEL G., PARMER, GEORGE A., PEARSON, CHARLES C., JR, POOLE, ROBERT E., SCHIANO, ANTHONY J., SEIG TRUST #1, STRUTHERS, RICHARD K., WEIDNER, DEAN A.
Assigned to HSBC BANK CANADA reassignment HSBC BANK CANADA SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CANADA INC.
Assigned to AVIGILON PATENT HOLDING 1 CORPORATION reassignment AVIGILON PATENT HOLDING 1 CORPORATION CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: 9051147 CANADA INC.
Assigned to AVIGILON PATENT HOLDING 1 CORPORATION reassignment AVIGILON PATENT HOLDING 1 CORPORATION RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: HSBC BANK CANADA
Assigned to MOTOROLA SOLUTIONS, INC. reassignment MOTOROLA SOLUTIONS, INC. NUNC PRO TUNC ASSIGNMENT (SEE DOCUMENT FOR DETAILS). Assignors: AVIGILON PATENT HOLDING 1 CORPORATION
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19602Image analysis to detect motion of the intruder, e.g. by frame subtraction
    • G08B13/19613Recognition of a predetermined image pattern or behaviour pattern indicating theft or intrusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19639Details of the system layout
    • G08B13/19641Multiple cameras having overlapping views on a single scene
    • G08B13/19643Multiple cameras having overlapping views on a single scene wherein the cameras play different roles, e.g. different resolution, different camera type, master-slave camera

Definitions

  • the present invention is a method and system for automatically detecting predefined events based on the behavior of people in a first video stream from a first means for capturing images in a physical space, accessing a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely using the timestamps associated with the detected events from the first video stream, and enabling an annotator to annotate each of the events with more labels using an annotation tool.
  • Trajkovic U.S. Pat. Appl. Pub. No. 2003/0058339 of Trajkovic, et al. (hereinafter Trajkovic) disclosed a method for detecting an event through repetitive patterns of human behavior. Trajkovic learned multi-dimensional feature data from the repetitive patterns of human behavior and computed a probability density function (PDF) from the data. Then, a method for the PDF analysis, such as Gaussian or clustering techniques, was used to identify the repetitive patterns of behavior and unusual behavior through the variance of the Gaussian distribution or cluster.
  • PDF probability density function
  • Trajkovic can model a repetitive behavior through the PDF analysis, Trajkovic are clearly foreign to the event detection for the aggregate of non-repetitive behaviors, such as the shopper traffic in a physical space. Trajkovic did not disclose the challenges in the event detection based on customers' behaviors in a video in a retail environment, such as the non-repetitive behaviors. Therefore, Trajkovic are clearly foreign to the challenges that can be found in a retail environment.
  • Bazakos disclosed a method for unsupervised learning of events in a video.
  • Bazakos disclosed a method of creating a feature vector of a related object in a video by grouping clusters of points together within a feature space and storing the feature vector in an event library. Then, the behavioral analysis engine in Bazakos determined whether an event had occurred by comparing features contained within a feature vector in a specific instance against the feature vectors in the event library.
  • Bazakos are primarily related to surveillance, rather than event detection based on customers' behaviors in a video.
  • U.S. Pat. Appl. Pub. No. 2005/0286774 of Porikli disclosed a method for event detection in a video using approximate estimates of the aggregated affinity matrix and clustering and scoring of the matrix.
  • Porikli constructed the affinity matrix based on a set of frame-based and object-based statistical features, such as trajectories, histograms, and Hidden Markov Models of feature speed, orientation, location, size, and aspect ratio, extracted from the video.
  • the step of receiving the user input via input devices makes Sorensen 1 inefficient for handling a large amount of video data in a large shopping environment with a relatively complicated store layout, especially over a long period of time.
  • the manual input by a human operator/user cannot efficiently track all of the shoppers in such cases, partially due to the possibility of human errors caused by tiredness and boredom.
  • the manual input approach is also much less scalable as the number of shopping environments to handle for the behavior analysis increases. Therefore, an automated event detection approach is needed.
  • the present invention utilizes an automated event detection approach for detecting predefined events from the customers' shopping interaction in a physical space.
  • Sorensen 2 Although U.S. Pat. Appl. Pub. No. 2002/0178085 of Sorensen, now U.S. Pat. No. 7,006,982, (hereinafter Sorensen 2) disclosed a usage of a tracking device and store sensors in a plurality of tracking systems primarily based on the wireless technology, such as the RFID, Sorensen 2 is clearly foreign to the concept of applying computer vision based tracking algorithms to the field of understanding customers' shopping behaviors and movements. In Sorensen 2, each transmitter was typically attached to a hand-held or push-type cart. Therefore, Sorensen 2 cannot distinguish the behaviors of multiple shoppers using one cart from the behavior of a single shopper also using one cart.
  • Sorensen 2 disclosed that the transmitter may be attached directly to a shopper, via a clip or other form of customer surrogate in order to correctly track the shopper in the case when the person is shopping without a cart, this will not be practical due to the additionally introduced cumbersome step to the shopper, not to mention the inefficiency of managing the transmitter for each individual shopper.
  • the present invention can embrace any type of automatic wireless sensors for the detection of the predefined events. However, in a preferred embodiment, the present invention primarily utilizes the computer vision based automated approach for the detection of the predefined events. The computer vision based event detection helps the present invention to overcome the obstacles mentioned above.
  • Steenburgh disclosed a relevant exemplary prior art.
  • Steenburgh disclosed a method for measuring dwell time of an object, particularly a customer in a retail store, which enters and exits an environment, by tracking the object and matching the entry signature of the object to the exit signature of the object, in order to find out how long people spend in retail stores.
  • the modeling and analysis of activity of interest can be used as the exemplary way to detect predefined events.
  • Choi a method for modeling an activity of a human body using optical flow vector from a video and probability distribution of the feature vectors from the optical flow vector. Choi modeled a plurality of states using the probability distribution of the feature vectors and expressed the activity based on the state transition.
  • Pavlidis U.S. Pat. Appl. Pub. No. 2003/0053659 of Pavlidis, et al. (hereinafter Pavlidis) disclosed a method for moving object assessment, including an object path of one or more moving objects in a search area, using a plurality of imaging devices and segmentation by background subtraction.
  • object included customers
  • Pavlidis also included itinerary statistics of customers in a department store.
  • Pavlidis was primarily related to monitoring a search area for surveillance.
  • U.S. Pat. Appl. Pub. No. 2004/0113933 of Guler disclosed a method for automatic detection of split and merge events from video streams in a surveillance environment.
  • Guler considered split and merge behaviors as key common simple behavior components in order to analyze high level activities of interest in a surveillance application, which are also used to understand the relationships among multiple objects not just individual behavior.
  • Guler used adaptive background subtraction to detect the objects in a video scene, and the objects were tracked to identify the split and merge behaviors.
  • To understand the split and merge behavior-based high level events Guler used a Hidden Markov Model (HMM).
  • HMM Hidden Markov Model
  • Ozer U.S. Pat. Appl. Pub. No. 2004/0120581 of Ozer, et al.
  • a method for identifying activity of customers for a marketing purpose or activity of objects in a surveillance area by comparing the detected objects with the graphs from a database.
  • Ozer tracked the movement of different object parts and combined them to high-level activity semantics, using several Hidden Markov Models (HMMs) and a distance classifier.
  • HMMs Hidden Markov Models
  • Dove U.S. Pat. No. 6,741,973 of Dove, et al. (hereinafter Dove) disclosed a model of generating customer behavior in a transaction environment.
  • Dove disclosed video cameras in a real bank branch as a way to observe the human behavior
  • Dove are clearly foreign to the concept of automatic event detection based on the customers' behaviors on visual information of the customers in other types of physical space, such as the shopping path tracking and analysis in a retail environment, for the sake of annotating the customers' behaviors.
  • Computer vision algorithms have been shown to be an effective means for detecting and tracking people. These algorithms also have been shown to be effective in analyzing the behavior of people in the view of the means for capturing images. This allows the possibility of connecting the visual information from a scene to the behavior analysis of customers and predefined event detection.
  • the present invention provides a novel approach for annotating the customers' behaviors utilizing the information from the automatic behavior analysis of customers and predefined event detection. Any reliable automatic behavior analysis in the prior art may be used for the predefined event detection in the present invention.
  • Computer vision algorithms have been shown to be an effective means for analyzing the demographic information of people in the view of the means for capturing images.
  • recognizing the demographic category of a person by processing the facial image using various approaches in the computer vision technologies, such as a machine learning approach.
  • Moghaddam U.S. Pat. No. 6,990,217 of Moghaddam, et al. (hereinafter Moghaddam) disclosed a method to employ Support Vector Machine to classify images of faces according to gender by training the images, including images of male and female faces; determining a plurality of support vectors from the training images for identifying a hyperplane for the gender decision; and reducing the resolution of the training images and the test image by sub-sampling before supplying the images to the Support Vector Machine.
  • U.S. Pat. Appl. Pub. No. 20030110038 of Sharma, et al. disclosed a computer software system for multi-modal human gender classification, comprising: a first-mode classifier classifying first-mode data pertaining to male and female subjects according to gender, and rendering a first-mode gender-decision for each male and female subject; a second-mode classifier classifying second-mode data pertaining to male and female subjects according to gender, and rendering a second-mode gender-decision for each male and female subject; and a fusion classifier integrating the individual gender decisions obtained from said first-mode classifier and said second-mode classifier, and outputting a joint gender decision for each of said male and female subjects.
  • the face tracking algorithm has been designed and tuned to improve the classification accuracy; the facial geometry correction step improves both the tracking and the individual face classification accuracy, and the tracking further improves the accuracy of the classification of gender and ethnicity over the course of visibly tracked faces by combining the individual face classification scores.
  • the present invention detects the predefined events based on the demographic information of people in another exemplary embodiment.
  • the invention automatically and unobtrusively analyzes the customers' demographic information without involving any hassle to customers or operators of feeding the information manually, utilizing the novel demographic analysis approaches in the prior arts.
  • the present invention utilizes the event detection by the automatic behavior analysis and demographic analysis in a first video stream to synchronize the same event in another second video stream and allows an annotator to annotate the synchronized event through an annotation tool.
  • the manual annotation data in the present invention can be used for various market analysis applications, such as measuring deeper insights for customers' shopping behavior analysis in a retail store, media effectiveness measurement, and traffic analysis.
  • the present invention is a method and system for optimizing the observation and annotation of predefined events by enabling the automatic detection of predefined events based on the behavior of people in a first video stream from a first means for capturing images in a physical space and the annotation for each of the events by an annotator utilizing an annotation tool.
  • the present invention captures a plurality of input images of the persons by a plurality of first means for capturing images and processes the plurality of input images in order to detect the predefined events based on the behavior analysis of the people in an exemplary embodiment.
  • Utilization of the dwell time of the people in a specific location of the physical space can be used as one of the exemplary criteria for defining the targeted behavior. Examples of the temporal targeted behavior can comprise passerby behavior and engaged shopper behavior based on the dwell time measurement and comparison against predefined thresholds.
  • the processes are based on a novel usage of a plurality of computer vision technologies to analyze the human behavior from the plurality of input images.
  • the method leverages the strengths of the technologies in the present invention and processes to deliver a new level of access to the behaviors and visual characteristics of people in the physical space.
  • the automatic event detection in the present invention can also be triggered by the other visual characteristics and segmentation of people in the physical space, such as the demographics, in another exemplary embodiment. Therefore, it is another objective of the present invention to process the first video stream in order to detect demographics of the people in the field of view of the first means for capturing images automatically and generate time-stamped lists of events based on the automatically detected demographics of the people for the predefined event detection.
  • An exemplary embodiment of the present invention can be applied to a retail space application, and it can provide demographic segmentation of the shoppers by gender and age group in this particular application domain.
  • the shopping behavior of each demographic group can be analyzed to obtain segment-specific insights. Understanding segment-based shopper behavior for a specific business goal in the retail space can help to develop effective customer-centric strategies to increase the basket size and loyalty of the highest-opportunity segments.
  • the present invention utilizes a plurality of first means for capturing images and a plurality of second means for capturing images in a preferred embodiment.
  • the first means for capturing images can be an overhead top-down camera
  • the second means for capturing images can be a camera that is positioned to observe the people more closely for analyzing a specific event.
  • the present invention can also utilize different types of sensors for the automatic event detection.
  • the present invention can utilize a wireless sensor based tracking for the automatic event detection or a door sensor to trigger an event.
  • the wireless sensor can include, but are not limited to, a RFID and means for using the RFID.
  • the present invention generates time-stamped lists of events based on the automatically detected predefined events. Then, it can access a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely using the timestamps associated with the detected events from the first video stream. Using the timestamps and the time-stamped lists of events, the present invention can access the corresponding sub-streams for the events in the synchronized second video stream.
  • a time-server can be used in order to maintain a synchronized time in the network of means for control and processing in the present invention.
  • the present invention can enable an annotator to manually annotate each of the synchronized events in the corresponding sub-streams for the events in the synchronized second video stream, with a plurality of labels, using a tool.
  • the annotation tool can comprise a user interface for the annotation.
  • Examples of the user interface can comprise a digital annotation tool or an analog annotation tool.
  • the user interface allows users to mark time-based annotations describing more complex behavioral issues, which may not be detected by using a fully automated method and require human identification. Examples of the more complex behavioral issues can comprise expressions of the people.
  • the tool can further comprise a graphical user interface for the annotation to further make the analysis more efficient.
  • the graphical user interface can be used to browse the video streams based on the timestamps of the events, such as the beginning and end time.
  • the physical space may be a retail space, and the people may be customers or shoppers in the retail space in the description of the invention.
  • the solution in the present invention can help the owner of the particular embodiment to have in-depth understanding of shopper behavior.
  • the annotation can be utilized for more quantitative and deeper behavior analysis about the interaction of people with commercial products in the retail space.
  • the present invention can also generate statistical reports by aggregating the annotated events.
  • the disclosed method may be described in the context of a retail space, the present invention can be applied to any physical space, and the application area of the present invention is not limited to the retail space.
  • the present invention can utilize a rule-based logic module for the synchronization between the first video stream and the second video stream. This enables dynamic rule application, where the synchronization can be adjusted based on the rules defined in the module, rather than the synchronization relying on an ad-hoc solution or static hard-code.
  • FIG. 1 is an overview of a preferred embodiment of the invention, where the present invention detects predefined events in a first video stream from a top-down first means for capturing images and generates time-stamped lists of events, which are used to access the corresponding sub-streams for the events in a synchronized second video stream from a second means for capturing images for the annotation of the events.
  • FIG. 2 is an overview of another exemplary embodiment of the invention, where the present invention uses a different type of sensor for detecting the predefined events.
  • FIG. 3 shows an exemplary scene of the annotation process by an annotator for the synchronized view of the events, using an exemplary annotation tool.
  • FIG. 4 shows an exemplary annotation tool in the present invention.
  • FIG. 5 shows an exemplary synchronization architecture in an exemplary network of a plurality of means for control and processing in the present invention, where the network consists of a plurality of first means for control and processing and a plurality of second means for control and processing, which communicate with each other to synchronize the time-stamped lists of events among a plurality of video streams for the detected events.
  • FIG. 6 shows overall processes of an exemplary embodiment of the present invention, comprising the automatic event detection in a first video stream, the synchronization of the event in a corresponding second video stream, and the annotation of the detected event in the synchronized second video stream.
  • FIG. 7 shows detailed exemplary processes of predefined event detection, based on the behavior analysis of the people, in an exemplary automatic event detection module in the present invention.
  • FIG. 8 shows detailed exemplary processes of automatic detection of predefined events in another exemplary embodiment of the present invention, where the predefined event detection also uses the segmentation information of the people, such as demographics, in an exemplary automatic event detection module.
  • FIG. 1 is an overview of a preferred embodiment of the invention, where the present invention detects predefined events in a first video stream from a top-down first means for capturing images 101 and generates time-stamped lists of events, which are used to access the corresponding sub-streams for the events in a synchronized second video stream from a second means for capturing images 102 for the annotation of the events.
  • the processes in the present invention are based on a novel usage of a plurality of computer vision technologies to analyze the human behavior from the plurality of input images.
  • the method leverages the strengths of the technologies in the present invention and processes to deliver a new level of access to the behaviors and visual characteristics of people in the physical space.
  • the present invention captures a plurality of input images of the people in a physical space 130 by a plurality of first means for capturing images 101 and processes the plurality of input images in order to detect the predefined events based on the behavior analysis of the people in the physical space.
  • the behavior analysis and the following automatic event detection can be based on the spatial and temporal attributes of the person tracking in the field of view of a first means for capturing images 101 .
  • an exemplary “event detection 1” 251 can comprise the automatically measured spatial and temporal attributes about the detected event, such as the time “Ti” when the event occurred and the location “(Xi, Yi)” of the event, the assigned event identification “EID1”, and the event type “ET1” of the specific event.
  • the utilization of the dwell time of the people in a specific location of the physical space can be used as one of the criteria for defining the targeted behavior.
  • Examples of the temporal targeted behavior can comprise passerby behavior and engaged shopper behavior, based on the dwell time measurement and comparison against predefined thresholds.
  • the present invention can utilize a plurality of first means for capturing images 101 and a plurality of second means for capturing images 102 in a preferred embodiment.
  • the first means for capturing images 101 can be an overhead top-down camera
  • the second means for capturing images 102 can be a camera that is positioned to observe the people more closely for analyzing a specific event.
  • the present invention generates time-stamped lists of events based on the automatically detected predefined events. Then, it can access a synchronized second video stream from a second means for capturing images 102 that is positioned to observe the people more closely, using the timestamps associated with the detected events from the first video stream. Using the timestamps and the time-stamped lists of events, the present invention can access the corresponding sub-streams for the events in the synchronized second video stream.
  • the physical space may be a retail space, and the people may be customers or shoppers in the retail space in the description of the invention.
  • the solution in the present invention can help the owner of the particular embodiment to have an in-depth understanding of shopper behavior.
  • the annotation can be utilized for more quantitative and deeper behavior analysis about the interaction of people with commercial products in the retail space.
  • the present invention can also generate statistical reports by aggregating the annotated events.
  • the disclosed method may be described in the context of a retail space, the present invention can be applied to any physical space, and the application area of the present invention is not limited to the retail space.
  • FIG. 2 is an overview of another exemplary embodiment of the invention, where the present invention uses a different type of sensor for detecting the predefined events.
  • the automatic behavior analysis of people is the preferred method for detecting the predefined event in the present invention.
  • the automatic event detection can also be triggered by the other visual characteristics and segmentation of people in the physical space, such as the demographics, in another exemplary embodiment.
  • the present invention can process the first video stream in order to detect the demographics of the people in the field of view of the first means for capturing images automatically and generate time-stamped lists of events based on the automatically detected demographics of the people for the predefined event detection.
  • an exemplary “event detection 2” 252 can comprise the automatically measured spatial and temporal attributes about the detected event, such as the time “Tj” when the event occurred and the location “(Xj, Yj)” of the event, the assigned event identification “EID2”, and the event type “ET2” of the specific event.
  • another exemplary “event detection 3” 253 can comprise the automatically measured spatial and temporal attributes about the detected event, such as the time “Tk” when the event occurred and the location “(Xk, Yk)” of the event, the assigned event identification “EID3”, and the event type “ET3” of the specific event.
  • the event types can be defined in association with the automatic demographic measurement, respectively.
  • the present invention can provide demographic segmentation of the shoppers by gender and age group in this particular application domain.
  • the shopping behavior of each demographic group can be analyzed to obtain segment-specific insights. Understanding segment-based shopper behavior for a specific business goal in the retail space can help to develop effective customer-centric strategies to increase the basket size and loyalty of the highest-opportunity segments.
  • the present invention can utilize a plurality of first means for capturing images 101 and a plurality of second means for capturing images 102 in a preferred embodiment.
  • the first means for capturing images 101 can be an overhead top-down camera
  • the second means for capturing images 102 can be a camera that is positioned to observe the people more closely for analyzing a specific event.
  • the present invention can also utilize different types of sensors for a different type of automatic event detection, such as a wireless sensor, a door sensor 116 , or other types of sensors in an electronic article surveillance (EAS) system.
  • a wireless sensor can include, but are not limited to, a RFID and means for using the RFID.
  • a sequence of the RFID proximity detection can be used to provide tracking information of the people.
  • the present invention can use a door sensor 116 to trigger a different type of event, such as an anti-theft alarm event.
  • FIG. 3 shows an exemplary scene of the annotation 280 process by an annotator for the synchronized view of the events using an exemplary annotation tool 160 .
  • the present invention can enable an annotator to manually annotate each of the synchronized events in the corresponding sub-streams for the events in the synchronized second video stream 172 , with a plurality of labels, using an annotation tool 160 .
  • the annotation tool 160 can comprise a user interface for the annotation.
  • Examples of the user interface can comprise a digital annotation tool or an analog annotation tool.
  • the user interface allows users to mark time-based annotations describing more complex behavioral issues, which may not be detected by using a fully automated method and require human identification. Examples of the more complex behavioral issues can comprise expressions of the people.
  • the present invention detects an exemplary event, “event detection 1 ” 251 , in a first video stream 171 , and then the annotator can use the annotation tool 160 to find the corresponding synchronized event in a second video stream 172 , utilizing the attributes in the exemplary “event detection 1 ” 251 .
  • the annotator can also use the annotation tool 160 to watch and annotate the synchronized event in a second video stream 172 by accessing the synchronized view of the event 265 in the annotation tool 160 .
  • the present invention can also display the top-down event detection view from the first video stream 171 on a means for playing output 103 .
  • FIG. 4 shows an exemplary annotation tool 160 in the present invention.
  • the annotation tool 160 can further comprise a graphical user interface 162 for the annotation to further make the analysis more efficient as shown in FIG. 4 .
  • the graphical user interface 162 can be used to browse the video streams based on the timestamps of the events, such as the beginning and end time.
  • the exemplary graphical user interface 162 can comprise event selection 176 , video stream selection 177 , event timeline selection 178 , and other facilitating interface capabilities.
  • the annotator can browse through time-stamped lists of events, automatically generated by the present invention, and select a synchronized second video stream among a plurality of available second video streams, using the video stream selection 177 . After a second video stream, relevant to the target event for annotation, is selected, the annotator can quickly and efficiently access the corresponding sub-streams for the event in the synchronized second video stream, using the timestamps for the detected events.
  • FIG. 5 shows an exemplary synchronization architecture in an exemplary network of a plurality of means for control and processing in the present invention, where the network consists of a plurality of first means for control and processing 107 and a plurality of second means for control and processing 108 , which communicate with each other to synchronize the time-stamped lists of events among a plurality of video streams for the detected events.
  • the present invention generates time-stamped lists of events based on the automatically detected predefined events. Then, it can access a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely, using the timestamps associated with the detected events from the first video stream. Using the timestamps and the time-stamped lists of events, the present invention can access the corresponding sub-streams for the events in the synchronized second video stream.
  • the utilization of the automatic event detection and the synchronization efficiently help the annotation process by reducing the amount of video streams and the time to handle and by allowing the annotator to focus more on the interested events according to the predefined rules for the automatically detected events.
  • a time-server 109 can be used in order to maintain a synchronized time in the network of means for control and processing in the present invention.
  • the exemplary network of a plurality of means for control and processing can consist of a plurality of first means for control and processing 107 and a plurality of second means for control and processing 108 .
  • a first means for control and processing 107 can act as a server and a plurality of second means for control and processing 108 can act as clients.
  • the server can run its own local clock or be connected to a global time-server 109 for the synchronization utilizing a time synchronization protocol, such as the Network Time Protocol (NTP).
  • NTP Network Time Protocol
  • the number of means for capturing images per a means for control and processing varies, depending on the system configuration in the physical space.
  • each means for control and processing knows the location and the identification of each of its associated plurality of means for capturing images and the area covered by the means for capturing images. Therefore, when an event is detected by a top-down first means for capturing images 101 at a location, its associated first means for control and processing 107 can correctly find the corresponding second means for capturing images 102 close to the specific location, through communicating with the second means for control and processing, associated with the corresponding second means for capturing images 102 .
  • the present invention when an event is detected by the “first means for capturing images at location L1” 110 , the present invention can correctly find the corresponding event and sub-streams from the “second means for capturing images at location L1” 112 . Likewise, the present invention can correlate the events between the “first means for capturing images at location Ln” 111 and the “second means for capturing images at location Ln” 113 for the location Ln, using their location and identification information.
  • the present invention can utilize a rule-based logic module for the synchronization among a plurality of the first video streams and a plurality of the second video streams.
  • a rule-based logic module for the synchronization among a plurality of the first video streams and a plurality of the second video streams.
  • the annotator can select and utilize any of the plurality of the second video streams from their associated second means for capturing images.
  • the rule-based logic module can also further help the annotator by providing more information about the detected event and synchronization, based on the predefined rules in the module.
  • the logic module can provide priority information among the plurality of second video streams according to the predefined rules for the order, relevance, and specific needs at the specific location in the physical space.
  • the rule-based logic module can also enable a dynamic rule application, where the synchronization can be adjusted dynamically based on the rules defined in the module, rather than the synchronization relying on an ad-hoc solution or static hard-code.
  • FIG. 6 shows overall processes of an exemplary embodiment of the present invention, comprising the automatic event detection 255 in a first video stream 171 , the synchronization 260 of the event in a corresponding second video stream, and the annotation 280 of the detected event in the synchronized second video stream.
  • the present invention processes a generation of lists of events 256 , based on the “automatic event detection” 255 in a first video stream 171 , from a first means for capturing images 101 .
  • an annotator can use the information in the generated events, such as the timestamp, the location of the corresponding second means for capturing images and the corresponding second means for control and processing, and their identifications, to find 272 and access 273 a synchronized second video stream, among a plurality of available second video streams, i.e. “second video stream 1” 173 , “second video stream 2” 174 , and “second video stream N” 175 , from the corresponding second means for capturing images that are positioned to observe the people more closely, utilizing an annotation tool.
  • the annotator further uses the detailed information for the target event, such as the start and end timestamps of the event, to access the relevant sub-streams in the synchronized second video stream for the final annotation 280 of the specific event, based on the domain specific parameters 282 .
  • FIG. 7 shows detailed exemplary processes of predefined event detection based on the behavior analysis of the people in an exemplary “automatic event detection” 255 module in the present invention.
  • the present invention detects 710 and tracks 714 a person in a physical space for the path analysis 470 , and the information in the path analysis 470 , such as the sequence of coordinates and temporal attributes, are used for the behavior analysis 480 of the person.
  • the present invention can utilize any reliable video-based tracking method for people in the prior art in regards to the behavior analysis.
  • U.S. Pat. No. 7,974,869 of Sharma, et al. (hereinafter Sharma869) disclosed an exemplary process of video-based tracking and behavior analysis for a single customer or a group of customers using multiple means for capturing images, based on the spatial and temporal attributes of the person tracking.
  • FIG. 20 and FIG. 21 in Sharma869 show exemplary spatio-temporal primitives for modeling human-object behavior and exemplary shopping interaction levels that are observed to produce the behavioral analysis in a physical space.
  • the behavior recognition can be achieved via spatio-temporal analysis of tracks, using geometry and pattern recognition techniques.
  • the approach for defining and detecting spatio-temporal relations specific to the retail enterprise domain followed by a Bayesian Belief propagation approach to modeling primitive behaviors specific to the retail domain, as an exemplary site of a media network in Sharma869, can be applied to any physical space.
  • the exemplary primitive behaviors comprised categories of “customer moves towards object”, “customer doesn't walk towards object”, “customer velocity reduces”, “customer velocity increases”, “customer stands in front of object”, and “customer walks away from object”, and these primitive behaviors were combined to model predefined complex behaviors. Then the behaviors of the people were analyzed based on the model. Walkthrough history, the time spent in a certain area within a physical space, frequency pattern, relational pattern, and special event pattern can also be used as the exemplary attributes for the behavior analysis.
  • the exemplary shopping interaction levels in Sharma869 can be regarded as an exemplary higher level of complex behaviors in a target physical space, especially in a retail space, which are observed to produce the behavioral analysis in the context of the present invention.
  • Sharma869 defined the exemplary shopping interaction levels based on the spatio-temporal relations, which are “passing by”, “noticing”, “stopping”, from “engaging 1” to “engaging P-1”, and “purchase”. They are labeled as “level 1” interaction, “level 2” interaction, “level 3” interaction, from “level 4” interaction to “level P-1” interaction, and “level p” interaction, respectively, where multiple engaging levels are also considered.
  • the shopping interaction level can be measured based on the temporal attribute of the person tracking for the customer in regards to the combination of the primitive behaviors. For example, if there is no change in velocity, the present invention can measure the customer's interaction level as a passer-by level at a particular category. If the stopping time T1 is greater than a threshold, such as T1 seconds, then the present invention can measure the customer's interaction level as a level 4 interaction. Likewise, the temporal attribute of the person tracking can match the time value to the corresponding interaction levels, based on the predefined threshold and rules.
  • the present invention can detect 250 the predefined events and generate a list of the detected events 256 .
  • FIG. 8 shows detailed exemplary processes of automatic detection of predefined events in another exemplary embodiment of the present invention, where the predefined event detection also uses the segmentation information of the people, such as demographics, in an exemplary automatic event detection module.
  • the present invention can process the event detection 250 based on the behavior analysis of the people in a physical space and generate a list of detected events 256 as described in regards to FIG. 7 .
  • the computer vision based automatic segmentation 241 of the people on a video can also be used as one of the criteria to define certain types of events.
  • Automatic demographic classification 814 can be used as an exemplary segmentation of the people.
  • the present invention can process segmentation 241 of the customer, such as the demographic classification 814 , based on the images of the people in a first video stream 171 and use the segmentation 241 information to detect the predefined events based on the segmentation criteria.
  • the present invention can utilize any reliable demographic composition measurement method in the prior art as an exemplary video-based segmentation of the customers.
  • any reliable demographic composition measurement method in the prior art as an exemplary video-based segmentation of the customers.
  • U.S. Provisional Pat. No. 60/808,283 of Sharma, et al. disclosed an exemplary demographic composition measurement based on gender and ethnicity.
  • Age is also another attribute that Sharma 60/808,283 can measure.
  • Automatic event detection based on the segmentation of the people in a physical space can provide unique benefits to the annotator and the owner of a particular embodiment of the present invention.
  • the detailed annotation labels can be efficiently organized based on the predefined segmentation criteria in the events.
  • Detailed annotation labels per demographic groups can be a very useful market analysis data in an exemplary embodiment of the present invention.

Abstract

The present invention is a method and system for optimizing the observation and annotation of complex human behavior from video sources by automatically detecting predefined events based on the behavior of people in a first video stream from a first means for capturing images in a physical space, accessing a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely using the timestamps associated with the detected events from the first video stream, and enabling an annotator to annotate each of the events with more labels using a tool. The present invention captures a plurality of input images of the persons by a plurality of means for capturing images and processes the plurality of input images in order to detect the predefined events based on the behavior in an exemplary embodiment. The processes are based on a novel usage of a plurality of computer vision technologies to analyze the human behavior from the plurality of input images. The physical space may be a retail space, and the people may be customers in the retail space.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Patent Application No. 60/898,311, filed Jan. 30, 2007.
FEDERALLY SPONSORED RESEARCH
Not Applicable
SEQUENCE LISTING OR PROGRAM
Not Applicable
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is a method and system for automatically detecting predefined events based on the behavior of people in a first video stream from a first means for capturing images in a physical space, accessing a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely using the timestamps associated with the detected events from the first video stream, and enabling an annotator to annotate each of the events with more labels using an annotation tool.
2. Background of the Invention
Event Detection based on Shoppers' Behavior Analysis
There have been earlier attempts for event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2003/0058339 of Trajkovic, et al. (hereinafter Trajkovic) disclosed a method for detecting an event through repetitive patterns of human behavior. Trajkovic learned multi-dimensional feature data from the repetitive patterns of human behavior and computed a probability density function (PDF) from the data. Then, a method for the PDF analysis, such as Gaussian or clustering techniques, was used to identify the repetitive patterns of behavior and unusual behavior through the variance of the Gaussian distribution or cluster.
Although Trajkovic can model a repetitive behavior through the PDF analysis, Trajkovic are clearly foreign to the event detection for the aggregate of non-repetitive behaviors, such as the shopper traffic in a physical space. Trajkovic did not disclose the challenges in the event detection based on customers' behaviors in a video in a retail environment, such as the non-repetitive behaviors. Therefore, Trajkovic are clearly foreign to the challenges that can be found in a retail environment.
U.S. Pat. Appl. Pub. No. 2006/0053342 of Bazakos, et al. (hereinafter Bazakos) disclosed a method for unsupervised learning of events in a video. Bazakos disclosed a method of creating a feature vector of a related object in a video by grouping clusters of points together within a feature space and storing the feature vector in an event library. Then, the behavioral analysis engine in Bazakos determined whether an event had occurred by comparing features contained within a feature vector in a specific instance against the feature vectors in the event library. Bazakos are primarily related to surveillance, rather than event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2005/0286774 of Porikli disclosed a method for event detection in a video using approximate estimates of the aggregated affinity matrix and clustering and scoring of the matrix. Porikli constructed the affinity matrix based on a set of frame-based and object-based statistical features, such as trajectories, histograms, and Hidden Markov Models of feature speed, orientation, location, size, and aspect ratio, extracted from the video.
Shoppers' Behavior Analysis
There have been earlier attempts for understanding customers' shopping behaviors captured in a video in a targeted environment, such as in a retail store, using cameras.
U.S. Pat. Appl. Pub. No. 2006/0010028 of Sorensen (hereinafter Sorensen 1) disclosed a method for tracking shopper movements and behavior in a shopping environment using a video. In Sorensen 1, a user indicated a series of screen locations in a display at which the shopper appeared in the video, and the series of screen locations were translated to store map coordinates.
The step of receiving the user input via input devices, such as a pointing device or keyboard, makes Sorensen 1 inefficient for handling a large amount of video data in a large shopping environment with a relatively complicated store layout, especially over a long period of time. The manual input by a human operator/user cannot efficiently track all of the shoppers in such cases, partially due to the possibility of human errors caused by tiredness and boredom. The manual input approach is also much less scalable as the number of shopping environments to handle for the behavior analysis increases. Therefore, an automated event detection approach is needed. The present invention utilizes an automated event detection approach for detecting predefined events from the customers' shopping interaction in a physical space.
Although U.S. Pat. Appl. Pub. No. 2002/0178085 of Sorensen, now U.S. Pat. No. 7,006,982, (hereinafter Sorensen 2) disclosed a usage of a tracking device and store sensors in a plurality of tracking systems primarily based on the wireless technology, such as the RFID, Sorensen 2 is clearly foreign to the concept of applying computer vision based tracking algorithms to the field of understanding customers' shopping behaviors and movements. In Sorensen 2, each transmitter was typically attached to a hand-held or push-type cart. Therefore, Sorensen 2 cannot distinguish the behaviors of multiple shoppers using one cart from the behavior of a single shopper also using one cart. Although Sorensen 2 disclosed that the transmitter may be attached directly to a shopper, via a clip or other form of customer surrogate in order to correctly track the shopper in the case when the person is shopping without a cart, this will not be practical due to the additionally introduced cumbersome step to the shopper, not to mention the inefficiency of managing the transmitter for each individual shopper.
The present invention can embrace any type of automatic wireless sensors for the detection of the predefined events. However, in a preferred embodiment, the present invention primarily utilizes the computer vision based automated approach for the detection of the predefined events. The computer vision based event detection helps the present invention to overcome the obstacles mentioned above.
With regard to the temporal behavior of customers, U.S. Pat. Appl. Pub. No. 2003/0002712 of Steenburgh, et al. (hereinafter Steenburgh) disclosed a relevant exemplary prior art. Steenburgh disclosed a method for measuring dwell time of an object, particularly a customer in a retail store, which enters and exits an environment, by tracking the object and matching the entry signature of the object to the exit signature of the object, in order to find out how long people spend in retail stores.
The modeling and analysis of activity of interest can be used as the exemplary way to detect predefined events.
U.S. Pat. Appl. Pub. No. 2002/0085092 of Choi, et al. (hereinafter Choi) disclosed a method for modeling an activity of a human body using optical flow vector from a video and probability distribution of the feature vectors from the optical flow vector. Choi modeled a plurality of states using the probability distribution of the feature vectors and expressed the activity based on the state transition.
Other Application Areas
There have been earlier attempts for activity analysis in various other areas than understanding customers' shopping behaviors, such as the surveillance and security applications. The following prior arts are not restricted to the application area for understanding customers' shopping behaviors in a physical space, but they disclosed methods for object activity modeling and analysis for the human body, using a video, in general.
Surveillance Application
U.S. Pat. Appl. Pub. No. 2003/0053659 of Pavlidis, et al. (hereinafter Pavlidis) disclosed a method for moving object assessment, including an object path of one or more moving objects in a search area, using a plurality of imaging devices and segmentation by background subtraction. In Pavlidis, the term “object” included customers, and Pavlidis also included itinerary statistics of customers in a department store. However, Pavlidis was primarily related to monitoring a search area for surveillance.
U.S. Pat. Appl. Pub. No. 2004/0113933 of Guler disclosed a method for automatic detection of split and merge events from video streams in a surveillance environment. Guler considered split and merge behaviors as key common simple behavior components in order to analyze high level activities of interest in a surveillance application, which are also used to understand the relationships among multiple objects not just individual behavior. Guler used adaptive background subtraction to detect the objects in a video scene, and the objects were tracked to identify the split and merge behaviors. To understand the split and merge behavior-based high level events, Guler used a Hidden Markov Model (HMM).
U.S. Pat. Appl. Pub. No. 2004/0120581 of Ozer, et al. (hereinafter Ozer) disclosed a method for identifying activity of customers for a marketing purpose or activity of objects in a surveillance area, by comparing the detected objects with the graphs from a database. Ozer tracked the movement of different object parts and combined them to high-level activity semantics, using several Hidden Markov Models (HMMs) and a distance classifier.
Transaction Application
U.S. Pat. No. 6,741,973 of Dove, et al. (hereinafter Dove) disclosed a model of generating customer behavior in a transaction environment. Although Dove disclosed video cameras in a real bank branch as a way to observe the human behavior, Dove are clearly foreign to the concept of automatic event detection based on the customers' behaviors on visual information of the customers in other types of physical space, such as the shopping path tracking and analysis in a retail environment, for the sake of annotating the customers' behaviors.
Computer vision algorithms have been shown to be an effective means for detecting and tracking people. These algorithms also have been shown to be effective in analyzing the behavior of people in the view of the means for capturing images. This allows the possibility of connecting the visual information from a scene to the behavior analysis of customers and predefined event detection.
Therefore, it is an objective of the present invention to provide a novel approach for annotating the customers' behaviors utilizing the information from the automatic behavior analysis of customers and predefined event detection. Any reliable automatic behavior analysis in the prior art may be used for the predefined event detection in the present invention. However, it is another objective of the present invention to provide a novel solution that solves the aforementioned problems found in the prior arts for the automatic event detection, such as the cumbersome attachment of devices to the customers, by automatically and unobtrusively analyzing the customers' behaviors without involving any hassle of requiring the customers to carry any cumbersome device.
Demographics
Computer vision algorithms have been shown to be an effective means for analyzing the demographic information of people in the view of the means for capturing images. Thus, there have been prior attempts for recognizing the demographic category of a person by processing the facial image using various approaches in the computer vision technologies, such as a machine learning approach.
U.S. Pat. No. 6,990,217 of Moghaddam, et al. (hereinafter Moghaddam) disclosed a method to employ Support Vector Machine to classify images of faces according to gender by training the images, including images of male and female faces; determining a plurality of support vectors from the training images for identifying a hyperplane for the gender decision; and reducing the resolution of the training images and the test image by sub-sampling before supplying the images to the Support Vector Machine.
U.S. Pat. Appl. Pub. No. 20030110038 of Sharma, et al. (hereinafter Sharma 20030110038) disclosed a computer software system for multi-modal human gender classification, comprising: a first-mode classifier classifying first-mode data pertaining to male and female subjects according to gender, and rendering a first-mode gender-decision for each male and female subject; a second-mode classifier classifying second-mode data pertaining to male and female subjects according to gender, and rendering a second-mode gender-decision for each male and female subject; and a fusion classifier integrating the individual gender decisions obtained from said first-mode classifier and said second-mode classifier, and outputting a joint gender decision for each of said male and female subjects.
Moghaddam and Sharma 20030110038, for demographics classification mentioned above, aim to classify a certain class of demographics profile, such as for gender only, based on the image signature of faces. U.S. Provisional Pat. No. 60/808,283 of Sharma, et al. (hereinafter Sharma 60/808,283) is a much more comprehensive solution, where the automated system captures video frames, detects customer faces in the frames, tracks the faces individually, corrects the pose the faces, and finally classifies the demographics profiles of the customers—both of the gender and the ethnicity. In Sharma 60/808,283, the face tracking algorithm has been designed and tuned to improve the classification accuracy; the facial geometry correction step improves both the tracking and the individual face classification accuracy, and the tracking further improves the accuracy of the classification of gender and ethnicity over the course of visibly tracked faces by combining the individual face classification scores.
Therefore, it is another objective of the present invention to detect the predefined events based on the demographic information of people in another exemplary embodiment. The invention automatically and unobtrusively analyzes the customers' demographic information without involving any hassle to customers or operators of feeding the information manually, utilizing the novel demographic analysis approaches in the prior arts.
The present invention utilizes the event detection by the automatic behavior analysis and demographic analysis in a first video stream to synchronize the same event in another second video stream and allows an annotator to annotate the synchronized event through an annotation tool. The manual annotation data in the present invention can be used for various market analysis applications, such as measuring deeper insights for customers' shopping behavior analysis in a retail store, media effectiveness measurement, and traffic analysis.
SUMMARY
The present invention is a method and system for optimizing the observation and annotation of predefined events by enabling the automatic detection of predefined events based on the behavior of people in a first video stream from a first means for capturing images in a physical space and the annotation for each of the events by an annotator utilizing an annotation tool.
It is an objective of the present invention to efficiently handle complex human behavior from video sources utilizing a plurality of computer vision technologies, such as person detection and tracking, and the annotation tool in a preferred embodiment. The present invention captures a plurality of input images of the persons by a plurality of first means for capturing images and processes the plurality of input images in order to detect the predefined events based on the behavior analysis of the people in an exemplary embodiment. Utilization of the dwell time of the people in a specific location of the physical space can be used as one of the exemplary criteria for defining the targeted behavior. Examples of the temporal targeted behavior can comprise passerby behavior and engaged shopper behavior based on the dwell time measurement and comparison against predefined thresholds.
The processes are based on a novel usage of a plurality of computer vision technologies to analyze the human behavior from the plurality of input images. The method leverages the strengths of the technologies in the present invention and processes to deliver a new level of access to the behaviors and visual characteristics of people in the physical space.
Although automatic behavior analysis of people is the primary method for the predefined event detection in the present invention, the automatic event detection in the present invention can also be triggered by the other visual characteristics and segmentation of people in the physical space, such as the demographics, in another exemplary embodiment. Therefore, it is another objective of the present invention to process the first video stream in order to detect demographics of the people in the field of view of the first means for capturing images automatically and generate time-stamped lists of events based on the automatically detected demographics of the people for the predefined event detection.
An exemplary embodiment of the present invention can be applied to a retail space application, and it can provide demographic segmentation of the shoppers by gender and age group in this particular application domain. In this exemplary embodiment, the shopping behavior of each demographic group can be analyzed to obtain segment-specific insights. Understanding segment-based shopper behavior for a specific business goal in the retail space can help to develop effective customer-centric strategies to increase the basket size and loyalty of the highest-opportunity segments.
The present invention utilizes a plurality of first means for capturing images and a plurality of second means for capturing images in a preferred embodiment. The first means for capturing images can be an overhead top-down camera, and the second means for capturing images can be a camera that is positioned to observe the people more closely for analyzing a specific event.
In another embodiment, the present invention can also utilize different types of sensors for the automatic event detection. For example, the present invention can utilize a wireless sensor based tracking for the automatic event detection or a door sensor to trigger an event. Examples of the wireless sensor can include, but are not limited to, a RFID and means for using the RFID.
The present invention generates time-stamped lists of events based on the automatically detected predefined events. Then, it can access a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely using the timestamps associated with the detected events from the first video stream. Using the timestamps and the time-stamped lists of events, the present invention can access the corresponding sub-streams for the events in the synchronized second video stream.
It is another objective of the present invention that the utilization of the automatic event detection and the synchronization efficiently help the annotation process by reducing the amount of video streams and the time to handle and by allowing the annotator to focus more on the interested events according to the predefined rules for the automatically detected events. A time-server can be used in order to maintain a synchronized time in the network of means for control and processing in the present invention.
The present invention can enable an annotator to manually annotate each of the synchronized events in the corresponding sub-streams for the events in the synchronized second video stream, with a plurality of labels, using a tool.
The annotation tool can comprise a user interface for the annotation. Examples of the user interface can comprise a digital annotation tool or an analog annotation tool. The user interface allows users to mark time-based annotations describing more complex behavioral issues, which may not be detected by using a fully automated method and require human identification. Examples of the more complex behavioral issues can comprise expressions of the people.
The tool can further comprise a graphical user interface for the annotation to further make the analysis more efficient. The graphical user interface can be used to browse the video streams based on the timestamps of the events, such as the beginning and end time.
The physical space may be a retail space, and the people may be customers or shoppers in the retail space in the description of the invention. In an exemplary embodiment for a retail space, the solution in the present invention can help the owner of the particular embodiment to have in-depth understanding of shopper behavior. The annotation can be utilized for more quantitative and deeper behavior analysis about the interaction of people with commercial products in the retail space. The present invention can also generate statistical reports by aggregating the annotated events.
However, although the disclosed method may be described in the context of a retail space, the present invention can be applied to any physical space, and the application area of the present invention is not limited to the retail space.
In another exemplary embodiment, the present invention can utilize a rule-based logic module for the synchronization between the first video stream and the second video stream. This enables dynamic rule application, where the synchronization can be adjusted based on the rules defined in the module, rather than the synchronization relying on an ad-hoc solution or static hard-code.
DRAWINGS Figures
FIG. 1 is an overview of a preferred embodiment of the invention, where the present invention detects predefined events in a first video stream from a top-down first means for capturing images and generates time-stamped lists of events, which are used to access the corresponding sub-streams for the events in a synchronized second video stream from a second means for capturing images for the annotation of the events.
FIG. 2 is an overview of another exemplary embodiment of the invention, where the present invention uses a different type of sensor for detecting the predefined events.
FIG. 3 shows an exemplary scene of the annotation process by an annotator for the synchronized view of the events, using an exemplary annotation tool.
FIG. 4 shows an exemplary annotation tool in the present invention.
FIG. 5 shows an exemplary synchronization architecture in an exemplary network of a plurality of means for control and processing in the present invention, where the network consists of a plurality of first means for control and processing and a plurality of second means for control and processing, which communicate with each other to synchronize the time-stamped lists of events among a plurality of video streams for the detected events.
FIG. 6 shows overall processes of an exemplary embodiment of the present invention, comprising the automatic event detection in a first video stream, the synchronization of the event in a corresponding second video stream, and the annotation of the detected event in the synchronized second video stream.
FIG. 7 shows detailed exemplary processes of predefined event detection, based on the behavior analysis of the people, in an exemplary automatic event detection module in the present invention.
FIG. 8 shows detailed exemplary processes of automatic detection of predefined events in another exemplary embodiment of the present invention, where the predefined event detection also uses the segmentation information of the people, such as demographics, in an exemplary automatic event detection module.
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is an overview of a preferred embodiment of the invention, where the present invention detects predefined events in a first video stream from a top-down first means for capturing images 101 and generates time-stamped lists of events, which are used to access the corresponding sub-streams for the events in a synchronized second video stream from a second means for capturing images 102 for the annotation of the events.
The processes in the present invention are based on a novel usage of a plurality of computer vision technologies to analyze the human behavior from the plurality of input images. The method leverages the strengths of the technologies in the present invention and processes to deliver a new level of access to the behaviors and visual characteristics of people in the physical space.
In the exemplary embodiment shown in FIG. 1, the present invention captures a plurality of input images of the people in a physical space 130 by a plurality of first means for capturing images 101 and processes the plurality of input images in order to detect the predefined events based on the behavior analysis of the people in the physical space.
The behavior analysis and the following automatic event detection can be based on the spatial and temporal attributes of the person tracking in the field of view of a first means for capturing images 101. For example, when a person stays in a specific region of interest in the physical space for more than a predefined time threshold, it can be decided that an event occurred. In the exemplary embodiment shown in FIG. 1, an exemplary “event detection 1” 251 can comprise the automatically measured spatial and temporal attributes about the detected event, such as the time “Ti” when the event occurred and the location “(Xi, Yi)” of the event, the assigned event identification “EID1”, and the event type “ET1” of the specific event.
As said, the utilization of the dwell time of the people in a specific location of the physical space can be used as one of the criteria for defining the targeted behavior. Examples of the temporal targeted behavior can comprise passerby behavior and engaged shopper behavior, based on the dwell time measurement and comparison against predefined thresholds.
As shown in the exemplary embodiment in FIG. 1, the present invention can utilize a plurality of first means for capturing images 101 and a plurality of second means for capturing images 102 in a preferred embodiment. The first means for capturing images 101 can be an overhead top-down camera, and the second means for capturing images 102 can be a camera that is positioned to observe the people more closely for analyzing a specific event.
The present invention generates time-stamped lists of events based on the automatically detected predefined events. Then, it can access a synchronized second video stream from a second means for capturing images 102 that is positioned to observe the people more closely, using the timestamps associated with the detected events from the first video stream. Using the timestamps and the time-stamped lists of events, the present invention can access the corresponding sub-streams for the events in the synchronized second video stream.
The physical space may be a retail space, and the people may be customers or shoppers in the retail space in the description of the invention. In an exemplary embodiment for a retail space, the solution in the present invention can help the owner of the particular embodiment to have an in-depth understanding of shopper behavior. The annotation can be utilized for more quantitative and deeper behavior analysis about the interaction of people with commercial products in the retail space. The present invention can also generate statistical reports by aggregating the annotated events.
However, although the disclosed method may be described in the context of a retail space, the present invention can be applied to any physical space, and the application area of the present invention is not limited to the retail space.
FIG. 2 is an overview of another exemplary embodiment of the invention, where the present invention uses a different type of sensor for detecting the predefined events.
In an exemplary embodiment, the automatic behavior analysis of people is the preferred method for detecting the predefined event in the present invention. However, the automatic event detection can also be triggered by the other visual characteristics and segmentation of people in the physical space, such as the demographics, in another exemplary embodiment. In this exemplary embodiment, the present invention can process the first video stream in order to detect the demographics of the people in the field of view of the first means for capturing images automatically and generate time-stamped lists of events based on the automatically detected demographics of the people for the predefined event detection.
In the exemplary embodiment shown in FIG. 2, the present invention can measure the demographics of the people at a specific region of interest, such as the entrance and exit of a physical space 130 using a plurality of first means for capturing images 101. In the exemplary embodiment, an exemplary “event detection 2” 252 can comprise the automatically measured spatial and temporal attributes about the detected event, such as the time “Tj” when the event occurred and the location “(Xj, Yj)” of the event, the assigned event identification “EID2”, and the event type “ET2” of the specific event. Likewise, another exemplary “event detection 3” 253 can comprise the automatically measured spatial and temporal attributes about the detected event, such as the time “Tk” when the event occurred and the location “(Xk, Yk)” of the event, the assigned event identification “EID3”, and the event type “ET3” of the specific event. In these exemplary event detections, the event types can be defined in association with the automatic demographic measurement, respectively.
In an exemplary embodiment applied to a retail space, the present invention can provide demographic segmentation of the shoppers by gender and age group in this particular application domain. In this exemplary embodiment, the shopping behavior of each demographic group can be analyzed to obtain segment-specific insights. Understanding segment-based shopper behavior for a specific business goal in the retail space can help to develop effective customer-centric strategies to increase the basket size and loyalty of the highest-opportunity segments.
As shown in FIG. 2, the present invention can utilize a plurality of first means for capturing images 101 and a plurality of second means for capturing images 102 in a preferred embodiment. The first means for capturing images 101 can be an overhead top-down camera, and the second means for capturing images 102 can be a camera that is positioned to observe the people more closely for analyzing a specific event.
However, in the exemplary embodiment, the present invention can also utilize different types of sensors for a different type of automatic event detection, such as a wireless sensor, a door sensor 116, or other types of sensors in an electronic article surveillance (EAS) system. Examples of the wireless sensor can include, but are not limited to, a RFID and means for using the RFID. A sequence of the RFID proximity detection can be used to provide tracking information of the people. In the exemplary embodiment shown in FIG. 2, the present invention can use a door sensor 116 to trigger a different type of event, such as an anti-theft alarm event.
FIG. 3 shows an exemplary scene of the annotation 280 process by an annotator for the synchronized view of the events using an exemplary annotation tool 160.
The present invention can enable an annotator to manually annotate each of the synchronized events in the corresponding sub-streams for the events in the synchronized second video stream 172, with a plurality of labels, using an annotation tool 160.
The annotation tool 160 can comprise a user interface for the annotation. Examples of the user interface can comprise a digital annotation tool or an analog annotation tool. The user interface allows users to mark time-based annotations describing more complex behavioral issues, which may not be detected by using a fully automated method and require human identification. Examples of the more complex behavioral issues can comprise expressions of the people.
In the exemplary embodiment shown in FIG. 3, the present invention detects an exemplary event, “event detection 1251, in a first video stream 171, and then the annotator can use the annotation tool 160 to find the corresponding synchronized event in a second video stream 172, utilizing the attributes in the exemplary “event detection 1251. The annotator can also use the annotation tool 160 to watch and annotate the synchronized event in a second video stream 172 by accessing the synchronized view of the event 265 in the annotation tool 160. The present invention can also display the top-down event detection view from the first video stream 171 on a means for playing output 103.
FIG. 4 shows an exemplary annotation tool 160 in the present invention.
The annotation tool 160 can further comprise a graphical user interface 162 for the annotation to further make the analysis more efficient as shown in FIG. 4. The graphical user interface 162 can be used to browse the video streams based on the timestamps of the events, such as the beginning and end time.
In the exemplary embodiment shown in FIG. 4, the exemplary graphical user interface 162 can comprise event selection 176, video stream selection 177, event timeline selection 178, and other facilitating interface capabilities. Using the event selection 176, the annotator can browse through time-stamped lists of events, automatically generated by the present invention, and select a synchronized second video stream among a plurality of available second video streams, using the video stream selection 177. After a second video stream, relevant to the target event for annotation, is selected, the annotator can quickly and efficiently access the corresponding sub-streams for the event in the synchronized second video stream, using the timestamps for the detected events.
FIG. 5 shows an exemplary synchronization architecture in an exemplary network of a plurality of means for control and processing in the present invention, where the network consists of a plurality of first means for control and processing 107 and a plurality of second means for control and processing 108, which communicate with each other to synchronize the time-stamped lists of events among a plurality of video streams for the detected events.
The present invention generates time-stamped lists of events based on the automatically detected predefined events. Then, it can access a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely, using the timestamps associated with the detected events from the first video stream. Using the timestamps and the time-stamped lists of events, the present invention can access the corresponding sub-streams for the events in the synchronized second video stream.
The utilization of the automatic event detection and the synchronization efficiently help the annotation process by reducing the amount of video streams and the time to handle and by allowing the annotator to focus more on the interested events according to the predefined rules for the automatically detected events.
A time-server 109 can be used in order to maintain a synchronized time in the network of means for control and processing in the present invention.
In the exemplary embodiment shown in FIG. 5, the exemplary network of a plurality of means for control and processing can consist of a plurality of first means for control and processing 107 and a plurality of second means for control and processing 108. In this exemplary embodiment, a first means for control and processing 107 can act as a server and a plurality of second means for control and processing 108 can act as clients. The server can run its own local clock or be connected to a global time-server 109 for the synchronization utilizing a time synchronization protocol, such as the Network Time Protocol (NTP).
The number of means for capturing images per a means for control and processing varies, depending on the system configuration in the physical space. However, each means for control and processing knows the location and the identification of each of its associated plurality of means for capturing images and the area covered by the means for capturing images. Therefore, when an event is detected by a top-down first means for capturing images 101 at a location, its associated first means for control and processing 107 can correctly find the corresponding second means for capturing images 102 close to the specific location, through communicating with the second means for control and processing, associated with the corresponding second means for capturing images 102.
In the exemplary embodiment shown in FIG. 5, when an event is detected by the “first means for capturing images at location L1” 110, the present invention can correctly find the corresponding event and sub-streams from the “second means for capturing images at location L1” 112. Likewise, the present invention can correlate the events between the “first means for capturing images at location Ln” 111 and the “second means for capturing images at location Ln” 113 for the location Ln, using their location and identification information.
In another exemplary embodiment, the present invention can utilize a rule-based logic module for the synchronization among a plurality of the first video streams and a plurality of the second video streams. For example, when there are multiple second means for capturing images in the vicinity of a single detected event, the annotator can select and utilize any of the plurality of the second video streams from their associated second means for capturing images. However, the rule-based logic module can also further help the annotator by providing more information about the detected event and synchronization, based on the predefined rules in the module. For example, the logic module can provide priority information among the plurality of second video streams according to the predefined rules for the order, relevance, and specific needs at the specific location in the physical space.
The rule-based logic module can also enable a dynamic rule application, where the synchronization can be adjusted dynamically based on the rules defined in the module, rather than the synchronization relying on an ad-hoc solution or static hard-code.
FIG. 6 shows overall processes of an exemplary embodiment of the present invention, comprising the automatic event detection 255 in a first video stream 171, the synchronization 260 of the event in a corresponding second video stream, and the annotation 280 of the detected event in the synchronized second video stream.
In the exemplary embodiment shown in FIG. 6, the present invention processes a generation of lists of events 256, based on the “automatic event detection” 255 in a first video stream 171, from a first means for capturing images 101. Then, an annotator can use the information in the generated events, such as the timestamp, the location of the corresponding second means for capturing images and the corresponding second means for control and processing, and their identifications, to find 272 and access 273 a synchronized second video stream, among a plurality of available second video streams, i.e. “second video stream 1” 173, “second video stream 2” 174, and “second video stream N” 175, from the corresponding second means for capturing images that are positioned to observe the people more closely, utilizing an annotation tool.
The annotator further uses the detailed information for the target event, such as the start and end timestamps of the event, to access the relevant sub-streams in the synchronized second video stream for the final annotation 280 of the specific event, based on the domain specific parameters 282.
FIG. 7 shows detailed exemplary processes of predefined event detection based on the behavior analysis of the people in an exemplary “automatic event detection” 255 module in the present invention.
As shown in the particular embodiment in FIG. 7, the present invention detects 710 and tracks 714 a person in a physical space for the path analysis 470, and the information in the path analysis 470, such as the sequence of coordinates and temporal attributes, are used for the behavior analysis 480 of the person.
The present invention can utilize any reliable video-based tracking method for people in the prior art in regards to the behavior analysis.
U.S. Pat. No. 7,974,869 of Sharma, et al. (hereinafter Sharma869) disclosed an exemplary process of video-based tracking and behavior analysis for a single customer or a group of customers using multiple means for capturing images, based on the spatial and temporal attributes of the person tracking.
FIG. 20 and FIG. 21 in Sharma869 show exemplary spatio-temporal primitives for modeling human-object behavior and exemplary shopping interaction levels that are observed to produce the behavioral analysis in a physical space.
As described in Sharma869, the behavior recognition can be achieved via spatio-temporal analysis of tracks, using geometry and pattern recognition techniques. The approach for defining and detecting spatio-temporal relations specific to the retail enterprise domain followed by a Bayesian Belief propagation approach to modeling primitive behaviors specific to the retail domain, as an exemplary site of a media network in Sharma869, can be applied to any physical space.
In Sharma869, the exemplary primitive behaviors comprised categories of “customer moves towards object”, “customer doesn't walk towards object”, “customer velocity reduces”, “customer velocity increases”, “customer stands in front of object”, and “customer walks away from object”, and these primitive behaviors were combined to model predefined complex behaviors. Then the behaviors of the people were analyzed based on the model. Walkthrough history, the time spent in a certain area within a physical space, frequency pattern, relational pattern, and special event pattern can also be used as the exemplary attributes for the behavior analysis.
The exemplary shopping interaction levels in Sharma869 can be regarded as an exemplary higher level of complex behaviors in a target physical space, especially in a retail space, which are observed to produce the behavioral analysis in the context of the present invention.
Sharma869 defined the exemplary shopping interaction levels based on the spatio-temporal relations, which are “passing by”, “noticing”, “stopping”, from “engaging 1” to “engaging P-1”, and “purchase”. They are labeled as “level 1” interaction, “level 2” interaction, “level 3” interaction, from “level 4” interaction to “level P-1” interaction, and “level p” interaction, respectively, where multiple engaging levels are also considered.
The shopping interaction level can be measured based on the temporal attribute of the person tracking for the customer in regards to the combination of the primitive behaviors. For example, if there is no change in velocity, the present invention can measure the customer's interaction level as a passer-by level at a particular category. If the stopping time T1 is greater than a threshold, such as T1 seconds, then the present invention can measure the customer's interaction level as a level 4 interaction. Likewise, the temporal attribute of the person tracking can match the time value to the corresponding interaction levels, based on the predefined threshold and rules.
Utilizing the exemplary method for behavior analysis based on the spatio-temporal primitives and model for the interaction levels, such as the shopping interaction levels in a retail space, based on the path analysis 470 of the people in a physical space, the present invention can detect 250 the predefined events and generate a list of the detected events 256.
FIG. 8 shows detailed exemplary processes of automatic detection of predefined events in another exemplary embodiment of the present invention, where the predefined event detection also uses the segmentation information of the people, such as demographics, in an exemplary automatic event detection module.
In the exemplary embodiment of the automatic event detection 255 shown in FIG. 8, the present invention can process the event detection 250 based on the behavior analysis of the people in a physical space and generate a list of detected events 256 as described in regards to FIG. 7.
As shown in FIG. 8, the computer vision based automatic segmentation 241 of the people on a video can also be used as one of the criteria to define certain types of events. Automatic demographic classification 814 can be used as an exemplary segmentation of the people.
In the exemplary embodiment shown in FIG. 8, the present invention can process segmentation 241 of the customer, such as the demographic classification 814, based on the images of the people in a first video stream 171 and use the segmentation 241 information to detect the predefined events based on the segmentation criteria.
The present invention can utilize any reliable demographic composition measurement method in the prior art as an exemplary video-based segmentation of the customers. For example, the above-mentioned U.S. Provisional Pat. No. 60/808,283 of Sharma, et al. (Sharma 60/808,283) disclosed an exemplary demographic composition measurement based on gender and ethnicity. Age is also another attribute that Sharma 60/808,283 can measure.
Automatic event detection based on the segmentation of the people in a physical space, such as a retail space, can provide unique benefits to the annotator and the owner of a particular embodiment of the present invention. For example, the detailed annotation labels can be efficiently organized based on the predefined segmentation criteria in the events. Detailed annotation labels per demographic groups can be a very useful market analysis data in an exemplary embodiment of the present invention.
While the above description contains much specificity, these should not be construed as limitations on the scope of the invention, but as exemplifications of the presently preferred embodiments thereof. Many other ramifications and variations are possible within the teachings of the invention. Thus, the scope of the invention should be determined by the appended claims and their legal equivalents, and not by the examples given.

Claims (20)

What is claimed is:
1. A method for efficiently annotating behavior and characteristics of a person or a plurality of persons in a first video stream and a second video stream in a physical space, comprising the following steps of:
a) capturing the first video stream of the person or the plurality of persons by a first means for capturing images,
b) processing the first video stream in order to track and detect predefined behavior and demographics of the person or the plurality of persons in a field of view of the first means for capturing images automatically using at least a means for control and processing that executes computer vision algorithms on the first video stream,
c) providing demographic segmentation of the person or the plurality of persons to create a plurality of demographic groups,
d) analyzing the behavior based on spatio-temporal primitives and a model for interaction levels of the person or the plurality of persons, wherein the behavior of each demographic group is analyzed to obtain segment-specific insights,
e) generating time-stamped lists of events based on the automatically detected predefined behavior and demographics, using a time server,
f) using the time-stamped lists of events and timestamps of events in the time-stamped lists of events to access at least a corresponding sub-stream for the events in the second video stream from a second means for capturing images,
g) manually annotating each of the events with a plurality of labels for a synchronized annotation using a user interface, and
h) utilizing the annotation for quantitative behavior analysis about interaction of the person or the plurality of persons with a plurality of commercial products in the physical space,
wherein the first video stream is synchronized with the second video stream, and whereby the user interface allows users to mark time-based annotations describing complex behavioral issues, including expressions of the person or the plurality of persons.
2. The method according to claim 1, wherein the method further comprises a step of utilizing a plurality of first means for capturing images,
wherein the plurality of first means for capturing images are synchronized with the second means for capturing images by using the time-stamped lists of events and the timestamps of events, and
whereby the first means for capturing images comprises an overhead top-down camera.
3. The method according to claim 1, wherein the method further comprises a step of utilizing a plurality of second means for capturing images,
whereby the second means for capturing images comprises a camera that is positioned to observe the person or the plurality of persons more closely for analyzing a specific event.
4. The method according to claim 1, wherein the method further comprises a step of using a graphical user interface for the annotation, whereby the graphical user interface is used to browse the first video stream and the second video stream based on the timestamps of the events, comprising beginning time and end time of the events.
5. The method according to claim 1, wherein the method further comprises a step of utilizing the time-stamped lists of events and the synchronized annotation to reduce time required to analyze a large amount of the first video stream and the second video stream.
6. The method according to claim 1, wherein the method further comprises a step of utilizing dwell time of the person or the plurality of persons to detect the predefined behavior,
whereby the predefined behavior based on the dwell time comprises passerby behavior and engaged shopper behavior.
7. The method according to claim 1, wherein the method further comprises a step of generating statistical reports by aggregating the annotated events.
8. The method according to claim 1, wherein the method further comprises a step of utilizing at least a wireless sensor and a wireless sensor based tracking to track and detect predefined behavior,
whereby the wireless sensor comprises a RFID and means for using the RFID.
9. The method according to claim 1, wherein the method further comprises a step of utilizing door sensors to detect predefined behavior.
10. The method according to claim 1, wherein the method further comprises a step of utilizing a rule-based logic module to synchronize the first video stream with the second video stream,
wherein the synchronization is adjusted dynamically based on rules defined in the rule-based logic module.
11. A system for efficiently annotating behavior and characteristics of a person or a plurality of persons in a first video stream and a second video stream in a physical space, comprising:
a) at least a first means for capturing images that captures the first video stream of the person or the plurality of persons,
b) at least a first means for control and processing that executes computer vision algorithms on the first video stream, performing the following steps of:
processing the first video stream in order to track and detect predefined behavior and demographics of the person or the plurality of persons in a field of view of the first means for capturing images automatically,
providing demographic segmentation of the person or the plurality of persons to create a plurality of demographic groups,
analyzing the behavior based on spatio-temporal primitives and a model for interaction levels of the person or the plurality of persons, wherein the behavior of each demographic group is analyzed to obtain segment-specific insights, and
generating time-stamped lists of events based on the automatically detected predefined behavior and demographics, using a time server, and
c) an annotation tool for using the time-stamped lists of events and timestamps of events in the time-stamped lists of events to access at least a corresponding sub-stream for the events in the second video stream from a second means for capturing images, and
for annotating each of the events with a plurality of labels for a synchronized annotation including a user interface for the annotation,
wherein the first video stream is synchronized with the second video stream,
wherein the annotation is utilized for quantitative behavior analysis about interaction of the person or the plurality of persons with a plurality of commercial products in the physical space, and
whereby the user interface allows users to mark time-based annotations describing complex behavioral issues, including expressions of the person or the plurality of persons.
12. The system according to claim 11, wherein the system further comprises at least an overhead top-down camera that is connected to the first means for control and processing,
wherein the overhead top-down camera is synchronized with the second means for capturing images by using the time-stamped lists of events and the timestamps of events.
13. The system according to claim 11, wherein the system further comprises a plurality of second means for capturing images,
whereby the second means for capturing images comprises a camera that is positioned to observe the person or the plurality of persons more closely for analyzing a specific event.
14. The system according to claim 11, wherein the system further comprises a graphical user interface for the annotation,
whereby the graphical user interface is used to browse the first video stream and the second video stream based on the timestamps of the events, comprising beginning time and end time of the events.
15. The system according to claim 11, wherein the system further comprises a means for control and processing for utilizing the time-stamped lists of events and the synchronized annotation to reduce time required to analyze a large amount of the first video stream and the second video stream.
16. The system according to claim 11, wherein the system further comprises a means for control and processing for utilizing dwell time of the person or the plurality of persons to detect the predefined behavior,
whereby the predefined behavior based on the dwell time comprises passerby behavior and engaged shopper behavior.
17. The system according to claim 11, wherein the system further comprises a means for control and processing for generating statistical reports by aggregating the annotated events.
18. The system according to claim 11, wherein the system further comprises at least a wireless sensor and a means for control and processing that utilizes a wireless sensor based tracking to track and detect predefined behavior,
whereby the wireless sensor comprises a RFID and means for using the RFID.
19. The system according to claim 11, wherein the system further comprises a means for control and processing for utilizing door sensors to detect predefined behavior.
20. The system according to claim 11, wherein the system further comprises a means for control and processing for utilizing a rule-based logic module to synchronize the first video stream with the second video stream, wherein the synchronization is adjusted dynamically based on rules defined in the rule-based logic module.
US12/011,385 2007-01-30 2008-01-25 Method and system for optimizing the observation and annotation of complex human behavior from video sources Active 2032-05-23 US8665333B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/011,385 US8665333B1 (en) 2007-01-30 2008-01-25 Method and system for optimizing the observation and annotation of complex human behavior from video sources

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US89831107P 2007-01-30 2007-01-30
US12/011,385 US8665333B1 (en) 2007-01-30 2008-01-25 Method and system for optimizing the observation and annotation of complex human behavior from video sources

Publications (1)

Publication Number Publication Date
US8665333B1 true US8665333B1 (en) 2014-03-04

Family

ID=50158782

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/011,385 Active 2032-05-23 US8665333B1 (en) 2007-01-30 2008-01-25 Method and system for optimizing the observation and annotation of complex human behavior from video sources

Country Status (1)

Country Link
US (1) US8665333B1 (en)

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120120201A1 (en) * 2010-07-26 2012-05-17 Matthew Ward Method of integrating ad hoc camera networks in interactive mesh systems
US20130036011A1 (en) * 2011-08-01 2013-02-07 Verizon Patent And Licensing, Inc. Targeted Advertisement Content Presentation Methods and Systems
US20130050496A1 (en) * 2011-08-25 2013-02-28 Electronics & Telecommunications Research Institute Security monitoring method and apparatus using augmented reality
US20130159460A1 (en) * 2011-12-16 2013-06-20 Mindshare Networks, Inc. Harnessing naturally occurring characteristics of social networks
US20130311230A1 (en) * 2012-05-17 2013-11-21 Catalina Marketing Corporation System and method of initiating in-trip audits in a self-checkout system
US20140063262A1 (en) * 2012-08-31 2014-03-06 Ncr Corporation Techniques for checkout security using video surveillance
US20140198216A1 (en) * 2011-09-22 2014-07-17 Koninklijke Philips N.V. Imaging service using outdoor lighting networks
US20140313413A1 (en) * 2011-12-19 2014-10-23 Nec Corporation Time synchronization information computation device, time synchronization information computation method and time synchronization information computation program
US20140363049A1 (en) * 2011-12-21 2014-12-11 Universite Pierre Et Marie Curie (Paris 6) Method of estimating optical flow on the basis of an asynchronous light sensor
US20150356062A1 (en) * 2014-06-06 2015-12-10 International Business Machines Corporation Indexing and annotating a usability test recording
US20160078286A1 (en) * 2013-04-26 2016-03-17 Nec Corporation Monitoring device, monitoring method and monitoring program
US20160078302A1 (en) * 2014-09-11 2016-03-17 Iomniscient Pty Ltd. Image management system
US9508239B1 (en) 2013-12-06 2016-11-29 SkyBell Technologies, Inc. Doorbell package detection systems and methods
US9736284B2 (en) 2013-07-26 2017-08-15 SkyBell Technologies, Inc. Doorbell communication and electrical systems
US9743049B2 (en) 2013-12-06 2017-08-22 SkyBell Technologies, Inc. Doorbell communication systems and methods
US20170241185A1 (en) * 2008-07-18 2017-08-24 Robert Osann, Jr. Moving door system synchronized with pedestrians passing there-through
US9769435B2 (en) 2014-08-11 2017-09-19 SkyBell Technologies, Inc. Monitoring systems and methods
US9997036B2 (en) 2015-02-17 2018-06-12 SkyBell Technologies, Inc. Power outlet cameras
US10044519B2 (en) 2015-01-05 2018-08-07 SkyBell Technologies, Inc. Doorbell communication systems and methods
JP2018534826A (en) * 2015-09-23 2018-11-22 ノキア テクノロジーズ オーユー Select video content
US10204467B2 (en) 2013-07-26 2019-02-12 SkyBell Technologies, Inc. Smart lock systems and methods
US10262331B1 (en) 2016-01-29 2019-04-16 Videomining Corporation Cross-channel in-store shopper behavior analysis
US10354262B1 (en) 2016-06-02 2019-07-16 Videomining Corporation Brand-switching analysis using longitudinal tracking of at-shelf shopper behavior
EP3311334A4 (en) * 2015-06-18 2019-08-07 Wizr Cloud platform with multi camera synchronization
US10387896B1 (en) 2016-04-27 2019-08-20 Videomining Corporation At-shelf brand strength tracking and decision analytics
EP3405889A4 (en) * 2016-01-21 2019-08-28 Wizr LLC Cloud platform with multi camera synchronization
US10462097B2 (en) * 2013-12-16 2019-10-29 Inbubbles Inc. Space time region based communications
US10489660B2 (en) 2016-01-21 2019-11-26 Wizr Llc Video processing with object identification
US10572843B2 (en) * 2014-02-14 2020-02-25 Bby Solutions, Inc. Wireless customer and labor management optimization in retail settings
US10665072B1 (en) * 2013-11-12 2020-05-26 Kuna Systems Corporation Sensor to characterize the behavior of a visitor or a notable event
US10733823B2 (en) 2013-07-26 2020-08-04 Skybell Technologies Ip, Llc Garage door communication systems and methods
US10909825B2 (en) 2017-09-18 2021-02-02 Skybell Technologies Ip, Llc Outdoor security systems and methods
US10922555B1 (en) * 2019-10-25 2021-02-16 7-Eleven, Inc. Customer-based video feed
US10963893B1 (en) 2016-02-23 2021-03-30 Videomining Corporation Personalized decision tree based on in-store behavior analysis
US11017229B2 (en) 2019-10-25 2021-05-25 7-Eleven, Inc. System and method for selectively verifying algorithmically populated shopping carts
US11023728B1 (en) 2019-10-25 2021-06-01 7-Eleven, Inc. Machine learning algorithm trained to identify algorithmically populated shopping carts as candidates for verification
US11074790B2 (en) 2019-08-24 2021-07-27 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11087271B1 (en) 2017-03-27 2021-08-10 Amazon Technologies, Inc. Identifying user-item interactions in an automated facility
US11102027B2 (en) 2013-07-26 2021-08-24 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11132877B2 (en) 2013-07-26 2021-09-28 Skybell Technologies Ip, Llc Doorbell communities
US11140253B2 (en) 2013-07-26 2021-10-05 Skybell Technologies Ip, Llc Doorbell communication and electrical systems
US11184589B2 (en) 2014-06-23 2021-11-23 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11210531B2 (en) * 2018-08-20 2021-12-28 Canon Kabushiki Kaisha Information processing apparatus for presenting location to be observed, and method of the same
US11228739B2 (en) 2015-03-07 2022-01-18 Skybell Technologies Ip, Llc Garage door communication systems and methods
US11238401B1 (en) 2017-03-27 2022-02-01 Amazon Technologies, Inc. Identifying user-item interactions in an automated facility
US11295135B2 (en) * 2020-05-29 2022-04-05 Corning Research & Development Corporation Asset tracking of communication equipment via mixed reality based labeling
US11326387B2 (en) * 2008-07-18 2022-05-10 Robert Osann, Jr. Automatic access control devices and clusters thereof
US11343473B2 (en) 2014-06-23 2022-05-24 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11354683B1 (en) 2015-12-30 2022-06-07 Videomining Corporation Method and system for creating anonymous shopper panel using multi-modal sensor fusion
US11361641B2 (en) 2016-01-27 2022-06-14 Skybell Technologies Ip, Llc Doorbell package detection systems and methods
US11374808B2 (en) 2020-05-29 2022-06-28 Corning Research & Development Corporation Automated logging of patching operations via mixed reality based labeling
US11380091B2 (en) 2019-10-25 2022-07-05 7-Eleven, Inc. System and method for populating a virtual shopping cart based on a verification of algorithmic determinations of items selected during a shopping session in a physical store
US11381686B2 (en) 2015-04-13 2022-07-05 Skybell Technologies Ip, Llc Power outlet cameras
US11386730B2 (en) 2013-07-26 2022-07-12 Skybell Technologies Ip, Llc Smart lock systems and methods
US11386647B2 (en) 2019-10-25 2022-07-12 7-Eleven, Inc. System and method for processing a refund request arising from a shopping session in a cashierless store
US20220264053A1 (en) * 2019-10-30 2022-08-18 Beijing Bytedance Network Technology Co., Ltd. Video processing method and device, terminal, and storage medium
US11494729B1 (en) * 2017-03-27 2022-11-08 Amazon Technologies, Inc. Identifying user-item interactions in an automated facility
US11575537B2 (en) 2015-03-27 2023-02-07 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11615430B1 (en) * 2014-02-05 2023-03-28 Videomining Corporation Method and system for measuring in-store location effectiveness based on shopper response and behavior analysis
US20230103735A1 (en) * 2021-10-05 2023-04-06 Motorola Solutions, Inc. Method, system and computer program product for reducing learning time for a newly installed camera
US11651668B2 (en) 2017-10-20 2023-05-16 Skybell Technologies Ip, Llc Doorbell communities
US11651665B2 (en) 2013-07-26 2023-05-16 Skybell Technologies Ip, Llc Doorbell communities
US11889009B2 (en) 2013-07-26 2024-01-30 Skybell Technologies Ip, Llc Doorbell communication and electrical systems

Citations (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5600368A (en) * 1994-11-09 1997-02-04 Microsoft Corporation Interactive television system and method for viewer control of multiple camera viewpoints in broadcast programming
US5745036A (en) * 1996-09-12 1998-04-28 Checkpoint Systems, Inc. Electronic article security system for store which uses intelligent security tags and transaction data
US20010049826A1 (en) * 2000-01-19 2001-12-06 Itzhak Wilf Method of searching video channels by content
US20020085092A1 (en) 2000-11-14 2002-07-04 Samsung Electronics Co., Ltd. Object activity modeling method
US6430357B1 (en) * 1998-09-22 2002-08-06 Ati International Srl Text data extraction system for interleaved video data streams
US20020161804A1 (en) * 2001-04-26 2002-10-31 Patrick Chiu Internet-based system for multimedia meeting minutes
US20020178085A1 (en) * 2001-05-15 2002-11-28 Herb Sorensen Purchase selection behavior analysis system and method
US20030002712A1 (en) * 2001-07-02 2003-01-02 Malcolm Steenburgh Method and apparatus for measuring dwell time of objects in an environment
US6531963B1 (en) * 2000-01-18 2003-03-11 Jan Bengtsson Method for monitoring the movements of individuals in and around buildings, rooms and the like
US20030053659A1 (en) * 2001-06-29 2003-03-20 Honeywell International Inc. Moving object assessment system and method
US20030058339A1 (en) 2001-09-27 2003-03-27 Koninklijke Philips Electronics N.V. Method and apparatus for detecting an event based on patterns of behavior
US20030108223A1 (en) * 1998-10-22 2003-06-12 Prokoski Francine J. Method and apparatus for aligning and comparing images of the face and body from different imagers
US20030110038A1 (en) 2001-10-16 2003-06-12 Rajeev Sharma Multi-modal gender classification using support vector machines (SVMs)
US6597391B2 (en) * 1997-09-17 2003-07-22 Sony United Kingdom Limited Security system
US20040032495A1 (en) * 2000-10-26 2004-02-19 Ortiz Luis M. Providing multiple synchronized camera views for broadcast from a live venue activity to remote viewers
US20040078809A1 (en) * 2000-05-19 2004-04-22 Jonathan Drazin Targeted advertising system
US6741973B1 (en) 1997-04-04 2004-05-25 Ncr Corporation Consumer model
US20040113933A1 (en) 2002-10-08 2004-06-17 Northrop Grumman Corporation Split and merge behavior analysis and understanding using Hidden Markov Models
US20040120581A1 (en) 2002-08-27 2004-06-24 Ozer I. Burak Method and apparatus for automated video activity analysis
US20040131254A1 (en) * 2000-11-24 2004-07-08 Yiqing Liang System and method for object identification and behavior characterization using video analysis
US20040161133A1 (en) * 2002-02-06 2004-08-19 Avishai Elazar System and method for video content analysis-based detection, surveillance and alarm management
US20050286774A1 (en) 2004-06-28 2005-12-29 Porikli Fatih M Usual event detection in a video using object and frame features
US20060010030A1 (en) * 2004-07-09 2006-01-12 Sorensen Associates Inc System and method for modeling shopping behavior
US20060010028A1 (en) * 2003-11-14 2006-01-12 Herb Sorensen Video shopper tracking system and method
US6990217B1 (en) * 1999-11-22 2006-01-24 Mitsubishi Electric Research Labs. Inc. Gender classification with support vector machines
US20060023073A1 (en) * 2004-07-27 2006-02-02 Microsoft Corporation System and method for interactive multi-view video
US20060047674A1 (en) * 2004-09-01 2006-03-02 Mohammed Zubair Visharam Method and apparatus for supporting storage of multiple camera views
US20060053342A1 (en) 2004-09-09 2006-03-09 Bazakos Michael E Unsupervised learning of events in a video sequence
WO2006106496A1 (en) * 2005-04-03 2006-10-12 Nice Systems Ltd. Apparatus and methods for the semi-automatic tracking and examining of an object or an event in a monitored site
US20060239645A1 (en) * 2005-03-31 2006-10-26 Honeywell International Inc. Event packaged video sequence
US20070055563A1 (en) * 2000-08-29 2007-03-08 Godsey Ronald G System and methods for tracking consumers in a store environment
US20070208263A1 (en) * 2006-03-01 2007-09-06 Michael Sasha John Systems and methods of medical monitoring according to patient state
US20070250901A1 (en) * 2006-03-30 2007-10-25 Mcintire John P Method and apparatus for annotating media streams
US7536706B1 (en) * 1998-08-24 2009-05-19 Sharp Laboratories Of America, Inc. Information enhanced audio video encoding system
US7623755B2 (en) * 2006-08-17 2009-11-24 Adobe Systems Incorporated Techniques for positioning audio and video clips
US20100002082A1 (en) * 2005-03-25 2010-01-07 Buehler Christopher J Intelligent camera selection and object tracking
US20110261172A1 (en) * 2008-04-17 2011-10-27 Terry Robert L Stereoscopic viewer
US8295597B1 (en) * 2007-03-14 2012-10-23 Videomining Corporation Method and system for segmenting people in a physical space based on automatic behavior analysis

Patent Citations (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5600368A (en) * 1994-11-09 1997-02-04 Microsoft Corporation Interactive television system and method for viewer control of multiple camera viewpoints in broadcast programming
US5745036A (en) * 1996-09-12 1998-04-28 Checkpoint Systems, Inc. Electronic article security system for store which uses intelligent security tags and transaction data
US6741973B1 (en) 1997-04-04 2004-05-25 Ncr Corporation Consumer model
US6597391B2 (en) * 1997-09-17 2003-07-22 Sony United Kingdom Limited Security system
US7536706B1 (en) * 1998-08-24 2009-05-19 Sharp Laboratories Of America, Inc. Information enhanced audio video encoding system
US6430357B1 (en) * 1998-09-22 2002-08-06 Ati International Srl Text data extraction system for interleaved video data streams
US20030108223A1 (en) * 1998-10-22 2003-06-12 Prokoski Francine J. Method and apparatus for aligning and comparing images of the face and body from different imagers
US6990217B1 (en) * 1999-11-22 2006-01-24 Mitsubishi Electric Research Labs. Inc. Gender classification with support vector machines
US6531963B1 (en) * 2000-01-18 2003-03-11 Jan Bengtsson Method for monitoring the movements of individuals in and around buildings, rooms and the like
US20010049826A1 (en) * 2000-01-19 2001-12-06 Itzhak Wilf Method of searching video channels by content
US20040078809A1 (en) * 2000-05-19 2004-04-22 Jonathan Drazin Targeted advertising system
US20070055563A1 (en) * 2000-08-29 2007-03-08 Godsey Ronald G System and methods for tracking consumers in a store environment
US7796162B2 (en) * 2000-10-26 2010-09-14 Front Row Technologies, Llc Providing multiple synchronized camera views for broadcast from a live venue activity to remote viewers
US20040032495A1 (en) * 2000-10-26 2004-02-19 Ortiz Luis M. Providing multiple synchronized camera views for broadcast from a live venue activity to remote viewers
US20020085092A1 (en) 2000-11-14 2002-07-04 Samsung Electronics Co., Ltd. Object activity modeling method
US20040131254A1 (en) * 2000-11-24 2004-07-08 Yiqing Liang System and method for object identification and behavior characterization using video analysis
US20020161804A1 (en) * 2001-04-26 2002-10-31 Patrick Chiu Internet-based system for multimedia meeting minutes
US20020178085A1 (en) * 2001-05-15 2002-11-28 Herb Sorensen Purchase selection behavior analysis system and method
US7006982B2 (en) 2001-05-15 2006-02-28 Sorensen Associates Inc. Purchase selection behavior analysis system and method utilizing a visibility measure
US20030053659A1 (en) * 2001-06-29 2003-03-20 Honeywell International Inc. Moving object assessment system and method
US20030002712A1 (en) * 2001-07-02 2003-01-02 Malcolm Steenburgh Method and apparatus for measuring dwell time of objects in an environment
US20030058339A1 (en) 2001-09-27 2003-03-27 Koninklijke Philips Electronics N.V. Method and apparatus for detecting an event based on patterns of behavior
US20030110038A1 (en) 2001-10-16 2003-06-12 Rajeev Sharma Multi-modal gender classification using support vector machines (SVMs)
US20040161133A1 (en) * 2002-02-06 2004-08-19 Avishai Elazar System and method for video content analysis-based detection, surveillance and alarm management
US20040120581A1 (en) 2002-08-27 2004-06-24 Ozer I. Burak Method and apparatus for automated video activity analysis
US20040113933A1 (en) 2002-10-08 2004-06-17 Northrop Grumman Corporation Split and merge behavior analysis and understanding using Hidden Markov Models
US20060010028A1 (en) * 2003-11-14 2006-01-12 Herb Sorensen Video shopper tracking system and method
US20050286774A1 (en) 2004-06-28 2005-12-29 Porikli Fatih M Usual event detection in a video using object and frame features
US20060010030A1 (en) * 2004-07-09 2006-01-12 Sorensen Associates Inc System and method for modeling shopping behavior
US20060023073A1 (en) * 2004-07-27 2006-02-02 Microsoft Corporation System and method for interactive multi-view video
US20060047674A1 (en) * 2004-09-01 2006-03-02 Mohammed Zubair Visharam Method and apparatus for supporting storage of multiple camera views
US20060053342A1 (en) 2004-09-09 2006-03-09 Bazakos Michael E Unsupervised learning of events in a video sequence
US20100002082A1 (en) * 2005-03-25 2010-01-07 Buehler Christopher J Intelligent camera selection and object tracking
US20060239645A1 (en) * 2005-03-31 2006-10-26 Honeywell International Inc. Event packaged video sequence
WO2006106496A1 (en) * 2005-04-03 2006-10-12 Nice Systems Ltd. Apparatus and methods for the semi-automatic tracking and examining of an object or an event in a monitored site
US20070208263A1 (en) * 2006-03-01 2007-09-06 Michael Sasha John Systems and methods of medical monitoring according to patient state
US20070250901A1 (en) * 2006-03-30 2007-10-25 Mcintire John P Method and apparatus for annotating media streams
US7623755B2 (en) * 2006-08-17 2009-11-24 Adobe Systems Incorporated Techniques for positioning audio and video clips
US8295597B1 (en) * 2007-03-14 2012-10-23 Videomining Corporation Method and system for segmenting people in a physical space based on automatic behavior analysis
US20110261172A1 (en) * 2008-04-17 2011-10-27 Terry Robert L Stereoscopic viewer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
U.S. Appl. No. 60/808,283, Sharma, et al.
U.S. Appl. No. 60/846,014, Sharma, et al.

Cited By (90)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10590693B2 (en) * 2008-07-18 2020-03-17 Robert Osann, Jr. Moving door system synchronized with pedestrians passing there-through
US20170241185A1 (en) * 2008-07-18 2017-08-24 Robert Osann, Jr. Moving door system synchronized with pedestrians passing there-through
US11326387B2 (en) * 2008-07-18 2022-05-10 Robert Osann, Jr. Automatic access control devices and clusters thereof
US20120120201A1 (en) * 2010-07-26 2012-05-17 Matthew Ward Method of integrating ad hoc camera networks in interactive mesh systems
US11107122B2 (en) * 2011-08-01 2021-08-31 Verizon and Patent Licensing Inc. Targeted advertisement content presentation methods and systems
US20130036011A1 (en) * 2011-08-01 2013-02-07 Verizon Patent And Licensing, Inc. Targeted Advertisement Content Presentation Methods and Systems
US20130050496A1 (en) * 2011-08-25 2013-02-28 Electronics & Telecommunications Research Institute Security monitoring method and apparatus using augmented reality
US10070100B2 (en) * 2011-09-22 2018-09-04 Philips Lighting Holding B.V. Imaging service using outdoor lighting networks
US20140198216A1 (en) * 2011-09-22 2014-07-17 Koninklijke Philips N.V. Imaging service using outdoor lighting networks
US20130159460A1 (en) * 2011-12-16 2013-06-20 Mindshare Networks, Inc. Harnessing naturally occurring characteristics of social networks
US9137295B2 (en) * 2011-12-16 2015-09-15 Mindshare Networks Determining audience engagement levels with presentations and providing content based on the engagement levels
US9210300B2 (en) * 2011-12-19 2015-12-08 Nec Corporation Time synchronization information computation device for synchronizing a plurality of videos, time synchronization information computation method for synchronizing a plurality of videos and time synchronization information computation program for synchronizing a plurality of videos
US20140313413A1 (en) * 2011-12-19 2014-10-23 Nec Corporation Time synchronization information computation device, time synchronization information computation method and time synchronization information computation program
US9213902B2 (en) * 2011-12-21 2015-12-15 Universite Pierre Et Marie Curie (Paris 6) Method of estimating optical flow on the basis of an asynchronous light sensor
US20140363049A1 (en) * 2011-12-21 2014-12-11 Universite Pierre Et Marie Curie (Paris 6) Method of estimating optical flow on the basis of an asynchronous light sensor
US11170329B1 (en) 2012-05-17 2021-11-09 Catalina Marketing Corporation System and method of initiating in-trip audits in a self-checkout system
US10387817B2 (en) * 2012-05-17 2019-08-20 Catalina Marketing Corporation System and method of initiating in-trip audits in a self-checkout system
US20220058538A1 (en) * 2012-05-17 2022-02-24 Catalina Marketing Corporation System and method of initiating in-trip audits in a self-checkout system
US20130311230A1 (en) * 2012-05-17 2013-11-21 Catalina Marketing Corporation System and method of initiating in-trip audits in a self-checkout system
US20140063262A1 (en) * 2012-08-31 2014-03-06 Ncr Corporation Techniques for checkout security using video surveillance
US9311645B2 (en) * 2012-08-31 2016-04-12 Ncr Corporation Techniques for checkout security using video surveillance
US20160078286A1 (en) * 2013-04-26 2016-03-17 Nec Corporation Monitoring device, monitoring method and monitoring program
US9946921B2 (en) * 2013-04-26 2018-04-17 Nec Corporation Monitoring device, monitoring method and monitoring program
US11889009B2 (en) 2013-07-26 2024-01-30 Skybell Technologies Ip, Llc Doorbell communication and electrical systems
US11651665B2 (en) 2013-07-26 2023-05-16 Skybell Technologies Ip, Llc Doorbell communities
US11140253B2 (en) 2013-07-26 2021-10-05 Skybell Technologies Ip, Llc Doorbell communication and electrical systems
US11132877B2 (en) 2013-07-26 2021-09-28 Skybell Technologies Ip, Llc Doorbell communities
US11102027B2 (en) 2013-07-26 2021-08-24 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US10204467B2 (en) 2013-07-26 2019-02-12 SkyBell Technologies, Inc. Smart lock systems and methods
US9736284B2 (en) 2013-07-26 2017-08-15 SkyBell Technologies, Inc. Doorbell communication and electrical systems
US11362853B2 (en) 2013-07-26 2022-06-14 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11386730B2 (en) 2013-07-26 2022-07-12 Skybell Technologies Ip, Llc Smart lock systems and methods
US10733823B2 (en) 2013-07-26 2020-08-04 Skybell Technologies Ip, Llc Garage door communication systems and methods
US10665072B1 (en) * 2013-11-12 2020-05-26 Kuna Systems Corporation Sensor to characterize the behavior of a visitor or a notable event
US9799183B2 (en) * 2013-12-06 2017-10-24 SkyBell Technologies, Inc. Doorbell package detection systems and methods
US9743049B2 (en) 2013-12-06 2017-08-22 SkyBell Technologies, Inc. Doorbell communication systems and methods
US9508239B1 (en) 2013-12-06 2016-11-29 SkyBell Technologies, Inc. Doorbell package detection systems and methods
US10462097B2 (en) * 2013-12-16 2019-10-29 Inbubbles Inc. Space time region based communications
US11140120B2 (en) * 2013-12-16 2021-10-05 Inbubbles Inc. Space time region based communications
US11615430B1 (en) * 2014-02-05 2023-03-28 Videomining Corporation Method and system for measuring in-store location effectiveness based on shopper response and behavior analysis
US11288606B2 (en) 2014-02-14 2022-03-29 Bby Solutions, Inc. Wireless customer and labor management optimization in retail settings
US10572843B2 (en) * 2014-02-14 2020-02-25 Bby Solutions, Inc. Wireless customer and labor management optimization in retail settings
US10649634B2 (en) * 2014-06-06 2020-05-12 International Business Machines Corporation Indexing and annotating a usability test recording
US20150356062A1 (en) * 2014-06-06 2015-12-10 International Business Machines Corporation Indexing and annotating a usability test recording
US11184589B2 (en) 2014-06-23 2021-11-23 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11343473B2 (en) 2014-06-23 2022-05-24 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US9769435B2 (en) 2014-08-11 2017-09-19 SkyBell Technologies, Inc. Monitoring systems and methods
US20160078302A1 (en) * 2014-09-11 2016-03-17 Iomniscient Pty Ltd. Image management system
US9892325B2 (en) * 2014-09-11 2018-02-13 Iomniscient Pty Ltd Image management system
US10044519B2 (en) 2015-01-05 2018-08-07 SkyBell Technologies, Inc. Doorbell communication systems and methods
US9997036B2 (en) 2015-02-17 2018-06-12 SkyBell Technologies, Inc. Power outlet cameras
US11228739B2 (en) 2015-03-07 2022-01-18 Skybell Technologies Ip, Llc Garage door communication systems and methods
US11871155B2 (en) 2015-03-07 2024-01-09 Skybell Technologies Ip, Llc Garage door communication systems and methods
US11388373B2 (en) 2015-03-07 2022-07-12 Skybell Technologies Ip, Llc Garage door communication systems and methods
US11575537B2 (en) 2015-03-27 2023-02-07 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11381686B2 (en) 2015-04-13 2022-07-05 Skybell Technologies Ip, Llc Power outlet cameras
EP3311334A4 (en) * 2015-06-18 2019-08-07 Wizr Cloud platform with multi camera synchronization
JP2018534826A (en) * 2015-09-23 2018-11-22 ノキア テクノロジーズ オーユー Select video content
US11354683B1 (en) 2015-12-30 2022-06-07 Videomining Corporation Method and system for creating anonymous shopper panel using multi-modal sensor fusion
US10489660B2 (en) 2016-01-21 2019-11-26 Wizr Llc Video processing with object identification
EP3405889A4 (en) * 2016-01-21 2019-08-28 Wizr LLC Cloud platform with multi camera synchronization
US11361641B2 (en) 2016-01-27 2022-06-14 Skybell Technologies Ip, Llc Doorbell package detection systems and methods
US10262331B1 (en) 2016-01-29 2019-04-16 Videomining Corporation Cross-channel in-store shopper behavior analysis
US10963893B1 (en) 2016-02-23 2021-03-30 Videomining Corporation Personalized decision tree based on in-store behavior analysis
US10387896B1 (en) 2016-04-27 2019-08-20 Videomining Corporation At-shelf brand strength tracking and decision analytics
US10354262B1 (en) 2016-06-02 2019-07-16 Videomining Corporation Brand-switching analysis using longitudinal tracking of at-shelf shopper behavior
US11238401B1 (en) 2017-03-27 2022-02-01 Amazon Technologies, Inc. Identifying user-item interactions in an automated facility
US11887051B1 (en) 2017-03-27 2024-01-30 Amazon Technologies, Inc. Identifying user-item interactions in an automated facility
US11494729B1 (en) * 2017-03-27 2022-11-08 Amazon Technologies, Inc. Identifying user-item interactions in an automated facility
US11087271B1 (en) 2017-03-27 2021-08-10 Amazon Technologies, Inc. Identifying user-item interactions in an automated facility
US11810436B2 (en) 2017-09-18 2023-11-07 Skybell Technologies Ip, Llc Outdoor security systems and methods
US10909825B2 (en) 2017-09-18 2021-02-02 Skybell Technologies Ip, Llc Outdoor security systems and methods
US11651668B2 (en) 2017-10-20 2023-05-16 Skybell Technologies Ip, Llc Doorbell communities
US11210531B2 (en) * 2018-08-20 2021-12-28 Canon Kabushiki Kaisha Information processing apparatus for presenting location to be observed, and method of the same
US11074790B2 (en) 2019-08-24 2021-07-27 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11854376B2 (en) 2019-08-24 2023-12-26 Skybell Technologies Ip, Llc Doorbell communication systems and methods
US11380091B2 (en) 2019-10-25 2022-07-05 7-Eleven, Inc. System and method for populating a virtual shopping cart based on a verification of algorithmic determinations of items selected during a shopping session in a physical store
US11017229B2 (en) 2019-10-25 2021-05-25 7-Eleven, Inc. System and method for selectively verifying algorithmically populated shopping carts
US11023728B1 (en) 2019-10-25 2021-06-01 7-Eleven, Inc. Machine learning algorithm trained to identify algorithmically populated shopping carts as candidates for verification
US11151388B2 (en) 2019-10-25 2021-10-19 7-Eleven, Inc. Customer-based video feed
US11386647B2 (en) 2019-10-25 2022-07-12 7-Eleven, Inc. System and method for processing a refund request arising from a shopping session in a cashierless store
US11475656B2 (en) 2019-10-25 2022-10-18 7-Eleven, Inc. System and method for selectively verifying algorithmically populated shopping carts
US11475674B2 (en) 2019-10-25 2022-10-18 7-Eleven, Inc. Customer-based video feed
US10922555B1 (en) * 2019-10-25 2021-02-16 7-Eleven, Inc. Customer-based video feed
US11475657B2 (en) 2019-10-25 2022-10-18 7-Eleven, Inc. Machine learning algorithm trained to identify algorithmically populated shopping carts as candidates for verification
US20220264053A1 (en) * 2019-10-30 2022-08-18 Beijing Bytedance Network Technology Co., Ltd. Video processing method and device, terminal, and storage medium
US11374808B2 (en) 2020-05-29 2022-06-28 Corning Research & Development Corporation Automated logging of patching operations via mixed reality based labeling
US11295135B2 (en) * 2020-05-29 2022-04-05 Corning Research & Development Corporation Asset tracking of communication equipment via mixed reality based labeling
US11682214B2 (en) * 2021-10-05 2023-06-20 Motorola Solutions, Inc. Method, system and computer program product for reducing learning time for a newly installed camera
US20230103735A1 (en) * 2021-10-05 2023-04-06 Motorola Solutions, Inc. Method, system and computer program product for reducing learning time for a newly installed camera

Similar Documents

Publication Publication Date Title
US8665333B1 (en) Method and system for optimizing the observation and annotation of complex human behavior from video sources
US8295597B1 (en) Method and system for segmenting people in a physical space based on automatic behavior analysis
US11669979B2 (en) Method of searching data to identify images of an object captured by a camera system
US7957565B1 (en) Method and system for recognizing employees in a physical space based on automatic behavior analysis
US7930204B1 (en) Method and system for narrowcasting based on automatic analysis of customer behavior in a retail store
US8009863B1 (en) Method and system for analyzing shopping behavior using multiple sensor tracking
US9740977B1 (en) Method and system for recognizing the intentions of shoppers in retail aisles based on their trajectories
US7974869B1 (en) Method and system for automatically measuring and forecasting the behavioral characterization of customers to help customize programming contents in a media network
Ge et al. Vision-based analysis of small groups in pedestrian crowds
US8351647B2 (en) Automatic detection and aggregation of demographics and behavior of people
US8189926B2 (en) Method and system for automatically analyzing categories in a physical space based on the visual characterization of people
US8219438B1 (en) Method and system for measuring shopper response to products based on behavior and facial expression
Makris et al. Learning semantic scene models from observing activity in visual surveillance
US7987111B1 (en) Method and system for characterizing physical retail spaces by determining the demographic composition of people in the physical retail spaces utilizing video image analysis
US9471832B2 (en) Human activity determination from video
US20170169297A1 (en) Computer-vision-based group identification
Popa et al. Analysis of shopping behavior based on surveillance system
Benezeth et al. Abnormality detection using low-level co-occurring events
Popa et al. Semantic assessment of shopping behavior using trajectories, shopping related actions, and context information
Ferryman et al. Robust abandoned object detection integrating wide area visual surveillance and social context
Liu et al. Customer behavior classification using surveillance camera for marketing
US11004093B1 (en) Method and system for detecting shopping groups based on trajectory dynamics
US11615430B1 (en) Method and system for measuring in-store location effectiveness based on shopper response and behavior analysis
Huang et al. Unsupervised pedestrian re-identification for loitering detection
CN112347907A (en) 4S store visitor behavior analysis system based on Reid and face recognition technology

Legal Events

Date Code Title Description
AS Assignment

Owner name: VIDEOMINING CORPORATION, PENNSYLVANIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHARMA, RAJEEV;MUMMAREDDY, SATISH;SCHAPIRA, EMILIO;AND OTHERS;SIGNING DATES FROM 20080417 TO 20080505;REEL/FRAME:021067/0728

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: SEIG TRUST #1 (PHILIP H. SEIG, TRUSTEE), PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: WEIDNER, DEAN A., PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: PARMER, GEORGE A., PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: PAPSON, MICHAEL G., PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: AGAMEMNON HOLDINGS, PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: PEARSON, CHARLES C., JR, PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: SCHIANO, ANTHONY J., PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: BRENNAN, MICHAEL, PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: MESSIAH COLLEGE, PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: SEIG TRUST #1 (PHILIP H. SEIG, TRUSTEE), PENNSYLVA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: STRUTHERS, RICHARD K., PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: BENTZ, RICHARD E., PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

Owner name: POOLE, ROBERT E., PENNSYLVANIA

Free format text: SECURITY INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:033860/0257

Effective date: 20130531

FEPP Fee payment procedure

Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: 9051147 CANADA INC., CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VIDEOMINING CORPORATION;REEL/FRAME:034881/0556

Effective date: 20150116

AS Assignment

Owner name: VIDEOMINING CORPORATION, PENNSYLVANIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNORS:STRUTHERS, RICHARD K.;SEIG TRUST #1 (PHILIP H. SEIG, TRUSTEE);SCHIANO, ANTHONY J.;AND OTHERS;REEL/FRAME:034967/0881

Effective date: 20150116

AS Assignment

Owner name: VIDEO MINING CORPORATION, PENNSYLVANIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNORS:PARMER, GEORGE A.;PEARSON, CHARLES C., JR;WEIDNER, DEAN A.;AND OTHERS;REEL/FRAME:035039/0632

Effective date: 20150116

AS Assignment

Owner name: HSBC BANK CANADA, CANADA

Free format text: SECURITY INTEREST;ASSIGNOR:CANADA INC.;REEL/FRAME:035387/0176

Effective date: 20150407

AS Assignment

Owner name: AVIGILON PATENT HOLDING 1 CORPORATION, BRITISH COL

Free format text: CHANGE OF NAME;ASSIGNOR:9051147 CANADA INC.;REEL/FRAME:040886/0579

Effective date: 20151120

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

AS Assignment

Owner name: AVIGILON PATENT HOLDING 1 CORPORATION, CANADA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:HSBC BANK CANADA;REEL/FRAME:046895/0803

Effective date: 20180813

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

AS Assignment

Owner name: MOTOROLA SOLUTIONS, INC., ILLINOIS

Free format text: NUNC PRO TUNC ASSIGNMENT;ASSIGNOR:AVIGILON PATENT HOLDING 1 CORPORATION;REEL/FRAME:062034/0176

Effective date: 20220411