WO2012027891A1

WO2012027891A1 - Video analytics for security systems and methods

Info

Publication number: WO2012027891A1
Application number: PCT/CN2010/076555
Authority: WO
Inventors: Fang Shi; Changsong Qi; Ming Jin; Keqiang Dai
Original assignee: Intersil Americas Inc.
Priority date: 2010-09-02
Filing date: 2010-09-02
Publication date: 2012-03-08
Also published as: CN102726042B; CN102726042A

Abstract

Video processing, encoding and decoding systems are described. A processor receives video frames representative of a sequence of images captured by a video sensor and the video frames are encode according to a desired video encoding standard. A video analytics processor receives video analytics metadata generated by the video encoder from the sequence of images and produces video analytics messages for transmission to a client device which performs client side video analytics processing. The video analytics metadata may comprise pixel domain video analytics information directly from an analog-to-digital front end or directly from an encoding engine as the engine is performing compression.

Description

VIDEO ANALYTICS FOR SECURITY SYSTEMS AND METHODS

Cross-Reference to Related Applications

[0001] The present Application is related to concurrently filed applications entitled "Video Classification Systems and Methods," "Rho-Domain Metrics" and "Systems And Methods for Video Content Analysis," which are expressly incorporated by reference herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Fig. 1 is a block schematic illustrating a simplified example of a video security surveillance analytics architecture according to certain aspects of the invention.

[0003] Fig. 2 is a block schematic depicting an example of a video analytics engine according to certain aspects of the invention.

[0004] Fig. 3 depicts an example of H.264 standards-defined bitstream syntax.

[0005] Fig. 4A is an image that includes both foreground and background objects.

[0006] Fig. 4B is the image of 4A from which foreground objects have been extracted using techniques according to certain aspects of the invention.

[0007] Figs. 5A and 5B are images illustrating virtual line counting according to certain aspects of the invention.

[0008] Fig. 6 is a simplified block schematic illustrating a processing system employed in certain embodiments of the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0009] Embodiments of the present invention will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the invention. Notably, the figures and examples below are not meant to limit the scope of the present invention to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts. Where certain elements of these embodiments can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present invention will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the invention. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the invention is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein.

Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such.

Further, the present invention encompasses present and future known equivalents to the components referred to herein by way of illustration.

[0010] Certain embodiments of the invention comprise systems having an architecture consistent with certain aspects of the invention and that are operable to perform video analytics for security applications. A simplified example of a video security surveillance analytics architecture is shown in Fig. 1 . In the example, the system is partitioned into server 10 and client 12 elements. The terms server and client are used here to include hardware and software systems, apparatus and other components that perform types of functions that can be attributed to server side and client side operations. It will be appreciated that certain elements may be provided on either or both server 10 and client 12 sides and that at least some client and server functionality may be committed to hardware components such as application specific integrated circuits, sequencers, custom logic devices as needed, typically to improve one or more of efficiency, reliability, processing speed and security.

[0011] On server side 10, a video sensor 100 can be configured to capture information representative a sequence of images, including video data, and passes the information to a video encoder module 102 adapted for use in embodiments of the invention. One example of such video encoder module 102 is the TW5864 from Intersil Techwell Inc., which can be adapted and/or configured to generate video analytics meta-data ("VAMD") 103. In certain embodiments, the video encoder 102 can typically be configured to generate compressed video bitstreams that may comply with industry standards and/or may be generated according to a proprietary specification. The video encoder 102 is typically configurable to produce video analytics meta-data. VAMD 103 may comprise pixel domain video analytics information, such as information obtained directly from an analog-to-digital ("A D") front end and/or from an encoding engine as the engine is performing compression. VAMD 103 may comprise block base video analytics information, such as, macroblock ("MB"), a 16x16 pixel block, level information such as motion vector, MB- type and/or number of non-zero coefficients, etc.

[0012] Video analytics engine ("VAE") 104 can be configured to receive the VAMD 103 and to process the VAMD 103 using one or more video analytics algorithms based on application requirements. VAE 104 can generate useful video analytics results, such as background model, motion alarm, virtual line detections, electronic image stabilization parameters, etc. A more detailed example of a VAE 104 is shown in Fig. 2. Video analytics results can comprise video analytics messages ("VAM") that may be categorized into a global VAM class and a local VAM class. Global VAM includes video analytics messages applicable to a group of pictures, such as background frames, foreground object segmentation descriptors, camera parameters, predefined motion alarm regions coordination and index, virtual lines, etc. Local VAM can be defined as localized VAM applied to a specific individual video frame, and can include global motion vectors of a current frame, motion alarm region alarm status of the current frame, virtual line counting results, object tracking parameters, camera moving parameters, and so on.

[0013] In certain embodiments, an encoder generated video bitstream, VAMD 103 and VAE generated VAM are packed together as a layered structure into a network bitstream following a predefined package format. The network bitstream can be sent though a network to client side of the system. The network bitstream may be stored locally and/or on a server or a remote storage device for future playback and/or dissemination.

[0014] Fig. 3 depicts an example of an H.264 standards-defined bitstream syntax, in which VAM and VAMD 103 can be packed into a supplemental enhancement information ("SEI") network abstraction layer ("NAL") package unit. Following SPS, PPS and IDR NAL, a Global VA ("GVA") SEI NAL can be inserted into network bitstream. The GVA NAL may include the global video analytics messages for a corresponding group of pictures, a pointer to the first local VA SEI NAL location within the group of pictures, and pointer to the next GVA NAL, and may include an indication of the duration of frames which the GVA applicable. Following each individual frame which is associated with a VAM or a VAMD 103, a local VA ("LVA") SEI NAL is inserted right after the frame's payload NAL. The LVA can comprise local VAM, VAMD 103 information and a pointer to a location of the next frame which has LVA SEI NAL. The amount of VAMD 103 packed into LVA NAL depends on the network bandwidth condition and the complexity of user VA requirement. For example, if sufficient network bandwidth is available, additional VAMD 103 can be packed. The VAMD 103 can be used by client side video analytics systems and may simplify and/or optimize performance of certain functions. When network bandwidth is limited, less VAMD 103 may be sent to meet the network bandwidth constraints. Figs. Illustrate a bitstream format for H.264 standards, but it will be appreciated that the principles involved may be applied in other video standards implementations.

[0015] According to certain aspects of the invention, the advantages of a layered video analytics system architecture can include facilitating and/or enabling a balanced partition of video analytics at multiple layers. These layer may include server and client layers, pixel domain layers and motion domain layers. For example, global VA messages such as background frame, segmented object descriptors and camera parameters can enable a cost efficient yet complicated video analytics in the receiver side for many advanced video intelligent applications. The VAM enables an otherwise difficult or impossible level of video analytics efficiency in term of computational complexity and analytic accuracy.

[0016] In certain embodiments of the invention, the client side receives a network bitstream sent from the server side and separates the video compressed bitstream, the VAMD 103 and the VAM from the network bitstream. Video analytics techniques may then be applied as appropriate for the application at hand. For example, analytics may include background extraction, motion tracking, object detection, etc., and the analytics may be selected based on speed requirements, efficiency objectives, and based on VAM and VAMD 103.

[0017] In certain embodiments, a VAMD 103 can comprise any video encoding intermediate data such as MB-type, motion vectors, non-zero coefficient (as per the H.264 standard), quantization parameter, DC or AC information, motion estimation metric sum of absolute value ("SAD"), etc. The VAMD 103 can comprise any useful information such as motionFlag information generated in an analog to digital front end module, such module being found, for example, in the TW5864 device

referenced above. VAMD is typically processed in VAE 104 to generate more advanced video intelligent information that may include, for example, motion indexing, background extraction, object segmentation, motion detection, virtual line detection, object counting, motion tracking and speed estimation. [0018] Certain advantages may be accrued from video analytics system

architecture and layered video analytics information embedded in network bitstreams according to certain aspects of the invention.

[0019] Certain embodiments provide greatly improved video analytics efficiency on the client side. In one example of a video analytics system according to certain aspects of the invention, VAE 104 processes the encoder feedback VAMD 103, generates various useful video analytics information that may be embedded in the network bitstream. This embedded layered VAM provides users direct access to a video analytics message of interest, and to use VAM with limited or no additional processing. In one example, additional processing would be unnecessary to access the motion frame, number of object passing a virtual line, object moving speed and classification, etc. Information related to object tracking may be obtained by additional limited processing related to the motion of the identified object, and information related to electronic image stabilization may be obtained by additional limited processing related based on the global motion information of VAM.

Accordingly, client side VA efficiency can be optimized and performance can be greatly improved, consequently enabling processing of an increased number of channels.

[0020] Certain embodiments enable operation of high-accuracy video analytics applications on the client side. According to certain aspects of the invention, client side video analytics may be performed using information generated on the server side. Without VAM embedded in the bitstream, client side video analytics processing would have to rely on video reconstructed from the decoded bitstream. The decoded bitstream typically lacks some of the detailed information of the original video content, which may be discarded or lost in the video compression process. Consequently, video analytics performed solely on the client side cannot preserve the accuracy that can be obtained in the server side where VAMD is generated from original video content in the server side. Loss of accuracy in analytics limited to client side can exhibit in geometric center of an object, object segmentation, etc. Therefore, embedded VAM enables high accuracy video analytics from a whole system point of view.

[0021] Certain embodiments of the invention enable fast video indexing, searching and other applications. In particular, embedded, layered VAM in the network bitstream enables fast video indexing, video searching, video classification applications and other applications in the client side. For instance, motion detection information, object indexing, foreground and background partition, human detection, human behavior classification information of the VAM can simplify client-side and/or downstream tasks that include, for example, video indexing, classification and fast searching in the client. Without VAM, a client generally needs vast computational power to process the video data and to rebuild the required video analytics information for a variety of applications including the above-listed applications. It will be appreciated that not all VAM can be accurately reconstructed from video bitstream and that certain applications cannot be performed VAM is not available; examples include human behavioral analysis applications.

[0022] Certain embodiments of the invention enable increased server/client algorithm complexity, partitioning of computational capability and balancing of network bandwidth. In certain embodiments, a video analytics system architecture is provided in which video analytics can be partitioned between server and client sides based on network bandwidth availability, server and client computational capability and the complexity of the video analytics. In one example, in response to low network bandwidth conditions, the system can embed more condensed VAM in the network bitstream after processing by the VAE. The VAM can include motion frame index, object index, and so on. After extracting the VAM from the bitstream, the client can utilize the VAM to assist further video analytics processing. VAMD can be embedded into the network bitstream limited or no processing by the VAE, when computational power is limited on the server side. Power may be limited on the server side when, for example, the server side system is embodied in a digital video recorder ("DVR") or network video recorder ("NVR"). Certain embodiments may use client side systems to process the embedded VAMD and accomplish the desired video analytics function system. In some embodiments, more video analytics functions can be partitioned and/or assigned to server side when, for example, the client side needs to monitor/process multiple channels simultaneously. It will be appreciated, therefore, that a balanced video analytics system can be achieved for a variety of system configurations.

Examples

[0023] Certain embodiments provide electronic image stabilization ("EIS") capabilities 220. EIS 220 finds wide application that can be used in video security applications. A current captured video frame is processed with reference to the previous reconstructed reference frame or frames and generates a global motion vector 202 for the current frame, utilizing the global motion vector to compensate the reconstructed image in the client side to reduce or eliminate image instability or shaking.

[0024] In a conventional pixel domain EIS algorithm, the current and previous reference frames are fetched, a block based or grey-level histogram based matching algorithm is applied to obtain local motion vectors, and the local motion vectors are processed to generate a pixel domain global motion vector. The drawbacks of the conventional approach include the high computational cost associated with the matching algorithm used to generate local motion vectors and the very high memory bandwidth required to fetch both current reconstructed frame and previous reference frames.

[0025] In certain embodiments of the invention, the video encoding engine can generate VAMD 103 including block-based motion vectors, MB-type, etc., as a byproduct of video compression processing. The VAMD 103 is fed into VAE 104, which can simply process the VAMD 103 information to generate global motion vector 202 as VAM. The VAM is then embedded into the network bitstream to transmit to the client side, typically over a network. A client processor can parse the network bitstream, extract the global motion information for each frame and apply global motion compensation to accomplish EIS 220.

Video background modeling

[0026] Certain embodiments of the invention comprise a video background modeling feature that can construct or reconstruct a background image 222 which can provide highly desired information for use in a wide variety of video surveillance applications, including motion detection, object segmentation, abundant object detection, etc. Conventional pixel domain background extraction algorithms operate on a statistical model of multiple frame co-located pixel values. For example, a Gauss model is used to model N continuous frames' co-located pixels and to select the mathematical most likely pixel value as the background pixel. If a video frame's height is denoted as H, width as W and continuous N frames to satisfy the statistical model requirement, then total W^*H^*N pixels are needed to process to generate a background frame.

[0027] In certain embodiments, MB-based VAMD 103 is used to generate the background information rather than pixel-based background information. According to certain aspects of the invention, the volume of information generated from VAMD is typically only 1/256 of the volume of pixel-based information. In one example, MB based motion vector and non-zero-count information can be used to detect background from foreground moving object. Fig. 4A shows an original image with background and foreground objects, and Fig. 4B shows a typical background extracted by processing VAMD 103.

[0028] Certain embodiments of the invention provide systems and methods for motion detection 200 and virtual line counting 201. A motion detector 200 can be used to automatically detect motion of objects including humans, animals and/or vehicles entering predefined regions of interest. Virtual line detection and counting module 201 can detect a moving object that crosses an invisible line defined by user configuration and that can count a number of objects crossing the line as illustrated in Figure 5. The virtual line can be based on actual lines in the image and can be a delineation of an area defined by a polygon, circle, ellipse or irregular area. In some embodiments, the number of objects crossing one or more lines can be recorded as an absolute number and/or as a statistical frequency and an alarm may be generated to indicate any line crossing, a threshold frequency or absolute number of crossings and/or an absence of crossings within a predetermined time. In certain embodiments, motion detection 200 and virtual line and counting 201 can be achieved by processing one or more MB-based VAMDs. Information such as motion alarm and object count across virtual line can be packed as VAM is transmitting to the client side. Motion indexing, object counting or similar customized applications can be easily archived by extracting the VAM with simple processing. It will be appreciated that configuration information may be provided from client side to server side as a form of feedback, using packed information as a basis for resetting lines, areas of interest and so on.

[0029] Certain embodiments of the invention provide improved object tracking within a sequence of video frames using VAMD 103. Certain embodiments can facilitate client side measurement of speed of motion of objects and can assist in identifying directions of movement. Furthermore, VAMD 103 can provide useful information related to video mosaics 221.

System Description

[0030] Turning now to Fig. 6, certain embodiments of the invention employ a processing system that includes at least one computing system 60 deployed to perform certain of the steps described above. Computing system 60 may be a commercially available system that executes commercially available operating systems such as Microsoft Windows®, UNIX or a variant thereof, Linux, a real time operating system and or a proprietary operating system. The architecture of the computing system may be adapted, configured and/or designed for integration in the processing system, for embedding in one or more of an image capture system, communications device and/or graphics processing systems. In one example, computing system 60 comprises a bus 602 and/or other mechanisms for

communicating between processors, whether those processors are integral to the computing system 60 (e.g. 604, 605) or located in different, perhaps physically separated computing systems 60. Typically, processor 604 and/or 605 comprises a CISC or RISC computing processor and/or one or more digital signal processors. In some embodiments, processor 604 and/or 605 may be embodied in a custom device and/or may perform as a configurable sequencer. Device drivers 603 may provide output signals used to control internal and external components and to communicate between processors 604 and 605.

[0031] Computing system 60 also typically comprises memory 606 that may include one or more of random access memory ("RAM"), static memory, cache, flash memory and any other suitable type of storage device that can be coupled to bus 602. Memory 606 can be used for storing instructions and data that can cause one or more of processors 604 and 605 to perform a desired process. Main memory 606 may be used for storing transient and/or temporary data such as variables and intermediate information generated and/or used during execution of the instructions by processor 604 or 605. Computing system 60 also typically comprises non-volatile storage such as read only memory ("ROM") 608, flash memory, memory cards or the like; non-volatile storage may be connected to the bus 602, but may equally be connected using a high-speed universal serial bus (USB), Firewire or other such bus that is coupled to bus 602. Non-volatile storage can be used for storing configuration, and other information, including instructions executed by processors 604 and/or 605. Non-volatile storage may also include mass storage device 610, such as a magnetic disk, optical disk, flash disk that may be directly or indirectly coupled to bus 602 and used for storing instructions to be executed by processors 604 and/or 605, as well as other information. [0032] In some embodiments, computing system 60 may be communicatively coupled to a display system 612, such as an LCD flat panel display, including touch panel displays, electroluminescent display, plasma display, cathode ray tube or other display device that can be configured and adapted to receive and display information to a user of computing system 60. Typically, device drivers 603 can include a display driver, graphics adapter and/or other modules that maintain a digital representation of a display and convert the digital representation to a signal for driving a display system 612. Display system 612 may also include logic and software to generate a display from a signal provided by system 600. In that regard, display 612 may be provided as a remote terminal or in a session on a different computing system 60. An input device 614 is generally provided locally or through a remote system and typically provides for alphanumeric input as well as cursor control 616 input, such as a mouse, a trackball, etc. It will be appreciated that input and output can be provided to a wireless device such as a PDA, a tablet computer or other system suitable equipped to display the images and provide user input.

[0033] According to one embodiment of the invention, portions of the described invention may be performed by computing system 60. Processor 604 executes one or more sequences of instructions. For example, such instructions may be stored in main memory 606, having been received from a computer-readable medium such as storage device 610. Execution of the sequences of instructions contained in main memory 606 causes processor 604 to perform process steps according to certain aspects of the invention. In certain embodiments, functionality may be provided by embedded computing systems that perform specific functions wherein the embedded systems employ a customized combination of hardware and software to perform a set of predefined tasks. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.

[0034] The term "computer-readable medium" is used to define any medium that can store and provide instructions and other data to processor 604 and/or 605, particularly where the instructions are to be executed by processor 604 and/or 605 and/or other peripheral of the processing system. Such medium can include nonvolatile storage, volatile storage and transmission media. Non-volatile storage may be embodied on media such as optical or magnetic disks, including DVD, CD-ROM and BluRay. Storage may be provided locally and in physical proximity to

processors 604 and 605 or remotely, typically by use of network connection. Non- volatile storage may be removable from computing system 604, as in the example of BluRay, DVD or CD storage or memory cards or sticks that can be easily connected or disconnected from a computer using a standard interface, including USB, etc. Thus, computer-readable media can include floppy disks, flexible disks, hard disks, magnetic tape, any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.

[0035] Transmission media can be used to connect elements of the processing system and/or components of computing system 60. Such media can include twisted pair wiring, coaxial cables, copper wire and fiber optics. Transmission media can also include wireless media such as radio, acoustic and light waves. In particular radio frequency (RF), fiber optic and infrared (IR) data communications may be used.

[0036] Various forms of computer readable media may participate in providing instructions and data for execution by processor 604 and/or 605. For example, the instructions may initially be retrieved from a magnetic disk of a remote computer and transmitted over a network or modem to computing system 60. The instructions may optionally be stored in a different storage or a different part of storage prior to or during execution.

[0037] Computing system 60 may include a communication interface 618 that provides two-way data communication over a network 620 that can include a local network 622, a wide area network or some combination of the two. For example, an integrated services digital network (ISDN) may used in combination with a local area network (LAN). In another example, a LAN may include a wireless link. Network link 620 typically provides data communication through one or more networks to other data devices. For example, network link 620 may provide a connection through local network 622 to a host computer 624 or to a wide are network such as the Internet 628. Local network 622 and Internet 628 may both use electrical, electromagnetic or optical signals that carry digital data streams.

[0038] Computing system 60 can use one or more networks to send messages and data, including program code and other information. In the Internet example, a server 630 might transmit a requested code for an application program through Internet 628 and may receive in response a downloaded application that provides or augments functional modules such as those described in the examples above. The received code may be executed by processor 604 and/or 605.

[0039] Additional Descriptions of Certain Aspects of the Invention

[0040] The foregoing descriptions of the invention are intended to be illustrative and not limiting. For example, those skilled in the art will appreciate that the invention can be practiced with various combinations of the functionalities and capabilities described above, and can include fewer or additional components than described above. Certain additional aspects and features of the invention are further set forth below, and can be obtained using the functionalities and components described in more detail above, as will be appreciated by those skilled in the art after being taught by the present disclosure.

[0041] Certain embodiments of the invention provide video processing systems and methods. Some of these embodiments comprise a processor configured to receive video frames representative of a sequence of images captured by a video sensor. Some of these embodiments comprise a video encoder operative to encode the video frames according to a desired video encoding standard. Some of these embodiments comprise a video analytics processor that receives video analytics metadata generated by the video encoder from the sequence of images. In some of these embodiments, the video analytics processor is configurable to produce video analytics messages for transmission to a client device. In some of these

embodiments, the video analytics messages are used for client side video analytics processing.

[0042] In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information. In some of these embodiments, the pixel domain video analytics information includes information received directly from an analog-to-digital front end. In some of these embodiments, the pixel domain video analytics information includes information received directly from an encoding engine as the engine is performing compression. In some of these embodiments, the video analytics messages include information related to one or more of a background model, a motion alarm, a virtual line detection and electronic image stabilization parameters. In some of these embodiments, the video analytics messages comprise video analytics messages related to a group of images, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region. [0043] In some of these embodiments, the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter. In some of these embodiments, the video analytics messages are transmitted to the client device in a layered structure network bitstream comprising encoder generated video bitstream, a portion of the video analytics metadata. In some of these embodiments, the video analytics messages and the portion of the video analytics metadata are transmitted in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream.

[0044] Certain embodiments of the invention provide video decoding systems and methods. Some of these embodiments comprise a decoder configured to extract a video frame and one or more video analytics messages from a network bitstream. In some of these embodiments, the video analytics messages provide information related to characteristics of the video frame. Some of these embodiments comprise one or more video processors configured to produce video analytics metadata related to the video frame based on content of the video frame and the video analytics messages.

[0045] In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information received directly from an analog-to-digital front end. In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information received directly from an encoding engine as the engine was performing compression. In some of these embodiments, the video analytics messages comprise video analytics messages related to a plurality of video frames, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region. In some of these embodiments, the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter.

[0046] In some of these embodiments, the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream. In some of these embodiments, the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream and together with a portion of the pixel domain video analytics information. In some of these embodiments, the one or more video processors configured to produce a global motion vector. In some of these embodiments, the one or more video processors provide electronic image stabilization based on the video analytics messages. In some of these embodiments, the one or more video processors extract a background image for a plurality of video frames based on the video analytics messages. In some of these embodiments, the one or more video processors use the video analytics messages to monitor objects crossing a virtual line in a plurality of video frames.

[0047] Although the present invention has been described with reference to specific exemplary embodiments, it will be evident to one of ordinary skill in the art that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

WHAT IS CLAIMED IS:

1. A video processing system comprising:

a processor configured to receive video frames representative of a sequence of images captured by a video sensor;

a video encoder operative to encode the video frames according to a desired video encoding standard;

a video analytics processor that receives video analytics metadata generated by the video encoder from the sequence of images, wherein the video analytics processor is configurable to produce video analytics messages for transmission to a client device, wherein the video analytics messages are used for client side video analytics processing.

2. The video processing system of claim 1 , wherein the video analytics metadata comprise pixel domain video analytics information.

3. The video processing system of claim 2, wherein the pixel domain video analytics information includes information received directly from an analog-to-digital front end.

4. The video processing system of claim 2, wherein the pixel domain video analytics information includes information received directly from an encoding engine as the engine is performing compression.

5. The video processing system of any of claims 1 -4, wherein the video analytics messages include information related to one or more of a background model, a motion alarm, a virtual line detection and electronic image stabilization parameters.

6. The video processing system of any of claims 1 -4, wherein the video analytics messages comprise video analytics messages related to a group of images, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region.

7. The video processing system of any of claims 1 -4, wherein the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter.

8. The video processing system of any of claims 1 -7, wherein the video analytics messages are transmitted to the client device in a layered structure network bitstream comprising encoder generated video bitstream, a portion of the video analytics metadata.

9. The video processing system of claim 8, wherein the video analytics messages and the portion of the video analytics metadata are transmitted in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream.

10. A video decoding system comprising:

a decoder configured to extract a video frame and one or more video analytics messages from a network bitstream, wherein the video analytics messages provide information related to characteristics of the video frame;

one or more video processors configured to produce video analytics metadata related to the video frame based on content of the video frame and the video analytics messages.

1 1 . The video decoding system of claim 10, wherein the video analytics metadata comprise pixel domain video analytics information received directly from an analog-to- digital front end.

12. The video decoding system of claim 10, wherein the video analytics metadata comprise pixel domain video analytics information received directly from an encoding engine as the engine was performing compression.

13. The video decoding system of any of claims 10-12, wherein the video analytics messages comprise video analytics messages related to a plurality of video frames, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region.

14. The video decoding system of any of claims 10-12, wherein the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter.

15. The video decoding system of any of claims 10-14, wherein the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream.

16. The video decoding system of claim 1 1 or claim 12, wherein the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream and together with a portion of the pixel domain video analytics information.

17. The video decoding system of any of claims 10-16, wherein the one or more video processors configured to produce a global motion vector.

18. The video decoding system of any of claims 10-17, wherein one or more video processors provide electronic image stabilization based on the video analytics messages.

19. The video decoding system of any of claims 10-18, wherein one or more video processors extract a background image for a plurality of video frames based on the video analytics messages.

20. The video decoding system of any of claims 10-19, wherein one or more video processors use the video analytics messages to monitor objects crossing a virtual line in a plurality of video frames.