CN104317386A

CN104317386A - Action recognition method of posture sequence finite-state machine

Info

Publication number: CN104317386A
Application number: CN201410293405.9A
Authority: CN
Inventors: 吴亚东; 林水强; 张红英
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2014-06-25
Filing date: 2014-06-25
Publication date: 2015-01-28
Anticipated expiration: 2034-06-25
Also published as: CN104317386B

Abstract

The invention discloses an action recognition method of a posture sequence finite-state machine. The action recognition method comprises the following steps: firstly, carrying out coordinate transformation on limb node data obtained by a Kinect sensor, measuring transformed data with a uniform spatial mesh model, and establishing a limb node coordinate system; then, sampling and analyzing a predefined limb action node motion sequence through defining a limb node feature vector; and finally, establishing a limb action locus regular expression based on a limb node motion locus, and constructing a posture sequence finite-state machine to realize the quick recognition of predefined actions. An experiment result shows that the method exhibits good expansibility and universality, and the recognition accuracy of 17 predefined limb actions is above 94%, recognition feedback time is shorter than 0.1s, and somatosensory interactive application requirements can be met.

Description

A kind of posture sequence finite state machine action identification method

Technical field

The present invention relates to human-computer interaction technology, in particular a kind of posture sequence finite state machine action identification method.

Background technology

In field of human-computer interaction, action recognition is the mutual prerequisite of body sense, and action recognition and behavior understanding become the study hotspot of field of human-computer interaction gradually ^[1-3].For reaching effective mutual object, to different interactive actions, limb motion, gesture and static posture must be comprised, carrying out defining and identifying ^[4].In recent years, the action recognition application and development based on Kinect somatosensory technology is very abundant, in such applications, although effectively can follow the tracks of human body motion track ^[5-7], but the more single and recognition methods of identification maneuver is unfavorable for expansion ^[8-10], urgently research and develop a kind of action recognition model with extendability and versatility.

At present, the action identification method based on Kinect is more, as event triggering method, template matching method, machine learning method etc.The dexterous actions that document [11] is mentioned and radial type skeleton kit (flexible action and articulated skeleton toolkit, FAAST) be between Kinect kit and application program body sense operation middleware, main employing event triggered fashion identifies, as events such as angle, distance, speed, the method calculated amount is little, real-time and accuracy rate high, but event triggering method itself has limitation, more difficult to the identification of continuous action.Ellis etc. ^[12]inquire into the balance between Activity recognition accuracy rate and delay, from action data sequence, determine key frame example, thus derive action template, but action template has only preserved form and the model of same class behavior, have ignored change.Wang etc. ^[13]give the method for a kind of subset of actions combination, classify to articulation point subset, discrimination is high, but the method focuses on it is the data stream rank of in advance segmentation, can not be used for never carrying out ONLINE RECOGNITION in splitting traffic.Zhao etc. ^[14]propose a kind of structuring stream bone (structured streaming skeletons, SSS) method of characteristic matching, a characteristics dictionary and gesture model is set up by off-line training, for each frame data distributing labels of the action data stream of unknown action, by extracting SSS feature on-line prediction type of action, the method effectively can solve the problem of erroneous segmentation and template matches deficiency, never ONLINE RECOGNITION can be carried out in splitting traffic, but calculation of complex, the Recognition feedback time is unstable, and to each action recognition characteristics of needs dictionary library, during for expansion type of action identification, need to collect a large amount of action data and carry out off-line training, specific action identification and the training set degree of coupling higher.

List of references:

[1] Yu Tao.Kinect application development and actual combat:In the most natural way to dialogue with the machine [M] .Beijing:China Machine Press, 2012:46-47 (in Chinese) (Yu Tao .Kinect application and development actual combat: talk with [M] with the most natural mode and machine. Beijing: China Machine Press, 2012:46-47)

[2]Wang J，Xu Z J.STV-based video feature processing for action recognition[J].Signal Processing，2013，93(8)：2151-2168

[3] Xu Guangyou, Cao Yuanyuan.Action recognition and activity understanding:A review [J] .Journal of Image and Graphics, 2009,14 (2): 189-195 (in Chinese) (Xu Guang Yu, Cao Yuanyuan. action recognition and behavior understanding are summarized [J]. Journal of Image and Graphics, 2009,14 (2): 189-195)

[4]Van den Bergh M，Carton D，De Nijs R，et al.Real-time3D hand gesture interaction with a robot for understanding directions from humans[C]//Proceedings of Robot and Human Interactive Communication.Los Alamitos:IEEE Computer Society Press,2011:357-362

[5]Zhang Q S，Song X，Shao X W，et al.Unsupervised skeleton extraction and motion capture from3D deformable matching[J].Neurocomputing,2013,100:170-182

[6]Shotton J，Sharp T，Kipman A，et al.Real-time human pose recognition in parts from single depth images[J].Communications of the ACM，2013，56(1)：116-124

[7]El-laithy R A，Huang J，Yeh M.Study on the use of Microsoft Kinect for robotics applications[C]//Proceedings of Position Location and Navigation Symposium.Los Alamitos:IEEE Computer Society Press,2012:1280-1288

[8]Oikonomidis I，Kyriazis N，Argyros A.Efficient model-based3D tracking of hand articulations using Kinect[C]//Proceedings of the22n ^d British Machine Vision Conference.British:BMVA Press,2011:1-11

[9] Shen Shihong, Li Weiqing.Research on Kinect-based gesture recognition system [C] //Proceedings of 8 ^thharmonious Human Machine Environment Conference CHCI.Beijing:Tsinghua University Press, 2012:55-62 (in Chinese) (Shen Shihong, Li Weiqing. research [C] // 8th harmonious man-machine environment associating academic conference (HHME2012) collection of thesis CHCI. Beijing based on the body sense gesture recognition system of Kinect: publishing house of Tsing-Hua University, 2012:55-62)

[10]Soltani F，Eskandari F，Golestan S.Developing a gesture-based game for deaf/mute people using Microsoft Kinect[C]//Proceedings of2012Sixth International Conference on Complex,Intelligent and Software Intensive Systems(CISIS).Los Alamitos:IEEE Computer Society Press,2012:491-495

[11]Suma E A,Krum D M,Lange B,et al.Adapting user interfaces for gestural interaction with the flexible action and articulated skeleton toolkit[J].Computers&Graphics,2013，37(3):193-201

[12]Ellis C,Masood S Z,Tappen M F,et al.Exploring the trade-off between accuracy and observational latency in action recognition[J].International Journal of Computer Vision,2013,101(3):420-436

[13]Wang J,Liu Z,Wu Y,et al.Mining actionlet ensemble for action recognition with depth cameras[C]//Proceedings of Computer Vision and Pattern Recognition(CVPR).Los Alamitos:IEEE Computer Society Press,2012:1290-1297

[14]Zhao X,Li X,Pang C,et al.Online human gesture recognition from motion data streams[C]//Proceedings of the21 ^st ACM International Conference on Multimedia.New York:ACM Press,2013:23-32

[15]Biswas K K,Basu S K.Gesture recognition using Microsoft Kinect[C]//Proceedings of 20115 ^th International Conference on Automation,Robotics and Applications(ICARA).Los Alamitos:IEEE Computer Society Press,2011:100-103

[16] Zhang Yi, Zhang Shuo, Luo Yuan, et al.Gesture track recognition based on Kinect depth image information and its applications [J] .Application Research of Computers, 2012,29 (9): 3547-3550 (in Chinese) (Zhang Yi, Zhang Shuo, Luo Yuan etc. based on gesture track recognition and the application [J] of Kinect depth image information. computer utility is studied, 2012,29 (9): 3547-3550)

[17]Chaquet J M，Carmona E J，Fernandez-Caballero A.A survey of video datasets for human action and activity recognition[J].Computer Vision and Image Understanding，2013，117(6)：633-659

Summary of the invention

Technical matters to be solved by this invention provides a kind of posture sequence finite state machine action identification method for the deficiencies in the prior art.

Technical scheme of the present invention is as follows:

A kind of posture sequence finite state machine action identification method, first, the limbs node data that body sense interactive device obtains is transformed to the limbs nodal coordinate system of customer-centric, by definition limbs node diagnostic vector, sampling analysis is carried out to limb action sequence, set up predefine movement locus regular expression, structure posture sequence finite state machine, thus the parsing realized predefine limb action and identification.

Described posture sequence finite state machine action identification method, described limbs nodal coordinate system method for building up, definition user's space coordinate is: with user's right-hand lay for x-axis positive dirction, it is y-axis positive dirction directly over head, interaction equipment dead ahead is z-axis positive dirction, and Liang Jian center is true origin; Coordinate points P'(x' under coordinate points P (x, y, z) under Kinect space coordinates oxyz and user's space coordinate system o'x'y'z', y', z') transformation relation can be described as formula (1).

(x^{'}, y^{'}, z^{'}, 1) = (x, y, z, 1) (\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ - x_{0} & - y_{0} & z_{0} & 1 \end{matrix}) (\begin{matrix} \cos θ & 0 & - \sin θ & 0 \\ 0 & 1 & 0 & 0 \\ \sin θ & 0 & \cos θ & 0 \\ 0 & 0 & 0 & 1 \end{matrix}) - - - (1)

In formula (1), O'(x ₀, y ₀, z ₀) representing the origin of user's space coordinate system o'x'y'z', θ is the anglec of rotation of user relative to sensor xoy plane, θ=arctan ((x _r-x _l)/(z _r-z _l)), wherein x _r>x _l,-45 ° of < θ <+45 °;

Under user coordinate system, metric unit is described as cubic unit volume mesh, and the user for different height needs the ratio corresponding relation considering its height and limbs length, is described limb action by unified mode; After the coordinate system transformation of through type (1), in user coordinate system, set up the distinctive space lattice model of active user, grid model is divided into w ³three-dimensional cubic volume mesh, w is one-dimensional grid number of partitions, value w=11;

Under user coordinate system, centered by initial point, carry out ratio cut partition respectively to the grid of three-dimensional, the positive and negative aspect ratio of x-axis is 1:1, and the positive and negative aspect ratio of y-axis is 3:8, and the positive and negative aspect ratio of z-axis is 6:5; By calculating the length of side d of unit grids, by unified mode, type of action is described, according to user's relative height ratio definition grid length of side, unit grids length of side d can be described as d=h/ (w-1), wherein, h represents the relative height of active user under user coordinate system, and w is one-dimensional grid number of partitions;

Can be described with cube net case form the region in user coordinate system after setting up 3D grid partitioning model, ensure that independently user's limbs nodal coordinate system is set up in customer-centric all the time, thus eliminate user's individual difference as far as possible.

Described posture sequence finite state machine action identification method, described limbs node diagnostic Definition of Vector, limbs node diagnostic vector comprises articulation point spatial motion vectors, articulation point run duration interval and articulation point space length, and formula (2) is shown in limbs node diagnostic vector V definition;

V (T, k) = [\cup_{i = 0}^{s - 1} J_{k}^{i} J_{k}^{i + 1}, Δ t_{k}^{s} | P_{m} P_{n} |] - - - (2)

In formula (2), T represents type of action, and k (0≤k≤19) represents articulation point index, i (i=0,1 ..., s) represent present sample frame, s represents that corresponding articulation point arrives the end frame of next particular sample point, J _k ⁱj _k ⁱ⁺¹represent that articulation point k moves to the spatial motion vectors of next frame i+1 from present sample frame i, J _k ⁱrepresent the volume coordinate point (x of articulation point k at the i-th frame _k ⁱ, y _k ⁱ, z _k ⁱ), Δ t _k ^srepresent that articulation point k is from J _k ⁰coordinate points by orbiting motion to J _k ^sthe time interval of coordinate points, | P _mp _n| represent the space length between human body particular joint point, this distance is as the ratio characteristic verification amount in grid model;

Each articulation point definition space motion vector J _k ⁱj _k ⁱ⁺¹, calculate direction of motion and the track of limbs node, each step sampled point transfer duration of action can pass through time interval Δ t _k ^s=t _k ^s-t _k ⁰be described, wherein, t _k ⁰and t _k ^scorresponding articulation point k is when often organizing starting sample frame and terminating sample frame respectively; Defined from formula (2), J _k ⁱ=(x _k ⁱ, y _k ⁱ, z _k ⁱ), J _k ⁱ⁺¹=(x _k ⁱ⁺¹, y _k ⁱ⁺¹, z _k ⁱ⁺¹), then spatial motion vectors J _k ⁱj _k ⁱ⁺¹be expressed as:

J_{k}^{i} J_{k}^{i + 1} = (x_{k}^{i + 1} - x_{k}^{i}, y_{k}^{i + 1} - y_{k}^{i}, z_{k}^{i + 1} - z_{k}^{i})

| P _mp _n| in, P _mand P _nrepresent the two ends articulation point at human body limb position respectively, m and n represents the termination of articulation point set initial sum call number, wherein m<n respectively, point (x _j, y _j, z _j) representing the volume coordinate of the corresponding articulation point in human body limb position, j (m≤j≤n-1) represents the index variables of the corresponding articulation point when calculating, then the space fixed range computing formula between limbs joint point is as follows:

| P_{m} P_{n} | = Σ_{j = m}^{n - 1} \sqrt{{(x_{j} - x_{j + 1})}^{2} + {(y_{j} - y_{j + 1})}^{2} + {(z_{j} - z_{j + 1})}^{2}}

According to limbs node diagnostic vector parameter defined above, various interactive action can be defined; According to the difference of metastomium and kinetic characteristic, type of action is carried out classification to set forth, the limbs node diagnostic vector that three classes represent action comprises limbs node diagnostic vector, the right hand played right leg side and draws round limbs node diagnostic vector, the limbs node diagnostic vector of both hands horizontal development;

According to formula (2), definition can indicate the limbs node diagnostic vector V (T that three classes represent action, k), when articulation point k arrives next particular sample point, end frame s is determined, the input parameter of current proper vector as state transition function is analyzed, by sample frame i zero setting, wait for end frame next time, then analyze, zero setting again, to the last a sampled point;

1) action is played for right leg side, extract the generic features data of right crus of diaphragm articulation point (k=19), thus definition limbs node diagnostic vector: wherein, L is leg length;

2) for right hand rotary movement, the generic features data of right hand articulation point (k=11) are extracted, thus definition limbs node diagnostic vector: wherein D is arm length;

3) for both hands horizontal development action, the generic features data of left/right swivel of hand point (k=7,11) are extracted, thus definition limbs node diagnostic vector:

By that analogy, adopt the method to be other limb actions definition limbs node diagnostic vector, then by posture sequence finite state machine, limbs node diagnostic vector is analyzed, thus realize action recognition.

Described posture sequence finite state machine action identification method, described posture sequence finite state machine structure, definition posture sequence finite state machine Λ, its five-tuple represents sees formula (3);

Λ＝(S,Σ,δ,s ₀,F) (3)

In formula (3), S represents state set { s ₀, s ₁..., s _n, f ₀, f ₁, each specific posture state of action is described; Σ represents limbs node diagnostic vector set and the limiting parameter alphabet of input wherein symbol " " presentation logic negative; δ is transfer function, is defined as S × Σ → S, represents that posture sequence finite state machine is transformed into successor states from current state; s ₀represent initial state; F={f ₀, f ₁be end-state set, represent respectively and identify success status and identify disarmed state;

In alphabet Σ, variable u represents the set of all limbs node diagnostic vector V corresponding to certain type of action, and proper vector represents the disperse drop field rule of movement locus in space lattice, can be constructed the track regular expression of action by a territory rule;

Path restriction p={xyz|x ∈ [x _min, x _max], y ∈ [y _min, y _max], z ∈ [z _min, z _max] specific posture is carried out to the scope control of key point, under any circumstance exceed predefine path domain, namely be true, be then marked as disarmed state;

Timestamp t ∈ [t _start, t _end] define the time of action required for current state to successor states transfer, if certain state of action does not transfer to follow-up effective status in official hour, namely be true, then jump to disarmed state;

Each action is made up of several typical static posture, the corresponding defined quantity of state of static posture, often kind of quantity of state is calculated in space lattice by key point characteristic, operating state transfer must meet the condition of path restriction p and timestamp t, thus identification maneuver type, understand user interactions intention; Every attribute characteristic and each step transfer process of posture sequence finite state machine can be described, posture sequence finite state machine operational process: at original state s by five-tuple ₀under, reach first effective status s according to predefined action ₁if the posture of subsequent time still in predefined scope, then reaches follow-up effective status s _k, by that analogy, until the state f that hits pay dirk ₀, i.e. identification maneuver success; Under initial and any effective status, if behavior exceeds path restriction or timestamp scope, then directly marking this sequence action is disarmed state, i.e. identification maneuver failure.After reaching any done state, current posture sequence limited row state machine runs complete, reinitializes the identification carrying out next group limb action.

The present invention proposes a kind of posture sequence finite state machine action identification method, achieve the quick identification to predefine action.The inventive method adopts limbs node diagnostic vector description limb action characteristic, carries out sampling analysis, sets up limb action track regular expression, construct posture sequence finite state machine, realize limb action identification predefine limb action sequence.The inventive method can carry out whole description to any action or gesture, does not need off-line training and study, and versatility and extendability are comparatively strong, and high to recognition accuracy that is simple and continuous action, and real-time is good, meets body sense interactive application demand.

The advantage of the inventive method mainly contains: 1) action recognition accuracy rate is high, and by carrying out test sample in triplicate to the user of 30 different heights and the bodily form, recognition accuracy is more than 94%; 2) action recognition feedback time is fast, and the actual test Recognition feedback time, between 0.060s-0.096s, is less than 0.1s; 3) do not need to collect a large amount of action data carry out off-line training when expanding type of action, the track regular expression defining limb action is only needed for specific action, versatility and extendability strong; 4) define initial and done state in the posture sequence finite state machine model that the present invention proposes, carry out real-time analysis by limbs node diagnostic vector, undivided action data stream can be processed; 5) posture sequence finite state machine is a kind of method carrying out matching continuous action track with discrete posture sequence, therefore, is applicable to identification that is simple and continuous action.But the inventive method shows good not in robustness, do not meet predefine rule mainly due to any state in identifying and be namely regarded as disarmed state, therefore, posture recognition sequence is comparatively responsive, needs user action as far as possible specification within the scope of personal style.

Accompanying drawing explanation

Fig. 1 is posture sequence finite state machine action identification method framework;

Fig. 2 is space coordinates conversions; A, user's space coordinate system, under b, Kinect coordinate system, user rotates vertical view;

Fig. 3 is the schematic cross-section that space lattice is divided in xoy plane;

Fig. 4 is that the characteristic under user's space coordinate system represents;

Fig. 5 is that three classes represent action limbs node diagnostic vector schematic diagram; A, leg action-right leg side is played, b, and single-handed exercise-right hand draws circle, c, double-handed exercise-both hands horizontal development;

Fig. 6 is posture sequence finite state machine prototype;

Fig. 7 is partial act track schematic diagram;

Fig. 8 is the posture sequence finite state machine of action;

Fig. 9 is that action recognition realizes effect; A leg action; B, left/right hand draws circle; C left/right hand is upwards lifted d left/right hand and is pressed downwards, and e both hands push away obliquely; F both hands push away obliquely; G both hands horizontal development h both hands level is shunk; The horizontal slip of i left/right hand;

Embodiment

Below in conjunction with specific embodiment, the present invention is described in detail.

In order to avoid the not easily expansion of body action identification method in the past and recognition efficiency low etc. not enough, the present invention proposes a kind of posture sequence finite state machine action identification method.Specific limb action can be regarded as by multiple posture one group of motion sequence on a timeline, and namely posture sequence describes.The posture sequence finite state machine action identification method proposed mainly adopts limbs node diagnostic vector description limb action characteristic, by carrying out sampling analysis to predefined limb action sequence, set up the track regular expression of limb action, by track regular expression structure posture sequence finite state machine, thus the analysis realized limb action and identification.

1 posture sequence finite state machine action identification method

Posture sequence finite state machine action identification method framework as shown in Figure 1.First, the limbs node data that body sense interactive device obtains is transformed to the limbs nodal coordinate system of customer-centric, by definition limbs node diagnostic vector, sampling analysis is carried out to limb action sequence, set up predefine movement locus regular expression, structure posture sequence finite state machine, thus the parsing realized predefine limb action and identification.

1.1 limbs nodal coordinate systems are set up

In order to eliminate user's individual difference as far as possible, needing by the spatial description of user action from device space ordinate transform to user's space coordinate system, setting up the interactive action feature meeting user's individual attribute.The present invention defines user's space coordinate: with user's right-hand lay for x-axis positive dirction, and be y-axis positive dirction directly over head, interaction equipment dead ahead is z-axis positive dirction, and Liang Jian center is true origin.Due in action recognition process, user's body positive dirction not necessarily with interactive device plane orthogonal, therefore, need to obtain user's limbs node data convert, set up user's limbs nodal coordinate system.As shown in Figure 2, Fig. 2 a describes user's space coordinate system to spatial coordinate transformation, and O' represents the initial point of user's space coordinate system o'x'y'z', and Fig. 2 b describes the vertical view that user under Kinect space coordinates carries out around y-axis rotating, L (x _l, z _l) represent the left shoulder mapping point point of user in Kinect space coordinates, R (x _r, z _r) representing the right shoulder mapping point point of user, θ represents the anglec of rotation (-45 ° < θ <+45 °) of user relative to equipment xoy plane.

Because the limbs node data obtained is Mirror Symmetry ^[15], therefore, the coordinate points P'(x' under the coordinate points P (x, y, z) under Kinect space coordinates oxyz and user's space coordinate system o'x'y'z', y', z') transformation relation can be described as formula (1).

(x^{'}, y^{'}, z^{'}, 1) = (x, y, z, 1) (\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ - x_{0} & - y_{0} & z_{0} & 1 \end{matrix}) (\begin{matrix} \cos θ & 0 & - \sin θ & 0 \\ 0 & 1 & 0 & 0 \\ \sin θ & 0 & \cos θ & 0 \\ 0 & 0 & 0 & 1 \end{matrix}) - - - (1)

In formula (1), O'(x ₀, y ₀, z ₀) representing the origin of user's space coordinate system o'x'y'z', θ is the anglec of rotation of user relative to sensor xoy plane, θ=arctan ((x _r-x _l)/(z _r-z _l)), wherein x _r>x _l,-45 ° of < θ <+45 °.

Under user coordinate system, metric unit is described as cubic unit volume mesh, and the user for different height needs the ratio corresponding relation considering its height and limbs length, is described limb action by unified mode.After the coordinate system transformation of through type (1), in user coordinate system, set up the distinctive space lattice model of active user, in the present invention, grid model is divided into w ³three-dimensional cubic volume mesh (w is one-dimensional grid number of partitions, experience value w=11 of the present invention), space lattice xoy plane schematic cross-section as shown in Figure 3.

As shown in Figure 3.Under user coordinate system, centered by initial point, carry out ratio cut partition respectively to the grid of three-dimensional, the positive and negative aspect ratio of x-axis is 1:1, and the positive and negative aspect ratio of y-axis is 3:8, and the positive and negative aspect ratio of z-axis is 6:5.By calculating the length of side d of unit grids, by unified mode, type of action is described, the present invention is according to user's relative height ratio definition grid length of side, unit grids length of side d can be described as d=h/ (w-1), wherein, h represents the relative height of active user under user coordinate system, and w is one-dimensional grid number of partitions.

1.2 limbs node diagnostic Definition of Vectors

Action is static posture in the state put sometime, and a certain joint of human body or the motion sequence of multiple articulation point in space are dynamic behaviour ^[16].Before identification maneuver, need to describe generic features data under user's space coordinate system, generic features data generally comprise the three-dimensional coordinate information of opposed articulation point, articulation point spatial motion vectors, space lengths etc. between articulation point, limb action characteristic describes as shown in Figure 4.

The present invention defines limbs node diagnostic vector to describe motion characteristic data, by the computation and analysis to limbs node diagnostic vector parameter, realizes the identification to the dynamic sequence that multiple given pose is combined to form, namely to the identification of limb action.Limbs node diagnostic vector comprises articulation point spatial motion vectors, articulation point run duration interval and articulation point space length, and formula (2) is shown in limbs node diagnostic vector V definition.

V (T, k) = [\cup_{i = 0}^{s - 1} J_{k}^{i} J_{k}^{i + 1}, Δ t_{k}^{s} | P_{m} P_{n} |] - - - (2)

In formula (2), T represents type of action, and k (0≤k≤19) represents articulation point index, i (i=0,1 ..., s) represent present sample frame, s represents that corresponding articulation point arrives the end frame of next particular sample point, J _k ⁱj _k ⁱ⁺¹represent that articulation point k moves to the spatial motion vectors of next frame i+1 from present sample frame i, J _k ⁱrepresent the volume coordinate point (x of articulation point k at the i-th frame _k ⁱ, y _k ⁱ, z _k ⁱ), Δ t _k ^srepresent that articulation point k is from J _k ⁰coordinate points by orbiting motion to J _k ^sthe time interval of coordinate points, | P _mp _n| represent the space length between human body particular joint point, this distance is as the ratio characteristic verification amount in grid model.

Each articulation point definition space motion vector J _k ⁱj _k ⁱ⁺¹, calculate direction of motion and the track of limbs node, each step sampled point transfer duration of action can pass through time interval Δ t _k ^s=t _k ^s-t _k ⁰be described, wherein, t _k ⁰and t _k ^scorresponding articulation point k is when often organizing starting sample frame and terminating sample frame respectively.Defined from formula (2), J _k ⁱ=(x _k ⁱ, y _k ⁱ, z _k ⁱ), J _k ⁱ⁺¹=(x _k ⁱ⁺¹, y _k ⁱ⁺¹, z _k ⁱ⁺¹), then spatial motion vectors J _k ⁱj _k ⁱ⁺¹be expressed as:

J_{k}^{i} J_{k}^{i + 1} = (x_{k}^{i + 1} - x_{k}^{i}, y_{k}^{i + 1} - y_{k}^{i}, z_{k}^{i + 1} - z_{k}^{i})

| P_{m} P_{n} | = Σ_{j = m}^{n - 1} \sqrt{{(x_{j} - x_{j + 1})}^{2} + {(y_{j} - y_{j + 1})}^{2} + {(z_{j} - z_{j + 1})}^{2}}

According to limbs node diagnostic vector parameter defined above, various interactive action can be defined.Type of action is carried out classification elaboration by the difference according to metastomium and kinetic characteristic, three classes represent the limbs node diagnostic vector of action as shown in Figure 5, Fig. 5 a is the limbs node diagnostic vector schematic diagram that right leg side is played, Fig. 5 b is that the right hand draws round limbs node diagnostic vector schematic diagram, and Fig. 5 c is the limbs node diagnostic vector schematic diagram of both hands horizontal development.

According to formula (2), definition can indicate the limbs node diagnostic vector V (T that three classes represent action, k), when articulation point k arrives next particular sample point, end frame s is determined, the input parameter of current proper vector as state transition function is analyzed, by sample frame i zero setting, wait for end frame next time, then analyze, zero setting again, to the last a sampled point.

1) action is played for the right leg side in Fig. 5 a, extract the generic features data of right crus of diaphragm articulation point (k=19), thus definition limbs node diagnostic vector: wherein, L is leg length.

2) for the right hand rotary movement in Fig. 5 b, the generic features data of right hand articulation point (k=11) are extracted, thus definition limbs node diagnostic vector: wherein D is arm length.

3) for the both hands horizontal development action in Fig. 5 c, the generic features data of left/right swivel of hand point (k=7,11) are extracted, thus definition limbs node diagnostic vector:

1.3 posture sequence finite state machine structures

For human body natural's interactive action, there is diversity and polytrope feature ^[17], need one general and efficient method identification maneuver.Each action is made up of the continuous movement locus of the limbs joint point of correspondence, continuous print movement locus can carry out matching by discrete key point, the corresponding specific posture state of each key point, by identifying the transfer change procedure of each state, can realize the judgement of action.Based on above-mentioned thought, the present invention proposes the predefined limb action of posture sequence finite state machine method identification.Posture sequence represents one group of motion sequence that an action is described on a timeline by multiple posture, posture sequence finite state machine describes the transfer process between limited state of each action and each state, invention defines posture sequence finite state machine Λ, its five-tuple represents sees formula (3).

Λ＝(S,Σ,δ,s ₀,F) (3)

In formula (3), S represents state set { s ₀, s ₁..., s _n, f ₀, f ₁, each specific posture state of action is described; Σ represents limbs node diagnostic vector set and the limiting parameter alphabet of input wherein symbol " " presentation logic negative; δ is transfer function, is defined as S × Σ → S, represents that posture sequence finite state machine is transformed into successor states from current state; s ₀represent initial state; F={f ₀, f ₁be end-state set, represent respectively and identify success status and identify disarmed state.

In alphabet Σ, variable u represents the set of all limbs node diagnostic vector V corresponding to certain type of action, and proper vector represents the disperse drop field rule of movement locus in space lattice, can be constructed the track regular expression of action by a territory rule.

Path restriction p={xyz|x ∈ [x _min, x _max], y ∈ [y _min, y _max], z ∈ [z _min, z _max] specific posture is carried out to the scope control of key point, under any circumstance exceed predefine path domain, namely be true, be then marked as disarmed state.

Timestamp t ∈ [t _start, t _end] define the time of action required for current state to successor states transfer, if certain state of action does not transfer to follow-up effective status in official hour, namely be true, then jump to disarmed state.

Each action is made up of several typical static posture, the corresponding defined quantity of state of static posture, often kind of quantity of state is calculated in space lattice by key point characteristic, operating state transfer must meet the condition of path restriction p and timestamp t, thus identification maneuver type, understand user interactions intention.Can be described every attribute characteristic and each step transfer process of posture sequence finite state machine by five-tuple, the state graph model of posture sequence finite state machine operational process as shown in Figure 6.

At original state s ₀under, reach first effective status s according to predefined action ₁if the posture of subsequent time still in predefined scope, then reaches follow-up effective status s _k, by that analogy, until the state f that hits pay dirk ₀, i.e. identification maneuver success.Under initial and any effective status, if behavior exceeds path restriction or timestamp scope, then directly marking this sequence action is disarmed state, i.e. identification maneuver failure.After reaching any done state, current posture sequence limited row state machine runs complete, reinitializes the identification carrying out next group limb action.State-transition table 1 can be obtained by the constitutional diagram of posture sequence finite state machine.

Table 1 posture sequence finite state machine status transfer table

In table, k, x=0,1,2 ..., n, and k ≠ k+x≤n, n represents the middle effective status sum required for identification maneuver.The point territory rule that this posture sequence finite state machine accepts is

Defined by formula (3), adopt posture sequence finite state machine to be described the action in 1.2 joints.Input alphabet Σ={ a _i, b _i, c _i, d _i, e _i, f _i, g _i, h _i, m _i, wherein i=0,1, the spatial dimension that each alphabetical variable description sampled point is associated in space lattice model, namely puts territory.Movement locus can adopt multiple somes territories to carry out matching, and some domain string constitutes and describes the discretize of certain movement locus.As shown in Figure 7, the some territory subscript i=0 in x negative direction space represents, the some territory subscript i=1 in x positive dirction space represents.

Represent action for three classes that provide in 1.2 joints, be respectively that right leg side is played, the right hand draws circle and both hands horizontal development, as shown in Figure 7, table 2 gives the proper vector point domain string that three classes represent action to partial act track schematic diagram.

Table 2 three class represents the proper vector point domain string of action

Represent action	Proper vector point domain string
		Right leg side is played	a ₁c ₁
The right hand draws circle	d ₁e ₀f ₁g ₁h ₁
		Both hands horizontal development	(d ₀∧d ₁)(e ₀∧e ₁)(h ₀∧h ₁)

The track regular expression of action can be extracted by the proper vector point domain string of action.In alphabet Σ, Σ _{i (1-i)}=Σ _i∧ Σ _1-i, wherein i=0,1, represent that the spatial point territory along yoz plane symmetry is set up simultaneously.Formula (4) is seen after the track regular expression readjusting and simplifying of action.

R＝a _ic _i|d _ie _1-if _ig _ih _i|d _i(1-i)e _i(1-i)h _i(1-i) (4)

Show that three classes represent the finite state machine figure of action and symmetrical type of action thereof according to track regular expression (4), as shown in Figure 8, wherein, eliminate f ₁disarmed state, s ₀for original state, s _kfor transition effective status, f ₀the success status of status representative acceptable point domain string.Under initial and any effective status, if action exceeds path restriction or timestamp scope, then directly marking this action is disarmed state f ₁, i.e. identification maneuver failure.After reaching any done state, the posture sequence finite state machine of current action runs complete, reinitializes the identification carrying out next group action behavior.

Need synchronously to be optimized process to set of actions in posture sequence finite state machine operational process, optimized algorithm step is as follows:

1) initialization likely action set T}={ " right leg side is played ", " right hand draws circle ", " both hands horizontal development " ..., multiple somes domain string expression formulas that wherein each specific action is corresponding;

2) in posture sequence finite state machine operational process, all impossible actions got rid of from set after each step state transfer, all possible action retains in set;

3) when arriving done state, if end-state is disarmed state, then without any output, and restart; If be acceptable success status, then the only element of current collection is type of action, carries out type of action output, jumps to step 1, and circulation identifies.

Finally, the semanteme of interactive action defined by the user and purposes, user gives new semanteme by the type of action obtained, lift leg as left and right and represent scene walkthrough, right-hand man slides and represents document page turning, and both hands horizontal development representative raises the curtain, thus realizes body sense interactive application.

2 experimental results and analysis

2.1 experiment tests and result

The posture sequence finite state machine model proposed according to the present invention and the algorithm realization identification of 17 kinds of limb actions, and to test under the Windows7x64 system of Intel Xeon CPU (2.53GHz) X3440,4GB internal memory.Limb action definition is in table 3, and action classification is divided into leg action, single-handed exercise and double-handed exercise by the difference according to kinetic characteristic and metastomium.

Table 3 limb action definition list

Fig. 9 illustrates 17 kinds of predefine limb action Dynamic Recognition processes, and the partial status of the green expression identifying of human body front in figure, red point represents identification success status.

Carry out experiment test to the volunteer of 30 different heights and the bodily form, every person to be measured carries out three repeated sample tests to each limb action, 1530 action examples altogether.Action recognition test result confusion matrix is in table 4, and wherein None represents and any action do not detected.

Table 4 action recognition test result confusion matrix

T	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	None
																			A	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
B	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			C	0.0	0.0	98.9	0.0	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
D	0.0	0.0	0.0	98.9	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			E	0.0	0.0	0.0	0.0	94.4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	5.6
F	0.0	0.0	0.0	0.0	0.0	97.8	0.0	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.1
																			G	0.0	0.0	0.0	0.0	0.0	0.0	97.8	0.0	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.1
H	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			I	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
J	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			K	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0
L	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.2	0.0	0.0	0.0	96.7	0.0	0.0	0.0	0.0	0.0	1.1
																			M	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	3.3	0.0	0.0	0.0	95.6	0.0	0.0	0.0	0.0	1.1
N	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	96.7	0.0	0.0	0.0	3.3
																			O	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	94.4	0.0	0.0	5.6
P	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.2	95.6	0.0	2.2
																			Q	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	98.9	1.1
T	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	None
																			A	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
B	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			C	0.0	0.0	98.9	0.0	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
D	0.0	0.0	0.0	98.9	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			E	0.0	0.0	0.0	0.0	94.4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	5.6
F	0.0	0.0	0.0	0.0	0.0	97.8	0.0	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.1
																			G	0.0	0.0	0.0	0.0	0.0	0.0	97.8	0.0	1.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.1
H	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			I	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
J	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
																			K	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	100	0.0	0.0	0.0	0.0	0.0	0.0	0.0
L	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.2	0.0	0.0	0.0	96.7	0.0	0.0	0.0	0.0	0.0	1.1
																			M	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	3.3	0.0	0.0	0.0	95.6	0.0	0.0	0.0	0.0	1.1
N	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	96.7	0.0	0.0	0.0	3.3
																			O	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	94.4	0.0	0.0	5.6
P	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.2	95.6	0.0	2.2
																			Q	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	98.9	1.1

Test result shows, action recognition rate is all more than 94%, and most of action recognition rate reaches 100%, and average recognition rate reaches 98%, can meet the demand of body sense interactive application.Wherein, taking off from both feet (E) None-identified when not having obvious take-off amplitude, left/right draws circle (L, M) action may be mistakenly identified as left/right hand and upwards be lifted the action of (H, I), both hands push away obliquely and downwards (N, O) may there is Unidentified situation in action, and both hands horizontal development (P) may be mistaken for the action that both hands push away (O) obliquely.But on the whole, the method that the present invention proposes can identify above-mentioned limb action well, the actual test Recognition feedback time, between 0.060s-0.096s, meets real-time, interactive requirement.

2.2 Experimental comparison and analysis

Table 5 gives human motion recognition method and compares, wherein, and many case-based learnings method ^[12], subset of actions combined method ^[13]with SSS feature matching method ^[14]the data set adopted is MSR-Action3D Dataset, and the test data set that the present invention adopts is 1530 action examples of 30 actual users.In experiment, recognition accuracy is " low " below 0.6, between 0.6 ~ 0.8 be " in ", between 0.8 ~ 1.0 be " height ".

Table 5 human motion recognition method compares

Many case-based learnings method ^[12]from action data sequence, determine key frame example, thus derive action template, but action template only preserves form and the model of same class behavior, have ignored change, in robustness and real-time, performance is general; Subset of actions combined method ^[13]classify to articulation point subset, discrimination is higher, and the method focuses on it is the data stream rank of in advance segmentation, can not be used for never carrying out ONLINE RECOGNITION in splitting traffic, although robustness is higher, but calculation of complex, real-time is poor; SSS feature matching method ^[14]a characteristics dictionary and gesture model is set up by off-line training, for each frame data distributing labels of the action data stream of unknown action, by extracting SSS feature on-line prediction type of action, never ONLINE RECOGNITION can be carried out in splitting traffic, robustness is higher, but calculation of complex, Recognition feedback time instability (-1.5s represents that 1.5s identifies in advance, and+1.5s represents that delaying 1.5s identifies).Above three kinds of methods all have employed machine learning and template matching technique realizes, such algorithm is to each action recognition characteristics of needs dictionary library, during for expansion type of action identification, need to collect a large amount of action data and carry out off-line training, to specific action identification and the training set degree of coupling higher, therefore, extendability is general.FAAST action identification method ^[11]the event triggered fashion such as angle, distance, speed are adopted to identify, the method calculated amount is little, real-time is good, extendability is stronger, for defined simple motion, recognition accuracy is high, but itself has limitation due to event triggering technique, robustness is lower, and more difficult to continuous action identification.

Should be understood that, for those of ordinary skills, can be improved according to the above description or convert, and all these improve and convert the protection domain that all should belong to claims of the present invention.

Claims

1. a posture sequence finite state machine action identification method, it is characterized in that, first, the limbs node data that body sense interactive device obtains is transformed to the limbs nodal coordinate system of customer-centric, by definition limbs node diagnostic vector, sampling analysis is carried out to limb action sequence, sets up predefine movement locus regular expression, structure posture sequence finite state machine, thus the parsing realized predefine limb action and identification.

2. posture sequence finite state machine action identification method according to claim 1, it is characterized in that, described limbs nodal coordinate system method for building up, definition user's space coordinate is: with user's right-hand lay for x-axis positive dirction, it is y-axis positive dirction directly over head, interaction equipment dead ahead is z-axis positive dirction, and Liang Jian center is true origin; Coordinate points P'(x' under coordinate points P (x, y, z) under Kinect space coordinates oxyz and user's space coordinate system o'x'y'z', y', z') transformation relation can be described as formula (1).

(x^{'}, y^{'}, z^{'}, 1) = (x, y, z, 1) (\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ - x_{0} & - y_{0} & z_{0} & 1 \end{matrix}) (\begin{matrix} \cos θ & 0 & - \sin θ & 0 \\ 0 & 1 & 0 & 0 \\ \sin θ & 0 & \cos θ & 0 \\ 0 & 0 & 0 & 1 \end{matrix}) - - - (1)

3. posture sequence finite state machine action identification method according to claim 1, it is characterized in that, described limbs node diagnostic Definition of Vector, limbs node diagnostic vector comprises articulation point spatial motion vectors, articulation point run duration interval and articulation point space length, formula (2) is shown in limbs node diagnostic vector V definition;

V (T, k) = [\cup_{i = 0}^{s - 1} J_{k}^{i} J_{k}^{i + 1}, Δ t_{k}^{s} | P_{m} P_{n} |] - - - (2)

J_{k}^{i} J_{k}^{i + 1} = (x_{k}^{i + 1} - x_{k}^{i}, y_{k}^{i + 1} - y_{k}^{i}, z_{k}^{i + 1} - z_{k}^{i})

| P_{m} P_{n} | = Σ_{j = m}^{n - 1} \sqrt{{(x_{j} - x_{j + 1})}^{2} + {(y_{j} - y_{j + 1})}^{2} + {(z_{j} - z_{j + 1})}^{2}}

4. posture sequence finite state machine action identification method according to claim 1, is characterized in that, described posture sequence finite state machine structure, definition posture sequence finite state machine Λ, and its five-tuple represents sees formula (3);

Λ＝(S,Σ,δ,s0,F) (3)