DE102012025564A1

DE102012025564A1 - Device for recognizing three-dimensional gestures to control e.g. smart phone, has Hidden Markov model (HMM) which executes elementary object positions or movements to identify positioning motion sequences

Info

Publication number: DE102012025564A1
Application number: DE201210025564
Authority: DE
Inventors: Bernd Burchard
Original assignee: Elmos Semiconductor SE
Current assignee: Elmos Semiconductor SE
Priority date: 2012-05-23
Filing date: 2012-05-23
Publication date: 2013-11-28

Abstract

The device has a HMM which is used as a hand model or finger model to execute elementary object positions or movements for identifying positioning motion sequences. A mechanically controllable joint of the HMM model is pivoted about three possible rotational degrees of freedom. An actuator of the HMM model is shifted by one translational degree of freedom. Independent claims are included for the following: (1) a method for transmission of three-dimensional gesture information; and (2) a system for recognizing three-dimensional gestures.

Description

Einleitungintroduction

Die kontinuierliche Erfassung von Gesten und deren Parametrisierung und Zuordnungsparametern ist eine wichtige Grundfunktion in vielen, insbesondere mobilen Applikationen wie beispielsweise bei der Steuerung von Computern, Robotern, Mobiltelefonen und Smart-Phones.The continuous acquisition of gestures and their parameterization and assignment parameters is an important basic function in many, especially mobile applications such as in the control of computers, robots, mobile phones and smart phones.

Ein wesentliches Problem vieler Gestenerkennungssysteme ist, dass sie auf der Verarbeitung von Bildsignalen beruhen. Hierfür sind Kameras und aufwendige Bildverarbeitungseinrichtungen erforderlich. Diese benötigen größere Mengen an Ressourcen wie beispielsweise Rechenleistung, Speicherbedarf und elektrische Energie. In einem Mobiltelefon sind aber genau diese sehr begrenzt. Dies betrifft insbesondere den Stromverbrauch, den Speicherplatz und die Lebensdauer der Verarbeitungseinheiten.A major problem with many gesture recognition systems is that they rely on the processing of image signals. This requires cameras and complex image processing equipment. These require larger amounts of resources such as computing power, memory requirements and electrical energy. In a mobile phone but these are very limited. This concerns in particular the power consumption, the storage space and the lifetime of the processing units.

Alternative Systeme verfügen über eine geringere Empfindlichkeit, was die Erkennungsleistung verschlechtert. Des Weiteren sind diese Systeme nicht applikationsunabhängig, weshalb die erneute, aufwendige und damit sehr teure Aufnahme von Gestendatenbanken erforderlich ist. Außerdem sind die meisten Systeme empfindlich gegenüber Umgebungslicht und anderen Störeinflüssen.Alternative systems have lower sensitivity, which degrades recognition performance. Furthermore, these systems are not application-independent, which is why the renewed, expensive and thus very expensive recording of gesture databases is required. In addition, most systems are sensitive to ambient light and other disturbances.

Schließlich verfügen, die meisten Anwendungssysteme über weitere Sensoren bzw. sensorielle Fähigkeiten, die mit dem Betrieb eines Gestenerkennungssystems kombiniert werden sollten.Finally, most application systems have additional sensors or sensor capabilities that should be combined with the operation of a gesture recognition system.

Aufgabe der ErfindungObject of the invention

Die Erfindung stellt sich die Aufgabe, eine permanent verfügbare, sprecher- und geräteunabhängige Gestenschnittstelle für Mensch-Maschineschnittstellen bereitzustellen.The invention has as its object to provide a permanently available, speaker and device-independent gesture interface for human-machine interfaces.

Hierzu löst sie die Aufgabe, die Übertragbarkeit von Gestendatenbanken durch eine Kalibriermethode und eine geeignete Vorrichtung zu verbessern. Die Vorrichtung hat die Aufgabe Norm-Gesten und eine Norm-Gestensprache weltweit wiederholbar darstellen zu können.To this end, it solves the problem of improving the portability of gesture databases by a calibration method and a suitable device. The device has the task of being able to reproduce standard gestures and a standard gesture language around the world in a repeatable manner.

Hierzu verwendet sie ein Verfahren gemäß den Ansprüchen 1 bis 6 und eine Vorrichtung gemäß den Ansprüchen 8 bis 12 For this purpose, it uses a method according to claims 1 to 6 and a device according to claims 8 to 12

Beschreibung des Grundgedankens der ErfindungDescription of the basic idea of the invention

Die Beschreibung der Erfindung gliedert sich in folgende Teile:

– Dem eigentlichen Erfassungssystem, insbesondere ein multidimensional rückgekoppeltes optisches Messsystem, das hier nicht beansprucht wird und nur ein Beispiel eines geeigneten kalibrierbaren Gestenerkennungssystems darstellt.
– Einem zur Verwendung des obigen Erfassungssystems geeigneten Smart-Phones bzw. Roboters
– Einem beispielhaften kalibrierbaren Verarbeitungsverfahren und Erkennungssystem zur Ermittlung eines Gestenerkennungsergebnisses, das durch das Erkennungssystem vollautomatisch ausgeführt werden kann
– Dem erfindungsgemäßen Verfahren mittels einer erfindungsgemäßen Vorrichtung zur Adaption des Erfassungssystems und der zugehörigen Datenbanken an eine konkrete Hardware-Plattform
– Ein Verfahren zur Aufnahme der geräte- und sprecherunabhängigen Gestendatenbanken, das hier beansprucht wird
– Der hier beanspruchten Grundstruktur einer zugehörigen maschinell darstellbaren Gestensprache mittels einer erfindungsgemäßen Vorrichtung zur plattformunabhängigen Steuerung eines Smart-Phones und ähnlicher Anwendungen, bei denen ein Erkennunssystem durch Gesten beeinflusst werden soll.

The description of the invention is divided into the following parts:

- The actual detection system, in particular a multidimensional feedback optical measuring system, which is not claimed here and represents only one example of a suitable calibratable gesture recognition system.
A smart phone or robot suitable for use of the above detection system
An exemplary calibratable processing method and recognition system for determining a gesture recognition result that can be performed fully automatically by the recognition system
- The inventive method by means of a device according to the invention for adapting the detection system and the associated databases to a specific hardware platform
A method of recording the device and speaker independent gesture databases as claimed herein
- The claimed here basic structure of an associated machine-representable gesture language by means of a device according to the invention for platform-independent control of a smart phone and similar applications in which a Erkennunssystem to be influenced by gestures.

Diese werden in den folgenden Abschnitten beschrieben.These are described in the following sections.

Beispielhaftes kalibrierbares ErkennungssystemExemplary calibratable recognition system

Ein erstes beispielhaftes Messsystem besteht in erster Linie aus einem typischerweise triangulationsfähigen optischen Messsystem mit typischerweise einem Sendersystem und einem Empfängersystem oder einem kombinierten Sender/Empfängersystem, das sehr schnell zwischen Sende-Phase und Empfangsphasen wechseln kann. Jedes Sendesystem kann dabei aus mehreren Sendern bestehen. Ebenso kann jedes Empfängersystem aus mehreren Empfängen bestehen. Gleiches gilt für die kombinierten Empfänger/Sendersysteme.A first exemplary measuring system consists primarily of a typically triangulation-capable optical measuring system with typically a transmitter system and a receiver system or a combined transmitter / receiver system that can switch very quickly between transmission phase and reception phases. Each transmission system can consist of several transmitters. Likewise, each one Receiver system consist of several receptions. The same applies to the combined receiver / transmitter systems.

Um die Arbeitsweise zu verdeutlichen, wird die Vorrichtung zunächst anhand eines Systems mit nur einem Sender und einem Empfänger mit Hilfe der schematischen 1 erläutert. An dieser Stelle sei darauf hingewiesen, dass sämtliche Figuren in diesem Dokument nur solche Details enthalten, die eine fachkundige Person zum Verständnis und zum Nachvollziehen des erfinderischen Gedankes benötigt.To illustrate the operation, the device is first using a system with only a transmitter and a receiver with the aid of the schematic 1 explained. It should be noted at this point that all figures in this document contain only those details that a skilled person needs to understand and understand the inventive idea.

Die beispielhafte Vorrichtung besteht aus mindestens einem ersten Transmitter (3) der in eine erste Übertragungstrecke bestehend aus den Übertragungsteilstrecken (4) und (6) hineinsendet. Diese Übertragungsstrecke (4, 6) kann beispielsweise durch ein Objekt in die besagte erste Übertragungsteilstrecke (4) und die zweite Übertragungsteilstrecke (6) geteilt werden. Weitere serielle und parallele Teilungen in weitere serielle und parallele Übertragungsteilstrecken sind denkbar. (Dies kommt der Möglichkeit eines unidirektionalen Übertragungsnetzwerkes mit einem Eingängen und einem Ausgängen gleich.)The exemplary device consists of at least one first transmitter ( 3 ) in a first transmission path consisting of the transmission sections ( 4 ) and ( 6 ). This transmission link ( 4 . 6 ) can, for example, by an object in said first transmission section ( 4 ) and the second transmission section ( 6 ) to be shared. Further serial and parallel divisions into further serial and parallel transmission sections are conceivable. (This equals the possibility of a unidirectional transmission network with one input and one output.)

Am Ende der ersten Übertragungsteilstrecke (4) befindet sich ein Wechselwirkungsbereich (2) in dem der Strom der physikalischen Größen (1) beispielsweise Ort, Ausrichtung und Oberflächenbeschaffenheit eines Objektes, insbesondere beispielsweise einer Hand, mit dem Signalstrom des Transmitters (3), der aus der ersten Übertragungsteilstrecke (4) austritt, wechselwirkt (5). Diese Wechselwirkung (5) verändert die Eigenschaften des Übertragungssignals des Transmitters (3). Diese Eigenschaften können beispielsweise Amplitude, Phase, Spektrum etc. sein. Somit prägt das Objekt dem Signal des Transmitters Spuren seiner Eigenschaften auf, die später genutzt werden sollen, um Rückschlüsse auf das Objekt zu ermöglichen.At the end of the first transmission link ( 4 ) there is an interaction region ( 2 ) in which the stream of physical quantities ( 1 ), for example, location, orientation and surface texture of an object, in particular for example a hand, with the signal current of the transmitter ( 3 ), which from the first transmission section ( 4 ) exits, interacts ( 5 ). This interaction ( 5 ) alters the characteristics of the transmission signal of the transmitter ( 3 ). These properties may be, for example, amplitude, phase, spectrum, etc. Thus, the object imposes on the signal of the transmitter traces of its properties, which are later to be used to allow conclusions about the object.

Das Signal des Transmitters (3) durchläuft nach dieser Modifikation (5) typischerweise die zweite Übertragungsteilstrecke (6) und wird von einem Empfänger oder Sensor (7) aufgefangen.The signal of the transmitter ( 3 ) goes through this modification ( 5 ) typically the second transmission link ( 6 ) and is controlled by a receiver or sensor ( 7 ).

Es verbleibt das Problem der Unabhängigkeit von Umgebungseinflüssen. Hierzu wird das Empfangssignal am Empfänger (7) typischerweise noch im physikalischen Medium durch ein Kompensationssignal so aufgefüllt, dass sich nahezu ein konstantes Gesamtsignal ergibt. Hierzu sendet ein Kompensationssender (9) über eine zweite Übertragungsstrecke ein entsprechendes Kompensationssignal, das sich im Empfänger (7) mit dem Signal des Transmitters (3) nach dessen Durchgang durch die Übertragungsstrecken (4, 6) überlagert. Hier sei der Vollständigkeit erwähnt, dass das Kompensationssignal sowohl im Medium, hier beispielsweise als Licht, erzeugt werden kann, als auch als Signal, hier beispielsweise als elektrisches Stromsignal, das den Photostrom einer Fotodiode emuliert, direkt in die Leitung des Verstärkers eingespeist werden kann.There remains the problem of independence from environmental influences. For this purpose, the received signal at the receiver ( 7 ) typically filled in the physical medium by a compensation signal so that almost a constant total signal results. For this purpose, a compensation transmitter ( 9 ) via a second transmission path, a corresponding compensation signal, which in the receiver ( 7 ) with the signal of the transmitter ( 3 ) after its passage through the transmission links ( 4 . 6 ) superimposed. It should be mentioned here that the compensation signal can be generated both in the medium, here for example as light, and as a signal, here for example as an electrical current signal which emulates the photocurrent of a photodiode, can be fed directly into the line of the amplifier.

Hierzu steuert ein Regler (8) (mit Controller beschriftet) einen Kompensationssender (9), der über eine vordefinierte und im Wesentlichen nicht durch den Strom physikalischer Einflüsse (1) beeinflusste, typischer Weise separate Übertragungsstrecke (10) ebenfalls ein Signal in den Empfänger (7) einspeist, wo die beiden Signale sich, wie gesagt, linear oder nicht linear überlagern. Der Empfänger (7) ist im Idealfall jedoch ein linearer Empfänger, in dem sich die Signale des Transmitters (3) (nach Modifikation (5)) und das Signal des Kompensators (9) linear überlagern. Ist der Empfänger (7) nicht linear, so addiert sich eine Komponente die proportional zu dem Produkt der beiden überlagernden Signale ist. Auch wenn dieser Fall regelungstechnisch berücksichtigt werden kann, so wird er hier nicht näher erläutert. Für den Fachmann ist es jedoch leicht möglich, den hier beschriebenen Regler so zu modifizieren, dass eine Berücksichtigung solcher Nichtlinearitäten jederzeit möglich ist.To do this, a controller ( 8th ) (labeled Controller) a compensation transmitter ( 9 ), which has a predefined and essentially not by the current physical influences ( 1 ), typically separate transmission link ( 10 ) also a signal in the receiver ( 7 ) feeds in, where the two signals, as I said, superimpose linear or non-linear. The recipient ( 7 ) is ideally a linear receiver in which the signals of the transmitter ( 3 ) (after modification ( 5 )) and the signal of the compensator ( 9 ) superimpose linearly. Is the recipient ( 7 ) is not linear, then a component is added which is proportional to the product of the two superimposed signals. Although this case can be considered in terms of control technology, it will not be explained in detail here. However, it is easily possible for the person skilled in the art to modify the regulator described here in such a way that consideration of such nonlinearities is possible at any time.

Der Controller (8) erzeugt auch das Sendesignal S5 des Transmitters (3). Wichtig bei der Erzeugung des Sendesignals S5 ist, dass dieses Sendesignal S5 bezüglich des in diesem Controller (8) durchgeführten Skalarprodukts orthogonal zu den Sendesignalen anderer Controller ist. Dies gilt für den Fall, dass sich weitere Controller im System befinden.The controller ( 8th ) also generates the transmit signal S5 of the transmitter ( 3 ). Important in the generation of the transmission signal S5 is that this transmission signal S5 with respect to the in this controller ( 8th ) scaled orthogonal to the transmit signals of other controllers. This applies in the event that there are other controllers in the system.

Der Controller (8) liefert ein vektorielles Ausgangssignal (24) das im Folgenden analysiert wird. Hierbei werden Dimension und Inhalt so gewählt, dass die Gestenerkennung optimiert wird.The controller ( 8th ) provides a vectorial output signal ( 24 ) which is analyzed below. Here dimension and content are chosen so that the gesture recognition is optimized.

Für die Analyse des durch den Controller (8) ausgegebenen vektoriellen Signals (24) wie auch für die Synthese des Prüfsignals können verschiedene Methoden und die zu diesen Methoden passenden Geräte bzw. Systemkomponenten eingesetzt werden.For the analysis of the by the controller ( 8th ) output vectorial signal ( 24 ) as well as for the synthesis of the test signal, different methods and the appropriate devices or system components can be used.

Ergebnis dieser Analyse bei Ansteuerung durch den Controller (8) ist ein Satz von Parametern, der durch weitere Parameter, die beispielsweise durch optionale weitere Sensoren (37), Touchpads, Schalter, Taster und andere Messsysteme (37) gewonnen werden können, ergänzt werden kann. Das Ergebnis ist ein so genannter Quantisierungsvektor, dessen Komponenten, die gemessenen Parameter, im Allgemeinen aber nicht vollkommen unabhängig voneinander sein werden. Jeder Parameter für sich weist in der Regel eine zu geringe Selektivität für die genaue Gestenunterscheidung in komplexen Zusammenhängen auf. Durch die Erstellung eines oder mehrerer solcher Quantisierungsvektoren aus dem kontinuierlichen Strom analoger physikalischer Parameterwerte (24) in typischerweise regelmäßigen oder regulären Zeitabständen durch das physikalische Interface (23) (siehe 1) mittels geeigneter entsteht ein zeitlich und wertemäßig quantisierter mehrdimensionaler Parameterdatenstrom (24).Result of this analysis when controlled by the controller ( 8th ) is a set of parameters, which can be determined by further parameters, for example by optional further sensors ( 37 ), Touchpads, switches, Pushbuttons and other measuring systems ( 37 ) can be added. The result is a so-called quantization vector whose components, the measured parameters, will generally not be completely independent. Each parameter itself typically has too low a selectivity for accurate gesture discrimination in complex contexts. By generating one or more such quantization vectors from the continuous stream of analog physical parameter values ( 24 ) at typically regular or regular time intervals through the physical interface ( 23 ) (please refer 1 ) by means of suitable arises a temporally and value-wise quantized multidimensional parameter data stream ( 24 ).

Dieses so gewonnene mehrdimensionale Signal (24) in Form eines oder mehrerer Ströme von Quantisierungsvektoren wird in einem ersten beispielhaften Bearbeitungsschritt zunächst in einzelne Frames definierter Länge unterteilt, gefiltert, normiert, dann orthogonalisiert und ggf. durch eine nichtlineare Abbildung – z. B. Logarithmierung und Cepstrumanalyse – geeignet verzerrt. Des Weiteren werden hier auch Ableitungen der so generierten Werte gebildet. Dieses Verfahren ist aus der Mustererkennung und hier der Spracherkennung und der Materialerkennung im speziellen seit langem als Feature-Extraktion (11) bekannt. Die Möglichkeiten sind mannigfaltig und können der Literatur entnommen werden. Schließlich findet eine Multiplikation des mehrdimensionalen Quantifizierungssektors mit einer so genannten LDA-Matrix statt.This multi-dimensional signal ( 24 ) in the form of one or more streams of quantization vectors is first subdivided into individual frames of defined length in a first exemplary processing step, filtered, normalized, then orthogonalized and possibly by a non-linear mapping - z. B. logarithmization and cepstrum analysis - appropriately distorted. Furthermore, derivatives of the values thus generated are also formed here. This method has long been used as feature extraction from pattern recognition and here speech recognition and material recognition ( 11 ) known. The possibilities are manifold and can be found in the literature. Finally, a multiplication of the multidimensional quantification sector with a so-called LDA matrix takes place.

Der nun folgende Schritt der Erkennung, kann beispielsweise auf zwei unterschiedliche Methoden durchgeführt werden:

a) durch ein Neuronales Netz oder
b) durch einen HMM-Erkenner
c) durch ein Petri-Netz

The following step of recognition, for example, can be performed in two different ways:

a) through a neural network or
b) by an HMM recognizer
c) through a Petri net

Zunächst wird der HMM-Erkenner (1) beschrieben:
Mit Hilfe der besagten vordefinierten LDA-Matrix (14) wird der so modifizierte Datenstrom (38) so vom mehrdimensionalen Eingangsparameterraum auf einen neuen Parameterraum abgebildet, wodurch seine Selektivität maximal wird. Die Komponenten der dabei gewonnenen neuen transformierten Merkmalsvektoren werden hierbei nicht nach realen physikalischen oder sonstigen Parametern, sondern nach maximaler Signifikanz ausgewählt, was besagte maximale Selektivität zur Folge hat.First, the HMM recognizer ( 1 ):
Using the predefined LDA matrix ( 14 ), the modified data stream ( 38 ) is thus mapped from the multidimensional input parameter space to a new parameter space, whereby its selectivity becomes maximum. The components of the new transformed feature vectors obtained in this case are not selected according to real physical or other parameters, but according to maximum significance, which results in said maximum selectivity.

Die LDA-Matrix wurde in der Regel zuvor aufgrund von Beispieldatenströmen aus der Datenbasis (18) mit bekannten Gestendatensätzen zuvor durch das Training (17) offline berechnet. Das Verfahren weicht im Falle einer geräteunabhängigen Gestenerkennung nun in einem entscheidenden Schritt vom Standard-Verfahren ab: (2)
Wenn Sorge getragen wird, dass alle Elemente der Feature-Extraktion (11) aus zumindest lokal umkehrbaren Funktionen bestehen, so können Abweichungen der Geometrie etc. in Form einer annähernd linearen Transformationsfunktion berücksichtigt werden.The LDA matrix was typically previously extracted from the database (due to example data streams). 18 ) with known gesture records previously through the training ( 17 ) calculated offline. In the case of device-independent gesture recognition, the method now deviates from the standard procedure in a decisive step: ( 2 )
If care is taken that all elements of the feature extraction ( 11 ) consist of at least locally reversible functions, deviations of the geometry etc. in the form of an approximately linear transformation function can be taken into account.

Ist daher für eine geometrische Anordnung von Sendern und Empfängern in einer wohl definierten Applikation die LDA-Matrix (14) ermittelt worden, so kann diese durch einfache Matrix-Multiplikation mit einer Geräte/Applikationsmatrix (26) an die jeweilige Geometrie des konkreten Gerätes angepasst werden. (2) Auf diese Weise erhält man einen geräteunabhängigen und sprecherunabhängigen Gestenerkenner, der leicht von Gerät zu Gerät – also beispielsweise von einem Spielkonsolentyp zu einem Smart-Phon – transportiert werden kann.Is therefore for a geometric arrangement of transmitters and receivers in a well-defined application, the LDA matrix ( 14 ), this can be determined by simple matrix multiplication with a device / application matrix ( 26 ) are adapted to the respective geometry of the specific device. ( 2 ) This way, you get a device-independent and speaker-independent gesture recognizer that can easily be transported from device to device, such as a game console type to a smart phone.

Diese Matrix (26) kann beispielsweise so ermittelt werden, dass eine Anzahl von Test-Nutzern vorgegebene und normierte Test-Gesten für die konkrete zu erkennende Gestenerkennungsapplikation ausführt. Da die Gesten bekannt sind, kann nun eine einfache lineare Transformation zwischen den neuen Vektoren und den zuvor auf einem anderen Gerät aufgenommenen Vektoren der Database (18) durch ein entsprechendes Trainingstool (25) berechnet werden.This matrix ( 26 ) may be determined, for example, such that a number of test users perform predetermined and normalized test gestures for the particular gesture recognition application to be recognized. Since the gestures are known, a simple linear transformation between the new vectors and the previously recorded on another device vectors of the database (now 18 ) through an appropriate training tool ( 25 ) be calculated.

Um die Test-Gesten weltweit reproduzierbar zu gestalten, ist es sinnvoll, diese durch geeignete Vorrichtungen (z. B. motorisierte Puppen oder Roboter) standardisiert ausführen zu lassen. (9) Dies ist der Kern der hier beanspruchten Erfindung. Hierdurch können diese künstlichen Gesten genau mit den abgespeicherten Prototypen für diese Roboter basierenden Referenzgesten verglichen werden. Das Erkennungsergebnis kann quantifiziert werden und für Mess-, Test-, Verifikations- und Kalibrationszwecke verwendet werden, was ebenfalls beansprucht wird. Dies wird anhand von 11 erläutert. Sie zeigt ein beispielhaftes Mobiltelefon (33) über dem sich eine Modellhand (32) befindet. Diese Modellhand kann beispielsweise die Hand einer Schaufensterpuppe sein. Die Hand wird durch einen Roboterarm (35), der segmentiert ist, bewegt. Zwischen den in der Länge (translatorisch) variabelen Segmenten (36) befinden sich Gelenke (35), die die rotatorische Bewegung um ein, zwei oder drei rotatorische Freiheitsgrade ermöglichen. Es ist sinnvoll dass die Translation in drei Raumrichtungen möglich ist. In 9 ist nicht exakt gezeigt, wie die Hand bewegt wird, das heißt durch welchen Motor, sondern nur wie die Aufnahme der Standardgesten/Normierungsgesten das Prinzip verläuft. Ist die Größe der Hand (32) und deren Form, Farbe etc. genormt, sowie deren Ausrichtung und Abstand zum Mobiltelefon (33), die Beleuchtung, die Reflexionseigenschaften des Behältnisses und der Hand selbst, in dem sich die gesamte Vorrichtung und das Mobiltelefon (33) befindet, und die Bewegung, hier angedeutet durch eine Wischbewegung (34), definiert, so lässt sich das Gestenerkennungsergebnis einer solchen definierten Bewegung (34) vorhersagen und damit mit dem erhaltenen Ergebnis vergleichenIn order to make the test gestures reproducible worldwide, it makes sense to have them standardized by means of suitable devices (eg motorized dolls or robots). ( 9 This is the gist of the invention claimed herein. As a result, these artificial gestures can be compared exactly with the stored prototypes for these robot-based reference gestures. The recognition result can be quantified and used for measurement, testing, verification and calibration purposes, which is also claimed. This is based on 11 explained. It shows an exemplary mobile phone ( 33 ) over which a model hand ( 32 ) is located. This model hand can be, for example, the hand of a mannequin. The hand is held by a robot arm ( 35 ), which is segmented, moves. Between the length (translationally) variable segments ( 36 ) are joints ( 35 ), which allow the rotational movement by one, two or three rotational degrees of freedom. It makes sense that the translation in three directions is possible. In 9 is not exactly shown how the hand is moved, that is, by which motor, but only how the recording of the standard gestures / normalization gestures the principle runs. Is the size of the hand ( 32 ) and their shape, color etc. standardized, as well as their orientation and distance to the mobile phone ( 33 ), the lighting, the reflective properties of the container and of the hand itself, in which the entire device and the mobile phone ( 33 ), and the movement, indicated here by a wiping movement ( 34 ) defines the gesture recognition result of such a defined movement ( 34 ) and thus compare with the result obtained

Das Ergebnis dieser Berechnung ist eine applikationsspezifische LDA-Matrix (26). (siehe auch 2) Der Vorteil ist nun, dass der Datenstrom (38), der die Feature-Extraktion verlässt, sprecherunabhängig und weitestgehend geräteunabhängig ist. Somit kann mit Hilfe einer genormten maschinellen Vorrichtung eine Rekalibration eines Gestenerkenners beim Wechsel der Hardware-Plattform stattfinden.The result of this calculation is an application-specific LDA matrix ( 26 ). (see also 2 ) The advantage now is that the data stream ( 38 ), which leaves the feature extraction, is speaker-independent and largely device-independent. Thus, with the aid of a standardized machine device, a recalibration of a gesture recognizer can take place when changing the hardware platform.

Der wesentliche wirtschaftliche Vorteil dieses Verfahrens zum Transfer einer Gestendatenbank von einem Gerätetyp zum anderen ist, dass die teure Aufnahme der Gestendatenbanken nur noch einmal mit realen Gestensprechern ausgeführt werden muss. Diese Kosten können bei bis zu 1 Mio Euro je Applikation liegen.The main economic advantage of this method of transferring a gesture database from one device type to another is that the costly recording of the gesture databases needs to be done only once with real gesture makers. These costs can amount to up to 1 million euros per application.

Die anderen entsprechenden Verfahren sind aus der Sprach- und Materialerkennung bekannt. Gleichzeitig werden die Prototypen aus diesen Beispieldatenströmen in der Datenbasis (18) in den Koordinaten des neuen Parameterraumes berechnet und in einer Prototypendatenbank (15) hinterlegt. Diese kann neben diesen statistischen Daten auch Anweisungen für ein Rechnersystem enthalten, was bei einer erfolgreichen bzw. fehlgeschlagenen Erkennung des jeweiligen Prototyps geschehen soll. In der Regel wird es sich bei dem Rechnersystem um das Rechnersystem (z. B. ein Mobiltelefon) handeln, dessen Mensch-Maschine-Schnittstelle (im Folgenden mit HMI bezeichnet) eine Gestenerkennung darstellen soll, die auf dem erfindungsgemäßen Verfahren beruht.The other corresponding methods are known from speech and material recognition. At the same time, the prototypes from these sample data streams in the database ( 18 ) in the coordinates of the new parameter space and in a prototype database ( 15 ) deposited. In addition to these statistical data, this can also contain instructions for a computer system, which should be used in the case of a successful or failed detection of the respective prototype. As a rule, the computer system will be the computer system (for example a mobile telephone) whose human-machine interface (referred to below as HMI) is intended to represent a gesture recognition based on the method according to the invention.

Die so gewonnenen Muster-Vektoren (38), die von der Feature-Extraction (11) ausgegeben werden, werden nun mit diesen vorab eingespeicherten, d. h. erlernten Gestenprototypen (15) beispielsweise durch Berechnung des euklidischen Abstands zwischen einem Quantisierungsvektor in den Koordinaten des neuen Parameterraumes und allen diesen zuvor abgespeicherten Prototypen (15) in der Emissionsberechnung (12) verglichen. Hierbei werden mindestens zwei Erkennungen geleistet:

1. Entspricht der erkannte Quantisierungsvektor (38) einem der vorgespeicherten Quantisierungsvektor-Prototypen (= vorbekannte Gesten) oder nicht und mit welcher Wahrscheinlichkeit und Zuverlässigkeit?
2. Wenn es eine der bereits gespeicherten Gesten ist, welche ist es und mit welcher Wahrscheinlichkeit und Zuverlässigkeit? Um die erste Erkennung zu leisten, werden in der Regel auch Dummy-Prototypen in der Prototypendatenbank (15) abgespeichert, die alle im Betrieb vorkommenden parasitären Parameterkombinationen weitestgehend abdecken sollten. Die Abspeicherung besagter Prototypen geschieht in einer Datenbank (15), die im Folgenden auch mit Code-Book (15) bezeichnet wird.

The thus obtained pattern vectors ( 38 ) generated by feature extraction ( 11 ) are now output with these previously stored, ie learned gesture prototypes ( 15 ), for example by calculating the Euclidean distance between a quantization vector in the coordinates of the new parameter space and all these previously stored prototypes ( 15 ) in the emission calculation ( 12 ) compared. At least two detections are made:

1. Does the detected quantization vector ( 38 ) one of the pre-stored quantization vector prototypes (= known gestures) or not and with what probability and reliability?
2. If it is one of the already stored gestures, what is it and with what probability and reliability? In order to make the first recognition, dummy prototypes are usually also included in the prototype database ( 15 ), which should cover as much as possible all parasitic parameter combinations occurring during operation. The storage of said prototypes is done in a database ( 15 ), which is also described below with Code-Book ( 15 ) referred to as.

Wird der in der Datenbank/dem Code-Book (15) ebenfalls abgelegte minimale halbe bekannte euklidische Abstand zwischen zwei Quantisierungsvektoren zweier verschiedener Prototypen von Elementargesten in neuen Koordinaten zu einem Prototyp aus dem Code-Book unterschritten, so wird dieser als erkannt gewertet. Es kann ab diesem Zeitpunkt ausgeschlossen werden, dass weitere im Verlauf einer weiter fortgeseteten Suche berechnete Abstände zu anderen Prototypen des Code-Books (15) noch kleinere Abstände liefern können.If the in the database / the code book ( 15 ) also dropped below minimum half known Euclidean distance between two quantization vectors of two different prototypes of elementary gestures in new coordinates to a prototype from the Code Book, this is considered to be recognized. It can be excluded from this point on that further distances calculated in the course of a continued search to other prototypes of the code book ( 15 ) can deliver even smaller distances.

Die Berechnung des minimalen euklidischen Abstands erfolgt dabei nach der folgenden Formel:

The calculation of the minimum Euclidean distance is done according to the following formula:

Dabei steht dim_cnt für den Dimensions-Index der bis zur maximalen Dimension des Feature-Vektors (38) dim durchläuft.Where dim_cnt stands for the dimension index of the maximum dimension of the feature vector ( 38 ) goes through dim.

FV_{dim_cnt} steht für die Komponente des feature-Vektors (38) entsprechend des Index dim_cnt.FV _{dim_cnt} stands for the component of the feature vector ( 38 ) according to the index dim_cnt.

Cb_cnt steht für die Nummer des Eintrags im Code-Book (15), also für die Cb_cnt entsprechende Elementargeste. Cb_cnt stands for the number of the entry in the code book ( 15 ), so for the Cb_cnt corresponding elementary gesture.

Cb_{CB_cnt,dim_cnt} steht dementsprechend für die dim_cnt entsprechende Komponente des prototypischen Code-Book-Feature-Vektor-Eintrags, der der Cb_cnt entsprechenden Elementargeste zugeordnet ist. _Accordingly, Cb _{CB_cnt, dim_cnt} stands for the dim_cnt corresponding component of the prototype code book feature vector entry associated with the elementary gesture corresponding to Cb_cnt.

Dist_{FV_CbE} steht daher für die erhaltene minimale euklidische Distanz. Bei der Suche nach der kleinsten euklidischen Distanz wird die Nummer Cb_cnt gemerkt, die den kleinsten Abstand produziert.Dist _{FV_CbE} therefore represents the minimum Euclidean distance obtained. In the search for the smallest Euclidean distance the number Cb_cnt is noticed, which produces the smallest distance.

Zur Verdeutlichung wird ein beispielhafter Assembler-Code angeführt:

For clarity, an exemplary assembler code is given:

Das Vertrauensmaß für eine richtige Erkennung leitet sich aus der Streuung der zugrunde gelegten Basisdatenströme für einen Prototyp und dem Abstand des Quantisierungsvektors von deren Schwerpunkt her.The confidence measure for a correct recognition derives from the dispersion of the underlying basic data streams for a prototype and the distance of the quantization vector from its center of gravity.

10 demonstriert verschiedene Erkennungsfälle. Zur Vereinfachung ist die Darstellung für einen zweidimensionalen Feature-Vektor gewählt, damit die Methodik auf einem zweidimensionalen Blatt Papier dargestellt werden kann. In der Realität sind die Feature-Vektoren (38) typischerweise immer multidimensional. 10 demonstrates different detection cases. For simplicity, the representation for a two-dimensional feature vector is chosen so that the methodology can be displayed on a two-dimensional sheet of paper. In reality, the feature vectors ( 38 ) typically always multidimensional.

Es sind Schwerpunkte verschiedener Prototypen (41, 42, 43, 44) eingezeichnet. Im Code-Book (15) kann nun wie oben beschrieben, der halbe minimale Abstand dieser Prototypen eingespeichert sein. Dies wäre dann ein globaler Parameter, der für alle Prototypen gleichermaßen gültig wäre. Diese Entscheidung mit Hilfe des minimalen Abstands setzt aber voraus, dass die Streuungen der Elementargesten-Prototypen (41, 42, 43, 44) mehr oder weniger identisch sind. Ist die Feature-Extraktion optimal, so ist dies auch der Fall. Dies entspräche einem Kreis um jeden der Prototypen mit Gestendatenbank spezifischen Radius. They are focal points of different prototypes ( 41 . 42 . 43 . 44 ). In the code book ( 15 ) can now be stored as described above, half the minimum distance of these prototypes. This would then be a global parameter that would be equally valid for all prototypes. However, this decision using the minimum distance assumes that the scatters of the elementary gesture prototypes ( 41 . 42 . 43 . 44 ) are more or less identical. If the feature extraction is optimal, so is the case. This would correspond to a circle around each of the prototypes with gestational database specific radius.

Das lässt sich in der Realität aber nur selten erzielen. Eine Verbesserung der Erkennungsleistung lässt sich daher erzielen, wenn die Streubreite für den Prototypen jeder Elementargeste jeweils mitabgespeichert würde. Dies entspräche einem Kreis um jeden der Prototypen mit Prototyp spezifischen Radius. Der Nachteil ist ein Anstieg der Rechenleistung.This can only rarely be achieved in reality. An improvement of the recognition performance can therefore be achieved if the scattering width for the prototype of each elementary gesture was also stored in each case. This would be a circle around each prototype prototype-specific radius. The disadvantage is an increase in computing power.

Eine weitere Verbesserung der Erkennungsleistung lässt sich erzielen, wenn die Streubreite für den Prototypen jeder Elementargeste durch eine Ellipse modelliert wird. Hierzu müssen statt des Radius wie zuvor nun die Hauptachsendurchmesser der Streu-Ellipse und deren Verkippung gegen das Koordinatensystem abgespeichert werden. Der Nachteil ist ein weiterer, massiver Anstieg der Rechenleistung und des Speicherbedarfes.A further improvement in recognition performance can be achieved by modeling the scattering width for the prototype of each elementary gesture with an ellipse. For this purpose, instead of the radius, as before, the major axial diameters of the scattering ellipse and their tilting must be stored against the coordinate system. The disadvantage is another massive increase in computing power and memory requirements.

Natürlich lässt sich die Berechnung noch weiter verkomplizieren, was aber in der Regel den Aufwand nur massiv steigert und die Erkennerleistung nicht mehr wesentlich anhebt.Of course, the calculation can be further complicated, but usually only massively increases the effort and does not significantly increase the recognizer performance.

Kompakterkenner für mobile Geräte werden daher typischerweise auf die einfachste der beschriebenen Möglichkeiten zurückgreifen.Compact smartphones for mobile devices will therefore typically resort to the simplest of the options described.

Die Lage der durch die Emissionsberechnung ermittelten Gestenvektoren kann nun höchst unterschiedlich sein. So ist es denkbar, dass ein solcher Gestenvektor (46) zu weit von irgendeiner Geste entfernt liegt. Dieser Abstandsschwellwert kann beispielsweise der minimale halbe Prototypenabstand sein. Auch kann es sein, dass sich die Streubereiche der Prototypen (43, 42) überlappen und ein ermittelter Gestenvektor (45) in dem Überlappungsbereich liegt. In diesem Fall würde eine Hypothesenliste beide Prototypen mit unterschiedlichen Wahrscheinlichkeiten, da unterschiedlichen Abständen enthalten.The position of the gesture vectors determined by the emission calculation can now be very different. So it is conceivable that such a gesture vector ( 46 ) is too far from any gesture. This distance threshold may be, for example, the minimum half prototype spacing. It may also be that the scattering areas of the prototypes ( 43 . 42 ) and a determined gesture vector ( 45 ) lies in the overlap area. In this case, a list of hypotheses would contain both prototypes with different probabilities because of different distances.

Im besten Fall liegt der ermittelte Gestenvektor (48) im Streubereich (Schwellwertellipsoid) (47) eines einzigen Prototypen (41) der damit sicher erkannt wird.In the best case, the determined gesture vector ( 48 ) in the scattering range (threshold ellipsoid) ( 47 ) of a single prototype ( 41 ) is thus reliably detected.

Es ist denkbar, zur verbesserten Modellierung der Streubereiche einer einzelnen Geste, diese durch mehrere Prototypen mit zugehörigen Streubereichen zu modellieren. Es können also mehrere Prototypen dieselbe Geste darstellen. Die Gefahr dabei ist, dass sich Aufgrund der Aufteilung der Wahrscheinlichkeit einer Geste auf mehrere solcher Subprototypen die Wahrscheinlichkeit der einzelnen Subprototypengeste kleiner werden kann als die einer anderen Geste, deren Wahrscheinlichkeit kleiner war als die der Ursprungsgeste. Somit kann sich diese andere Geste möglicherweise fälschlich durchsetzen.It is conceivable, for improved modeling of the scattering regions of a single gesture, to model these by several prototypes with associated scattering regions. So several prototypes can represent the same gesture. The danger here is that, because of the distribution of the likelihood of a gesture on several such sub-prototypes, the probability of each sub-prototype gesture may become smaller than that of another gesture whose probability was less than that of the origin gesture. Thus, this other gesture may possibly be falsely enforced.

Ein wesentliches Problem stellt somit die Rechenleistung dar, die zur Verfügung gestellt werden muss, um die Elementargesten sicher zu erkennen. Dies soll noch ein wenig diskutiert werden:
Ein entscheidender Punkt ist, dass der Rechenaufwand mit Cb_anz·dim steigt.A major problem is thus the computing power that has to be made available to safely recognize the elementary gestures. This should be discussed a little bit more:
A crucial point is that the computational effort increases with Cb_anz · dim.

Bei einem nicht optimierten HMM-Erkenner beträgt die Anzahl der Assemblerbefehle, die ausgeführt werden müssen, um eine Vektorkomponente zu berechnen, ca. 8 Schritte.For a non-optimized HMM recognizer, the number of assembler instructions that must be performed to calculate a vector component is approximately 8 steps.

Die Anzahl A_Abst der notwendigen Assembler-Schritte zur Berechnung des Abstands eines einzelnen Code-Book-Eintrags (CbE) zu einem einzelnen Feature-Vektor (FV) wird in etwa wie folgt berechnet: A_Abst = FV_Dimension·8 + 8 The number A_Abst of the necessary assembler steps to calculate the distance of a single Code Book entry (CbE) to a single Feature Vector (FV) is calculated approximately as follows: A_Abst = FV_Dimension · 8 + 8

Dies führt zur Anzahl A_CB der Assembler-Schritte für die Ermittlung des Code-Book-Eintrags mit dem geringsten Abstand: A_CB = Cb_anz·(A_Abst) + 4 = Cb_anz·(FV_Dimension·8 + 8) + 4 This results in the number A_CB of assembler steps for finding the smallest-spaced code book entry: A_CB = Cb_anz · (A_Abst) + 4 = Cb_anz · (FV_Dimension · 8 + 8) + 4

Am Beispiel eines mittlerer HMM-Erkenners mit 50000 CbE (Code-Book-Einträgen = CB_anz) und 24 FV_Dimensionen (Feature-Vektor-Dimensionen = FV_Dimension) beträgt die Anzahl der Schritte: 50000·(24·8 + 8) + 4 ~ 10 Millionen Operationen pro Feature-Vektor (24) Using the example of a medium HMM recognizer with 50000 CbE (code book entries = CB_anz) and 24 FV_Dimensions (feature vector dimensions = FV_Dimension), the number of steps is: 50000 · (24 · 8 + 8) + 4 ~ 10 million operations per feature vector (24)

Bei einer relativ niedrigen Abtastrate von 8 kHz = 8000 FV pro Sekunde (Feature-Vektoren (24) pro Sekunde) benötigt man schon eine Rechenleistung von 8 GIpS (8 Milliarden Instruktionen pro Sekunde).At a relatively low sampling rate of 8 kHz = 8000 FV per second (feature vectors ( 24 ) per second) you already need a computing power of 8 GIpS (8 billion instructions per second).

Insbesondere für energieautarke und/oder mobile Systeme führt dieses einfache Beispiel schon zu einem Ressourcenverbrauch, der nicht tragbar ist.Especially for energy self-sufficient and / or mobile systems, this simple example already leads to resource consumption, which is not sustainable.

Im Falle eines optimierten HMM-Erkenners wird daher, wie bereits erwähnt, der kleinste Code-Book-Abstand zwischen zwei Prototypen vorberechnet und im Code-Book (15) abgelegt. Dies hat den Vorteil, dass die Suche dann abgebrochen werden kann, wenn ein Abstand gefunden wurde, der kleiner als die Hälfte dieses minimalen Code-Book-Abstands ist. Damit halbiert sich die mittlere Suchzeit. Weitere Optimierungen können vorgenommen werden, wenn das Code-Book nach dem statistischen Vorkommen der Elemantargesten sortiert wird. Hierdurch kann sichergestellt werden, dass die häufigsten Elementargesten sofort gefunden werden, was die Rechenzeit weiter verkürzt.In the case of an optimized HMM recognizer, therefore, as already mentioned, the smallest code book space between two prototypes is precalculated and stored in the code book ( 15 ) filed. This has the advantage that the search can be aborted if a distance is found that is less than half of this minimum code book distance. This halves the mean search time. Further optimizations can be made if the code book is sorted by the statistical occurrence of the elementary gestures. This ensures that the most common elementary gestures are found immediately, further reducing computation time.

Für einen dermaßen optimierten HMM-Erkenner stellt sich der Rechenleistungsbedarf nun wie folgt dar:
Wieder sind es 8 Schritte zur Berechnung einer Vektor-Komponente. Die Schritte zur Berechnung des Abstands A_Abst eines Code-Book-Eintrags (CbE) zum Feature-Vektor (FV) sind wieder: A_Abst = FV_Dimension·8 + 8 For a HMM recognizer optimized in this way, the computing power requirement is as follows:
Again there are 8 steps to calculate a vector component. The steps for calculating the distance A_Abst of a code book entry (CbE) to the feature vector (FV) are again: A_Abst = FV_Dimension · 8 + 8

Die Anzahl der Schritte für die Ermittlung des Code-Book-Eintrags mit dem geringsten Abstand A_CB mit Optimierung ist ein wenig höher: A_CB = Cb_anz·(A_Abst) + 4 = Cb_anz·(FV_Dimension·8 + 10) + 4 The number of steps for finding the code book entry with the least distance A_CB with optimization is a little higher: A_CB = Cb_anz · (A_Abst) + 4 = Cb_anz · (FV_Dimension · 8 + 10) + 4

Die zwei zusätzlichen Assembler-Befehle sind notwendig, um zu prüfen, ob sich der Abstand unterhalb des halben kleinsten Code-Book-Eintrag-Abstands befindet.The two additional assembler instructions are necessary to check if the distance is below half the smallest code book entry distance.

Des Weiteren wird die Anzahl CB_Anz der Code-Book-Einträge für mobile und energieautarke Applikationen auf 4000 Code-Book Einträge oder noch geringer beschränkt.Furthermore, the number CB_Anz of the code book entries for mobile and energy self-sufficient applications is limited to 4000 code book entries or even lower.

Außerdem wird die Anzahl der Feature-Vektoren pro Sekunde durch Absenkung der Abtastrate abgesenkt.In addition, the number of feature vectors per second is lowered by lowering the sampling rate.

Dies wird an einem einfachen Beispiel erläutert:
Der besagte mittlere HMM-Erkenner wird nun mit einem Code-Book mit nur noch einem knappen Zehntel der Einträge z. B. mit 4000 CbE und weiterhin mit 24 FV-Dimensionen betrieben.This is explained by a simple example:
The said mean HMM recognizer is now with a code book with only a tenth of the entries z. B. operated with 4000 CbE and continue with 24 FV dimensions.

Die Anzahl der Schritte beträgt nun 4000·(24·8 + 10) + 4 ~ 808004 Operationen pro Feature-Vektor The number of steps is now 4000 · (24 · 8 + 10) + 4 ~ 808004 operations per feature vector

Bei einer Absenkung der Feature-Vektor-Rate auf 100 FV pro Sek (Feature-Vektoren pro Sekunde) extrahiert aus 10 ms Parameter-Fenstern über beispielsweise je 80 Samples und Abbruch der Suche wenn Abstand unterhalb des halben kleinsten Code-Book-Eintrag-Abstands liegt, ergibt sich mindestens eine Halbierung des Aufwands bei geeigneter Code-Book-Sortierung.If the feature vector rate is reduced to 100 FV per second (feature vectors per second), then 10ms will extract parameter windows for each 80 samples, for example, and abort the search if distance is less than half the smallest code book entry distance is at least a halving of the effort with appropriate code book sorting.

Die erforderliche Rechenleistung sinkt dann auf < 33 DSP-Mips (33 Millionen Operationen pro Sekunde). In der Realität führt die Code-Book-Sortierung zu noch geringeren Rechenleistungsbedarfen von beispielsweise 30 Mips. Damit wird das System real-time fähig und in ein einzelnes IC integrierbar.The required computing power then drops to <33 DSP-Mips (33 million operations per second). In reality, the code book sorting leads to even lower computing power requirements of, for example, 30 Mips. This makes the system real-time capable and integrable into a single IC.

Durch Vorselektion kann der Suchraum eingeschränkt werden. Eine Voraussetzung ist dabei eine Gleichverteilung der Daten = Schwerpunkte der Quadranten im geometrischen Quadrantenzentrum.By preselecting the search space can be restricted. A prerequisite is an equal distribution of the data = focal points of the quadrants in the geometric quadrant center.

Die notwendige Verringerung des Code-Books (15) hat Vor- und Nachteile:
Eine Verringerung des Code-Books (15) erhöht sowohl die False-Acceptance-Rate (FAR), also die falschen Gesten, die als Gesten akzeptiert werden, als auch die False-Rejection-Rate (FRR), also die richtig ausgeführten Gesten, die nicht erkannt werden.The necessary reduction of the code book ( 15 ) has advantages and disadvantages:
A reduction in the code book ( 15 ) increases both the False Acceptance Rate (FAR), the wrong gestures that are accepted as gestures, and the False Rejection Rate (FRR), which are the correct gestures that are not recognized.

Auf der anderen Seite verringert sich hierdurch der Ressourcenbedarf (Rechenleistung, Chip-Flächen, Speicher, Stromverbrauch etc.). On the other hand, this reduces the resource requirements (computing power, chip areas, memory, power consumption, etc.).

Darüber hinaus kann die Vorgeschichte, d. h. die zuvor erkannten Gesten, bei der Hypothesenbildung herangezogen werden. Ein geeignetes Modell hierfür ist das so genannte Hidden-Markov-Model (HMM).In addition, the history, d. H. the previously recognized gestures used in hypothesis formation. A suitable model for this is the so-called Hidden Markov Model (HMM).

Für jeden Gestenprototypen lässt sich somit ein Vertrauensmaß und ein Abstand zum gemessenen Quantisierungsvektor herleiten, die vom System zum einen direkt verwandt werden können (21), zum anderen aber auch weiterverarbeitet werden können. Auch ist es sinnvoll, jeweils eine Hypothesenliste für die erkannten Prototypen jedes Frames auszugeben, die beispielsweise die zehn wahrscheinlichsten Gestenprototypen dieses Frames mit der jeweiligen Wahrscheinlichkeit und Zuverlässigkeit der Erkennung enthält.For each gesture prototype a confidence measure and a distance to the measured quantization vector can be derived, which can be directly used by the system ( 21 ), but also can be further processed. It is also expedient to output in each case a list of hypotheses for the identified prototypes of each frame, which contains, for example, the ten most probable gesture prototypes of this frame with the respective probability and reliability of the recognition.

Mit diesen Daten (21) können auch mit dem jeweiligen Prototypen korrelierte Anweisungen oder ausführbarer Code an das besagte Nutzrechnersystem (z. B. ein Mobiltelefon) ausgegeben werden, die beispielsweise mit in der Prototypendatenbasis (21) oder sonst wie zuvor abgelegt wurden.With this data ( 21 ), statements or executable code correlated with the respective prototype can also be output to the said user computer system (for example a mobile telephone), which can be included in the prototype database (for example). 21 ) or otherwise filed as before.

Da nicht jede zeitliche wie räumliche Gestensequenz sinnvoll ist, ist es möglich, die zeitliche Aufeinanderfolge der Hypothesenlisten der aufeinanderfolgenden Frames auszuwerten.Since not every temporal as well as spatial gesture sequence makes sense, it is possible to evaluate the time sequence of the hypothesis lists of the successive frames.

Hierbei ist der Elementargesten-Sequenzpfad durch die aufeinanderfolgenden Hypothesenlisten zu finden, der die höchste Wahrscheinlichkeit hat.Here, the elementary gesture sequence path is found through the successive hypothesis lists that has the highest probability.

Auch hierbei werden wiederum mindestens zwei Erkennungen geleistet:

1. Ist die wahrscheinlichste Elementargestensequenz eine der bereits eingespeicherten Sequenzprototypen oder nicht und mit welcher Wahrscheinlichkeit und Zuverlässigkeit?
2. Wenn sie eine der bereits gespeicherten Sequenzen ist, welche ist es und mit welcher Wahrscheinlichkeit und Zuverlässigkeit?

Again, at least two detections are made:

1. Is the most probable elementary gesture sequence one of the already stored sequence prototypes or not and with what probability and reliability?
2. If it is one of the sequences already stored, what is it and with what probability and reliability?

Hierzu kann zum einen durch ein Anlernprogramm (20) ein Sequenzprototyp vollautomatisch in ein Sequenzlexikon (16) eingegeben werden, zum anderen manuell über ein Type-In-Tool (19), das die Eingabe dieser Sequenzprototypen über die Tastatur ermöglicht. Auch ist es denkbar, bei geeigneter Standardisierung Einzelgestensequenzen beispielsweise über das Internet herunter zu laden.For this purpose, on the one hand by a learning program ( 20 ) a sequence prototype fully automatically into a sequence lexicon ( 16 ) and manually via a Type-In-Tool ( 19 ), which allows you to enter these sequence prototypes from the keyboard. It is also conceivable to download single-gesture sequences, for example via the Internet, with suitable standardization.

Mittels eines Viterbi-Dekoders (13), kann aus der Abfolge der Hypothesenlisten die wahrscheinlichste der vordefinierten Sequenzen für eine Folge von Elementargesten, genauer der erkannten Gestenprototypen, ermittelt werden. Die gilt insbesondere auch dann, wenn einzelne Elementargesten aufgrund von Messfehlern falsch erkannt wurden. Insofern ist eine Übernahme von Abfolgen von Gestenhypothesenlisten (39) aus der Emissionsberechnung (12), wie oben beschrieben, durch die Viterbi-Sequenzerkennung (13) sehr sinnvoll. Das Ergebnis ist die als wahrscheinlichste erkannte Elementargestensequenz (22) oder analog zur Emissionsberechnung einen Sequenzhypothesenliste. Auch hier können an die als wahrscheinlichste erkannte Gestensequenz in der Sequenzdatenbank (16) oder sonst wie Anweisungen an die besagte Rechnereinheit, beispielsweise ein Mobiltelefon gekoppelt sein, was zu geschehen hat. Gegebenenfalls erfolgt beispielsweise eine Warnmeldung an Nutzer.By means of a Viterbi decoder ( 13 ), the sequence of hypothesis lists can be used to determine the most probable of the predefined sequences for a sequence of elementary gestures, more specifically of the identified gesture prototypes. This applies in particular even if individual elementary gestures were detected incorrectly due to measurement errors. In this respect, a takeover of sequences of gesture hypothesis lists ( 39 ) from the emission calculation ( 12 ), as described above, by the Viterbi sequence recognition ( 13 ) very useful. The result is the most likely recognized elementary gesture sequence ( 22 ) or, analogously to the emission calculation, a sequence hypothesis list. Again, the most likely recognized gesture sequence in the sequence database ( 16 ) or otherwise like instructions to the said computer unit, for example a mobile telephone, which has to be done. If necessary, for example, a warning message to users.

Als letztes betrachten wir die funktionalen Softwarekomponenten der Gestensequenz-Erkennungs-Maschine. Diese ist im in 1 als Viterbi-Suche (13) eingetragen. Diese Suche (13) greift auf das Sequenzlexikon (16) zu. Das Sequenzlexikon (16) wird zum einen durch ein an Anlerntool (20) und zum anderen durch ein Werkzeug (19) gespeist, bei denen diese Elementargestensequenzen durch textuelle der Eingabe festgelegt werden können Die Möglichkeit eines Downloads wurde ja bereits erwähnt.Lastly, consider the functional software components of the gesture sequence recognition engine. This is in the 1 as a Viterbi search ( 13 ). This search ( 13 ) accesses the sequence lexicon ( 16 ) too. The sequence dictionary ( 16 ) by a learning tool ( 20 ) and by a tool ( 19 ), where these elementary gesture sequences can be determined by textual input The possibility of a download has already been mentioned.

Basis der Sequenzerkennung (19) ist das Hidden-Markov-Modell. Das Modell baut sich aus verschiedenen Zuständen auf. In dem in 23 angegebenen Beispiel werden diese Zustände durch nummerierte Kreise symbolisiert. In dem besagten Beispiel in 23 sind die Kreise von 1 bis 6 nummeriert. Zwischen den Zuständen bestehen Übergänge. Diese Übergänge werden in der 23 mit dem Buchstaben a und zwei Indizes i, j bezeichnet. Der erste Index i bezeichnet die Nummer des Ausgangsknotens der zweite Index j die Nummer des Zielknotens. Neben den Übergängen zwischen zwei verschiedenen Knoten bestehen auch Übergänge a_ii oder a_jj die wieder auf den Startknoten zurückführen. Darüber hinaus gibt es Übergänge, die es ermöglichen, Knoten zu überspringen aus der Sequenz ergibt sich somit jeweils eine Wahrscheinlichkeit, eine k-te Observable b_k tatsächlich zu beobachten. Somit ergeben sich Folgen von Observablen, die mit vorberechenbaren Wahrscheinlichkeiten b_k zu beobachten sind.Basis of the sequence recognition ( 19 ) is the hidden Markov model. The model is made up of different states. In the in 23 In the example given, these states are symbolized by numbered circles. In the said example in 23 the circles are numbered from 1 to 6. There are transitions between the states. These transitions are in the 23 denoted by the letter a and two indices i, j. The first index i denotes the number of the source node, the second index j the number of the destination node. In addition to the transitions between two different nodes, there are also transitions a _ii or a _jj which lead back to the start node. In addition, there are transitions that allow nodes to jump out of the sequence, thus giving each one probability to actually observe a k-th observable b _k . This results in sequences of observables observable with precomputable probabilities b _k .

Wichtig ist dabei, dass ein jedes Hidden-Markov-Modell aus nicht beobachtbaren Zuständen qⁱ besteht. Zwischen zwei Zuständen qⁱ und q^j besteht die Übergangswahrscheinlichkeit Somit lässt sich die Wahrscheinlichkeit p für den Übergang von qⁱ auf q^j schreiben als: p(q j / n|q i / n-1) ≡ a_ij It is important that each hidden Markov model consists of unobservable states q ⁱ . The transition probability exists between two states q ⁱ and q ^j. Thus, the probability p for the transition from q ⁱ to q ^{j can be} written as: p (qj / n | qi / n-1) ≡ a _ij

Dabei steht n für einen diskreten Zeitpunkt. Der Übergang findet also zwischen dem Schritt n mit Zustand qⁱ und dem Schritt n + 1 mit dem Zustand q^j statt.Where n stands for a discrete time. The transition thus takes place between the step n with state q ⁱ and the step n + 1 with the state q ^j .

Die Emissionsverteilung b_i(Ge) hängt vom Zustand qⁱ ab. Wie bereits erläutert ist dies die Wahrscheinlichkeit, die Elementargeste Ge (die Observable) zu beobachten, wenn sich das System (Hidden-Markov-Modell) im Zustand qⁱ befindet: p(Ge|qⁱ) ≡ b_i(Ge) The emission distribution b _i (Ge) depends on the state q ⁱ . As already explained, this is the probability of observing the elementary gesture Ge (the observable) when the system (hidden Markov model) is in the state q ⁱ : p (Ge | q ⁱ⁾ ≡ _i b (Ge)

Um das System starten zu können, müssen die Anfangszustände festgelegt sein. Dies geschieht durch einen Wahrscheinlichkeitsvektor π_i. Danach lässt sich angeben, dass ein Zustand qⁱ mit der Wahrscheinlichkeit π_i ein Anfangszustand ist: p(qⁱ ₁) ≡ π_i To start the system, the initial states must be set. This is done by a probability vector π _i . It can then be stated that a state q ⁱ with the probability π _{i is} an initial state: p (q ⁱ ₁ ) ≡ π _i

Wichtig ist, dass für jede Gestensequenz ein neues Modell erstellt werden muss. In einem Modell M soll die Beobachtungswahrscheinlichkeit für eine Beobachtungssequenz von Elementargesten Ge → = (Ge₁, Ge₂, ...Ge_N) bestimmt werdenThe important thing is that a new model must be created for each gesture sequence. In a model M, the observation probability for an observation sequence of elementary gestures Ge → = (Ge ₁ , Ge ₂ , ... Ge _N ) be determined

Dies entspricht einer nicht unmittelbar beobachtbaren Zustandssequenz, die der folgenden Sequenz entspricht: Q → = (q₁, q₂, ...q_N) This corresponds to a state sequence that is not directly observable and corresponds to the following sequence: Q → = (q ₁ , q ₂ , ... q _N )

Die von dem Modell M, der Zustandssequenz Q und der Beobachtungssequenz Ge abhängige Wahrscheinlichkeit p, diese Zustandssequenz Q zu beobachten, ist:

The probability p dependent on the model M, the state sequence Q and the observation sequence Ge to observe this state sequence Q is:

Damit ergibt sich als Wahrscheinlichkeit einer Zustandssequenz Q → = (q₁, q₂, ...q_N) im Modell M:

This results in the probability of a state sequence Q → = (q ₁ , q ₂ ,... Q _N ) in the model M:

Somit ergibt sich als Wahrscheinlichkeit für die Erkennung eines Gestenwortes gleich einer Gestensequenz:

Thus, the probability of recognizing a gesture word is equal to a gesture sequence:

Dabei erfolgt die Bestimmung des wahrscheinlichsten Wortmodells (Gestensequenz) für die beobachtete Emission Ge durch diese Summation der Einzelwahrscheinlichkeiten über alle möglichen Pfade Q_k, die zu dieser beobachteten Gestensequenz Ge führen.The determination of the most probable word model is (gesture sequence) for the observed emission Ge by this summation of the individual probabilities over all possible paths Q _k, that result in the observed sequence gesture Ge.

Die Summierung über alle möglichen Pfade Q ist aufgrund des möglichen Rechenaufwands nicht unproblematisch. In der Regel wird daher sehr frühzeitig abgebrochen. Manche Erkenner nutzen nur den wahrscheinlichsten Pfad Q_k. Dies wird im folgenden diskutiert.The summation over all possible paths Q is not unproblematic due to the possible computational effort. As a rule, it is therefore aborted very early. Some recognizers use only the most probable path Q _k . This will be discussed below.

Die Berechnung erfolgt durch rekursive Berechnung. Die Wahrscheinlichkeit a_n(i) zum Zeitpunkt n das System im Zustand qⁱ zu beobachten lässt sich wie folgt berechnen:

The calculation is done by recursive calculation. The probability a _n (i) at time n to observe the system in the state q ⁱ can be calculated as follows:

Hierbei wird über alle S möglichen Pfade summiert, die in den Zustand qⁱ⁺¹ hineinführenThis sums over all S possible paths that lead into the state q ^{i + 1}

Es wird dabei angenommen, dass die Gesamtwahrscheinlichkeit in den Zustand q¹ _n+i zu gelangen vom besten Pfad dominiert wird. Dann kann die Summe mit geringem Fehler vereinfacht werden.It is assumed that the overall probability of entering the state q ¹ _{n + i is} dominated by the best path. Then the sum can be simplified with little error.

Durch Rückverfolgung vom letzten Zustand aus erhält man nun den besten Pfad.Tracing back from the last state gives you the best path.

Die Wahrscheinlichkeit dieses Pfades ist ein Produkt. Daher reduziert eine logarithmische Berechnung das Problem auf ein reines Summationsproblem. Dabei entspricht die Wahrscheinlichkeit für die Erkennung eines Wortes, was der Erkennung eines Modells M_j entspricht, der Bestimmung des wahrscheinlichsten Wortmodells für die beobachtete Emission X. Diese erfolgt nun ausschließlich über den besten möglichen Pfad Q_best

The probability of this path is a product. Therefore, a logarithmic calculation reduces the problem to a pure summation problem. The probability for the recognition of a word, which corresponds to the recognition of a model M _j, corresponds to the determination of the most probable word model for the observed emission X. This now takes place exclusively via the best possible path Q _best

Diese wird damit zuThis becomes so

Es ist nun von besonderer Bedeutung dass das Prototype Book (15) nur Elementargesten enthält.It is of particular importance that the Prototype Book ( 15 ) contains only elementary gestures.

Die 12 bis 22 zeigen einige der möglichen beispielhaften Elementargesten über einem skizierten Smart-Phon als beispielhaftes Gerät, die mit einer erfindungsgemäßen mechanischen Vorrichtung dargestellt werden können.The 12 to 22 show some of the possible exemplary elemental gestures over a skinned smart phone as an exemplary device that can be represented with a mechanical device according to the invention.

Die Figuren zeigen Gesten, die mit einer Modellhand durch die erfindungsgemäße Vorrichtung ausgeführt werden. Diese können selbstverständlich auch zu Zweihand-Gesten zweiter Modellhände kombiniert werden. Es ist sicherlich auch denkbar, diese Gesten mit Modellen anderer Körperteile auszuführen. Beispielsweise könnte die kreisende Bewegung der Modellhand über dem Smart-Phon einer kreisenden Bewegung eines Modellfußes vor einem Kofferraum entsprechen. Das Erkennungssystem wäre dann nach unten gerichtet, und würde bei Erkennen dieser simulierten Fuß-Geste in Kombination mit anderen Faktoren, beispielsweise der Nähe des richtigen Autoschlüssels, die Heckklappe öffnen.The figures show gestures that are performed with a model hand by the device according to the invention. These can of course also be combined to two-handed gestures second model hands. It is certainly also possible to carry out these gestures with models of other body parts. For example, the circular motion of the model hand over the smart phone could correspond to a circular motion of a model foot in front of a trunk. The recognition system would then face down and, upon recognition of this simulated foot gesture in combination with other factors, such as the proximity of the proper car key, would open the tailgate.

Zur Vereinfachung können die durch die erfindungsgemäße Vorrichtung darstellbaren Gesten in verschiedene Hierarchien zerlegt werden. Zunächst sind da die Orientierungen der Modellhand bzw. des Objekts. Sodann gibt es die Übergänge zwischen diesen Orientierungen. Dies sind typischerweise Drehungen, es folgen die Positionierungen im Raum und deren Übergange, die Hand- oder Objektbahnen im Raum über dem Gerät entsprechen. Die erfindungsgemäße Vorrichtung führt diese Positionierungen, Orientierungen und Bewegungen aus.To simplify the representable by the inventive device gestures can be broken down into different hierarchies. First, there are the orientations of the model hand or the object. Then there are the transitions between these orientations. These are typically rotations, followed by the positioning in space and its transitions, which correspond to hand or object trajectories in the space above the device. The device according to the invention performs these positioning, orientations and movements.

Wenden wir uns zunächst den elementaren Objektorientierungen zu. Diese werden wie die Bewegungen im Prototype-Book (15) abgelegt. In erster Näherung können wir beispielsweise folgende Orientierungen einer Modellhand der erfindungsgemäßen Vorrichtung angeben. Für ein Objekt ist eine analoge Grundorientierung zuerst festzulegen, um eine analoge Orientierung jeweils zu definieren.

a. Eine Handfläche der Modellhand oder Objektunterseite zum Erkenner
b. Eine Hand der Modellhand von der Seite mit kleinem Finger der Modellhand zum Erkenner (Objekt von der rechten Seite)
c. Ein Handrücken der Modellhand oder Objektoberseite zum Erkenner
d. Eine Modellhand von der Seite mit dem Daumen der Modellhand zum Erkenner (Objekt von der linken Seite)
e. Eine Modellhand oder Objekt in einer der folgenden Richtungen orientiert:
i. Nord,
ii. Nord Ost
iii. Ost
iv. Süd Ost
v. Süd
vi. Süd West
vii. West
viii. Nord West
f. Eine Modellhand gekippt mit der Spitze nach unten (Ein Objekt vorne nach unten gekippt)
g. Eine Modellhand gekippt mit der Spitze nach oben (Ein Objekt vorne nach oben gekippt)
h. Eine Modellhand vertikal mit Spitze nach oben (Objekt senkrecht nach oben)
i. Eine Modellhand vertikal mit Spitze nach unten (Objekt senkrecht nach unten)
j. Eine Modellhand abgeknickt (Objekt gedreht)
k. Nach rechts
l. Nach links
m. Die Finger einer Modellhand gespreizt
n. Die Anzahl der abgespreizten Finger einer Modellhand ist 1, 2, 3, 4, 5
o. Ein Finger (insbesondere Zeigefinger) einer Modellhand nach vorne

und so weiter. Hinsichtlich der letzten Punkte j, k, l und m hängt deren Anwendbarkeit auf ein anderes Objekt stark von dessen Fähigkeiten ab. Ggf. können bei einem Objekt ganz andere Eigenschaften wichtig werden. Sind mehrere Objekte/Modellhänd vorhanden, so sind auch Kombinationen dieser Elementarorientierungen möglich. Obige Auflistung gilt selbstverständlich auch für andere Objekte als Modellhände. Also beispielsweise für Modellfüße oder andere Signalisierungshilfen. Dies gilt auch für die folgenden Abschnitte.Let us first turn to elementary object orientations. These become like the movements in the Prototype Book ( 15 ) filed. In a first approximation, we can specify, for example, the following orientations of a model hand of the device according to the invention. For an object, a basic analog orientation must first be defined in order to define an analogous orientation in each case.

a. A palm of the model hand or object bottom to the recognizer
b. A hand of the model hand from the side with a small finger of the model hand to the recognizer (object from the right side)
c. One back of the model hand or object top to the recognizer
d. A model hand from the side with the thumb of the model hand to the recognizer (object from the left side)
e. A model hand or object oriented in one of the following directions:
i. Nord,
ii. North East
iii. east
iv. South east
v. south
vi. Southwest
vii. west
viii. North West
f. A model hand tilted with the tip down (one object tilted front down)
G. A model hand tilted with the tip up (one object tilted front up)
H. A model hand vertically with tip up (object vertically upwards)
i. A model hand vertical with point down (object vertically downwards)
j. A model hand kinked (object rotated)
k. To the right
l. To the left
m. The fingers of a model hand spread
n. The number of splayed fingers of a model hand is 1, 2, 3, 4, 5
o. A finger (especially index finger) of a model hand forward

and so on. With regard to the last points j, k, l and m, their applicability to another object depends strongly on their abilities. Possibly. In the case of an object, completely different properties can become important. If there are several objects / model hands, combinations of these elementary orientations are also possible. Of course, the above list also applies to other objects than model hands. So for example for model feet or other signaling aids. This also applies to the following sections.

Die Unterscheidbarkeit der Orientierungen hängt letztlich nur von der Anzahl der Sender T_Anz und Empfänger S_Anz ab. Die Anzahl der ohne Ableitung extrahierbaren Parameter P_Anz ist dann proportional zu P_Anz = T_Anz·S_Anz The distinctness of the orientations ultimately depends only on the number of transmitters T_Anz and receiver S_Anz. The number of extractable without derivation parameter P_Anz is then proportional to P_Anz = T_Anz · S_Anz

Dies setzt allerdings eine entsprechend umfangreiche Reglung voraus.However, this requires a correspondingly extensive regulation.

Die durch eine Modellhand oder ein Objekt ausgeführten Elementargesten lassen sich einteilen in

• solche ohne Objekt (Modellhand),
• solche mit Objekt (Modellhand) aber ohne Bewegung (z. B. keine Geste, verharren,),
• Änderung der Anzahl der Objekte (der Modellhände) vor dem Erkenner (keine, eine, zwei) typischerweise mit einer Raumorientierung (Eintritt von links, rechts, oben, unten und entsprechende Zwischenwerte wie diagonal oben etc.),
• solche, bei denen das Objekt (die Modellhand) selbst um eine von drei Raumachsen links oder rechtsherum gedreht wird (12, 13, 14), insbesondere Neuorientierungen des Objektes/der Modellhand (z. B. Handkante, Handinnenseite, Handoberseite der Modellhand zum Detektor),
• solche, bei denen das Objekt (die Modellhand) eine Kreis- oder Ellipsen- oder Hyperbelbewegungen links- oder rechtsherum auf einer Koordinatenrichtung eines Polarkoordinatensystems oder eines sphärischen Koordinatensystems oder eines anderen orthogonalen Koordinatensystems ausführt (21 und 22 zeigen kreisende Bewegungen. Die dritte Achsenorientierung entsprechend 21 ist nicht gezeigt, da lediglich das Smartphone gedreht wäre. Selbstverständlich ist die Beschränkung auf Kurvensegmente denkbar.),
• solche, bei denen das Objekt (die Modellhand) translatorisch vor oder zurück in eine der drei Raumrichtungen (x, y, z, und diagonale Zwischenwerte) bewegt wird (18, 19, 20)
• solche bei denen die Struktur der Modellhand (des Objekts) geändert wird (ein, zwei und mehr Finger vorstrecken, z. B. zählen, 15, wobei 15 zur Vereinfachung die Zahlen 4 und 5 nicht zeigt; oder Fingerspreizen oder -schließen, 17; oder z. B. Modellhand öffnen oder schließen 16)
• Kombinationen dieser Gesten oder Bewegungen (z. B. kreisende Bewegung mit Öffnen und Schließen einer Modellhand wobei die Modellhand immer an einem Pol des Kreises geschlossen und am anderen geöffnet ist.)

The elementary gestures executed by a model hand or an object can be divided into

• those without object (model hand),
• those with object (model hand) but without movement (eg no gesture, remain,),
Changing the number of objects (model hands) in front of the recognizer (none, one, two) typically with a spatial orientation (entry from left, right, up, down and corresponding intermediate values such as diagonally above, etc.),
• those in which the object (the model hand) itself is rotated by one of three spatial axes left or right ( 12 . 13 . 14 ), in particular reorientations of the object / the model hand (eg hand edge, palm of the hand, hand top of the model hand to the detector),
• those in which the object (the model hand) performs a circular or elliptical or hyperbola movement left or right around a coordinate direction of a polar coordinate system or a spherical coordinate system or another orthogonal coordinate system ( 21 and 22 show circular movements. The third axis orientation accordingly 21 is not shown because only the smartphone would be rotated. Of course, the restriction to curve segments is conceivable.),
• those in which the object (the model hand) is moved in a forward or backward direction in one of the three spatial directions (x, y, z, and diagonal intermediate values) ( 18 . 19 . 20 )
• those where the structure of the model's hand (the object) is changed (one, two or more fingers ahead, eg counting, 15 , in which 15 for simplicity, does not show the numbers 4 and 5; or finger spreads or closing, 17 ; or z. B. Open or close model hand 16 )
• Combinations of these gestures or movements (eg circular motion with opening and closing a model hand with the model hand always closed at one pole of the circle and open at the other).

Dabei gibt es zu jeder Geste immer eine inverse Geste. Diese hebt das Resultat einer zuvor durchgeführten Bewegung wieder im geometrischen Sinne auf. Eine beispielsweise zu einer Kreisbewegung im Uhrzeigersinn inverse Geste ist die Kreisbewegung gegen den Uhrzeigersinn, die inverse zum Fingerschließen, das Fingeröffnen der Modellhand (17) usw.There is always an inverse gesture to every gesture. This raises the result of a previously performed movement again in the geometric sense. For example, a counterclockwise circular motion in a clockwise circular motion is the counter-clockwise circular motion, the inverse of the finger-closing, the finger-opening of the model hand (FIG. 17 ) etc.

Wichtig ist, dass Kombinationen dieser Gesten denkbar sind. So kann beispielweise das Fingeröffnen und -schließen bezogen auf die Bewegungslinie der Finger in sechs Raumrichtungen (x, y, z, -x, -y, -z) erfolgen. Dabei kann die Hand in sechs Richtungen (x, y, z, -x, -y, -z) orientiert sein. (Diagonale Zwischenorientierungen sind selbstverständlich auch noch möglich.) Insgesamt kann die grundsätzliche Elementargeste also in 36 Konfigurationen ausgeführt werden. Diese können durch Drehungen ineinander überführt werden. Allerdings sind nicht alle Gesten gleich gut erkennbar. Im Falle eines Fingeröffnens der Modellhand in Z-Richtung, schattet der untere Finger den oberen ab. Solche Gesten wird ein Fachkundiger daher typischerweise nicht verwenden.It is important that combinations of these gestures are conceivable. Thus, for example, the finger opening and closing based on the line of movement of the fingers in six spatial directions (x, y, z, -x, -y, -z) take place. The hand can be oriented in six directions (x, y, z, -x, -y, -z). (Diagonal intermediate orientations are, of course, still possible.) Overall, the basic elementary gesture can therefore be executed in 36 configurations. These can be converted by rotations into each other. However, not all gestures are equally recognizable. In the case of finger opening of the model hand in the Z direction, the lower finger shadows the upper one. Such gestures will therefore typically not be used by a skilled person.

Eine Elementargestensequenz „Wischen in X-Richtung” (19) kann sich somit beispielsweise aus folgenden Aktionen zusammensetzen:

a) Elementargeste „Idle” = kein Objekt
b) Beginn Eintritt (= Modellhandobjekt erscheint am Rand des Messbereiches)
c) Eintrittsprozess (= Objekt ändert Größe und Abstand im Wesentlichen nicht, führt jedoch eine Translationsbewegung in X-Richtung aus, erscheint also zu unterschiedlichen Zeitpunkten an unterschiedlichen Orten)
d) Verharren (= Objekt ändert Größe und Abstand im Wesentlichen nicht, führt keine Translationsbewegung in X- und Y-Richtung aus, erscheint also zu unterschiedlichen Zeitpunkten an gleichen Orten) (= Ende Eintrittsprozess)
e) Wischen (= Objekt ändert Größe und Abstand im Wesentlichen nicht, führt jedoch eine Translationsbewegung in X-Richtung aus, erscheint also zu unterschiedlichen Zeitpunkten an unterschiedlichen Orten)
f) Verharren (= Objekt ändert Größe und Abstand im Wesentlichen nicht, führt keine Translationsbewegung in X- und Y-Richtung aus, erscheint also zu unterschiedlichen Zeitpunkten an gleichen Orten) (= Ende Wischen)
g) Austrittprozess (= Objekt ändert Größe und Abstand im Wesentlichen nicht, führt jedoch eine Translationsbewegung in X-Richtung aus, erscheint also zu unterschiedlichen Zeitpunkten an unterschiedlichen Orten)
h) Ende Eintritt (= Modellhandobjekt verschwindet am gegenüberliegenden Rand des Messbereiches)
i) Elementargeste „Idle” = kein Objekt

An elementary gesture sequence "Wipe in X-direction" ( 19 ) can thus be composed, for example, of the following actions:

a) Elementary gesture "Idle" = no object
b) Start of entry (= model hand object appears at the edge of the measuring range)
c) entry process (= object does not essentially change size and distance, but does translate in the X direction, so it appears at different times at different locations)
d) Persistence (= object does not essentially change size and distance, does not translate in the X and Y directions, so it appears at the same places at different times) (= end entry process)
e) wiping (= object essentially does not change size and distance, but does translate in the X direction, so it appears at different times at different locations)
f) Persistence (= object does not essentially change size and distance, does not translate in the X and Y directions, so it appears at the same places at different times) (= wipe end)
g) exit process (= object does not essentially change size and distance, but does translate in the X direction, so it appears at different times at different locations)
h) End of entry (= model hand object disappears at the opposite edge of the measuring range)
i) Elementary gesture "Idle" = no object

Ein solchermaßen definiertes „Gestenwort”, das aus den oben beschriebenen Elementargesten zusammengesetzt ist, kann durch neben dem beschriebenen kombinierten HMM-Viterbi-Erkenner (13) mit hoher Genauigkeit erkannt werden.Such a defined "gesture word", which is composed of the above-described elementary gestures, can by in addition to the described combined HMM-Viterbi recognizer ( 13 ) are detected with high accuracy.

Es sind nun grundsätzlich zwei Erkennungsszenarien möglich: Die Einzelgestenworterkennung (24) und die kontinuierliche Gestenworterkennung (25).There are now basically two detection scenarios possible: The individual gesture word recognition ( 24 ) and the continuous gesture word recognition ( 25 ).

Bei der Einzelgestenworterkennung wird durch ein anderes Ereignis das Steuersystem dazu veranlasst, die Gestenerkennung zu starten. Das kann beispielsweise ein Knopfdruck sein. Der Vorteil dieser Konstruktion ist, dass Rechenleistung und damit Energie gespart wird. Dies kann bei mobilen Geräten von Vorteil sein.In single-gesture recognition, another event causes the control system to start gesture recognition. This can be, for example, a push of a button. The advantage of this design is that computing power and therefore energy is saved. This can be beneficial for mobile devices.

Bei der kontinuierlichen Gestenworterkennung (25) ist der Gestenerkenner permanent aktiv. Er befindet sich in mindestens einem Ruhezustand, der der permanenten Erkennung der Geste „keine Geste” entspricht. Erfahrungsgemäß wird diese durch eine größere Menge von verschiedenen Prototypen repräsentiert. Darüber hinaus ist es noch möglich, den Gestenerkenner in einen speziellen Stromsparmodus zu versetzen. Beispielsweise ist es denkbar, dass ein energieautakes Gerät mit einer Solarzelle betrieben wird und der eigentliche Gestenerkenner abgeschaltet ist. Die Energieausbeute der Solarzelle kann als Sensorsignal (37, 1) genutzt werden. Der Erkenner springt dann aus dem Zustand „sleep” und den Zustand „wake up” Das bedeutet, dass die Transmitter und das übrige System gestartet werden, wodurch der Stromverbrauch massiv ansteigt. Wird keine erlaubte Elementargeste gefunden, so geht das System wieder in den Modus „sleep” über. Wird jedoch eine erlaubte Elementargeste gefunden (In 25 sind dies die Gesten „no gesture” und „bcd”, so geht das System in die jeweiligen Zustände über. Verbleibt das System zu lange im Status „no gesture” so geht es wieder in den Zustand „sleep” über, um Energie zu sparen. Solange Gesten, die innerhalb eines vorgegebenen Zeitraums aufeinanderfolgen, ausgeführt werden, kehrt das System immer wieder zum Zustand „no gesture” zurück. In vielen Fällen ist es sinnvoll, die Gestenworte immer mit einem speziellen Zustand „end gesture” abzuschließen, der die Gestenworte systemspezifisch abschließt.In continuous gesture word recognition ( 25 ) the gesture recognizer is permanently active. He is in at least one state of rest, which corresponds to the permanent recognition of the gesture "no gesture". Experience shows that this is represented by a larger amount of different prototypes. In addition, it is still possible to put the gesture recognizer in a special power saving mode. For example, it is conceivable that an energy-automatic device is operated with a solar cell and the actual gesture recognizer is turned off. The energy yield of the solar cell can be used as a sensor signal ( 37 . 1 ) be used. The recognizer then jumps out of the "sleep" state and the "wake up" state. This means that the transmitters and the rest of the system are started, which massively increases power consumption. If no permissible elementary gesture is found, the system returns to "sleep" mode. However, if an allowed elementary gesture is found (In 25 if these are the gestures "no gesture" and "bcd", the system changes to the respective states. If the system remains in the "no gesture" state for too long, it will return to the "sleep" state in order to save energy. As long as gestures that follow each other within a given period of time are executed, the system always returns to the state "no gesture". In many cases it makes sense to always conclude the gesture words with a special state "end gesture", which terminates the gesture words system-specifically.

Die Gestenworte, deren Erkennung oben beschrieben wurde, können nun zu komplexeren „Gestensätzen” zusammengesetzt werden.The gesture words whose recognition was described above can now be assembled into more complex "gesture sets".

Hierbei ist es zweckmäßig Gestenworte in Gestenwortklassen zusammenzufassen. Beispielhafte Klassen können beispielsweise Klassen sein, die Gestenworte zusammenfassen, die beispielsweise

• Subjekte
• Objekte (z. B. Gegenstände und Personen) oder
• Eigenschaften dieser Objekte oder
• Veränderungen der Eigenschaften oder
• Eigenschaften der Veränderungen selbst
• Verursacher dieser Veränderungen oder
• Verhältnisse dieser Objekte untereinander

bezeichnen.Here it is useful to summarize gesture words in gesture word classes. Exemplary classes may be, for example, classes that summarize gesture words, for example

• subjects
• objects (eg objects and persons) or
• Properties of these objects or
• changes in properties or
• Properties of the changes themselves
• cause of these changes or
• Relationships between these objects

describe.

Die Abfolge der Gestenworte kann in Abhängigkeit von der Gestenwortklasse definiert werden. Dies ermöglicht weiteren nachgeschalteten HMM-Viterbi-Erkennern trotz möglicherweise falscher Gestenworterkennung aus dem zuvor beschriebenen kombinierten HMM-Viterbi-Erkenner, den korrekten Inhalt einer derartig definierten Gestenbotschaft, einen Gestensatz, richtig zu erkennen. Da der Prozess analog wie bei der Erkennung der Elementargestensequenzen (Gestenworte) auf Basis einer Hypothesenliste (29) abläuft, wird er nicht weiter erläutert und ist auch in den Figuren zur Vereinfachung nicht mehr eingezeichnet.The sequence of gesture words can be defined depending on the gesture word class. This allows further downstream HMM-Viterbi recognizers, despite possibly incorrect gesture recognition from the previously described combined HMM-Viterbi recognizer, to correctly recognize the correct content of such a defined gesture message, gesture set. Since the process is analogous to the recognition of elementary gesture sequences (gesture words) on the basis of a list of hypotheses ( 29 ), it will not be further explained and is not shown in the figures for simplicity.

Eine derartig definierte Gestengrammatik erlaubt eine wesentlich verbesserte Übermittlung von kompakten Kommandosequenzen ähnlich einer Gestensprache. Such a defined gesture grammar allows a much improved transmission of compact command sequences similar to a gesture language.

Um eine derartige Gestensprache effizient zu gestalten, ist es sinnvoll die Übergänge zwischen den Gestenworten so zu gestalten, dass das Ende der vorausgehenden und der Beginn der nachfolgenden Geste aufeinander abgestimmt werden können.To make such a gesture language efficient, it makes sense to make the transitions between the gesture words so that the end of the preceding and the beginning of the subsequent gesture can be coordinated.

Gestenworte können Elementargesten und Elementargestensequenzen enthalten, die Abhängigkeiten zu anderen Gestenworten ausdrücken oder andere Information wir beispielsweise Zeitbezüge (wie Vergangenheit und Zukunft etc.) ausdrücken. Die Elementargestensequenzen können dabei auch nur aus einer Elementargeste bestehen.Gestural words may include elementary gestures and elementary gesture sequences that express dependencies on other gesture words, or other information such as time expressions (such as past and future, etc.). The elementary gesture sequences can consist of only one elementary gesture.

An dieser Stelle sei auf die vielfältigen Möglichkeiten der Linguistik verwiesen. Wichtig ist, dass diese Gesten durch die erfindungsgemäße Vorrichtung ausgeführt werden können.At this point, reference is made to the manifold possibilities of linguistics. It is important that these gestures can be performed by the device according to the invention.

Es ist nun von besonderer Wichtigkeit, dass die ausgeführten Gesten auch Konsequenzen haben und etwas bewirken. In der Regel wird über solche Gesten die Steuerung eines Rechners erfolgen. Hierbei kann es sich auch um den Rechner eines Roboters handeln. Da die Anwendungen in der Regel nicht vom Hersteller des Gerätes erstellt werden, ist es sinnvoll dass die Gesten oder Gesten-Sequenzen Text-Strings oder anderen Symbolketten zugeordnet sind oder zugeordnet werden können. Somit entsteht eine Verknüpfung zwischen den Gesten oder Gesten-Sequenzen auf der einen Seite und Aktionen der Aktoren des Rechner- und/oder Robotersystems auf der anderen Seite. Eine solche Übertragung einer Verknüpfung geschieht zweckmäßigerweise durch ein Datentransferprotokoll, beispielsweise ein http- oder XML-Protokoll. Diese Übermittlung erfordert die Standardisierung der Gesten mittels des erfindungsgemäßen Verfahrens und der erfindungsgemäßen Vorrichtung.It is now of particular importance that the executed gestures also have consequences and make a difference. As a rule, such gestures will be used to control a computer. This can also be the computer of a robot. Since the applications are usually not created by the manufacturer of the device, it makes sense that the gestures or gesture sequences are assigned or can be assigned to text strings or other symbol strings. This creates a link between the gestures or gesture sequences on the one hand and actions of the actuators of the computer and / or robot system on the other hand. Such a transfer of a link is expediently carried out by a data transfer protocol, for example an http or XML protocol. This transmission requires the standardization of the gestures by means of the method according to the invention and the device according to the invention.

Bei Verwendung eines Neuronalen-Netz-Erkenners (3) wird ein neuronales Netz (27) mit den ermittelten Quantisierungsvektoren (38) der Feature-Extraktion (11) gespeist. Ein neuronales Netz (27) muss aber zuvor mit Hilfe von Beispieldaten (18) und eines Trainingsverfahrens (28) konditioniert worden sein. Auch hier ist eine Rückkopplung des Erkennungsergebnisses (29) oder (38) in den Controller (8) denkbar. (8)When using a neural network recognizer ( 3 ) is a neural network ( 27 ) with the determined quantization vectors ( 38 ) of feature extraction ( 11 ). A neural network ( 27 ), but first has to do this with the help of sample data ( 18 ) and a training procedure ( 28 ). Again, a feedback of the recognition result ( 29 ) or ( 38 ) in the controller ( 8th ) conceivable. ( 8th )

Neuronale-Netz-Erkenner sind wesentlich kompakter und daher eher zu integrieren als HMM-Erkenner. Sie sind jedoch hinsichtlich ihrer Fähigkeit zur Unterscheidung von Gesten aufgrund des intern auftretenden Quantisierungsrauschens begrenzt. Auch ist die Empfindlichkeit gegenüber Störsignalen größer. Daher ist nicht zu empfehlen mehr als 20 Gesten einschließlich der parasitären Fälle mit einem Neuronalen-Netz unterscheiden zu wollen. Außerdem ergeben Neuronale-Netz-Erkenner in der Regel sprecher- und geräteabhängige Ergebnisse. Sie müssen vom jeweiligen Nutzer also angelernt werden, was die Einsatzfähigkeit sehr begrenzt.Neural network recognizers are much more compact and therefore more likely to integrate than HMM recognizers. However, they are limited in their ability to distinguish gestures due to internally occurring quantization noise. Also, the sensitivity to interfering signals is greater. Therefore, it is not recommended to distinguish more than 20 gestures including the parasitic cases with a neural network. In addition, neural network recognizers typically yield speaker and device dependent results. They have to be trained by the respective user, which limits the operational capability very much.

Diese Art von Erkenner-Systemen ist Stand der Technik und für die Mustererkennung seit langem bekannt.This type of recognizer systems has been known in the art and pattern recognition for a long time.

Um das System jedoch mit einem besonders effizienten Satz von Merkmalsvektoren versorgen zu können, ist es sinnvoll, den Transmitter (3) und den Kompensationstransmitter (9) mit einem optimalen Satz von Signalen anzusteuern. Dieser Gedanke der Rückkopplung eines Erkennerergebnisses in das physikalische Interface (23) ist neu.However, to be able to supply the system with a particularly efficient set of feature vectors, it makes sense to connect the transmitter ( 3 ) and the compensation transmitter ( 9 ) with an optimal set of signals. This thought of the feedback of a recognizer result into the physical interface ( 23 ) is new.

Hierzu kann der Viterbi-Erkenner (13) aufgrund des Sequence-Lexicons (16) eine Liste der wahrscheinlichen Gesten (Gn) herausgeben, die als nächstes folgen werden. Diese Information kann nun benutzt werden, um die Transmitter (3) so zu steuern, dass die erhaltenen Vektoren (24) eine optimale Gestenunterscheidung ermöglichen. (7) For this purpose, the Viterbi recognizer ( 13 ) due to the sequence lexicon ( 16 ) issue a list of probable gestures (Gn) that will follow next. This information can now be used to send the transmitters ( 3 ) so that the obtained vectors ( 24 ) allow an optimal gesture distinction. ( 7 )

Dieses Verfahren soll an einem besonderen Beispiel erläutert werden:
4 zeigt beispielhaft schematisch ein mobiles Gerät (Mobiltelefon) mit acht Kombinationen von Transmitterdiode (H1–H8), Empfängerdiode (D1–D8) und Kompensationsdiode (K1–K8).This method will be explained with a specific example:
4 shows by way of example schematically a mobile device (mobile phone) with eight combinations of transmitter diode (H1-H8), receiver diode (D1-D8) and compensation diode (K1-K8).

Der Transmitter sendet nach Erstellung der Prognose der wahrscheinlichsten nachfolgenden Gesten ein zeitliches und ggf. auch räumliches Muster, das zur Unterscheidung der wahrscheinlich zu erwartenden Alternativen am optimalsten geeignet ist.After generating the prediction of the most likely subsequent gestures, the transmitter sends a temporal and possibly spatial pattern that is most suitable for distinguishing the likely alternatives to be expected.

Dieses Muster kann mit Hilfe der Database (18) vorbestimmt berechnet werden. This pattern can be calculated using the Database ( 18 ) are calculated in a predetermined manner.

Die 27 bis 58 zeigen alle möglichen Aktivitätsmuster der verschiedenen Transmitter des beispielhaften Mobiltelefons aus 4.The 27 to 58 show all possible activity patterns of the various transmitters of the exemplary mobile phone 4 ,

Für den Fachmann ist es klar, dass unter der Voraussetzung dass eine lineare Addition der Aktivitätsmuster möglich ist, die Muster einen Hilbert-Raum bilden können. Somit können durch Orthogonalisierung der Muster einfache Basismuster erzeugt werden, aus denen sich alle anderen Muster zusammensetzen.It will be clear to those skilled in the art that provided that a linear addition of the activity patterns is possible, the patterns can form a Hilbert space. Thus, by orthogonalizing the patterns, simple basic patterns can be generated that make up all other patterns.

Diese Elementar-Aktivitätsmuster können im einfachsten Fall durch die Aktivität einer einzelnen Diode dargestellt werden.These elementary activity patterns can in the simplest case be represented by the activity of a single diode.

Durch die Bestrahlung mit einem optimierten Transmissionsmuster wird die Signifikanz der Quantisierungsvektoren maximiert.Irradiation with an optimized transmission pattern maximizes the significance of the quantization vectors.

Das letzte Problem ist die Ableitung der optimalen Transmissions-Sequenz.The last problem is the derivation of the optimal transmission sequence.

Hierfür wird der Database (18) in Abhängigkeit von der durch den Viterbi-Erkenner (13) als die wahrscheinlichste nächste Elementargeste (Gn) das optimale Stimulationsmuster für diese Geste zu deren Verifikation durch den Controller (8) entnommen. Daher muss bei der Erstellung dieser Datenbasis (18) bereits für jedes denkbare Muster eine Erkennungsperformance hinterlegt sein. Dies kann beispielsweise durch eine Matrix (LDA_B) geschehen.For this purpose, the database ( 18 ) depending on the by the Viterbi recognizer ( 13 ) as the most likely next elementary gesture (Gn) the optimal stimulation pattern for this gesture for its verification by the controller ( 8th ). Therefore, when creating this database ( 18 ) already have a recognition performance for every conceivable pattern. This can be done for example by a matrix (LDA_B).

Schließlich soll noch der Regelalgorithmus des Physical-Interfaces (23) anhand von 5 verdeutlicht werden, mit dem Transmitter (die Leuchtdiode H_i), Kompensationstransmitter (die Leuchtdiode K_i) und der Sensor (die Fotodiode D_i) miteinander verknüpft werden:
Jede der Transmitter H₁ bis H₈ sendet in Abhängigkeit vom gewünschten Transmissionsmuster ein Signal in jeweils einen Übertragungskanal (I1_i) zum zu vermessenden Objekt (O), hier typischerweise eine Hand. Dort wird das Licht reflektiert und über jeweils einen weiteren Übertragungskanal (I3_j) zu jeweils einer Empfängerdiode K₁ bis K₈ übertragen.Finally, the control algorithm of the physical interface ( 23 ) based on 5 be clarified with the transmitter (the light emitting diode H _i ), compensation transmitter (the light emitting diode K _i ) and the sensor (the photodiode D _i ) are linked together:
Depending on the desired transmission pattern, each of the transmitters H ₁ to H ₈ sends a signal into a respective transmission channel ( _I ₁ _i ) to the object (O) to be measured, here typically a hand. There, the light is reflected and transmitted via a respective further transmission channel (I3 _j ) to a respective receiver diode K ₁ to K ₈ .

Für jede Paarung aus Empfangsdiode D_i und Sendediode H_j wird ein Regler C_ij vorgesehen. Da dies zu einem großen Hardware-Aufwand führen würde, ist es sinnvoll in unserem Beispiel nur acht solcher Regler C_ij vorzusehen und immer nur eine Sendediode H_j zu betreiben. Das bedeutet, dass zwischen den Sendedioden H_i im Zeitmultiplexverfahren umgeschaltet wird. Die entsprechenden Feature-Vektoren werden daher typischerweise zwischengespeichert und zu einem in diesem Beispiel achtfach breiteren Feature-Vektor (24) zusammengesetzt und in diesem Beispiel mit einem Achtel der ursprünglichen Feature-Vektor-Frequenz an die Feature-Extraktion übergeben.For each pairing of receiving diode D _i and transmitting diode H _j , a controller C _{ij is} provided. Since this would lead to a large hardware outlay, it makes sense in our example to provide only eight such controllers C _ij and to operate only one transmission diode H _j at a time. This means that switching between the transmitting diodes H _i is time-division multiplexed. The corresponding feature vectors are therefore typically cached and become an eightfold wider feature vector in this example (FIG. 24 ) and, in this example, pass one-eighth of the original feature vector frequency to the feature extraction.

Jeder der Empfangsdioden D_i wird jeweils ein Signalgenerator G_i zugeordnet. Dieser erzeugt ein zu den anderen Generatoren orthogonales Signal. Die Orthogonalität wird im Folgenden näher erläutert.Each of the receiving diodes D _i is assigned a respective signal generator G _i . This generates a signal orthogonal to the other generators. The orthogonality is explained in more detail below.

Das Signal des Generators G_i, der der Empfängerdiode D_i zugeordnet ist, wird durch eine Orthogonalisierungseinheit in einen kurzen Anfangspuls S5_oi und ein nachlaufendes Signal S5_di derart aufgespalten, dass deren Summe wieder das Signal S5_i ergeben würde und dass diese Signale bezüglich des später erläuterten Skalar-Produktes orthogonal sind.The signal of the generator G _i , which is associated with the receiver diode D _i , is split by an orthogonalization unit into a short start pulse S5 _oi and a trailing signal S5 _di such that their sum would again give the signal S5 _i and that these signals with respect later explained scalar product are orthogonal.

Das Signal der Empfangsdiode D_i wird durch einen Vorverstärker verstärkt und mit dem Signal S5_di zum Signal S10_dij im Multiplizierer M_1ij multipliziert. Anschließend wird das Signal im Filter F_1ij gefiltert. Die vorausgegangene Multiplikation im Multiplizierer M1_ij und diese Filterung im Filter F1_ij definieren das Skalar-Produkt des Ausgangssignals des Empfängers D_1j mit dem Signal S5_di. Wenn von Orthogonalität weiterer Signale in dieser Beschreibung die Rede war, so war damit gemeint, dass diese zwei Signale miteinander auf diese Weise skalar-multipliziert null ergeben.The signal of the receiving _diode D _i is amplified by a preamplifier and multiplied by the signal S5 _di to the signal S10 _dij in the multiplier M _1ij . Subsequently, the signal is filtered in the filter F _1ij . The previous multiplication in the multiplier M1 _ij and this filtering in the filter F1 _ij define the scalar product of the output signal of the receiver D _1j with the signal S5 _di . When we talked about the orthogonality of other signals in this description, we meant that these two signals yielded zero scalar multiply in this way.

Anschließend wird das Filterausgangssignal nochmals durch den Verstärker V1_ij zum Signal S4_dij verstärkt. Es erfolgt die Rücktransformation durch erneute Multiplikation des Signals S4_dij mit dem Signal SS_di zum Signal S6_dij.Subsequently, the filter output signal is again amplified by the amplifier V1 _ij to the signal S4 _dij . The _inverse transformation takes place by renewed multiplication of the signal S4 _dij with the signal SS _di to the signal S6 _dij .

Die gleiche Operation wird mit dem anderen Signal S5_oi durchgeführt. Diese ergibt auf analoge Weise das Signal S6_oij.The same operation is performed with the other signal S5 _oi . This results in an analogous manner the signal S6 _oij .

Um nun das Kompensationssignal der Kompensationsdiode K_1j zu erhalten, werden die Signale S6_dij und S6_oij, für alle Generatorsignale i zusammenaddiert und nach Addition eines geeigneten Offsets B1_j der Kompensationsdiode K_1j zugeführt. In order to obtain the compensation _{signal of} the compensation _diode K _1j , the signals S6 _dij and S6 _{oij are added} together for all the generator signals i and supplied to the compensation _diode K _1j after addition of a suitable offset B1 _j .

Jeder Regler C_ij gibt in diesem Beispiel zwei Werte aus: A_ij und D_eij. Diese bilden in diesem Beispiel mit den sieben weiteren Paaren der anderen sieben aktiven Regler einen 16 dimensionalen analogen Vektor. Da durch den Zeitmultiplex immer nur ein Transmitter H_i aktiv ist, entstehen acht dieser Vektoren innerhalb eines Messzyklus, bei dem alle Transmitter H_i einmal aktiv sind. Diese werden wie beschrieben zwischengespeichert und zu einem 128 dimensionalen Feature-Vektor (24) zusammengesetzt. Dieser 128-dimensionale Feature-Vektor (24) bildet das Eingangssignal für die Feature-Extraktion (11). Da in der Regel mehrere Transmitter H_i und mehrere Sensoren D1_j vorhanden sind, kann die Anzahl der zu einem Kompensationssignal S6_j beitragenden Controller C_ij größer als eins sein. Dies ist in 5 dadurch angedeutet, dass das Signal des Sensors D1_j nach unten aus der Figur heraus geführt wird und auf der anderen Seite ein analoges Signal von unten in den Summationspunkt mit dem Signal S6o_ij geführt wird. Dies ist so zu verstehen, dass dort der nächste Controller C_kj angeschlossen wird, wobei k ≠ i ist. Dieser Controller C_kj wird dann selbstverständlich mit dem Signal S5_k analog zum Controller C_ij betrieben. Damit die beiden Controller C_ij und C_kj sich gegenseitig nicht stören, müssen die Signals S5_i und S5_k bezüglich des Skalar-Produkts durch Multiplikation im Multiplizierer M1_ij (bzw. M1_kj) und der anschließenden Filterung im Filter F1_ij (bzw. F1_kj) orthogonal sein.Each controller C _ij outputs two values in this example: A _ij and D _eij . These form in this example with the seven other pairs of the other seven active controls a 16-dimensional analog vector. Since only one transmitter H _{i is} active by the time division multiplex, eight of these vectors are generated within a measurement cycle in which all transmitters H _i are active once. These are cached as described and converted into a 128-dimensional feature vector ( 24 ). This 128-dimensional feature vector ( 24 ) forms the input signal for the feature extraction ( 11 ). Since a plurality of transmitters H _i and a plurality of sensors D 1 _j are generally present, the number of controllers C _ij contributing to a compensation signal S6 _j can be greater than one. This is in 5 indicated that the signal of the sensor D1 _{j is} guided down out of the figure out and on the other side an analog signal is fed from below into the summation point with the signal S6o _ij . This is to be understood as connecting the next controller C _kj , where k ≠ i. This controller C _kj is then of course operated with the signal S5 _k analogous to the controller C _ij . So that the two controllers C _ij and C _kj do not interfere with each other, the signals S5 _i and S5 _k must be multiplied in the multiplier M1 _ij (or M1 _kj ) by the multiplier M1 _ij (or M1 _kj ) and the subsequent filtering in the filter F1 _ij (resp. F1 _kj ) be orthogonal.

Die Kompensation durch die Kompensationssender K_j erfolgt im Medium und ist damit von solchen Problemen wie Sensorverschmutzung und -alterung weitestgehend unabhängig. Allerdings wird stets der Kompensationssender benötigt. Im Falle eines optischen Systems und einer LED als Kompensationssender K_j ist dies jedoch nicht unbedingt notwendig. Es ist auch denkbar, das System wie in 6 gezeigt aufzubauen. Hier ist – was auf 5 nicht eingezeichnet wurde, eine Basisbestromung der Fotodiode D1_i eingezeichnet. Diese wird durch den Widerstand R gewährleistet. Der Kompensationstransmitter ist die in 6 eingezeichnete Stromquelle K_j. Diese emuliert den Kompensationslichtstrom I2_j aus 5 indem sie einen Strom durch den Widerstand R erzwingt, der dem durch den Lichtstrom I2_j in 5 im Fotosensor D1_j erzeugten elektrischen Strom entspricht.The compensation by the compensation transmitter K _j takes place in the medium and is therefore largely independent of such problems as sensor contamination and aging. However, the compensation transmitter is always needed. In the case of an optical system and an LED as compensation transmitter K _j , however, this is not absolutely necessary. It is also possible to use the system as in 6 shown to build. Here's what's up 5 has not been drawn, a Basisbestromung the photodiode D1 _{i located} . This is ensured by the resistance R. The compensation transmitter is the in 6 drawn current source K _j . This emulates the compensation luminous flux I2 _j 5 by forcing a current through the resistor R, which is caused by the luminous flux I2 _j in 5 in the photosensor D1 _j generated electric current corresponds.

Da der Eingang der Sensoren D_j stets im gleichen Arbeitspunkt liegen soll, ist es hier erfahrungsgemäß sinnvoll ein Signal S5_i für ein i zu definieren, das ein Konstantes Signal ist. In diesem Fall ist der Wert des Kompensationsstromes K_i ein Gleichwert, des sich nur ändert, wenn der Gleichwert des durch Dj empfangenen Signals ändert. Typischerweise erhält man bei optischen Systemen hierdurch eine Gleichlichtkompensation. Daher wird der zugehörige Sender H_i für diesen Gleichwert typischerweise nicht betrieben und kann daher entfallen.Since the input of the sensors D _{j should} always be at the same operating point, it is useful in this case according to experience to define a signal S 5 _i for an i, which is a constant signal. In this case, the value of the compensation current K _{i is} an equivalent, which only changes when the equivalent value of the signal received by Dj changes. Typically, in optical systems this results in a DC compensation. Therefore, the associated transmitter H _i for this equivalent is typically not operated and can therefore be omitted.

Hinsichtlich der Kompensation sind somit beide Systeme gleich. Das System in 5 weist jedoch den Vorteil auf, dass Kompensationssignal I2j und Messsignal I1i, I3j den gleichen Einflüssen unterliegen und daher typischer Weise in gleicher Weise beeinflusst werden. Das System aus 5 ist daher technisch besser, das aus 6 ist das preiswertere, da es einen Sender weniger aufweist.As far as the compensation is concerned, both systems are the same. The system in 5 However, it has the advantage that compensation signal I2j and measurement signal I1i, I3j are subject to the same influences and are therefore typically influenced in the same way. The system off 5 is therefore technically better, that off 6 is the cheaper, since it has a transmitter less.

Bedenkt man das bisher gesagte, so wird klar, dass ein erfindungsgemäßes Messsystem vektoriell aufgebaut werden kann. Dies ist in 9 gezeichnet.Considering what has been said so far, it is clear that a measuring system according to the invention can be constructed vectorially. This is in 9 drawn.

Die Generator-Bank G erzeugt ein vektorielles Signal S5, dessen Komponenten die bereits bekannten S5_i-Signale sind. Aus diesen wird durch eine Logik L und einer vektoriellen Verzögerungseinheit Δt das vektorielle Signal S5o gebildet. Dessen Komponenten sind die bereits bekannten S5d_i und S5o_i.The generator bank G generates a vectorial signal S5 whose components are the already known S5 _i signals. From these, the vectorial signal S5o is formed by a logic L and a vectorial delay unit Δt. Its components are the already known S5d _i and S5o _i .

Das vektorielle Generatorsignal S5 wird sofern notwendig mit einem vektoriellen Offset B1 versehen und der Transmitterbank H zugeführt. Insofern können die acht Transmitter aus 4 als physikalische Repräsentation eines Vektors aufgefasst werden. Die Elemente dieses Transmitter-Vektors H sind die bereits bekannten Transmitter H_i. Jeder der Transmitter H_i des Transmitter-Vektors H sendet ein Signal in einen Übertragungskanal I1_i hinein. Diese I1_i bilden wiederum einen vektoriellen Übertragungskanal I1. Dieser führt zu mindestens einem Objekt O. (Die Diskussion mehrerer Objekte wird an dieser Stelle nicht geführt, da dies einem Fachmann ohne weiteres möglich ist.) Von diesem Objekt O führt je ein Übertragungskanal I3_j zu je einem Sensor D1_j. Die I3_j bilden den vektoriellen Übertragungskanal I3. Die D1j bilden die vektorielle Sensorbank D1. Deren vektorielles Ausgangssignal wird verstärkt und im Multiplizierer M mit jedem Signal des vektoriellen S5o Signals multipliziert. Es ist leicht erkennbar, dass genau an diesem Punkt das System hinsichtlich des notwendigen Aufwands sehr stark expandiert wird. Eine Reduktion auf eine Untermenge von Multiplikationen und/oder ein Zeitmultiplex scheinen daher hier angebracht. Insofern handelt es sich nicht um eine mathematisch ideale Multiplikation. Es ergibt sich somit das Signal vektorielle Signal S10 dessen Komponenten die bereits bekannten Signale S10o_ij und S10d_ij bilden. Diese werden jedes für sich in der vektoriellen Filterbank F gefiltert, wobei dieses Filter aus den bekannten Filtern F1_ij und F2_ij zusammengesetzt ist. Es folgt für jedes der gefilterten Signale in einem vektoriellen Verstärker V, der sich wieder aus den bekannten Verstärkern V1_ij und V2_ij aus 5 zusammensetzt. Das vektorielle Ausgangssignal S4 der vektoriellen Verstärkerbank V setzt sich aus den Signalen S4o_ij und S4d_ij aus 5 zusammen. Es erfolgt wieder die Multiplikation jedes Elementes S4o_ij und S4d_ij des Signalvektors S4 mit den zugehörigen Signalen S5d_i und S5o_i. Über alle i werden dann die sich ergebenden Signal S6d_ij und S6o_ij zur jeweiligen Komponente S6_j des vektoriellen Signals S6 summiert. Zu diesem wird ggf. bei Bedarf das vektorielle Bias-Signal B2 zum vektoriellen Kompensationssignal S3 addiert. Dieses treibt die vektorielle Kompensationstransmitterbank K. Diese strahlt über die vektorielle Übertragungsstrecke I2 in die vektorielle Sensorbank D wieder ein. Wie bereits besprochen, wird das System so gewählt, dass die Schwankungen des Signals über die Übertragungsstrecke I3 durch die Kompensation ausgeglichen werden.If necessary, the vector generator signal S5 is provided with a vectorial offset B1 and fed to the transmitter bank H. In this respect, the eight transmitters can 4 as a physical representation of a vector. The elements of this transmitter vector H are the already known transmitters H _i . Each of the transmitters H _{i of} the transmitter vector H transmits a signal into a transmission channel _I 1 _i . These I1 _i in turn form a vectorial transmission channel I1. This leads to at least one object O. (The discussion of several objects is not performed at this point, as this is readily possible for a person skilled in the art). From this object O, one transmission channel I3 _j leads to one sensor D1 _j each. The I3 _j form the vectorial transmission channel I3. The D1j form the vectorial sensor bank D1. Its vectorial output signal is amplified and multiplied in the multiplier M by each signal of the vectorial S5o signal. It is easy to see that it is at this point that the system is greatly expanded in terms of the effort required. A reduction to a subset of multiplications and / or a time multiplex seems therefore appropriate here. In this respect, it is not a mathematically ideal Multiplication. This results in the signal vectorial signal S10 whose components form the already known signals S10o _ij and S10d _ij . These are individually filtered in the vectorial filter bank F, this filter being composed of the known filters F1 _ij and F2 _ij . It follows for each of the filtered signals in a vectorial amplifier V, which again consists of the known amplifiers V1 _ij and V2 _ij 5 composed. The vectorial output signal S4 of the vectorial amplifier bank V is composed of the signals S4o _ij and S4d _ij 5 together. Again, the multiplication of each element S4o _ij and S4d _{ij of} the signal vector S4 with the associated signals S5d _i and S5o _{i takes place} . Over all i, the resulting signals S6d _ij and S6o _ij are then summed to the respective component S6 _{j of} the vectorial signal S6. For this, if necessary, the vectorial bias signal B2 is added to the vector compensation signal S3. This drives the vectorial compensation transmitter bank K. This radiates again into the vectorial sensor bank D via the vectorial transmission link I2. As already discussed, the system is chosen so that the fluctuations of the signal over the transmission path I3 are compensated by the compensation.

Das Erkennungsergebnis (S4) wird als Datenstrom an die Feature-Extraktion FE übergeben. Diese erzeugt einen Feature-Vektor pFV der hinsichtlich der Selektivität noch nicht maximiert ist. Dieser Vektor wird durch Multiplikation mit der LDA Matrix (LDA_F) (14) hinsichtlich der Selektivität maximiert. Das Ergebnis ist der Feature-Vektor FV (38). Aus diesem erzeugt die Emissionsberechnung EC (12) die Hypothesenliste HL (39). Mit Hilfe des Sequence-Lexicons SL (16) Erzeugt die Viterbi-Suche VS (13) hieraus die Liste der erkannten Gestensequenzen G (22). Gleichzeitig wird eine Prognose für die nächste erwartete Geste Gn erstellt. Dieser Vektor wird beispielsweise mit Hilfe der Illumination-Matrix (LDA_B) (40) durch Multiplikation modifiziert und zur Modifikation der Generatorsignale S5 verwendet. In 9 geschieht dies durch signalweise Multiplikation des Vektors der Generator Signale mit dem so erhaltenen Vektor.The recognition result (S4) is transferred as a data stream to the feature extraction FE. This generates a feature vector pFV which is not yet maximized in terms of selectivity. This vector is multiplied by the LDA matrix (LDA_F) ( 14 ) is maximized in terms of selectivity. The result is the feature vector FV ( 38 ). From this the emission calculation EC ( 12 ) the list of hypotheses HL ( 39 ). Using the Sequence Lexicon SL ( 16 ) Generates the Viterbi search VS ( 13 ) from this the list of recognized gesture sequences G ( 22 ). At the same time, a forecast is generated for the next expected gesture Gn. This vector is, for example, using the illumination matrix (LDA_B) ( 40 ) is modified by multiplication and used to modify the generator signals S5. In 9 This is done by signalwise multiplication of the vector of the generator signals with the vector thus obtained.

Vorteile der ErfindungAdvantages of the invention

Die Erfindung ermöglicht die geräteunabhängige Erfassung von Gestendatenbanken und deren Übertragung auf nahezu beliebige Hardwareplattformen. Hierdurch werden die Entwurfskosten massiv gesenkt. Insbesondere eine Gestennormung wird möglich.The invention enables the device-independent detection of gesture databases and their transmission to almost any hardware platforms. As a result, the design costs are massively reduced. In particular, a Gestormormung is possible.

Sie ist besonders für mobile Systeme und Internetnutzung geeignet.It is particularly suitable for mobile systems and Internet use.

Figurencharacters

1: Beispielhafter funktioneller Ablauf der sprecherunabhängigen Gesten- bzw. Gestensequenzerkennung mit HMM Erkennung 1 : Example Functional Process of Speaker Independent Gesture or Gesture Sequence Recognition with HMM Detection

2: Beispielhafter funktioneller Ablauf der sprecher- und geräteunabhängigen Gesten- bzw. Gestensequenzerkennung mit HMM Erkennung 2 : Exemplary Functional Process of Speaker and Device Independent Gesture or Gesture Sequence Detection with HMM Detection

3: Beispielhafter funktioneller Ablauf der sprecherabhängigen Gesten- bzw. Gestensequenzerkennung mit einem Neuronalen-Netz-Erkenner 3 Exemplary Functional Process of Speaker-Dependent Gesture Sequence Recognition with a Neural Network Recognizer

4: Positionierung der Transmitter (insbesondere Transmitterdioden) H_i, der Empfänger (insbesondere Empfängerdioden) D_j und der Kompensationssender (insbesondere Kompensationsdioden) D_j am Beispiel eines Mobiltelefons mit beispielhaften acht Transmitter, Kompensationssendern und Empfängern. 4 Positioning of the transmitters (in particular transmitter diodes) H _i , the receiver (in particular receiver diodes) D _j and the compensation transmitter (in particular compensation diodes) D _j using the example of a mobile telephone with exemplary eight transmitters, compensation transmitters and receivers.

5: Beispielhafter Regelkreis C_ij für die Kompensationsregelung des Kompensationssenders K_j 5 Exemplary control circuit C _ij for the compensation control of the compensation transmitter K _j

6: Beispielhafter Regelkreis C_ij für die Kompensationsregelung des Kompensationssenders K_j wobei der Kompensationssender eine Stromquelle ist. 6 : Exemplary control circuit C _ij for the compensation control of the compensation transmitter K _j wherein the compensation transmitter is a current source.

7: Beispielhafter funktioneller Ablauf der sprecher- und geräteunabhängigen Gesten- bzw. Gestensequenzerkennung mit HMM Erkennung mit Beeinflussung des Sendemusters der Transmitter in Abhängigkeit vom Erkennungsergebnis 7 : Exemplary Functional Sequence of Speaker and Device Independent Gesture or Gesture Sequence Recognition with HMM Detection with Influencing of the Transmit Pattern of the Transmitter Depending on the Detection Result

8: Beispielhafter funktioneller Ablauf der sprecherabhängigen Gesten- bzw. Gestensequenzerkennung mit einem Neuronalen-Netz-Erkenner mit Beeinflussung des Sendemusters der Transmitter in Abhängigkeit vom Erkennungsergebnis 8th Exemplary functional sequence of the speaker-dependent gesture or GestENquenzerkennung with a neural network recognizer with influence on the transmission pattern of the transmitter in dependence on the recognition result

9: Beispielhafter vektorieller Gesamtregelkreis für die Kompensationsregelung der Kompensationssender K und der Anpassung des Sendemusters der Transmitter T an das Erkennungsergebnis. 9 : Exemplary overall vector control circuit for the compensation control of the compensation transmitter K and the adaptation of the transmission pattern of the transmitter T to the recognition result.

10: Diagramm zur Verdeutlichung des Vorgangs der Emissionsberechnung 10 : Diagram to clarify the process of emission calculation

11: Schematische Darstellung zum Prinzip einer mechanisch automatisierten Kalibriervorrichtung zur reproduzierbaren Ausführung standardisierter Gesten 11 Schematic representation of the principle of a mechanically automated calibration device for the reproducible execution of standardized gestures

12: Drehen der Hand um die Längsachse 12 : Turning the hand around the longitudinal axis

13: Drehen der Hand um die Achse senkrecht zur Handfläche 13 : Turning the hand around the axis perpendicular to the palm

14: Drehen der Hand um die Achse quer zur Längsachse in der Handfläche 14 : Turning the hand around the axis transverse to the longitudinal axis in the palm

15: Abspreizen einer unterschiedlichen Anzahl von Fingern (0, 1, 2, 3) 4 und 5 Finger sind zur besseren Übersicht nicht gezeichnet. 15 : Spreading a different number of fingers (0, 1, 2, 3) 4 and 5 fingers are not drawn for clarity.

16: Ändern der Handform von der Faust zur flachen Hand 16 : Changing the hand shape from the fist to the flat hand

17: Finger spreizen 17 : Fingers spread

18: Auf- und Niederbewegen der Hand (Z-Bewegung) 18 : Moving up and down the hand (Z-movement)

19: Translationsbewegung quer zum Gerät (x-Bewegung) 19 : Translational movement across the device (x-movement)

20: Translationsbewegung parallel zum Gerät (y-Bewegung) 20 : Translational movement parallel to the device (y movement)

21: Kreisbewegung um die y-Achse über dem Gerät (x-Achse wäre analog) 21 : Circular motion around the y-axis above the device (x-axis would be analogous)

22: Kreisbewegung um die z-Achse über dem Gerät 22 : Circular motion around the z-axis above the device

23: HMM Modell 23 : HMM model

24: Einzelwort Erkennung (Einzelmodelle) 24 : Single word recognition (single models)

25: Continuous Gesture Recognition 25 : Continuous Gesture Recognition

26–57: beispielhafte mögliche Aktivitätsmuster Liste der Bezeichnungen Nr. Bezeichnung 1 Physical Value Stream/Strom physikalischer Größen 2 Interaction Region/Gebiet der Wechselwirkung 3 Erster Transmitter/Erster Sender 4 Erster Übertragungsstrecke 5 Wechselwirkung oder Modifikation 6 Zweite Übertragungsstrecke 7 Sensor oder Empfänger 8 Controller/Regler 9 Compensation Transmitter/Kompensationssender 10 Übertragungsstrecke vom Kompensationssender (9) zum Sensor (7) 11 Feature Extraction 12 Emission Computation/Emissionsberechnung 13 Viterbi Search/Viterbi-Decoder 14 LDA Matrix 15 Prototype Book (Code Book oder CB) 16 Sequence Lexicon/Sequenzlexikon 17 Trainings-Tool zur Erstellung der LDA Matrix (14) und des Prototype-Books (15) 18 Datenbasis zuvor aufgenommener Gesten-Feature-Vektoren vorbekannter Elementargesten in vorbekannten Elementargestensequenzen einer statistischen Auswahl von Gesten-Sprechern in einer statistischen Auswahl von Gestensituationen vorzugsweise inklusive Normgesten und Norm Gestensequenzen zur Rekalibierung des Gestenerkenners bei Änderung des physikalischen Interfaces (23) 19 Type-In-Tool zur manuellen Eingabe von zu erkennenden Elementargestensequenzen (Gesten) 20 Online Training-Tool zur Eingabe von neuen Elementargestensequenzen (Gesten über das physikalische Interface (23) (Zerlegung in vorbekannte Elementargesten) 21 Erkannte Gestenparameter 22 Erkannte Gestensequenzen 23 Physikalische Schnittstelle/Physical Interface 24 Strom von physikalischen Werten aus dem Physical-Interface (23) Dieser wird in der Feature Extraction (11) in einen Strom von Gesten Feature Vektoren verwandelt. 25 Applikationsspezifisches Training 26 Applikationsspezifische LDA Matrix (Diese wird immer dann geändert, wenn das Physical-Interface (23) geändert wird.) 27 Neuronales Netz 28 Trainingstool für das neuronale Netz 29 Mit dem neuronalen Netz erkannte Gestenparameter 30 Beispielhaftes Mobiltelefon 31 Bildschirm 32 Beispielhafte Künstliche Roboter Hand zur Definition und Darstellung einer Standardgeste 33 Beispielhaftes Device 34 Beispielhafte Elementargeste 35 Beispielhafte Gelenke mit drei rotatorischen Freiheitsgraden 36 Beispielhafte Arme mit je einem translatorischen Freiheitsgrad (Länge) 37 Signale anderer Sensoren und Messsysteme 38 Modifizierte Feature Vektor Datenstrom 39 Elementargesten-Hypothesenliste 40 Illumination Matrix (LDA_B) 41 Prototyp 1 42 Prototyp 2 43 Prototyp 3 44 Prototyp 4 45 Nicht präzise identifizierte Geste 46 Nicht registrierte Geste 47 Schwellwert Ellipsoid 48 Erkannte Geste A_ij Amplitudenpegel der Einstrahlung des Senders H_i in den Empfänger D_j B1 Konstantes vektorielles Signal, dass zum vektoriellen Signal S5 addiert wird. Mit dem Additionsergebnis wird die Transmitterbank H gespeist. B2 Konstantes vektorielles Signal, dass zum vektoriellen Signal S6 addiert wird. Mit dem Additionsergebnis S3 wird die Kompensationstransmitterbank K gespeist. CB Code Book (Prototypendatenbank (15)) CBE Code Book Eintrag (= Eintrag in die Prototypendatenbank (15)) C_ij Controller zur Regelung des Signals der Kompensationsdiode K_j in Abhängigkeit vom Sender H_i. Hierbei muss jeder Sender ein eigenes zu allen anderen Signalen S5_i orthogonales S5_i Signal aussenden. K1 Kompensationstransmitter 1 (Kompensations-LED 1) j = 1 K2 Kompensationstransmitter 2 (Kompensations-LED 2) j = 2 K3 Kompensationstransmitter 3 (Kompensations-LED 3) j = 3 K4 Kompensationstransmitter 4 (Kompensations-LED 4) j = 4 K5 Kompensationstransmitter 5 (Kompensations-LED 5) j = 5 K6 Kompensationstransmitter 6 (Kompensations-LED 6) j = 6 K7 Kompensationstransmitter 7 (Kompensations-LED 7) j = 7 K8 Kompensationstransmitter 8 (Kompensations-LED 8) j = 8 D1 Sensor 1 (Fotodiode 1) j = 1 D2 Sensor 2 (Fotodiode 2) j = 2 D3 Sensor 3 (Fotodiode 3) j = 3 D4 Sensor 4 (Fotodiode 4) j = 4 D5 Sensor 5 (Fotodiode 5) j = 5 D6 Sensor 6 (Fotodiode 6) j = 6 D7 Sensor 7 (Fotodiode 7) j = 7 D8 Sensor 8 (Fotodiode 8) j = 8 Di Sensor j (Fotodiode j) De_ij Verzögerungspegel der Einstrahlung des Senders H_i in den Empfänger D_j DSP Digitaler Signalprozessor Δt Vektorielle Verzögerungseinheit. Diese dient der Logik L zur Erzeugung des vektoriellen Signals S5o aus dem vektoriellen Signal S5 F Vektorielle Filterbank bestehend aus allen Filtern F1_ij und F2_ij F1_ij Filter 1 des Controllers C_ij F2_ij Filter 2 des Controllers C_ij FV Feature-Vektor (38) G Vektorielle Generator Bank. GS Erkannte Gestensequenz (22) Gn Erwartete nächste Geste (Vektor) H1 Transmitter 1/Sender 1/Sendediode 1 i = 1 H2 Transmitter 2/Sender 2/Sendediode 2 i = 2 H3 Transmitter 3/Sender 3/Sendediode 3 i = 3 H4 Transmitter 4/Sender 4/Sendediode 4 i = 4 H5 Transmitter 5/Sender 5/Sendediode 5 i = 5 H6 Transmitter 6/Sender 6/Sendediode 6 i = 6 H7 Transmitter 7/Sender 7/Sendediode 7 i = 7 H8 Transmitter 8/Sender 8/Sendediode 8 i = 8 Hi Transmitter i/Sender i/Sendediode i i = i HL Elementargesten-Hypothesenliste (39) I1 Vektorielle erste Übertragunsgsteilstrecke von der vektoriellen Transmittebank H zum Objekt O. I2 Vektorielle zweite Übertragunsgsteilstrecke von der vektoriellen Kompensationstransmittebank K zum Objekt O. I3 Vektorielle dritte Übertragunsgsteilstrecke vom Objekt O zur vektoriellen Sensorbank D. I1_i Erste Übertragunsgsteilstrecke vom der Transmitte H_i zum Objekt O. I2_j Zweite Übertragunsgsteilstrecke vom Kompensationstransmitter K_j zum Objekt O. I3_j Dritte Übertragunsgsteilstrecke vom Objekt O zum Sensor D_j. K1 Kompensationstransmitter 1 (Kompensations-LED 1) j = 1 K2 Kompensationstransmitter 2 (Kompensations-LED 2) j = 2 K3 Kompensationstransmitter 3 (Kompensations-LED 3) j = 3 K4 Kompensationstransmitter 4 (Kompensations-LED 4) j = 4 K5 Kompensationstransmitter 5 (Kompensations-LED 5) j = 5 K6 Kompensationstransmitter 6 (Kompensations-LED 6) j = 6 K7 Kompensationstransmitter 7 (Kompensations-LED 7) j = 7 K8 Kompensationstransmitter 8 (Kompensations-LED 8) j = 8 Kj Kompensationstransmitter j (Kompensations-LED j)j L Die Logik L erzeugt mit Hilfe der Verzögerungseinheit Δt das vektorielle Signal S5o aus dem vektoriellen Signal S5. LDA_F LDA Matrix (14) LDA_B Illumination Matrix (40) M1_ij Multiplexer 1 des Controllers C_ij M2_ij Multiplexer 2 des Controllers C_ij O Objekt pFV Feature-Vektor (beispielsweise nach Logarithmierung, Framing und Filterung etc.) vor Multiplikation mit der LDA Matrix (14) S3 Vektorielles Sendesignal für die vektorielle Kompensationstransmitterbank K. Das Signal entsteht durch vektorielle Addition des Vektors B2 mit S6. S3_j Sendesignal für den Kompensationstransmitter K_j. Das Signal entsteht durch Addition des Signals B2_j mit S6_j. S4d_ij Amplitudenpegel der Einstrahlung des Senders H_i in den Empfänger D_j S4o_ij Verzögerungspegel der Einstrahlung des Senders H_i in den Empfänger D_j S5 Vektorielles Sendesignal bestehend aus den Signalen S5_i S5o Vektorielles Signal bestehend aus allen Signalen S5o_i und S5d_i S5o_i Torsignal zur Empfangs-Verzögerungsmessung des Signals S5_i des Senders H_i S5d_i Torsignal zur Empfangs-Amplitudenmessung des Signals S5_i des Senders H_i S5_i Sendesignal des Senders H₁. Hierbei dieses Signal S5_i zu allen anderen Signalen S5_i orthogonales sein. S6 Vektorielles Signal bestehend aus allen Signalen S6o_i und S6d_i S6o_i Rücktransformierte der Empfangs-Verzögerungsmessung des Signals S5_i des Senders H_i S6d_i Rücktransformierte der Empfangs-Amplitudenmessung des Signals S5_i des Senders H_i S6_j Kompensationsvorsignal zur Ansteuerung des Kompensationstransmitters K_j Auf dieses Signal wird ggf. noch eine Konstante B1_j aufaddiert. S10 Vektorielles Ausgangssignal des Multiplexers M bestehend aus allen Signalen S10d_ij und S10o S10d_ij Ausgangssignal des Multiplexers M1_ij des Controllers C_ij S10o_ij Ausgangssignal des Multiplexers M2_ij des Controllers C_ij SL Sequence Lexicon (16) V Verstärkerbank bestehend aus allen Verstärkern V1_ij und V2_ij V1_ij Verstärker 1 des Controllers C_ij V2_ij Verstärker 2 des Controllers C_ij 26 - 57 : exemplary possible activity patterns list of labels No. description 1 Physical value stream / stream of physical quantities 2 Interaction region / area of interaction 3 First transmitter / first transmitter 4 First transmission line 5 Interaction or modification 6 Second transmission line 7 Sensor or receiver 8th Controller / controller 9 Compensation Transmitter / Compensation Transmitter 10 Transmission path from the compensation transmitter ( 9 ) to the sensor ( 7 ) 11 Feature Extraction 12 Emission Computation / Emission Calculation 13 Viterbi Search / Viterbi decoder 14 LDA matrix 15 Prototype Book (Code Book or CB) 16 Sequence Lexicon / Sequence Dictionary 17 Training tool for creating the LDA matrix ( 14 ) and the prototype book ( 15 ) 18 Data base of previously recorded gesture feature vectors of previously known elementary gestures in previously known elementary gesture sequences of a statistical selection of gesture speakers in a statistical selection of gesture situations, preferably including standard gestures and standard gesture sequences for recalibration of the gesture recognizer when changing the physical interface ( 23 ) 19 Type-in tool for manual entry of elementary gesture sequences (gestures) to be recognized 20 Online training tool for entering new elementary gesture sequences (gestures via the physical interface ( 23 ) (Decomposition into previously known elementary gestures) 21 Detected gesture parameters 22 Detected gesture sequences 23 Physical Interface / Physical Interface 24 Stream of physical values from the physical interface ( 23 ) This is used in the Feature Extraction ( 11 ) turned into a stream of gestures feature vectors. 25 Application specific training 26 Application-specific LDA matrix (This is changed whenever the physical interface ( 23 ) will be changed.) 27 Neural network 28 Training tool for the neural network 29 Gestural parameters detected with the neural network 30 Exemplary mobile phone 31 screen 32 Exemplary Artificial Robot Hand for defining and displaying a standard gesture 33 Exemplary device 34 Exemplary elementary gesture 35 Exemplary joints with three rotational degrees of freedom 36 Exemplary arms, each with a translatory degree of freedom (length) 37 Signals from other sensors and measuring systems 38 Modified feature vector data stream 39 Elementary gesture hypothesis list 40 Illumination Matrix (LDA_B) 41 Prototype 1 42 Prototype 2 43 Prototype 3 44 Prototype 4 45 Not precisely identified gesture 46 Unregistered gesture 47 Threshold Ellipsoid 48 Detected gesture A _ij Amplitude level of the radiation of the transmitter H _i in the receiver D _j B1 Constant vectorial signal added to the vectorial signal S5. With the addition result, the transmitter bank H is fed. B2 Constant vectorial signal added to the vectorial signal S6. With the addition result S3, the compensation transmitter bank K is fed. CB Code Book (prototype database ( 15 )) CBE Code Book entry (= entry in the prototype database ( 15 )) C _ij Controller for controlling the signal of the compensation diode K _j as a function of the transmitter H _i . Each transmitter must emit its own S5 _i signal which is orthogonal to all other signals S5 _i . K1 Compensation transmitter 1 (compensation LED 1) j = 1 K2 Compensation transmitter 2 (compensation LED 2) j = 2 K3 Compensation Transmitter 3 (compensation LED 3) j = 3 K4 Compensation Transmitter 4 (compensation LED 4) j = 4 K5 Compensation Transmitter 5 (compensation LED 5) j = 5 K6 Compensation transmitter 6 (compensation LED 6) j = 6 K7 Compensation Transmitter 7 (compensation LED 7) j = 7 K8 Compensation Transmitter 8 (Compensation LED 8) j = 8 D1 Sensor 1 (photodiode 1) j = 1 D2 Sensor 2 (photodiode 2) j = 2 D3 Sensor 3 (photodiode 3) j = 3 D4 Sensor 4 (photodiode 4) j = 4 D5 Sensor 5 (photodiode 5) j = 5 D6 Sensor 6 (photodiode 6) j = 6 D7 Sensor 7 (photodiode 7) j = 7 D8 Sensor 8 (photodiode 8) j = 8 di Sensor j (photodiode j) De _ij Delay level of the radiation of the transmitter H _i in the receiver D _j DSP Digital signal processor .delta.t Vectorial delay unit. This serves the logic L for generating the vectorial signal S5o from the vectorial signal S5 F Vectorial filter bank consisting of all filters F1 _ij and F2 _ij F1 _ij Filter 1 of the controller C _ij F2 _ij Filter 2 of the controller C _ij FV Feature vector ( 38 ) G Vectorial generator bank. GS Detected gesture sequence ( 22 ) gn Expected next gesture (vector) H1 Transmitter 1 / Transmitter 1 / Transmitting diode 1 i = 1 H2 Transmitter 2 / Transmitter 2 / Transmitting diode 2 i = 2 H3 Transmitter 3 / Transmitter 3 / Transmitting diode 3 i = 3 H4 Transmitter 4 / Transmitter 4 / Transmitting diode 4 i = 4 H5 Transmitter 5 / Transmitter 5 / Transmitting diode 5 i = 5 H6 Transmitter 6 / Transmitter 6 / Transmitting diode 6 i = 6 H7 Transmitter 7 / Transmitter 7 / Transmitting diode 7 i = 7 H8 Transmitter 8 / Transmitter 8 / Transmitting diode 8 i = 8 Hi Transmitter i / Transmitter i / Transmitting diode ii = i HL Elementary Gesture Hypothesis List ( 39 ) I1 Vectorial first transmission segment from the vectorial transmission bank H to the object O. I2 Vectorial second transmission segment from the vectorial compensation transmission center K to the object O. I3 Vectorial third transmission segment from object O to vector sensor tank D. I1 _i First transmission line from the transmitter H _i to the object O. I2 _j Second transmission section from the compensation transmitter K _j to the object O. I3 _j Third transmission path from object O to sensor D _j . K1 Compensation transmitter 1 (compensation LED 1) j = 1 K2 Compensation transmitter 2 (compensation LED 2) j = 2 K3 Compensation Transmitter 3 (compensation LED 3) j = 3 K4 Compensation Transmitter 4 (compensation LED 4) j = 4 K5 Compensation Transmitter 5 (compensation LED 5) j = 5 K6 Compensation transmitter 6 (compensation LED 6) j = 6 K7 Compensation Transmitter 7 (compensation LED 7) j = 7 K8 Compensation Transmitter 8 (Compensation LED 8) j = 8 kj Compensation Transmitter j (compensation LED j) j L The logic L generates the vectorial signal S5o from the vectorial signal S5 with the aid of the delay unit Δt. LDA_F LDA matrix ( 14 ) LDA_B Illumination Matrix ( 40 ) M1 _ij Multiplexer 1 of the controller C _ij M2 _ij Multiplexer 2 of the controller C _ij O object pFV Feature vector (for example after logarithmization, framing and filtering, etc.) before multiplication with the LDA matrix ( 14 ) S3 Vectorial transmission signal for the vector compensation transistor bank K. The signal is produced by vectorial addition of the vector B2 with S6. S3 _j Transmission signal for the compensation transmitter K _j . The signal is produced by adding the signal B2 _j to S6 _j . S4d _ij Amplitude level of the radiation of the transmitter H _i in the receiver D _j S4o _ij Delay level of the radiation of the transmitter H _i in the receiver D _j S5 Vectorial transmission signal consisting of the signals S5 _i S5O Vectorial signal consisting of all signals S5o _i and S5d _i S5o _i Gate signal for receiving delay measurement of the signal S5 _{i of} the transmitter H _i S5d _i Gate signal for reception amplitude measurement of the signal S5 _{i of} the transmitter H _i S5 _i Transmission signal of the transmitter H ₁ . In this case, this signal S5 _i to be orthogonal to all other signals S5 _i . S6 Vectorial signal consisting of all signals S6o _i and S6d _i S6o _i Inverse transformed the receive delay measurement of the signal S5 _{i of} the transmitter H _i S6d _i Inverse transformed the received amplitude measurement of the signal S5 _{i of} the transmitter H _i S6 _j Compensation leading signal for controlling the compensation transmitter K _j A constant B1 _j may be added to this signal. S10 Vectorial output signal of the multiplexer M consisting of all signals S10d _ij and S10o S10d _ij Output signal of the multiplexer M1 _{ij of} the controller C _ij S10o _ij Output signal of the multiplexer M2 _{ij of} the controller C _ij SL Sequence Lexicon ( 16 ) V Amplifier bank consisting of all amplifiers V1 _ij and V2 _ij V1 _ij Amplifier 1 of the controller C _ij V2 _ij Amplifier 2 of the controller C _ij

Claims

Device for measuring a recognition system for the detection of three-dimensional gestures, characterized in that it is capable of performing elementary object positioning or movements or positioning sequences or movement sequences.

Apparatus according to claim 1, characterized in that it has a model body ( 32 ), in particular a model hand or a model finger.

Apparatus according to claim 2, characterized in that the model hand a shape corresponding to one or more of the 12 to 22 or can accept this.

Device according to one or more of claims 1 to 3, characterized in that - a model body ( 32 ) with the aid of at least one mechanically controllable joint ( 35 ) is pivoted about at least one of the three possible rotational degrees of freedom (axes) ( 34 ) or - that a model body ( 32 ) by means of at least one mechanically controllable arm or actuator ( 36 ) can be shifted by at least one translational degree of freedom.

Method for the three-dimensional mechanical transmission of information composed of a temporal sequence of distinguishable elementary object orientations of a mechanical signaling means according to one of the preceding claims, characterized in that statistical data of said elementary signaling means positions of said signaling means are stored in a prototype book of a gesture recognition system.

A method according to claim 5, characterized in that - the elementary object orientations of a mechanical signaling means used are one or more of the following elementary object orientations for one or each of two model hands as a mechanical signaling means: a. A palm of the model hand to the recognizer b. A model hand from the side with a little finger to the recognizer c. A back of the hand of the model hand to the recognizer d. A model hand from the side with thumb model to Recognizer e. A model hand oriented in one of the following directions: i. North, ii. North east iii. East iv. South east v. South vi. South west vii. west viii. North West f. A model hand tilted with the tip down g. A model hand tilted with the top up h. A model hand vertical with tip up i. A model hand vertically with point down j. A model hand bent i. To the right ii. To the left k. The model fingers of a model hand spread l. The number of splayed model fingers of a model hand is 1, 2, 3, 4, 5 m. A model finger (in particular a forefinger model) of a model hand forwards n. Combinations of the above one-model hand elementary orientations by two model hands to two-model hand orientations

A method according to claim 6, characterized in that a temporal information sequence is composed of elementary, distinguishable object positioning and movements.

A method according to claim 7, characterized in that the elementary, distinguishable object positioning and movements are one or more of the following prototypes or that these subsequent prototypes are part of such elementary, distinguishable object positioning and movement: a. No signaling (idle) or no object b. Entering a mechanical signaling means from the left, right, up, down or corresponding intermediate values such as in particular diagonally or another change in the number of detected mechanical signaling means c. Persistence of a mechanical signaling device (no movement) d. Linear translation movements of a mechanical signaling means, in particular in x, y, and z-direction and / or their intermediate directions e. Movements, in particular circular or elliptical or hyperbolic movements of a mechanical signaling means, along a coordinate direction of an imaginary, in particular orthogonal, coordinate system above the recognizer, in particular a polar coordinate system or a spherical coordinate system, or a cylindrical coordinate system or another orthogonal coordinate system. f. Rotations of a mechanical signaling means or transitions between object orientations according to claim 5 G. Modifications of the structure of a mechanical signaling means, in particular the opening of a model hand or the closing or spreading of the model fingers of a model hand or the pre-stretching of one, two or more model fingers. H. Combinations of these elementary object positioning and movement or elementary signaling

Method according to one or more of the preceding claims, characterized in that it defines sequences of elementary object positioning and movements, wherein a sequence consists of at least one elementary object positioning and / or movement, and assigns these sequences with meaning classes, in particular that one of the following Meaning classes is used: a. subjects b. Objects (especially objects and persons) or c. Properties of these objects or subjects or d. Changes in the properties or e. Properties of the changes themselves f. Cause of these changes or G. Relationships of objects or subjects among themselves H. Properties of the relationships of these objects with each other

Characterized in that at least one elementary object positioning or movement or positioning sequence or movement sequence at least one text string or at least one other symbol chain is assigned or can be assigned and - That this elementary object positioning or movement or positioning sequence or movement sequence of one or more elementary object positioning or movements or positioning sequences or movement sequences of claims 5 to 9 can be assigned

A method of transmitting gesture sequences according to claim 10, characterized - That the thus defined links between the elementary object positioning or movements or positioning sequences or movement sequences on the one hand and actions of a computer and / or robot system or its actuators on the other side of the computer system of the computer and / or robot system by a Protocol, in particular an http or XML protocol to be transmitted.

System, in particular a robot or computer or smart-phone, characterized That it has a device or a detection system capable of mechanical signaling according to one or more of the preceding claims, and In that at least one actuator or action can be controlled by means of at least the elementary object positioning or movement or positioning sequences or movement sequences according to one or more of the preceding claims, generated by a device according to one or more of claims 1 to 4 become.

Gesture recognition system characterized in that it - Operated with data, which were determined by means of a method according to one of claims 5 to 9, or - Operated with data that has been transmitted to the system by the method of claim 10 or 11 or - Is operated with data that were generated by means of a device according to one or more of claims 1 to 4 or - Was tested or verified or calibrated with the aid of a device according to one or more of claims 1 to 4 in the production process or in the design process.