Wednesday, February 25, 2009

Specifying Gestures by Example

by Rubine, D.


Summary:


Rubine's article is based on an application framework called Gesture Recognizers Automated in a Novel Direct Manipulation Architecture, GRANDMA for short, for creating and manipulating recognizers from examples. After explaining the need for such a framework as being able to create "small, fast and accurate recognizers", Rubine goes on and describes various aspects of GRANDMA in an example GDP, a gesture based drawing program.

Rubine is interested in creating recognizers for single stroke gestures since it avoids the segmentation problem that is inherent in multi-stroke gestures. Here, a single stroke gesture is described as the pressing of a mouse button for initialization, moving the mouse to draw the gesture and lifting the mouse button or waiting for a certain amount of time to end it. Next, a click-and-drag interface is mentioned for creating the gestures to be recognized in GDP. The user adds the gestures to GDP by simply adding a new class in the interface and then providing examples for training. It is said that 15 examples per class is sufficient to reflect variance in size and/or orientation. The semantics of the gesture is described after the class is created to enable editing and manipulation of the gesture.

After creating the gesture classes to be recognized, the author presents a statistical method for recognition. In this matter, a gesture is nothing but a series of time stamped points. And the problem is to determine the class which a gesture belongs from a set of gesture classes that are specified by examples. The statistical gesture recognition includes extracting a vector of features from the input and classifying the vector as one of the possible gesture classes. The author empirically determines 13 features that enable computations to be done in run-time. A linear evaluation function is used to classify the input gestures. The dot product of a feature vector to weight vectors of each gesture class gives us the results of the evaluation function. The maximum among these results determines the class which the input gesture belongs, therefore concluding the recognition. The weight vector is calculated from averaging the feature vectors of the examples and averaging the common covariance matrix. Rejection occurs when the results of evaluation functions of different classes are close to each other. Evaluation of this system showed over 98% accuracy with 15 classes and 15 or more examples provided for each class. Even with different combinations of number of classes to be recognized and the number of examples provided the accuracy rate does not fall below 90%.

The author mentions two extensions to the framework. First one is eager recognition which proposes that the system being able to recognize an input gesture as soon as its ambiguity is eliminated. an the other one is multi-finger recognition which proposes applying the single stroke recognition to each of the individual strokes created by the fingers and then combining the results to classify the multi-finger gesture.


Discussion:


The framework GRANDMA presented in this article had struck me with its simplicity, in spite of the fact that almost every detail of its feature evaluation functions and classification method explained in the paper. The adding of a new gesture class together with its semantics and training with very few examples seemed trivial. There are, however, some issues related to drawing constraints that caught my attention. A simple example of this would be” x” -deletion gesture. It is odd that we should place the starting point of the gesture to delete a previously drawn gesture instead of placing the intersection point on top of it. This may be done in order to be efficient in storing information and calculation of the features. Moving on, drawing some gestures with a single stroke does not seem to be natural and with the expansion to new domains the single stroke gestures are likely to be irrelevant to the gesture class that it is designed for. This would give the user an unnatural feeling even considering the fact that users are expected to familiarize themselves with a program before using it.

Another part that made me think was the author's way of selecting features. Since the selection of the features was empirical and since the results were satisfying, there seems not to be an extensive discussion on that matter. However, extracting features and selecting among them is supposed to be an important part of recognition.


Citation:


Rubine, D. 1991. Specifying gestures by example. In Proceedings of the 18th Annual Conference on Computer Graphics and interactive Techniques SIGGRAPH '91. ACM Press, New York, NY, 329-337. DOI= http://doi.acm.org/10.1145/122718.122753

No comments:

Post a Comment