by Sezgin, T. M. and Davis, R.
Summary:
At the beginning of his article, Sezgin gives the intention of the paper as proposing a novel approach to sketch recognition by taking into account the incremental and interactive nature of it. The incremental nature comes about the fact that every stroke is drawn one by one and interactive, since there is a two-way communication between the user and the computer.
He continues with providing the terminology for the paper. He gives the informal definition of a sketch as the messy hand drawings and formal definition as the sequence of strokes in which the x, y coordinates and the time t for each point is captured. He asserts that stroke ordering is important in recognition since a user's preferred order of strokes defines his/her sketching style. He continues with three steps of the recognition process as segmentation, which is the grouping of strokes that belong to same object class to the same group; classification, which is to determine which stroke group accounts for which object class; and labeling, which gives, as the name suggests, labels to the recognized objects' components.
The author gives the problem of the current sketching systems as treating all the users with the same recognition procedures and therefore ignoring their sketching styles. He suggests that s employing an approach for capturing these sketching styles provides efficient sketch recognition with polynomial time requirements.
A user study that has been completed indicated that people tend to be stylized while sketching and they are persistent in their styles in different sketches. From this intuition, Sezgin proposes to use HMMs in order to model different sketching styles. Next, he gives an overview of HMMs and goes on with the designing of HMMs with fixed and variable training data to incorporate the different lengths of observations that are present while drawing the sketches of the same objects with different styles in the training of HMMs. While modeling with fixed input length, after training the HMMs to capture different sketching styles with the Baum-Welch method, the calculation of the probabilities of every model presents the challenge of pre-segmenting the sketch. To overcome this challenge, Sezgin proposes a dynamic programming approach in the form of a shortest path problem. After segmentation, the classification is done by computing the probabilities of every model with Forward algorithm and choosing the model with the highest likelihood and thus the object class.
In designing with variable length training data, Sezgin gathers all training examples of an object, therefore embodying all sketching styles and training one HMM for each object class. After the training of HMMs the estimation for the probabilities of ending states is also calculated to help the recognition process. Again, the segmentation is achieved with the same way that is done with fixed length training data.
Evaluation of these two frameworks showed that, as discussed in the paper, with variable length training data, the recognition accuracy increases slightly. The author says that this is due to training those HMMs with more training data.
Discussion:
In his '05 paper, Sezgin brings many intuitions that seem to have very much effect on the sketch recognition research. The idea of regarding the incremental and interactive nature of sketches, using these to design HMMs and achieving polynomial time requirements for recognition is very impressive. The contributions achieved with this paper are huge; however, the most important parts of this paper are the discussion and future work. The author does seem to know what the drawbacks of the proposed systems are, e.g. handling single stroke objects, and shows where his research is headed.
There are some phenomena though he had not covered in his paper. It is not clear if the proposed systems do consider the recognition of filled-in or over traced objects, which are both very common in sketching. Another point of concern is that whenever there is a need to add a new domain, would the system needed to be designed all over again and for that specific domain or can a normal user do this by simply providing the training examples. Lastly, the system could take the correctly recognized objects and add them to the training set to increase the recognition accuracy.
Citation:
Sezgin, T. M. and Davis, R. HMM-Based Efficient Sketch Recognition. In Proceedings of the International Conference on Intelligent User Interfaces (IUI’05), January 10–13, 2005, San Diego, California, USA. ACM Press.