Richard K. Furuta
Center for the Study of Digital Libraries &
Department of Computer Science
Texas A&M University
College Station, Texas
At a first glance, the concept of building 3D models for ASL signs in VRML is simple and intuitive. However, discovering how to expand a small vocabulary of static handshapes in VRML, such as the one made by Geitz, et al. , into a larger vocabulary is not trivial. In this paper, we present techniques for systematically solving the problem.
Our work is based on the assumption that ASL signs can be described in terms of linguistic features of signing hands (handshapes, positions, orientations, and movements), and each feature has only a small number of possible configurations. The assumption is based on Stokoe, Casterline, and Croneberg's sign writing system . Since each movement has a starting point and an ending point, the problem of describing a dynamic sign can be reduced into selecting static gestures as key frames, and then animating among them to form the sign. Facial expression and body language are not considered in this work.
In this paper we describe the implementation of a Web-accessible interface for ASL-based gestures. We started with a simple 3D hand model and added specification for all the degrees of freedom (DOFs) on the hand to construct static gestures. In the interface, the DOFs on the hand can be acquired by manipulating a control panel; finger bendings, wrist bendings, location and orientation of the hand model can be adjusted one by one.
Next, we used the function of key-frame animation provided with VRML 2.0 to compose dynamic gestures by specifying pre-selected static gestures as key frames. The Web site provides a proof-of-concept prototype creating VRML files that demonstrates this. Here, users can input a string of letters and numbers and see an animation of the sequence in the ASL manual alphabet in a dynamically-generated VRML world.
In the paper, the hand model will be introduced first and followed by the control panel. Then, animation and fingerspelling will be explained. Finally, we will discuss the possibility of compiling an ASL dictionary in VRML.
The hand model consists of entities for a palm, a forearm, and three bones for each finger and thumb; each entity is made of a scaled Cylinder plus two scaled Spheres attached to the two ends of the Cylinder. Cylinder and Sphere are VRML geometry nodes. To simplify the hand model implementation, we ignore the flesh patch between the index finger and thumb, which makes the hand model a bit less realistic. Since the model is so simple, we can keep the file size small (about 9KB).
The major DOFs on the hand can be characterized as below : each finger is able to do distal bending, middle bending, proximal bending, and proximal deviation. The thumb is similar. In total this yields 20 DOFs for fingers and thumb. Additionally, the wrist can do yaw and pitch and the forearm has 6 DOFs: x, y, and z as well as yaw, pitch, and roll. The final count is 28 DOFs for the complete hand model.
The letter Z in the manual alphabet is a gesture using the index finger to draw a "Z"; four handshapes were used as key frames to create the dynamic gesture for Z. The letter J uses the little finger to draw a "J"; three handshapes were used for the J gesture.
The VRML files for letters and numbers are archived ready for retrieval. However, the VRML files for string input are created on the fly by our CGI script, GestureMaker. For adjacent handshapes that are the same, we insert the same handshape again in-between but moving backward slightly, to imitate repeated handshapes in a row in fingerspelling.
One obvious problem of the system is the collision and penetration of fingers and thumb for some string inputs, say, moving from letter A to letter B. In this case, we need to insert one or more key-frame handshapes to guide the path of movements. At this moment we still are working on general algorithms to find key frames to be inserted for these cases.
Also, we need more user feedback on our system, which has not been done exhaustively. A user familiar with ASL reported that the speed of our fingerspelling is slow compared to real-world fingerspelling. This will be improved in the next version by allowing users to adjust the pace of fingerspelling.
The most direct method for acquiring static and dynamic gestures is from a hand tracking device. However, the disadvantage is that each gesture has to be recorded separately. The next in the spectrum is to record key static gestures in ASL from the device, and to create dynamic gestures with animation among static ones. By this method, we gain a degree of automation. Also, static gestures can be created purely by the manipulations of control panel. This gives us a higher degree of automation since we can reuse gestures.
Furthermore, we can break down handshapes into sub-handshape elements , then assemble handshapes with these elements, and do minor adjustments on the DOFs if needed. This gains an even higher degree of automation. This method will be implemented in the next version of system. Here is an example for the sign "I love you" to be expressed in sub-handshape elements:
I love you :: F1 (index), F4 (little): straight; F2 (middle), F3 (ring): fully curved; F0 (thumb): outward; PALM (orientation): superior-anterior.We will explore the possibility of expressing an ASL sign in terms of a string of tokens. The set of tokens will represent sub-handshape elements based on Stokoe's sign writing system. To complete this kind of system, more factors have to be considered. The most important thing is to include the second hand as well as the face and body of the signer in the VRML world. Since an ASL dictionary using the sign writing system is available , we can just follow it and translate the symbols into our tokens to compile an ASL dictionary in VRML. Our Web site demonstrating fingerspelling and Gesture-Maker is at http://www.csdl.tamu.edu/~su/asl/.