VRML-based Representations of ASL Fingerspelling on the World-Wide Web

ASSETS 98, Marina del Rey, California
Copyright ACM
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

S. Augustine Su
Department of Computer Science
University of Maryland
College Park, Maryland
su@cs.umd.edu

Richard K. Furuta
Center for the Study of Digital Libraries &
Department of Computer Science
Texas A&M University
College Station, Texas
furuta@cs.tamu.edu

1. ABSTRACT

Virtual Reality Modeling Language (VRML) is an effective tool to document sign language on the World-Wide Web. In this paper, we present techniques to enlarge the vocabulary of encoded ASL signs in VRML 2.0 for educational purposes. In order to prove the concept of gesture making, a Web site is presented that demonstrates application of the hand model to fingerspell the ASL manual alphabet and numbers.

1.1 Keywords

American Sign Language, Virtual Reality Modeling Language, World Wide Web, hand gestures

2. INTRODUCTION

Virtual Reality Modeling Language (VRML) [8] is a standard language for describing 3D computer graphics models on the World-Wide Web. Since 3D models in VRML provide more effective visual documentation of sign language than traditional 2D media, such as drawings and video [2], the construction of ASL signs in VRML will not only help facilitate the learning of ASL by students, but also provides universal access to the learning material.

At a first glance, the concept of building 3D models for ASL signs in VRML is simple and intuitive. However, discovering how to expand a small vocabulary of static handshapes in VRML, such as the one made by Geitz, et al. [2], into a larger vocabulary is not trivial. In this paper, we present techniques for systematically solving the problem.

Our work is based on the assumption that ASL signs can be described in terms of linguistic features of signing hands (handshapes, positions, orientations, and movements), and each feature has only a small number of possible configurations. The assumption is based on Stokoe, Casterline, and Croneberg's sign writing system [7]. Since each movement has a starting point and an ending point, the problem of describing a dynamic sign can be reduced into selecting static gestures as key frames, and then animating among them to form the sign. Facial expression and body language are not considered in this work.

In this paper we describe the implementation of a Web-accessible interface for ASL-based gestures. We started with a simple 3D hand model and added specification for all the degrees of freedom (DOFs) on the hand to construct static gestures. In the interface, the DOFs on the hand can be acquired by manipulating a control panel; finger bendings, wrist bendings, location and orientation of the hand model can be adjusted one by one.

Next, we used the function of key-frame animation provided with VRML 2.0 to compose dynamic gestures by specifying pre-selected static gestures as key frames. The Web site provides a proof-of-concept prototype creating VRML files that demonstrates this. Here, users can input a string of letters and numbers and see an animation of the sequence in the ASL manual alphabet in a dynamically-generated VRML world.

In the paper, the hand model will be introduced first and followed by the control panel. Then, animation and fingerspelling will be explained. Finally, we will discuss the possibility of compiling an ASL dictionary in VRML.

3. HAND MODEL

The first step to representing signs in VRML is to construct a 3D hand model. In order to keep the file size as small as possible, we decided to preserve the degrees of freedom on the hand but to ignore other details, like texture.

The hand model consists of entities for a palm, a forearm, and three bones for each finger and thumb; each entity is made of a scaled Cylinder plus two scaled Spheres attached to the two ends of the Cylinder. Cylinder and Sphere are VRML geometry nodes. To simplify the hand model implementation, we ignore the flesh patch between the index finger and thumb, which makes the hand model a bit less realistic. Since the model is so simple, we can keep the file size small (about 9KB).

The major DOFs on the hand can be characterized as below [5]: each finger is able to do distal bending, middle bending, proximal bending, and proximal deviation. The thumb is similar. In total this yields 20 DOFs for fingers and thumb. Additionally, the wrist can do yaw and pitch and the forearm has 6 DOFs: x, y, and z as well as yaw, pitch, and roll. The final count is 28 DOFs for the complete hand model.

4. CONTROL PANEL

The means we provide for specifying hand's DOFs is by a control panel. Figure 1 is a snapshot of the control panel built in VRML. Each control wheel is used to adjust one DOF; each control ball is used to adjust two DOFs at the same time. There is one exception: the distal and middle bendings of each finger (2 DOFs) are governed by a 1-DOF control wheel since the angle of distal bending is approximately two thirds that of middle bending for natural gestures [4]. Therefore, the number of controllable DOFs on the control panel is 24 (4 for 4 fingers is deducted from the original 28 DOFs). For easy learning, the wheels and balls were arranged corresponding to the positions on the hand where DOFs can be adjusted.


Figure 1. A control panel making handshapes (right handed).

The Script nodes of VRML are used to get the amounts of mouse drag upon control wheels and balls, and to set the translation and rotation fields of the Transform nodes by VRMLScript, which is a subset of JavaScript, but implemented on Cosmo Player [1] only. After finishing a handshape, we can click on a control button on the panel to send out the values for the DOFs to our remote CGI script, called GestureMaker. Then, GestureMaker will reply a new VRML file carrying the handshape. This is the method we used to produce all the static gestures for ASL manual alphabet and numbers (zero to nine). In specification, we arbitrarily chose one of the possible variants for each sign [3], however, it would be straightforward to specify additional variants as well.

5. ANIMATION

With static handshapes, dynamic gestures can be created by key-frame animation of VRML 2.0. We use the PositionInterpolator Nodes of VRML to linearly interpolate values for the translation fields, and use the OrientationInterpolator Nodes of VRML to linearly interpolate values for the rotation fields. The values of DOFs for a sequence of key-frame handshapes are passed to the GestureMaker script, which returns a VRML file for a dynamic gesture interpolating among the handshapes.

The letter Z in the manual alphabet is a gesture using the index finger to draw a "Z"; four handshapes were used as key frames to create the dynamic gesture for Z. The letter J uses the little finger to draw a "J"; three handshapes were used for the J gesture.

6. FINGERSPELLING

After creating dynamic gestures for J and Z, we moved a step further to create animation among letters and numbers. Figure 2 is a snapshot of our Web site, which can demonstrate the animation of right-handed fingerspelling among ASL manual alphabet and one-digit numbers in VRML. Users can select to see VRML files for individual letters and numbers, or they can input a string of letters and numbers and wait to see a fingerspelling animation.

The VRML files for letters and numbers are archived ready for retrieval. However, the VRML files for string input are created on the fly by our CGI script, GestureMaker. For adjacent handshapes that are the same, we insert the same handshape again in-between but moving backward slightly, to imitate repeated handshapes in a row in fingerspelling.

One obvious problem of the system is the collision and penetration of fingers and thumb for some string inputs, say, moving from letter A to letter B. In this case, we need to insert one or more key-frame handshapes to guide the path of movements. At this moment we still are working on general algorithms to find key frames to be inserted for these cases.


Figure 2. ASL fingerspelling in VRML (right-handed).

Also, we need more user feedback on our system, which has not been done exhaustively. A user familiar with ASL reported that the speed of our fingerspelling is slow compared to real-world fingerspelling. This will be improved in the next version by allowing users to adjust the pace of fingerspelling.

7. FUTURE WORK AND CONCLUSION

In this paper we have described techniques for constructing hand gestures in VRML 2.0. By mixing them carefully, there exists a spectrum of methods for creating VRML gestures in terms of tradeoff between straightforwardness and automation.

The most direct method for acquiring static and dynamic gestures is from a hand tracking device. However, the disadvantage is that each gesture has to be recorded separately. The next in the spectrum is to record key static gestures in ASL from the device, and to create dynamic gestures with animation among static ones. By this method, we gain a degree of automation. Also, static gestures can be created purely by the manipulations of control panel. This gives us a higher degree of automation since we can reuse gestures.

Furthermore, we can break down handshapes into sub-handshape elements [6], then assemble handshapes with these elements, and do minor adjustments on the DOFs if needed. This gains an even higher degree of automation. This method will be implemented in the next version of system. Here is an example for the sign "I love you" to be expressed in sub-handshape elements:

 I love you ::   F1 (index), F4 (little):   straight;
                 F2 (middle), F3 (ring):  fully curved;
                 F0 (thumb):   outward;
                 PALM (orientation):  superior-anterior.
We will explore the possibility of expressing an ASL sign in terms of a string of tokens. The set of tokens will represent sub-handshape elements based on Stokoe's sign writing system. To complete this kind of system, more factors have to be considered. The most important thing is to include the second hand as well as the face and body of the signer in the VRML world. Since an ASL dictionary using the sign writing system is available [7], we can just follow it and translate the symbols into our tokens to compile an ASL dictionary in VRML. Our Web site demonstrating fingerspelling and Gesture-Maker is at http://www.csdl.tamu.edu/~su/asl/.

8. ACKNOWLEDGMENTS

This material is based in part on work supported by the Texas Advanced Research Program under Grant Number 999903-230.

9. REFERENCES

  1. Cosmo Player, a VRML browser and plug-in; see http://cosmo.sgi.com/.

  2. Geitz, S., Hansen, T. and Maher, S. Computer generated 3-Dimensional models of manual alphabet handshapes for the World Wide Web, in Proceedings of ASSETS '96 (Vancouver, Canada, April 1996), ACM Press, 27-31. http://holodeck.gsfc.nasa.gov/sign/sign.html

  3. Riekehof, L.L. The Joy of Signing, 2nd ed. Gospel Publishing House, Springfield, MO, 1987.

  4. Rijpkema, H. and Girard, M. Computer animation of knowledge-based human grasping, in Proceedings of SIGGRAPH '91 (July 1991), ACM Press, 339-348.

  5. Spence, A.P. and Mason, E.B. Human Anatomy and Physiology, 3rd ed. Benjamin Cummings Pub. Co., 1987.

  6. Su, S.A. and Furuta, R. A logical hand device in virtual environments, in Virtual Reality Software & Technology: Proceedings of the VRST'94 Conference (Singapore, August 1994). World Scientific Publishing Co., Singapore, 33-42.

  7. Stokoe, W.C,. Casterline, D.C., and Croneberg, C.G. A Dictionary of American Sign Language on Linguistic Principles. Linstok Press, Silver Spring, MD, 1976.

  8. VRML Repository at http://www.sdsc.edu/vrml/