Integration of Signed and Verbal Communication: South African Sign Language Recognition, Animation and Translation
What is Sign Language?
Signed Languages (SL) are used for communication by Deaf people around the world. Signed languages are fully-fledged languages, that are as complex as any spoken language and are functionally capable of expressing the entire range of human experience that spoken languages can express. They have phonological, morphological, syntactic and semantic levels of representation. What distinguishes signed languages is their production through the medium of space, using the hands, face, head and upper torso. Facial expressions, mouth patterns, eyebrow movements, body inclination, eye gaze and head movements all convey grammatical and/or phonological information. These non-manual features are essential for the full expression of any signed language.
There is no universal signed language as they arise and develop, like spoken languages, though community use. The signed language used in one country will be clearly distinguishable from the signed language used in another. Hence, developments in signed language technologies will need to be adapted specifically for each signed language, although the technology remains the same the language differs significantly.
South African Sign Language (SASL) is believed to exhibit an underlying common syntactic and morphological base, but with a high degree of lexical diversity. It is not a collection of signed languages each bound to a separate spoken language, e.g. Afrikaans Sign Language or Zulu Sign Language, but one distinct language with regional variation, similar to that which is found in spoken languages.
South African Sign Language (SASL) is used by between 500,000 to 1,600,000 South Africans. It has been included in the South African constitution and is the accepted language of instruction for Deaf students
The objective of this research programme is to create the technology needed to build a system that will assist Deaf people in interacting with a world that assumes that everybody can hear. The proposed system is outlined below.
Overview of the proposed SASL translation system
SASL-to-English
- Step 1 – Video of a person using SASL needs to be recorded. This can be done using a PC based system such as a Webcam, but to enhance mobility, cellular phones will primarily be used.
- Step 2 – Computer vision needs to be used to extract semantic information from the video. Both manual and non-manual gestures need to be extracted.
- Step 3 – Machine translation of SASL to English. Though this is not a current focus of this project a system will be built to collect SASL information to facilitate this process in future.
- Step 4 – Text to speech synthesis of translated text. Existing text to speech synthesis systems, such as Festival, will be used to perform this task.
English-to-SASL
- Step 1 – Audio of person speaking English needs to be recorded. This can be done using PC based systems, but to enhance mobility, cellular phones will primarily be used.
- Step 2 – Speech recognition needs to be performed. Existing speech recognition systems, such as Sphinx, will be used to perform this task.
- Step 3 – Machine translation of English to SASL. Though this is not a current focus of this project a system will be built to collect SASL information to facilitate this process in future.
- Step 4 – Render AVATAR of translated SASL. Rendering systems for both PC and mobile platforms will be developed.
Machine translation, speech recognition and speech synthesis are active research areas with applications outside of sign languages. In order to keep this project focused, these fields will only be investigated on a level that will allow for the successful transition between the steps outlined above.
For an overview of current trends, state of the art, as well as active participants in the field of Machine translation see NIST Machine Translation Evaluation Official Results (http://www.nist.gov/speech/tests/mt/doc/).
For an overview of the field of speech recognition and synthesis see “Cole, R. A., Mariani, J., Uszkoreit, H., Zaenen, A., and Zue, V. 1995. Survey of the State of the Art in Human Language Technology. Center for Spoken Language Understanding CSLU, Carnegie Mellon University, Pittsburgh, PA.”. Also see the Festival and Sphinx projects.
This project is therefore focused on developing computer vision technologies that apply specifically to extracting semantic information about SASL from video footage as well as developing an AVATAR system for rendering SASL.
In order to achieve this objective, this programme will develop, use and extend technologies that allows for the accurate capture and analysis of sign language gestures. It will also develop, use and extend techologies needed to render sign language.
To date this project has developed a digital SASL phrasebook. User interaction is through a cellular phone and the processing happens on a server. The phrasebook performs whole sign recognition of manual gestures. It can be used for remote phone to phone communication or on a single device. The system also has a phrase lookup feature.
Successful completion of this proposed research programme will
1. improve the quality of life for the Deaf community by widening their access to information services,
2. improve their access to careers and thereby improving their socio-economic situation,
3. empower the Deaf community to contribute to the development of new technologies, and
4. advance the state-of-the-art in human/computer interaction,
5. improve the competitiveness of South Africa in the global market as more and more service providers move toward the integration of the Deaf community, and
6. create opportunities for developing the output of this programme into commercially viable products.
Today, assistance to the Deaf community is being offered in the form of real-time interpretation of news programmes on television and text-to-voice and voice-to-text translation for telephony. However, such services are costly and the expected increase of information services make such human-in-the-loop translation systems cumbersome and infeasible. Automatic speech-to-sign language and sign-language-to-speech interpretation are desirable.
Videos
demonstrating the iSign prototype
demonstrating the PhoneReader prototype.
demonstrating bimanual hand tracking
demonstrating robust bimanual hand tracking in extreme enviroments
demonstrating an intergrated sign language system.
demonstrating a robust facial expression.
demonstrating faster upper body pose recognition and estimation using CUDA.
James Connan
Department of Computer Science
University of the Western Cape
Belville, South Africa