top of page
PORTFOLIO

Dylan Pallickara

Poudre High School, Fort Collins, Colorado

Contents

  • Computer Science Research

  • Creative Writing: Poetry

Computer-Assisted Recognition of American Sign Language
With Wireframes

 

 

 

 

 

 

 

 

 

 

 

Several efforts have targeted ASL recognition from raw images directly. However, depending on the data used to train the models, such methods may be subject to biases stemming from variations in the skin tone, texture, and finger thickness that may not have been sufficiently captured in the datasets (John, Sherif 2022). ASL recognition involved two phases. In the first phase, wireframes of hands were extracted while the second phase classifies the ASL sign based on the wireframe.

The primary dataset comprising images of ASL hand gestures is from Kaggle [Akash, 2017]. These images were transformed to extract wireframes associated with each gesture. The data are first passed through TensorFlow's hand landmarking functionality, which results in a set of 20 coordinates per image. A wireframe image (rendered over a black background) is generated from these coordinates as depicted in Figure 2. The generated wireframe image attenuates background interference and noise. Distilling each image into a simple wireframe also reconciles differences in skin tone, finger thickness, and skin texture.

The wireframe extraction process was performed for 3,000 images per ASL sign to construct a curated dataset of wireframe images. This curated ASL dataset contained numbers 0 through 9 as well as every alphabet except for “J” and “Z” that are not amenable for classifications using still images. The curated ASL wireframe dataset was then used to train the deep network, which was a type of convolutional neural network. The 8-layer Keras Sequential Model was adapted and trained using TensorFlow. The model includes 4 convolutional input layers (2x32 feature maps with a size of 3x3 and 2x64 feature maps with a kernel size of 3x3). Two max pooling layers followed by a flatten layer were also used. Unlike the original implementation, a fully connected layer with 128 units was used. Dropout regularization was used and the overall dropout rate was set as 25% except for the last layer where the rate was set at 50%. The curated dataset was partitioned into training and validation datasets using an 80:20 split. Accuracy and loss metrics of the model were profiled at the end of training and the model had achieved a 94% accuracy.

Bibliography

Akash. ASL Alphabet. Image data set for alphabets in the American Sign Language. 2017. https://www.kaggle.com/datasets/grassknoted/asl-alphabet

bottom of page