top of page
PORTFOLIO

Dylan Pallickara

Poudre High School, Fort Collins, Colorado

Contents

  • Computer Science Research

  • Creative Writing: Poetry

Computer-Assisted Recognition of American Sign Language
With Raw Images

As part of my 10th grade International Baccalaureate (IB) personal project, I attempted to perform real-time detection of ASL gestures and translate the signs into text information in real time. A key goal of the IB personal project is to assess students’ approaches to learning skills for self-management, research, communication, critical and creative thinking, and collaboration.

The objective of my IB project was to teach a computer to see and comprehend ASL to bridge the communication gap between ASL and verbal communicators. Without software-based assists, bridging the communication barriers would be possible only with translators. A drawback of human-mediated translations is not just the cost but also scale – there should be at least as many translators as the number of signers. A computer program and an affordable camera is cost-effective and significantly reduces the number of human translators that are needed.

I have built an application that takes 24 different ASL (American Sign Language) gestures and translates them to text information. As required by the IB program, I worked solo on this project in my 10th grade.

I had found a dataset of ASL images on Kaggle [Akash, 2017]. Rather than sift through the images of ASL gestures pixel-by-pixel and then try to codify rules by hand and enforce them in code, I figured using ML to codify these rules would be far more effective. Google’s open-source TensorFlow supports several model architectures. I used the MobileNetv2 model that is part of TensorFlow’s model garden and one among several models that Google makes available. MobileNetv2 that is a type of convolutional neural network designed for mobile and embedded vision applications. This architecture uses depth-wise, separable convolutions to build more compact deep neural networks and I leverage it in my project given its suitability for mobile and embedded devices.

This was my first attempt at fitting models to high dimensional data. For a while it seemed I was operating in the realm of Murphy’s Law. There was a lot of trial-and-error. Most of the times things simply did not work. But as I started to work on it more, I found that my errors were starting to be much more rewarding. The errors started to point to new paths for experimentations. I learned how to capture images using the Open CV library. I was also able to create a virtual environment to protect the rest of my computer from software changes. I also learned, firsthand, the benefits of transfer learning and weight matrix initializations; I could use a model trained for a different task for a completely different objective.

Bibliography

Akash. ASL Alphabet. Image data set for alphabets in the American Sign Language. 2017. https://www.kaggle.com/datasets/grassknoted/asl-alphabet

bottom of page