Computer Science Research Projects

PORTFOLIO

Dylan Pallickara

Poudre High School, Fort Collins, Colorado

Contents

Computer Science Research

Research Project Overview

ASL Recognition with Joint Angles

ASL Recognition with Instruction Assistance

Creative Writing: Poetry

Cover Letter

Imperfect Circles

Beauty Dies

Fall

Jazz Standard

Computer-Assisted Recognition of American Sign Language

One million of the US population is deaf, and another 10 million have hearing disabilities [Mitchell, 2006]. Nine out of the ten children who are born deaf are born to parents who hear [NIDCD, 2023]. The inability to communicate with others feeds isolation and withdrawal from social situations. Non-verbal communications, using the American Sign Language (ASL), is the primary mechanism used by those who are deaf –and increasingly, those with hearing disabilities – to communicate. For those who are deaf, signing has benefits such as improved cognitive development, social engagement, and improved access to information [Foggetti, 2023].

The ability to communicate and decipher ASL is key to engaging socially and forming relationships with peers. However, if those who are verbal are unable to comprehend ASL it can be limiting. Because the average American interacts with 12 others over the course of a day [Zhaoyang, 2018], most interactions that individuals who are deaf are likely to have are with those who are verbal. The communication problem stems not from those who sign in ASL, but rather from those that cannot decipher ASL – roughly, 333 million individuals nationwide.

Over the past two years, I have worked on computer-assisted recognition of ASL signs and, more recently, on extensions to support instruction in ASL acquisition by identifying subtle errors in signing. Given the visual nature of ASL, the methods I explored were reliant on images. An alternative approach that has been explored by others involves the use of sensors [Zhou, 2020]. These require the ASL signer to wear gloves equipped with sensors that can be an onerous, expensive burden. Relying on images also allows non-signers to use their cellphone cameras – since over 97% of the US population has one [Pew Research, 2023], the barriers are reduced for everyone. I was also able to find an ASL dataset with a large number of images [Akash, 2017]. The availability of this corpus is what gravitated me to leveraging AI/ML (Artificial Intelligence/Machine Learning) methods that have shown demonstrable promise in learning from the data. The class of AI/ML methods that I have experimented with include Deep Neural Networks (DNNs) and Random Forests.

Challenges

I had to circumvent few challenges to “train” models. All training and data processing had to be done on my personal laptop: a MacBook Pro. DNNs are computationally (and data) hungry; the bigger they are, the hungrier they get. Training models from scratch was infeasible – calibrating a DNNs large number of parameters entailed demands from a data and computational perspective that were impossible to satisfy. That set me off on a journey. I learned about transfer learning that allows a model trained on a different task to be repurposed for a new, related task. I leveraged open-source models that had previously been trained and calibrated for other tasks.

Being limited to using my personal laptop for all experimentation informed several of my choices. Rather than increase model complexity (and the accompanying data requirements), I decided to explore a more data-centric approach. I focused not so much on data volumes but rather on the qualitative aspects of the data such as reduced dimensionality and improved signal-to-noise in the individual datums used for training. I have found that smaller, well-curated datasets not only simplify the model structure, but that such datasets also reduce bias and improve accessibility all while increasing model accuracy. Finally, data curation allows an untethering of models from complex DNNs. The models can be based on traditional machine learning that are often substantially lighter weight with fewer parameters that need to be tuned.

Broad Description of the Work

My work in ASL can be broadly categorized into three chronologically progressive sections. Each built on lessons from the earlier sections. Failures and deficiencies in each section served as the north star for the next project.

ASL Project I: ASL Recognition with Raw Images

A lot of the effort here involved data wrangling, gaining experience with installing the TensorFlow deep learning framework, identifying a base model, and having a functional system. The accuracy of the ASL detection was poor. [READ MORE]

ASL Project II: ASL Recognition with Wireframe

The next approach targeted distilling the ASL dataset further. This was accomplished via transformation of images with ASL hand signs into images with wireframes that encapsulate the joint structure of the metacarpals (palm) and phalanges (distal, intermediate, and proximate) that comprise the fingers. Crucially, this representation whittles away aspects such as skin tone, texture, or finger thickness and is effective in reducing bias: the model would be just as performant with skin tones and textures that were excluded from the original dataset. More importantly, this model was 94% accurate in the detection. [READ MORE]

ASL Project III: ASL Recognition with Joint Angles

Was there a way to further distill the information from the wireframe image? The wireframe image was transformed into a set of joint angles. The use of angles allows the detection to reconcile variations in the size of the palm (metacarpal) or the length of the fingers (phalanges). A consequence of this input distillation was that the classification model (now based on Random Forests) was less complex and much more interpretable. The model was also 97% accurate. [READ MORE]

ASL Project IV: ASL Recognition with Instructional Assistance

ASL Instructional Assistance: With a trove of ASL signs and their distilled representations, this project targeted ASL acquisition. To be useful, any instructional setting, must identify not just what’s wrong, but also why and how it is wrong in a timely fashion. Targeted feedback identifies areas of improvement and when combined with deliberate practice can allow mastery of signs. When a user is trying to sign an ASL; the software first tries to detect that sign. Next, it identifies what were the key angular differences in the attempt. The model currently overlays the users attempt at a sign with the canonical representation of that sign (based on the averaging of joint angles in the reference dataset) and identifies angular differences. [READ MORE]

Working with Prof. Sarath Shreedharan ASL Detection (based on wireframes and angles) and ASL Instructional Assistance were performed under the mentorship and extended internship with Prof. Sarath Sreedharan in the Computer Science Department at CSU. I am presently the only student (graduate or undergraduate) who is doing work on ASL with Prof. Sreedharan.

What did the poet in me discover? My work with ASL was a petri dish for how networks “learn”. Each input subtly adjusts the neural network’s matrix of synaptic weights. Each input alters the weights (or mutates the DNA) of the deep network and leaves a trace of itself for posterity. I also saw how large language models (or LLMs) are exploitative of creative work by artists by “scraping content”. Companies such as Google, OpenAI etc. display transparency for the structure of their networks; the data used to train their networks is a closely guarded secret.

I feel that the discourse around AI —with its emphasis on singularity and sentience — is disconnected from reality. The harms be it perpetuating bias, copyright violations, exploitation of creative work, are all here (now!) with little recourse to redress them. This exploitation of creative work will be perpetuated because newer versions of AI models, use the weights of earlier versions as the starting point. I want to research methods that force AI models to divulge works they have “scraped” and the bias/inequities they are inadvertently perpetuating.

ASL Scope

In my projects, the ASL sign detection is restricted to the numbers 0 through 9 as well as every alphabet except for “J” and “Z” that are not amenable for classifications using still images. The code and datasets have all been made publicly available via GitHub and Kaggle respectively [LINKS]. I am unfamiliar with the mobile app development process, so the models have not been made available in a mobile app — a pre-requisite for traction.

Peer-Reviewed Research Publication and Computer Science Competition

A student research poster based on this research activities has been accepted at the 38th Annual AAAI Conference on Artificial Intelligence.

Dylan Pallickara, and Sarath Sreedharan, A Wireframe-based Approach for Classifying and Acquiring Proficiency in the American Sign Language, the 38th Annual AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024. Abstract

I have participated in the Google International ASL Fingerspelling Competition in August 2023. My software placed 992nd . The prizes was totaling $200,000 and total number of entries was 19,596.

Bibliography

Bantupalli, Kshitij, and Ying Xie. "American sign language recognition using deep learning and computer vision." 2018 IEEE International Conference on Big Data (Big Data).

Fernanda Foggetti. The Benefits of Sign Language for Children with Hearing Loss. Available at: https://www.handtalk.me/en/blog/the-benefits-of-sign-language-for-children-with-hearing-loss/#:~:text=Communicating%20in%20Sign%20Language%20has,for%20children%20and%20their%20development . Accessed 2023.

John, J. , Sherif, B. 2022. Hand Landmark-Based Sign Language Recognition Using Deep Learning. In Machine Learning and Autonomous Systems: Proceedings of ICMLAS 2021 (pp. 147-157). Singapore: Springer Nature .

Mitchell RE. How many deaf people are there in the United States? Estimates from the Survey of Income and Program Participation. J Deaf Stud Deaf Educ. 2006 Winter;11(1):112-9. doi: 10.1093/deafed/enj004. Epub 2005 Sep 21. PMID: 16177267.

National Institute on Deafness and Other Communication Disorders (NIDCD). Quick Statistics About Hearing. Available at: https://www.nidcd.nih.gov/health/statistics/quick-statistics-hearing. Accessed 2023.

Pangestu, Y.; Heryadi, Y.; Suparta, W.; Arifin, Y. 2022. The Deep Learning Approach For American Sign Language Detection. 2022 IEEE Creative Communication and Inno-vative Technology (ICCIT), Tangerang, Indonesia, 2022, pp. 1-5 doi: 10.1109/ICCIT55355.2022.10118626.

Pew Research Center. Mobile Fact Sheet. The vast majority of Americans – 97% – now own a cellphone of some kind. Available at: https://www.pewresearch.org/internet/fact-sheet/mobile/#:~:text=The%20vast%20majority%20of%20Americans,a%20cellphone%20of%20some%20kind. Accessed 2023.

Ravikiran, J.; Kavi Mahesh.; Suhas Mahishi, R.; Dheeraj, S. Sudheender.; and Nitin V. Pujari. 2009. Finger detec-tion for sign language recognition. Paper presented at the International MultiConference of Engineers and Computer Scientists 2009 Vol I IMECS 2009, Hong Kong, March 18 - 20.

Zhaoyang R, Sliwinski MJ, Martire LM, Smyth JM. Age differences in adults' daily social interactions: An ecological momentary assessment study. Psychol Aging. 2018 Jun;33(4):607-618. doi: 10.1037/pag0000242. Epub 2018 Apr 30. PMID: 29708385; PMCID: PMC6113687.

Zhou, Z., Chen, K., Li, X. et al. Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nat Electron 3, 571–578 (2020). https://doi.org/10.1038/s41928-020-0428-6