Year: 2025 | Month: December | Volume 12 | Issue 2
Whispering in Sylheti Language
Swapnajeet Das
Amit Kumar Roy
Bijay Kumar Singh and Bipul Syam Purkayastha
DOI:10.30954/2348-7437.2.2025.6
Abstract:
Automatic Speech Recognition (ASR) converts the spoken words/sentence to text, which allows voice activation features, accessibility, and real-time translation to all. In low resource languages, ASR opens the
door to digital inclusion, heritage conservation, and community-based data collection in areas where there were previously limited resources. We consider an ASR system of the Sylheti language that was created on
the Whisper model of Open AI. Sylheti is a language with low resources, used in India and Bangladesh, but there are no significant digital and linguistic resources, which makes it challenging to develop the ASR. A hybrid dataset was developed to fill this gap by the addition of a custom Sylheti corpus and Common- Voice dataset Bengali corpus. The audio information was pre-processed by removing noise, cutting, and matching the audio with written transcriptions. The small model of the Whisper was also optimized with the help of the combined dataset, as its model is based on the transformer-based encoder decoder architecture, which was trained on the task of multilingual transcription. Word Error Rate (WER) metrics is used to evaluate trained model. The system got a WER of 66.8% which proved to be a good system in transcription accuracy of Sylheti speech. The system was implemented via an interactive Gradio interface of interactive transcription. Its results confirm Whisper to be flexible towards low-resource languages and the possibility of facilitating linguistic inclusivity, cultural continuity, and accessibility.
Print This Article
Email This Article to Your Friend