Kang, Ruoxin (2025) Streaming Speech Recognition for Smart Glasses: A Fine-tuning Approach Based on Pre-trained FastConformer. Master thesis, Voice Technology (VT).
|
PDF
RuoxinKangthesis.pdf Download (400kB) | Preview |
Abstract
Real-time automatic speech recognition (ASR) on wearable devices such as smart glasses is a key technology for accessible communication, particularly for hearing-impaired users. However, current streaming ASR systems still face key challenges when operating under strict latency constraints, especially in noisy, dynamic environments where smart glasses are most useful for multiple sensors. This study investigates domain-adaptive fine-tuning of a pre-trained cache-aware FastConformer model to improve the latency-accuracy trade-off and speaker attribution performance in multi-channel streaming ASR. The study builds on the CHiME-8 Task 3 first baseline system, retaining its architecture while applying advanced and widely-used fine-tuning strategies, Cosine Annealing learning rate scheduling and Layer-wise Learning Rate Decay (LLRD), to optimize fine-tuning on the in-domain MMCSG dataset. Conversational speech in wearable contexts presents distinct acoustic and streaming challenges due to the mobility of the device and the presence of multi-channel microphone arrays. Results demonstrate consistent improvements over the baseline across all evaluated latency thresholds, achieving approximately 10\% relative word error rate (WER) reduction without increasing model complexity or violating streaming constraints. These findings highlight the effectiveness of advanced fine-tuning strategies for adapting pre-trained ASR models to realistic wearable applications, paving the way for more accurate and responsive streaming ASR systems to support accessible, real-world communication.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Nayak, S. |
Date Deposited: | 16 Jun 2025 11:13 |
Last Modified: | 16 Jun 2025 11:13 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/663 |
Actions (login required)
![]() |
View Item |