An AI model developed by Seoul National University Bundang Hospital (SNUBH) can now detect stress with up to 77.5 percent accuracy—based entirely on a person’s voice.
Trained on samples from more than 100 Korean full-time workers, the deep learning system flags stress by analyzing subtle non-verbal cues like tone, pitch, and breath rhythm. The results, published in Psychiatry Investigation, represent one of the first biosignal-validated voice-based stress models built specifically for a Korean population.
The research team, led by Professor Kim Jeong-hyun of SNUBH’s Department of Public Health Medical Services and supported by SNU’s Institute of New Media and Communications, used ECAPA-TDNN—an AI architecture originally designed for speaker recognition. Participants recorded their voices before and after undergoing a standardized stress-inducing protocol: the Socially Evaluated Cold Pressor Test, which involves hand immersion in ice water while being observed.
To confirm whether stress had been successfully induced, the study combined AI prediction scores with biological and self-reported markers—salivary cortisol and distress thermometer readings. Only data from participants who showed measurable stress responses were used to train and validate the model.
Compared to traditional models like convolutional neural networks and conformers, ECAPA-TDNN consistently delivered higher performance, especially when analyzing free-form speech. The model was trained on 95 subjects and tested on a separate group of 20, identifying stress in 70 percent of them.
Instead of focusing on what people said, the model zeroed in on how they said it—capturing stress-related shifts in vocal tension, rhythm, and tempo. Because it relies only on non-linguistic features, researchers noted the system avoids common sources of bias tied to language fluency, education level, or cultural background. The researchers added that all processing took place locally on device, keeping privacy risks low.
The study was supported by SK Telecom and conducted at both SNUBH and Boramae Medical Center. 115 participants read a neutral essay and responded to casual prompts about their daily lives. Audio recordings were segmented into overlapping four-second chunks and converted into Mel spectrograms, a common feature representation in voice-based AI.
While not yet commercialized, the team said it believes the technology could eventually power real-time stress monitoring in consumer devices. Future iterations may integrate additional biometric inputs—such as heart rate variability or skin conductance—to further boost accuracy, Professor Kim said in a statement.
Related articles
- Seoul National University Bundang Hospital achieves 2,000 cochlear implant surgeries
- SNUBH, certified as 'research-centered hospital,' aims to lead healthcare R&D innovation
- Professor Huh of SNUBH appointed president of Korean Society for Dermatologic Surgery
- SNUBH’s AI breakthrough promises new hope for infertility treatments
- Transformer AI model detects wheezing in chidren with over 90% accuracy: study
- Professor Moon’s maternal-fetal health research picked for long-term state funding
- SNUBH becomes 1st state-run hospital in Korea to perform 20,000 robotic surgeries
- AI model predicts postoperative cardiovascular risk in older adults
