Article Audio Player
pausedExperiments on creating an *audio markup* in text to speech
Notes
In this project, over a couple of trials far and wide, I tried to build an app to read articles aloud from the web.
The concept was not new, you could read pages from the browser - but quality and lack of background playing was annoying. Other apps did it well, like Reader or more recently ElevenReader from Eleven Labs.
However, this still sparked my interest as I follow closely the evolution of opensource TTS technology. I leveraged kokoro TTS to build multiple forms of the service - the latest version was a native expo app.
But the interesting bit was experimenting around an audio markup. When we write structured text, we use formatting and positioning clues (markup) to add paratextual meaning alongside it. Titles, paragraphs, quote marks, bold and italic - all those impact the meaning of the words themselves, and our understanding of the content as a whole. However once read aloud, we lose this signal. The evolution I tried out was to use different voices, sounds, and pauses to carry on the meaning of the markup. Titles and quotes would be a female sounding voice, paragraphs were male. Links, stripped of their urls, would play a bell sound right after its label to alert listeners there was more to it. Paragraphs would have specific pauses, and smaller ones for lists. I even swapped similar but slightly different tone for bold or italic.
Advanced TTS will probably be able, one day, to carry all those nuances as a skilled public reader does naturally.