C2LC-243 | Fluid Project Issues Archive

Metadata

Source: C2LC-243
Type: Task
Priority: N/A
Status: Done
Resolution: N/A
Assignee: N/A
Reporter
Created: 2020-10-22T10:57:26.548-0400
Updated: 2021-02-10T16:57:10.967-0500
Versions: N/A
Fixed Versions: Coding Env 0.6
Component: N/A

Description

Attachments

Comments

Tony Atkins [RtF] commented 2020-12-09T08:45:28.860-0500

Up to this point, we have used recorded TTS utterances for our announcements. We explored using the voices built into OS X, but the EULA for OS X (for example, Big Sur) expressly forbids commercial use of the voices. I have explored a range of alternatives, and considered others.

One option is simply to use a different voice, one that we have a license for. I have a voice that's based on recordings of my own voice, and tried that.

I also looked for a TTS engine that we can use commercially that is of acceptable quality, which we could simply use with our current approach of generating announcements. As TTS has been a research subject for CMU and others for many years, there are a range of options. Unfortunately, many of the solutions with friendly licensing terms lack the polish of modern commercial efforts, and are (in my opinion) of unusably low quality. Of the products in active development with acceptable license terms, the only one I felt was high enough quality to share with the group was Mimic the TTS engine used by Mycroft.ai.

I also looked at the voices provided by Windows 10, since their EULA does not prohibit us from using commercially.

I have also given thought to other approaches that I wanted to put before the group.
1. We could record our own spoken announcements.
2. We could choose to convert our current announcements into something that screen readers would be able to announce. One approach is to hide an aria-live region off screen and update that with the text to be announced.
I have attached a few screen recordings for comparison:

1. The status quo.
2. "Allison", the US English TTS voice included with OS X.
3. Model Talker, a voice created based on recordings of my voice.
4. Mimic, an open source TTS engine with compatible licensing terms.
5. "Linda", one of the Windows SAPI voices for Canadian English.
Simon Bates commented 2020-12-10T14:42:40.957-0500

Thanks very much Tony. Have you looked at availability of non-English voices in your research? I think we will want a French version of the announcements.

Another possibility would be to try out the browser support for text-to-speech: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

If I remember correctly, Justin and Alan have done quite a lot of work here and may be able to advise regarding suitability, browser support, and issues and such.
Tony Atkins [RtF] commented 2020-12-11T04:56:28.107-0500

Simon Bates, Windows 10 also has Canadian French voices.

I will look into the Web Speech API, I think it could be a better option. The results would be high quality, and I suspect we'd mitigate at least some of the load time issues, as we'd only be loading music and sounds rather than the announcements. Update: I checked, the announcements take up 14% of the volume of sounds downloaded.
Tony Atkins [RtF] commented 2020-12-11T08:54:35.043-0500

Justin gave me some good input, it really does seem like we can / should pivot and use the Web Speech API for announcements. I'm thinking of another bit of feature detection for the very small number of (mostly mobile) browsers and JSDom. Then we'd just switch from using Sampler components to directly announcing things. Interestingly it seems like this might also help when someone is adding a bunch of stuff, we can avoid having the sounds appear to overlap, which I tried but couldn't factor away with the Tone.js approach.