Getting Alexa to Speak Pirate - Phonetics and Interjections

Pirate Monkeyness Alexa skill

My go to project for testing out any development on a new platform or service is either a pirate translator or pirate insult generator. The first version was written as an OS X Dashboard widget in 2006. Since then, it has been written as a Yahoo widget, Google gadget, iOS app twice, Javascript in a web page, API as a CGI in Perl, API using PHP, API using AWS Lambda and Node.

The latest incarnation is as an Amazon Alexa skill (submission pending)! That's right. You can now ask your Amazon Echo to insult you like a pirate!

When I wrote it, I knew the Alexa voice wouldn't speak with a pirate-sounding accent, but I had no idea how far off it would be. Not only does she pleasantly recite something that should be yelled with spittle flying off your lips, but her pronunciation of pirate words is atrocious. Apparently, the Alexa Skills team (the group at Amazon that makes it possible for developers to extend Alexa) didn't plan on her speaking like a pirate at all. Go figure.

Fortunately, the Alexa Skills team did build in some pretty nifty ways of controlling how Alexa pronounces words by using Speech Synthesis Markup Language (SSML). It won't get Alexa talking with a drunken slur, but it will let you tweak her pronunciation into something that sounds more natural. In the plain text you would have translated to speech, you can include tags to indicate how a word or phrase should be spoken. For instance, 

<say-as interpret-as="spell-out">hello</say-as>

will spell out the word hello instead of speaking it. When the text is displayed in the Alexa app, it can show the contents of the tag without showing the tag itself.

Fancier still, the "w" tag lets you indicate which pronunciation of a word would be appropriate. In "I read a book yesterday" or "I read a book when I want to relax", read is pronounced two different ways. If Alexa doesn't figure it out on her own, you can use the "w" tag and clue her in to which she should use, the present verb (VB) or the past participle verb (VBD).

I <w role="ivona:VB">read</w> a book when I want to relax.
I <w role="ivona:VBD">read</w> a book yesterday.

Other options include NN for noun or SENSE_1 to pick the alternate pronunciation of a word, as in "bass", the fish, instead of "bass", the musical clef.

For my use in pirate insults, I needed even more control since I was giving her non-standard words that aren't quite English. For instance, thar as in "Thar she blows" gets pronounced by Alexa with a soft th (a voiceless dental fricative) like in thing when it should be a hard th (voiced dental fricative) like in there. Enter the phoneme tag! It's as simple as spelling out the pronunciation in the International Phonetic Alphabet (IPA).

<phoneme alphabet="ipa" ph="ðɑː(ɹ)">Thar</phoneme> she blows!

Ok. Maybe it's not so simple since it looks like gibberish; it doesn't even use Roman characters! I wasted a lot of time trying to figure out exactly what characters to use with little luck until I figured out a quick and easy trick for getting the pronunciation I wanted. Wiktionary! I would look up a word like there and copy the first character of the IPA pronunciation that has the "th" I want, then look up far and copy the rest the IPA pronunciation. I didn't actually have to understand IPA much at all. It does help to know that an apostrophe shows emphasis and is not a pronounced character and shows up as the first character in a lot of words.

Armed with the phoneme tag, I happily tweaked pronunciations in ways no one else will ever understand or care about. That was an improvement, but she still sounded awkward. She pronounced things better, but she didn't do it with feeling. At this point, I wouldn't have expected more, but there was one more surprise from the Alexa team. Last month, they introduced interjections, alternate recordings of many popular interjections that have some emotion to them! That's right. I could now get Alexa to wake up and give me an Argh like she means it.

<say-as interpret-as="interjection">Argh!</say-as>

Granted, the interjection list is primarily non-pirate, but there are three notable exceptions, ahoy, argh, and blast. It's not much for speaking pirate, but it's a good step in the right direction and will hopefully expand in the future. Maybe we'll eventually see a good avast or yo ho ho. Take a look at the Alexa interjections page. It is a fun way to waste some time listening to all her new interjections. Argh!

comments powered by Disqus