Converting text to human-like speech

You can convert text to human-like voice speech and add the .mp3 audio object to your title. Select from dozens of male and female voices that can produce native sounds for languages such as Mandarin Chinese, Italian, Brazilian Portuguese, and Spanish. You can use plain text to compose the intended text or format the intended text using the Speech Synthesis Markup Language (SSML). SSML is a XML-based markup language for speech synthesis applications that provides additional control over how the speech is generated. For example, with SSML, you can include a long pause, change the speech rate or pitch.

An Internet connection is required.

A mp3 file can be generated with a sample rates of 8000Hz, 16000Hz, 22050Hz, and 24000Hz .

The Publisher's SSML support is powered by Amazon's Text-to-Speech (TTS) cloud service called Amazon Polly. Amazon Polly uses a subset of the SSML markup tags that are defined by Speech Synthesis Markup Language (SSML) Version 1.1, W3C Recommendation.

Amazon Polly supports the following SSML tags:

Adding a Pause (<break>)
Emphasizing Words (<emphasis>)
Specifying Another Language for Specific Words (<lang>)
Placing a Custom Tag in Your Text (<mark>)
Adding a Pause Between Paragraphs (<p>)
Using Phonetic Pronunciation (<phoneme>)
Controlling Volume, Speaking Rate, and Pitch (<prosody>)
Setting a Maximum Duration for Synthesized Speech (<prosody amazon:max-duration>)
Adding a Pause Between Sentences (<s>)
Controlling How Special Types of Words Are Spoken (<say-as>)
Identifying SSML-Enhanced Text (<speak>)
Pronouncing Acronyms and Abbreviations (<sub>)
Improving Pronunciation by Specifying Parts of Speech (<w>)
Adding the Sound of Breathing (<amazon:auto-breaths>)
Adding Dynamic Range Compression (<amazon:effect name="drc">)
Speaking Softly (<amazon:effect phonation="soft">)
Controlling Timbre (<amazon:effect vocal-tract-length>)
Whispering (<amazon: effect name="whispered">)

Unsupported SSML tags in input text generate errors. For details about converting text to speech, see Voices in Amazon Polly and SSML Tags Supported by Amazon Polly.

You can preview the voice with the current text before you add the new audio object to your title. The audio object is added at your current location in the title.

To add human-like speech that is converted from text:

In the Title Explorer, select the location in which you want to add the audio.
Do one of the following:

On the Insert ribbon, select the drop-down list under Audio in the Add Media group and select Text to Speech.
On the Tools ribbon, select Text to Speech in the Create New group.

“The Text to Speech dialog box opens.”

In the Options box, use the Language pull-down list to select the language of the speech and use the Voice pull-down list to select the type of voice for the language of the speech. For example, to select a female voice in US English, select English, US and Ivy, Female. Select Add Closed Captions to automatically generate a WebVTT Closed Caption file with the resulting audio.

“Text edits you make directly on the resulting WebVTT file will not sync back to the entered text in the Text to Speech dialog.”

Select Preview to preview the language and voice configuration you selected.

To supply the text you want converted from plain text, enter the text on the Plain Text tab in the box on the right side of the Text to Speech dialog box. If you intend on formatting the text with SSML, use the SSML tab.

“Text edits you make on the SSML tab will not sync to the Plain Text tab, and vice versa.”

Select Generate Audio.

The .mp3 audio object is added to the title. If Add Closed Captions was

selected, the WebVTT file is also added to the title and associated with the audio.

“To make any changes to the Text to Speech WebVTT captions file use the Text to Speech tool.”

This article last reviewed Oct, 2020. The software may have changed since the last review. Please visit our Release Notes to learn more about version updates.