Text to Speech (JA)

Don't have a recording? Write one instead! SoundMadeSeen's text-to-speech turns a written script into natural-sounding audio, complete with an automatic transcription so it's ready for subtitles and video the moment it's generated.

Creating a script

Go to Text to Speech in the left side navigation menu. Give your script a name, type your first line, and pick a voice. Then click Create script to open the full script editor, where you build your script line by line.

For each line you can set:

The text up to 4,096 characters per line.
The voice every line can use a different voice, which makes multi-speaker scripts (interviews, dialogues, ads with two presenters) easy. Lines are colour-coded by speaker so you can see the conversation at a glance.
A pause at the end of the line ? none, 1, 3, or 5 seconds.

You can edit or delete any line later, and rename or delete the whole script from the script page.

Choosing a voice

Click the voice selector on any line to browse the voice library. Each voice has a short description (gender, accent, character) and a preview button so you can hear a sample before committing. Use the search box to filter by name or description.

Your own cloned voices appear in the same list, alongside the stock voices.

Emotion tags

Voices powered by our most expressive engine support inline emotion tags. When you've selected one of these voices, a tag toolbar appears above the text box ? click to insert tags like `[excited]`, `[whispers]`, `[laughs]`, `[sighs]`, or `[pauses]` directly into your text, and the voice will perform them.

Generating the audio

Choose a transcription language from the dropdown in the header (it defaults to your interface language). This is the language used to transcribe the finished audio.
Click Generate and confirm. Generation takes a few minutes; you'll see live progress as each line is processed.
When it's done, you're taken straight to the new audio file, with its transcription already prepared - ready to design a video from.

Note: Our usage policy requires you to disclose the use of AI-generated speech wherever you share your videos.

Translating a script

Click the translate (globe) button on a script page to translate the whole script into another language - over 20 languages are available. Translation creates a new copy of your script in the target language, leaving the original untouched, so you can generate localised versions of the same audio.

Translation uses content tokens (the same credits as other AI features); the dialog shows the estimated cost and your remaining balance before you confirm.

Characters and limits

Text-to-speech usage is measured in characters. The counter at the bottom of the script editor shows how many characters your script uses and how many you have remaining on your plan. If you run out, you can buy additional text-to-speech credits from the prompt in the editor. Once the purchase completes, the editor unlocks automatically.

ドキュメント

Text to Speech

目次

Creating a script

Choosing a voice

Emotion tags

Generating the audio

Translating a script

Characters and limits

機能

サポート

ソリューション

会社

フィードバックを送信