ドキュメント
Text to Speech
目次
- Introduction
- Create Audio
- Uploading or Creating Audio
- Importing a Podcast
- Text to Speech
- Voice Cloning
- Edit your Video
- Selecting Audio Clips
- The Design Editor
- Text and Timed Text
- Template Variables
- API Documentation
- File Upload Endpoints
- Clip Endpoints
- Design Endpoints
- Video Endpoints
- Webhook API
- FAQs
- Videos
- Supported Languages
Don't have a recording? Write one instead! SoundMadeSeen's text-to-speech turns a written script into natural-sounding audio, complete with an automatic transcription so it's ready for subtitles and video the moment it's generated.
Creating a script
Go to Text to Speech in the left side navigation menu. Give your script a name, type your first line, and pick a voice. Then click Create script to open the full script editor, where you build your script line by line.
For each line you can set:
- The text up to 4,096 characters per line.
- The voice every line can use a different voice, which makes multi-speaker scripts (interviews, dialogues, ads with two presenters) easy. Lines are colour-coded by speaker so you can see the conversation at a glance.
- A pause at the end of the line ? none, 1, 3, or 5 seconds.
You can edit or delete any line later, and rename or delete the whole script from the script page.
Choosing a voice
Click the voice selector on any line to browse the voice library. Each voice has a short description (gender, accent, character) and a preview button so you can hear a sample before committing. Use the search box to filter by name or description.
Your own cloned voices appear in the same list, alongside the stock voices.
Emotion tags
Voices powered by our most expressive engine support inline emotion tags. When you've selected one of these voices, a tag toolbar appears above the text box ? click to insert tags like `[excited]`, `[whispers]`, `[laughs]`, `[sighs]`, or `[pauses]` directly into your text, and the voice will perform them.
Generating the audio
- Choose a transcription language from the dropdown in the header (it defaults to your interface language). This is the language used to transcribe the finished audio.
- Click Generate and confirm. Generation takes a few minutes; you'll see live progress as each line is processed.
- When it's done, you're taken straight to the new audio file, with its transcription already prepared - ready to design a video from.
Note: Our usage policy requires you to disclose the use of AI-generated speech wherever you share your videos.
Translating a script
Click the translate (globe) button on a script page to translate the whole script into another language - over 20 languages are available. Translation creates a new copy of your script in the target language, leaving the original untouched, so you can generate localised versions of the same audio.
Translation uses content tokens (the same credits as other AI features); the dialog shows the estimated cost and your remaining balance before you confirm.
Characters and limits
Text-to-speech usage is measured in characters. The counter at the bottom of the script editor shows how many characters your script uses and how many you have remaining on your plan. If you run out, you can buy additional text-to-speech credits from the prompt in the editor. Once the purchase completes, the editor unlocks automatically.