How to Use

Text to Speech

The main page is where you convert written text into spoken audio. Audio streams in real time as it generates, so you don't have to wait for the full file before you start listening.

Entering Text

Type or paste your text into the large input area on the left side of the page. The counter below the box shows how many characters you've used out of the 2,500 character limit.

You can type directly, paste from your clipboard, or edit the pre-filled example text.
The limit is 2,500 characters per generation. For longer content, split it into sections.
Punctuation matters. Commas create short pauses, periods create longer ones. Use them intentionally to shape the rhythm of the speech.

Choosing a Voice

The voice selector is on the right side of the page. Click the dropdown to see all available voices. The default voice is Alba.

Built-in voices include Alba, Anna, Eponine, Fantine, George, and Jean, each with a distinct tone and style.
Click the small play button next to any voice name to preview it before selecting.
Community voices and your own cloned voices also appear in this list once selected from their respective pages. An account is required to use them.

No account needed to use the built-in voices. Sign up to unlock community voices and voice cloning.

Generating Audio

Once you've entered your text and chosen a voice, click Generate audio. The audio player will appear below the form and start playing as the audio streams in.

The speaker icon button next to Generate toggles auto-play on or off. When enabled, audio starts playing automatically as it streams.
The X button cancels an in-progress generation at any time.
Once generation is complete, a Download button appears in the output section. Click it to save the audio as a WAV file.
A status bar below the generate button shows real-time progress during generation.

Advanced Settings

Click Advanced Settings at the bottom of the main card to expand fine-grained controls over how audio is generated. These are optional; the defaults work well for most use cases.

Temperature (default 0.70)

Controls how random or expressive the output is. Lower values (closer to 0.1) produce consistent, neutral speech. Higher values (up to 2.0) add more variation and expressiveness but can occasionally sound unstable.

LSD Decode Steps (default 1)

The number of decoder iterations. Higher values improve audio quality but increase generation time. Start at 1 for speed; increase to 3–5 for higher fidelity output.

EOS Threshold (default -4.0)

Controls when the model decides speech has ended. More negative values (e.g. -8.0) produce longer audio before stopping. Less negative values (closer to 0) cause earlier cutoffs. Adjust if audio ends too abruptly or runs too long.

Noise Clamp (optional)

Caps the noise magnitude during generation to reduce audio artifacts. Leave blank to disable. Try values between 1.0 and 3.0 if you hear distortion or crackling in the output.

Frames After EOS (optional)

Adds extra audio frames after the model detects the end of speech. Useful if the last word or syllable gets cut off. Try values between 8 and 24 if audio ends too abruptly.

Playback Speed (default 1.0x)

Adjusts the playback rate of the audio player. This only affects how fast the audio plays back in the browser; it does not change the generated file. Range is 0.5x (slower) to 2.0x (faster).

Logged-in users can save their preferred settings with Save Preferences and restore defaults with Reset to Defaults.

Voice Cloning

Voice cloning lets you upload a short audio sample and use it as a reference for all future generations. The model captures the speaker's tone, pace, and character from the sample and applies it to any text you generate. Requires an account.

How to upload a voice

1

Go to My Voices

Navigate to My Voices from the navbar. You can store up to 12 voices per account. The page also shows stats for your public voices: likes, uses, and plays from the community.

2

Click Upload

Click the Upload button in the top right of the voices list. A modal will open with the upload form.

3

Fill in the voice details

The upload form has several fields:

Avatar (optional) profile image for the voice. Click the circle on the left to upload a JPEG, PNG, WebP, or GIF (max 5 MB).
Voice Name (required). Give it a clear, descriptive name so you can identify it later (e.g. "My Podcast Voice").
Description (optional). Describe the voice's character or intended use. Helpful if you share it publicly.
Tags (required, select 2 to 5). Choose from: Calm, Energetic, Deep, Bright, Warm, Authoritative, Soft, Narrative, Conversational, Other. Tags help others find your voice if you make it public.
Gender (required). Select Male, Female, or Other. Used for filtering on the Community Voices page.
Audio File (required). Any common audio format is accepted. The sample must be between 10 and 30 seconds long. Files larger than 10 MB are rejected.

4

Trim if needed

If your audio file is longer than 30 seconds, a trim tool opens automatically. Drag the start and end handles on the waveform to select a 10–30 second clip. Use the Preview button to listen to your selection before confirming.

5

Submit and use

Click Upload Voice. Once saved, the voice appears in your library and immediately becomes available in the voice dropdown on the TTS page.

Managing your voices

Making a voice public

Each voice card has a toggle to make it public or private. Public voices appear on the Community Voices page for anyone to browse and use. Private voices are only visible to you.

Deleting a voice

Click the delete button on any voice card to permanently remove it. This also removes the audio file from the server. Deletion cannot be undone, and any generations that used this voice will no longer be able to reference it.

Voice limit

Each account can hold up to 12 voices. The counter next to "Your Voices" shows how many you've used. Delete unused voices to free up slots.

Voice stats

The stats bar at the top of the page shows aggregate totals for your public voices: how many likes, uses (times selected for a generation), and plays (times the sample was previewed) they've received from the community.

Tips for a good sample

Record in a quiet room

Avoid fans, air conditioning, traffic, or any ambient noise. Even low-level hum degrades clone quality.

Single speaker only

The sample must contain only one voice. Music, sound effects, or other speakers in the background will confuse the model.

Speak naturally

Use a consistent, natural pace. Avoid whispering, shouting, or exaggerated emotion. The model will try to replicate whatever style it hears.

Trim silence from the edges

Remove any silence or breath sounds at the start and end of the clip. Keep only the spoken content within the 10–30 second window.

Denoise before uploading

Run your recording through a free tool like Adobe Podcast Enhance to remove background noise before uploading. This consistently produces better clone quality.

Voice cloning requires an account. The audio sample is stored securely and only used as a reference for your generations. You can delete it at any time from the My Voices page.

Community

The Community page is a masonry feed of audio generations shared publicly by users. Browse, listen, interact, and share your own work with everyone on the platform.

Browsing the feed

The feed loads automatically and adds more cards as you scroll down. Each card shows the text that was generated, the audio player, the voice used, and the creator's username with a timestamp.

Playing audio

Each card has a built-in audio player. Press play to listen directly in the browser. The play count on the card increments the first time you play each generation in a session.

Voice badge

The badge in the top-right of each card shows which voice was used. Blue badges are built-in preset voices (e.g. Alba, George). Purple badges are community or cloned voice samples. If the voice had a cover image uploaded, it appears as a circular avatar on the card.

Tag pills

Colored tag pills below the audio player show the tags associated with the voice used in that generation (e.g. Calm, Narrative, Deep). These come from the voice sample's metadata.

Gender pill

A small colored pill next to the voice badge shows the gender of the voice: blue for Male, pink for Female, yellow for Other. Only appears when the voice has a gender set.

Sorting and filtering

Use the controls at the top of the page to narrow down what you see. All filters apply together and reload the feed instantly.

Sort By

Latest: newest generations first. Default view.
Most Downloaded: sorted by how many times the audio has been downloaded.
Most Liked: sorted by total likes received.

Time Range

Combine with any sort to limit results to a specific window: Today, This Week, This Month, This Year, or All Time.

For example: "Most Liked this Week" shows the most-liked generations from the past 7 days.

Filter by Tag

Click any tag pill (Calm, Energetic, Deep, Bright, Warm, Authoritative, Soft, Narrative, Conversational, Other) to show only generations that used a voice with that tag. Click the same tag again or the Clear link to remove the filter. Only one tag can be active at a time.

Filter by Gender

Click Male, Female, or Other to show only generations that used a voice of that gender. Click the same button again to deselect. Can be combined with a tag filter.

Interacting with generations

Logged-in users can interact with any generation card. Guest users can see the counts but cannot take actions.

Like

Click the heart icon to like a generation. The icon fills red and the count updates immediately. Click again to unlike. Likes are visible on the creator's profile stats.

Save

Click the bookmark icon to save a generation to your collection. The icon fills blue when saved. Saved generations are accessible from your profile. Click again to unsave.

Download

Click the download icon (bottom-right of the card) to save the audio as a WAV file. Each download increments the download counter shown on the card.

Report

Click the flag icon to report a generation. A modal opens where you select a reason: Spam, Inappropriate Content, Copyright Violation, Harassment, or Other. You can add optional details before submitting. You can only report each generation once.

Viewing other users

Click on any username or avatar in a generation card to open a small popover menu with two options:

View Profile

Opens the user's public profile page, where you can see all their public generations, their saved voices, and their activity stats.

Send Message

Opens the chat bubble to start a direct message conversation with that user. Requires an account. The popover does not appear on your own cards.

Generations are private by default. To share one publicly on the community feed, you need to mark it as public. There are two ways to do this:

1

From the History page

Go to History, find the generation you want to share, and toggle its visibility to public. It will appear in the community feed immediately.

2

From your Profile

Your profile page lists all your generations. You can toggle public/private visibility for each one from there as well.

Browsing and listening is available to everyone. Liking, saving, downloading, reporting, and messaging require an account.

Community Voices

The Community Voices page is a library of voice samples shared publicly by other users. You can browse, preview, and use any of them for your own generations.

Browsing and previewing

Filter voices by gender or tags to find the right fit for your content.
Click the play button on any voice card to preview the sample audio before using it.
Click Use this voice to select it and be taken to the TTS page with that voice pre-loaded.
Like voices to save them for quick access later. You can also report voices that violate community guidelines.

An account is required to use community voices. Browsing and previewing is available to everyone.

History

Every audio generation you make while logged in is saved to your History page. You can replay, download, or delete past generations at any time.

Each entry shows the text used, the voice selected, and the date it was generated.
Click the play button on any entry to listen to it directly in the browser.
Use the download button to save any past generation as a WAV file.
Generations made without an account are not saved. Sign in before generating to keep a record.

Account & Settings

Manage your account details and preferences from the Settings page, accessible from the user dropdown in the navbar.

Profile

Update your username, display name, and profile picture. Your public profile is visible to other users on the community pages.

Generation Preferences

Set your default advanced settings so they're pre-loaded every time you open the TTS page. Saves you from adjusting sliders on every session.

Blocked Users

Manage users you've blocked. Blocked users cannot interact with your content and their content is hidden from your feeds.

Account Security

Change your password and manage login credentials from the security section of settings.

Tips & Tricks

Use punctuation to control pacing

Commas add brief pauses, periods add longer ones. Ellipses can create dramatic pauses. Experiment with punctuation to get the rhythm you want.

Keep generations short for best quality

Shorter texts (under 500 characters) tend to produce more accurate, natural-sounding results. For long scripts, break them into paragraphs and generate each separately.

Preview voices before committing

Use the small play button next to each voice in the dropdown to hear a sample before selecting. Different voices suit different content; a narrator voice won't always work for dialogue.

Spell out numbers and abbreviations

The model handles most numbers and abbreviations well, but for unusual cases, spelling them out (e.g. "forty-two" instead of "42") can improve pronunciation accuracy.

Save your preferred settings

Once you find advanced settings that work well for your use case, save them as your defaults. They'll be pre-loaded every time you open the TTS page.

Use community voices for variety

The community voice library grows over time. Check back regularly for new voices; you might find one that's a perfect fit for a project without needing to clone your own.