Skip to main content
Also known as text-to-speech (TTS), speech generation/synthesis is integral part to modern AI systems. We have built this endpoint with strong support for African languages.

Request

The generate() function can be used to generate speech. The following parameters are available:

Parameters

  • text (required) - The text you want to convert to speech
  • language (required) - Language code: en, yo, ha, ig, am
  • voice (required) - Voice ID to use for generation
  • model (optional) - Model to use, defaults to legacy
  • format (optional) - Output audio format, defaults to wav

Audio Formats

The format parameter supports the following audio formats:
  • wav (default) - Standard wave format
  • mp3 - MPEG Layer III audio
  • ogg_opus - OGG container with Opus codec
  • webm_opus - WebM container with Opus codec
  • flac - Free Lossless Audio Codec
  • pcm_s16le - Raw PCM 16-bit little-endian
  • mulaw - μ-law encoded audio
  • alaw - A-law encoded audio
Examples are provided below as a guide for you.

Best Practices for Use

  • We highly recommend that you perform tone-marking first before TTS. This allows the model to pronounce the words properly during speech generation.
  • Make sure your text has correct punctuation before sending it for speech generation to achieve more natural and accurate output.
  • Not all voices work for all languages. Ensure you select the voice that matches the language of your choice. More info on voices can be found on the Voices page

Response

The response for speech generation is in bytes.
  • The Content-Type is audio/wav
  • The content is streamed back to the caller.
  • The file type of the generated audio is wav. If you use the streaming interface (Python SDK), you can start to take action on the byte chunks, e.g. stream to file.

Choosing a Voice

We currently have 8 characters with unique voices for the supported languages. Each of these characters has unique attributes, we think you will find them fun to use. Feel free to try them out and let us know which one you love the most. 😉

Language Support

The speech generation model supports the following languages:
  • English: en
  • Hausa: ha
  • Igbo: ig
  • Amharic: am
  • Yoruba: yo

Examples

Python
import os
from spitch import Spitch

os.environ["SPITCH_API_KEY"] = "YOUR_API_KEY"
client = Spitch()

with open("new.mp3", "wb") as f:
    response = client.speech.generate(
        text="Bawo ni ololufe mi?",
        language="yo",
        voice="sade",
        format="mp3"
    )
    f.write(response.read())