Skip to main content
Also known as speech-to-text (STT), transcription is the process of converting speech to text. We have built this endpoint with strong support for African languages.

Request

The transcribe() function can be used to transcribe audio. Pass either a url or a content to the transcribe function.

Parameters

  • language (required) - Language code: en, yo, ha, ig, am
  • content (optional) - Audio file content (binary data)
  • url (optional) - URL to the audio file
  • model (optional) - STT model to use: mansa_v1, legacy, human. Defaults to legacy
  • special_words (optional) - Custom words to help with recognition accuracy
  • timestamp (optional) - Timestamp granularity: sentence, word, none. Defaults to none
You must provide either content or url, but not both.

STT Models

The model parameter allows you to choose different speech-to-text models:
  • legacy (default) - Standard transcription model
  • mansa_v1 - Enhanced model with better accuracy for African languages
  • human - High-quality model optimized for human speech patterns

Timestamp Options

The timestamp parameter controls the level of timing information returned:
  • none (default) - No timestamp information
  • sentence - Timestamps for each sentence
  • word - Timestamps for each individual word
Examples are provided below as a guide for you.

Best Practices for Use

  • You can provide either the content (file) or url (str), but do not provide both.
  • The maximum file size is 25MB, we will support larger sizes in the future.
  • We only support mp3, wav, m4a, and ogg file formats.
  • If you provide url, ensure that access to the file is not blocked by authentication.
  • When transcribing, you should use the language code (e.g. en, yo, ig) and not the full text.

Response

The response for speech generation is in bytes.
  • The Content-Type is application/json
  • A request_id is returned for issue resolution with our support team.
Below is an example of a response from the transcription endpoint.
    {
      "request_id": "86095cea-77d5-45ba-a093-0f800ac2c7df",
      "text": "Báwo ni olólùfẹ́ mi?"
    }

Language Support

Our speech-to-text model supports the following languages:
  • Hausa: ha
  • Igbo: ig
  • Yoruba: yo
  • Amharic: am
  • English: en
More info on languages can be found on the Languages page

Examples - file

Python
import os
from spitch import Spitch

os.environ["SPITCH_API_KEY"] = "YOUR_API_KEY"
client = Spitch()

with open("new.wav", "rb") as f:
    response = client.speech.transcribe(
        language="yo",
        content=f.read(),
        model="mansa_v1",
        timestamp="sentence"
    )
print(f"Text: {response.text}")

Examples - url

Python
import os
from spitch import Spitch

os.environ["SPITCH_API_KEY"] = "YOUR_API_KEY"
client = Spitch()

response = client.speech.transcribe(
    language="yo",
    url="https://myfilelocation.com/file.mp3",
    model="human",
    special_words="Spitch API"
)
print(response.text)
I