Also known as speech-to-text (STT), transcription is the process of converting speech to text. We have built this endpoint with strong support for African languages.

Request

The transcribe() function can be used to transcribe audio. Pass either a url or a content to the transcribe function. Examples are provided below as a guide for you.

Best Practices for Use

  • You can provide either the content (file) or url (str), but do not provide both.
  • The maximum file size is 25MB, we will support larger sizes in the future.
  • We only support mp3, wav, m4a, and ogg file formats.
  • If you provide url, ensure that access to the file is not blocked by authentication.
  • When transcribing, you should use the language code (e.g. en, yo, ig) and not the full text.

Response

The response for speech generation is in bytes.
  • The Content-Type is application/json
  • A request_id is returned for issue resolution with our support team.
Below is an example of a response from the transcription endpoint.
    {
      "request_id": "86095cea-77d5-45ba-a093-0f800ac2c7df",
      "text": "Báwo ni olólùfẹ́ mi?"
    }

Language Support

Our speech-to-text model supports the following languages:
  • Hausa: ha
  • Igbo: ig
  • Yoruba: yo
  • Amharic: am
  • English: en
More info on languages can be found on the Languages page

Examples - file

Python
import os
from spitch import Spitch

os.environ["SPITCH_API_KEY"] = "YOUR_API_KEY"
client = Spitch()

with open("new.wav", "rb") as f:
    response = client.speech.transcribe(
        language="yo",
        content=f.read()
    )
print(f"Text: {response.text}")

Examples - url

Python
import os
from spitch import Spitch

os.environ["SPITCH_API_KEY"] = "YOUR_API_KEY"
client = Spitch()

response = client.speech.transcribe(
    language="yo",
    url="https://myfilelocation.com/file.mp3"
)
print(response.text)