Mansa.v1

Mansa is a 1.4B parameter ASR model optimized for African languages, offering high transcription accuracy and low-latency performance. Some of the key features are:
  • African named entity recognition in English contexts
  • Custom spelling guidance for names and specialized terms.
  • Sentence or Word-level timestamps for audio up to 30 minutes (25MB)
Before you can try out our new model, you need to make sure that you’re using the latest version of our SDK.
pip install spitch>=1.34.0
Ready to give it a spin? Use the code sample below to get started.
from spitch import Spitch
client = Spitch(api_key="YOUR-API-KEY")

response = client.speech.transcribe(
    content=open("YOUR-AUDIO-FILE", "rb"), 
    language="en", 
    model="mansa_v1",
	timestamps="sentence",#Choose either sentence-level transcriptions or word-level transcriptions
	special_words=["Spitch"] # Add your special words here
)
The sample response format is:
SpeechTranscribeResponse(
    request_id="35580dcf-xxxx-4666-8d75-xxxxxxxxxx",
    text=(
        "I'm having some power issues, I won't be available for a bit. "
        "All right. Please let me know when you are back."
    ),
    timestamps=[
        Timestamp(
            start=1.20,
            end=4.64,
            text="I'm having some power issues, I won't be available for a bit."
        ),
        Timestamp(
            start=5.80,
            end=6.72,
            text="All right."
        ),
        Timestamp(
            start=8.08,
            end=9.92,
            text="Please let me know when you are back."
        ),
    ]
)
You can index into this response to fetch specific parameters like text or timestamps.
Mansa.v1 currently supports only English language. Additional language support will be available in upcoming releases.
model
string
“mansa_v1”
timestamp
string
“sentence”, “word”, “none”
special_words
comma separated list
You can use special_words to guide Mansa’s Named Entity Recognition. By entering in a list of entity strings, Mansa will accurately recognize and transcribe those terms in your audio.