Mansa is a 1.4B parameter ASR model optimized for African languages, offering high transcription accuracy and low-latency performance.Some of the key features are:
African named entity recognition in English contexts
Custom spelling guidance for names and specialized terms.
Sentence or Word-level timestamps for audio up to 30 minutes (25MB)
Before you can try out our new model, you need to make sure that you’re using the latest version of our SDK.
Copy
pip install spitch>=1.34.0
Ready to give it a spin? Use the code sample below to get started.
Copy
from spitch import Spitchclient = Spitch(api_key="YOUR-API-KEY")response = client.speech.transcribe( content=open("YOUR-AUDIO-FILE", "rb"), language="en", model="mansa_v1", timestamps="sentence",#Choose either sentence-level transcriptions or word-level transcriptions special_words=["Spitch"] # Add your special words here)
The sample response format is:
Copy
SpeechTranscribeResponse( request_id="35580dcf-xxxx-4666-8d75-xxxxxxxxxx", text=( "I'm having some power issues, I won't be available for a bit. " "All right. Please let me know when you are back." ), timestamps=[ Timestamp( start=1.20, end=4.64, text="I'm having some power issues, I won't be available for a bit." ), Timestamp( start=5.80, end=6.72, text="All right." ), Timestamp( start=8.08, end=9.92, text="Please let me know when you are back." ), ])
You can index into this response to fetch specific parameters like text or timestamps.
Mansa.v1 currently supports only English language. Additional language support will be available in upcoming releases.
You can use special_words to guide Mansa’s Named Entity Recognition. By entering in a list of entity strings, Mansa will accurately recognize and transcribe those terms in your audio.