Prerequisites

  • Python 3.7 or higher
  • Spitch API key
  • Livekit API key, URL and secret key

Installation & Setup

Install the following modules into your Python environment and setup your *.env* file with the following parameters.
pip install "livekit-agents[spitch]~=1.0"
SPITCH_API_KEY=<Your Spitch API Key>
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<Your URL>

Integrating Spitch STT with Livekit

from livekit.plugins import spitch

session = AgentSession(
   stt=spitch.STT(
      language="en",
   ),
)

STT Parameters

language
string
required
Check here for language options

Integrating Spitch TTS with Livekit

from livekit.plugins import spitch

session = AgentSession(
   tts=spitch.TTS(
      language="en",
      voice="lina",
   )
)

TTS Parameters

language
string
default:"en"
required
Check here for language options
voice
string
default:"en"
required
Check here for voice options

Building a Voice Agent with Spitch

In this section, we would build an end-to-end voice agent using Spitch and Livekit. The flowchart below summarizes the architecture of our agent: To get started, you’ll need API keys for Spitch, Livekit and your preffered LLM. We will be making use of OpenAI in this demo. To find out more about which LLMs are available on Livekit, check out this doc.

Install packages

Install the following packages on your local computer to get started.
pip install \
  "livekit-agents[spitch,openai,silero,turn-detector]~=1.0" \
  "livekit-plugins-noise-cancellation~=0.2" \
  "python-dotenv"

Set up your environment variables

.env
SPITCH_API_KEY=<Your Spitch API Key>
OPENAI_API_KEY=<Your OpenAI API Key>
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<Your Livekit URL>

Voice Agent Code

agent.py
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
   spitch,
    openai,
    noise_cancellation,
    silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv()

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.") #You can change the instrruction to meet your unique use case.

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=spitch.STT(language="en"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=spitch.TTS(language="en", voice="lina")
        vad=silero.VAD.load(), #Voice Activity Detection Model
        turn_detection=MultilingualModel(), #Turn Detection Model
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(), 
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )


if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Running the Agent

Download model files

Download the model files first to use the turn-detector, silero, or noise-cancellation plugins:
python agent.py download-files

Speak to the agent

Start your agent in console mode to run inside your terminal:
python agent.py console

Connect the agent to the playground

Start the agent in dev mode to connect it to LiveKit and make it available from anywhere on the internet:
python agent.py dev

Additional Resources