Streaming TTS Quickstart
This tutorial streams text to the Truebar WebSocket API and plays back the synthesised audio. It mirrors the STT quickstart but focuses on text-to-speech (TTS) pipelines.
Prerequisites#
- Credentials for the OAuth password grant flow (username, password, and client ID). Store them in a
.envfile or secrets manager—avoid checking them into version control. - Runtime requirements:
- Node.js 18+ with ES modules enabled (
"type": "module"inpackage.jsonor an.mjsextension). - Python 3.10+ with
pipand a working C toolchain (required bysoundfile/libsndfile). - FFmpeg (for the
ffplayutility) or an alternative audio player.
- Node.js 18+ with ES modules enabled (
If you are using a non-production cluster, replace hostnames and WebSocket URLs with the ones provided by your environment administrator.
Prepare environment variables#
Load the following variables in your shell (or write them to .env and use a loader such as direnv or dotenv-cli):
export TRUEBAR_USERNAME="alice@example.com"export TRUEBAR_PASSWORD="super-secret"export TRUEBAR_CLIENT_ID="truebar-client"export TRUEBAR_AUTH_URL="https://auth.true-bar.si/realms/truebar/protocol/openid-connect/token"export TRUEBAR_TTS_WS_URL="wss://api.true-bar.si/api/pipelines/stream"What you will build#
- Authenticate with Truebar and attach the resulting bearer token as a query parameter on the TTS WebSocket endpoint.
- Stream text segments to the pipeline and receive raw 16Â kHz PCM audio chunks in response.
- Persist the stream locally (
speech.pcmorspeech.wav) and play it back for verification.
Expect initial configuration round trips to add ~150–300 ms latency depending on the selected voice and preprocessing stages. See the troubleshooting notes below for guidance on rate limits and payload sizing.
Voice tag
Set TRUEBAR_TTS_TAG to the voice you intend to use (discoverable via GET /api/pipelines/stages or your .env.truebar). Riva voices also require the accompanying TRUEBAR_TTS_NLP_* tags shown later in this guide.
1. Run the sample#
- JavaScript (Node.js)
- Python
Install dependencies and run the script below:
npm init -ynpm pkg set type=modulenpm install axios wsnode tts.jsNode environment
If your credentials live in .env, launch the script with npx dotenv -e .env -- node tts.js (or configure your shell to export the variables before running node). The sample requires Node.js 18+.
import axios from "axios";import WebSocket from "ws";import { createWriteStream } from "node:fs";
const isRivaTag = (tag: string | undefined) => !!tag && /^(RIVA|RIVA-STREAM|VIT):/i.test(tag);
const parseJsonEnv = (name: string) => { const raw = process.env[name]; if (!raw) return {}; try { const parsed = JSON.parse(raw); return typeof parsed === "object" && parsed !== null ? parsed : {}; } catch (error) { throw new Error(`Invalid JSON in ${name}: ${raw}`); }};
async function fetchToken() { const form = new URLSearchParams({ grant_type: "password", username: process.env.TRUEBAR_USERNAME!, password: process.env.TRUEBAR_PASSWORD!, client_id: process.env.TRUEBAR_CLIENT_ID ?? "truebar-client", });
const { data } = await axios.post(process.env.TRUEBAR_AUTH_URL!, form, { headers: { "Content-Type": "application/x-www-form-urlencoded" }, // Retry guidance: a 401 means the password or client ID is wrong or expired. validateStatus: (status) => status >= 200 && status < 500, });
if (!data.access_token) { throw new Error(`Failed to obtain token: ${JSON.stringify(data)}`); }
return data.access_token as string;}
function buildPipelineConfig() { const tag = process.env.TRUEBAR_TTS_TAG ?? "RIVA:en-US:*:*"; const pipeline: any[] = [];
if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_ST_TAG) { pipeline.push({ task: "NLP_st", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_ST_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_ST_PARAMETERS"), }, }); }
if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_TN_TAG) { pipeline.push({ task: "NLP_tn", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_TN_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_TN_PARAMETERS"), }, }); }
if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_G2A_TAG) { pipeline.push({ task: "NLP_g2a", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_G2A_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_G2A_PARAMETERS"), }, }); }
return { type: "CONFIG", pipeline: [ ...pipeline, { task: "TTS", exceptionHandlingPolicy: "THROW", config: { tag, parameters: parseJsonEnv("TRUEBAR_TTS_PARAMETERS"), }, }, ], };}
const useSsml = () => { const flag = process.env.TRUEBAR_TTS_ENABLE_SSML; return !!flag && ["1", "true", "yes", "on"].includes(flag.toLowerCase());};
const wrapText = (text: string) => { const rate = process.env.TRUEBAR_TTS_SSML_RATE?.trim(); const rateAttr = rate ? ` rate="${rate}"` : ""; const escaped = text.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">"); return `<speak><prosody${rateAttr}>${escaped}</prosody></speak>`;};
async function main() { const token = await fetchToken(); const ws = new WebSocket(process.env.TRUEBAR_TTS_WS_URL!, { headers: { Authorization: `Bearer ${token}` }, }); const out = createWriteStream("speech.pcm"); let streamed = false; const text = "Hello from the Truebar streaming API. This sample generates audio on demand.";
ws.on("message", (payload, isBinary) => { if (isBinary) { out.write(payload as Buffer); return; }
const msg = JSON.parse(payload.toString()); if (msg.type === "STATUS") { console.log("STATUS:", msg.status); if (msg.status === "CONFIGURED" && !streamed) { streamed = true; const payload = useSsml() ? wrapText(text) : text; ws.send( JSON.stringify({ type: "TEXT_SEGMENT", textSegment: { text: payload }, }), ); ws.send(JSON.stringify({ type: "EOS", lockSession: false })); } if (msg.status === "FINISHED") { out.end(); ws.close(); } return; }
if (msg.type === "ERROR") { console.error("Pipeline error", msg); } });
ws.once("message", () => { ws.send(JSON.stringify(buildPipelineConfig())); });
ws.on("open", () => console.log("TTS stream connected")); ws.on("close", () => console.log("TTS stream closed")); ws.on("error", (err) => console.error("TTS error", err));}
main().catch((err) => { console.error("Fatal error", err); process.exit(1);});Listen to the output (PCM16LE, 16Â kHz mono):
ffplay -f s16le -ar 16000 -ac 1 speech.pcmNo ffplay?
Install FFmpeg (brew install ffmpeg, sudo apt-get install ffmpeg, or choco install ffmpeg). As alternatives, convert the raw PCM to WAV with ffmpeg -f s16le -ar 16000 -ac 1 -i speech.pcm speech.wav and play it via afplay (macOS) or any desktop audio editor (Windows/Audacity).
Set up a virtual environment and run the sample:
python -m venv .venvsource .venv/bin/activate # On Windows use: .venv\Scripts\activatepip install aiohttp numpy soundfilepython tts.pyPython environment
soundfile depends on libsndfile. Install it via sudo apt-get install libsndfile1 (Debian/Ubuntu), brew install libsndfile (macOS), or the Pre-built binaries for Windows. If installing native deps is not feasible, swap soundfile for Python's built-in wave module and keep the PCM stream.
import asyncioimport jsonimport osimport refrom html import escapefrom typing import Any
import aiohttpimport numpy as npimport soundfile as sf
async def fetch_token(session: aiohttp.ClientSession) -> str: payload = { "grant_type": "password", "username": os.environ["TRUEBAR_USERNAME"], "password": os.environ["TRUEBAR_PASSWORD"], "client_id": os.getenv("TRUEBAR_CLIENT_ID", "truebar-client"), } async with session.post(os.environ["TRUEBAR_AUTH_URL"], data=payload) as resp: if resp.status == 401: raise RuntimeError("Authentication failed. Check username, password, or client permissions.") resp.raise_for_status() data = await resp.json() return data["access_token"]
def load_tts_parameters() -> dict[str, Any]: raw = os.getenv("TRUEBAR_TTS_PARAMETERS") if not raw: return {} try: data = json.loads(raw) except json.JSONDecodeError: return {} return data if isinstance(data, dict) else {}
def load_stage_parameters(env_var: str) -> dict[str, Any]: raw = os.getenv(env_var) if not raw: return {} try: data = json.loads(raw) except json.JSONDecodeError: raise RuntimeError(f"Invalid JSON in {env_var}") return data if isinstance(data, dict) else {}
def build_pipeline() -> list[dict[str, Any]]: tag = os.getenv("TRUEBAR_TTS_TAG", "RIVA:en-US:*:*") is_riva = bool(re.match(r"^(RIVA|RIVA-STREAM|VIT):", tag, re.IGNORECASE)) pipeline = []
if is_riva and os.getenv("TRUEBAR_TTS_NLP_ST_TAG"): pipeline.append({ "task": "NLP_st", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_ST_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_ST_PARAMETERS"), }, })
if is_riva and os.getenv("TRUEBAR_TTS_NLP_TN_TAG"): pipeline.append({ "task": "NLP_tn", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_TN_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_TN_PARAMETERS"), }, })
if is_riva and os.getenv("TRUEBAR_TTS_NLP_G2A_TAG"): pipeline.append({ "task": "NLP_g2a", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_G2A_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_G2A_PARAMETERS"), }, })
pipeline.append( { "task": "TTS", "exceptionHandlingPolicy": "THROW", "config": { "tag": tag, "parameters": load_tts_parameters(), }, }, )
return pipeline
def use_ssml() -> bool: flag = os.getenv("TRUEBAR_TTS_ENABLE_SSML") if flag and flag.lower() in {"1", "true", "yes", "on"}: return True return bool(os.getenv("TRUEBAR_TTS_SSML_RATE"))
def wrap_text(text: str) -> str: rate = os.getenv("TRUEBAR_TTS_SSML_RATE") rate_attr = f' rate="{rate.strip()}"' if rate else "" return f"<speak><prosody{rate_attr}>{escape(text)}</prosody></speak>"
async def main() -> None: async with aiohttp.ClientSession() as session: token = await fetch_token(session)
ws_url = os.environ["TRUEBAR_TTS_WS_URL"] headers = {"Authorization": f"Bearer {token}"}
async with session.ws_connect(ws_url, headers=headers, heartbeat=30) as ws: configured = asyncio.Event() audio_chunks: list[bytes] = []
await ws.send_json({"type": "CONFIG", "pipeline": build_pipeline()})
text = ( "Hello from the Truebar streaming API. " "This sample generates audio on demand." )
async def sender(): try: await asyncio.wait_for(configured.wait(), timeout=5.0) except asyncio.TimeoutError: print("CONFIGURED status not received in time; streaming anyway.")
payload = wrap_text(text) if use_ssml() else text await ws.send_json({ "type": "TEXT_SEGMENT", "textSegment": {"text": payload}, }) await ws.send_json({"type": "EOS", "lockSession": False})
async def receiver(): async for msg in ws: if msg.type in (aiohttp.WSMsgType.CLOSE, aiohttp.WSMsgType.CLOSING): break if msg.type == aiohttp.WSMsgType.BINARY: audio_chunks.append(bytes(msg.data)) continue if msg.type != aiohttp.WSMsgType.TEXT: continue
data = msg.json() message_type = data.get("type")
if message_type == "STATUS": status = data.get("status") print("STATUS:", status) if status in {"INITIALIZED", "CONFIG_REQUIRED", "CONFIGURED"}: configured.set() if status == "FINISHED": break continue
if message_type == "ERROR": configured.set() raise RuntimeError(f"Pipeline returned error: {data}")
await asyncio.gather(sender(), receiver())
if not audio_chunks: raise RuntimeError("Pipeline returned no audio")
pcm = np.frombuffer(b"".join(audio_chunks), dtype="<i2") sf.write("speech.wav", pcm, 16_000) print("Saved synthesis to speech.wav")
if __name__ == "__main__": try: asyncio.run(main()) except KeyboardInterrupt: print("Cancelled by user.")Text payload format
Truebar accepts a full sentence per TEXT_SEGMENT. Set TRUEBAR_TTS_ENABLE_SSML=true (and optionally TRUEBAR_TTS_SSML_RATE) to wrap the payload as SSML, matching the production implementation.
Play the resulting WAV file:
ffplay speech.wavAlternative players
Any media player that supports 16Â kHz mono WAV (e.g. QuickTime, Windows Media Player, VLC) works here. If you cannot install FFmpeg, double-click speech.wav or run open speech.wav (macOS) / start speech.wav (Windows).
Run the Python sample in Jupyter Notebook#
%pip install aiohttp numpy soundfile nest_asyncioimport nest_asyncio, asyncio, osfrom tts import main # assumes the tutorial code lives in tts.py next to the notebook
os.environ["TRUEBAR_USERNAME"] = "alice@example.com"os.environ["TRUEBAR_PASSWORD"] = "super-secret-passphrase"os.environ["TRUEBAR_CLIENT_ID"] = "truebar-client"os.environ["TRUEBAR_AUTH_URL"] = "https://playground-auth.true-bar.si/realms/truebar/protocol/openid-connect/token"os.environ["TRUEBAR_TTS_WS_URL"] = "wss://playground-api.true-bar.si/api/pipelines/stream"
nest_asyncio.apply() # allows asyncio.run inside Jupyterawait main()Play the generated audio directly in the notebook:
from IPython.display import AudioAudio("speech.wav")2. Configure voices and pipeline stages#
- Voice tags follow the pattern
FRAMEWORK:locale:voice:version. RunGET /api/pipelines/stages/TTSto list everything your account can access. SetTRUEBAR_TTS_TAGto lock the example to a specific voice. - Stage parameters let you tune specific stage.
- Pre-processing stages (punctuation, text normalisation, accentuation) run before synthesis. Add additional objects ahead of the
TTStask in thepipelinearray to clean or enrich text before it reaches the voice. - Segmenting text: Use multiple
TEXT_SEGMENTmessages for long content. Keep individual segments at least one sentence long and under a few sentences (<4 KB payload) to avoid hitting per-message limits and to reduce latency.
Enable the Riva multi-stage pipeline#
Truebar’s Riva voices expect a four-stage pipeline (sentence tokenization → text normalisation → grapheme-to-accentuation → TTS). Configure it by exporting the NLP stage tags and turning on SSML support:
export TRUEBAR_TTS_TAG="RIVA:sl-SI:ZIGA:20250408-2103"export TRUEBAR_TTS_ENABLE_SSML=true # wrap payload in <speak> automaticallyexport TRUEBAR_TTS_NLP_ST_TAG="VIT:sl-SI:*:1.0.0"export TRUEBAR_TTS_NLP_TN_TAG="VIT:sl-SI:*:*"export TRUEBAR_TTS_NLP_G2A_TAG="VIT:sl-SI:*:*"export TRUEBAR_TTS_NLP_ST_PARAMETERS='{"processSsml": true}'export TRUEBAR_TTS_NLP_G2A_PARAMETERS='{"fastEnabled": true}'With those variables set, the tutorial’s Python and Node.js samples emit a CONFIG payload similar to:
{ "type": "CONFIG", "pipeline": [ {"task": "NLP_st", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:1.0.0", "parameters": {"processSsml": true}}}, {"task": "NLP_tn", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:*", "parameters": {}}}, {"task": "NLP_g2a", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:*", "parameters": {"fastEnabled": true}}}, {"task": "TTS", "exceptionHandlingPolicy": "THROW", "config": {"tag": "RIVA:sl-SI:ZIGA:20250408-2103", "parameters": {}}} ]}You can discover valid tags and parameters via GET /api/pipelines/stages (use the token from the earlier step) or by copying them from your working .env.truebar. Toggle TRUEBAR_TTS_ENABLE_SSML off if you want to send plain text through the same pipeline.
Both the Node.js and Python samples automatically add these NLP stages when the active tag belongs to the Riva family (or when you override with TRUEBAR_TTS_USE_RIVA_PIPELINE=1).
3. Troubleshooting and production readiness#
- 401 responses typically shows problems with your account or wrong credentials while 403 responses typically mean the user lacks required permissions.
- Watch for 429 rate-limit responses which means you have reached predefined quota limit.
- Log and persist pipeline errors from the
ERRORmessage type. They often contain actionable details (e.g. unsupported SSML tag, missing pipeline stage).
4. Next steps#
- Prototype advanced SSML features (say-as, emphasis, phonemes) and combine them with preprocessing stages to keep payloads readable.
- Review the Streaming TTS guide for protocol details, SSML authoring tips, and pipeline discovery queries.
- Instrument your application with observability (request IDs, audio durations, jitter) before deploying to production.
- If you also need transcription, try the Streaming STT quickstart and experiment with duplex pipelines.