Streaming TTS Quickstart
This tutorial streams text to the Truebar WebSocket API and plays back the synthesised audio. It mirrors the STT quickstart but focuses on text-to-speech (TTS) pipelines.
#
Prerequisites- A Truebar tenant with Streaming TTS enabled for your project. Ask your workspace admin to grant the Streaming pipelines privilege to a dedicated service user instead of your personal login.
- Credentials for the OAuth password grant flow (username, password, and client ID). Store them in a
.env
file or secrets manager—avoid checking them into version control. - Runtime requirements:
- Node.js 18+ with ES modules enabled (
"type": "module"
inpackage.json
or an.mjs
extension). - Python 3.10+ with
pip
and a working C toolchain (required bysoundfile
/libsndfile). - FFmpeg (for the
ffplay
utility) or an alternative audio player.
- Node.js 18+ with ES modules enabled (
If you are using a non-production cluster, replace hostnames and WebSocket URLs with the ones provided by your environment administrator.
#
Prepare environment variablesLoad the following variables in your shell (or write them to .env
and use a loader such as direnv
or dotenv-cli
):
export TRUEBAR_USERNAME="alice@example.com"export TRUEBAR_PASSWORD="super-secret"export TRUEBAR_CLIENT_ID="truebar-client"export TRUEBAR_AUTH_URL="https://auth.true-bar.si/realms/truebar/protocol/openid-connect/token"export TRUEBAR_TTS_WS_URL="wss://api.true-bar.si/api/pipelines/stream"
Rotate credentials regularly and prefer short-lived passwords or tokens where possible.
#
What you will build- Authenticate with Truebar and attach the resulting bearer token as a query parameter on the TTS WebSocket endpoint.
- Stream text segments to the pipeline and receive raw 16Â kHz PCM audio chunks in response.
- Persist the stream locally (
speech.pcm
orspeech.wav
) and play it back for verification.
Expect initial configuration round trips to add ~150–300 ms latency depending on the selected voice and preprocessing stages. See the troubleshooting notes below for guidance on rate limits and payload sizing.
Voice tag
Set TRUEBAR_TTS_TAG
to the voice you intend to use (discoverable via GET /api/pipelines/stages
or your .env.truebar
). Riva voices also require the accompanying TRUEBAR_TTS_NLP_*
tags shown later in this guide.
#
1. Run the sample- JavaScript (Node.js)
- Python
Install dependencies and run the script below:
npm init -ynpm pkg set type=modulenpm install axios wsnode tts.js
Node environment
If your credentials live in .env
, launch the script with npx dotenv -e .env -- node tts.js
(or configure your shell to export the variables before running node
). The sample requires Node.js 18+.
import axios from "axios";import WebSocket from "ws";import { createWriteStream } from "node:fs";
const isRivaTag = (tag: string | undefined) => !!tag && /^(RIVA|RIVA-STREAM|VIT):/i.test(tag);
const parseJsonEnv = (name: string) => { const raw = process.env[name]; if (!raw) return {}; try { const parsed = JSON.parse(raw); return typeof parsed === "object" && parsed !== null ? parsed : {}; } catch (error) { throw new Error(`Invalid JSON in ${name}: ${raw}`); }};
async function fetchToken() { const form = new URLSearchParams({ grant_type: "password", username: process.env.TRUEBAR_USERNAME!, password: process.env.TRUEBAR_PASSWORD!, client_id: process.env.TRUEBAR_CLIENT_ID ?? "truebar-client", });
const { data } = await axios.post(process.env.TRUEBAR_AUTH_URL!, form, { headers: { "Content-Type": "application/x-www-form-urlencoded" }, // Retry guidance: a 401 means the password or client ID is wrong or expired. validateStatus: (status) => status >= 200 && status < 500, });
if (!data.access_token) { throw new Error(`Failed to obtain token: ${JSON.stringify(data)}`); }
return data.access_token as string;}
function buildPipelineConfig() { const tag = process.env.TRUEBAR_TTS_TAG ?? "RIVA:en-US:*:*"; const pipeline: any[] = [];
if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_ST_TAG) { pipeline.push({ task: "NLP_st", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_ST_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_ST_PARAMETERS"), }, }); }
if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_TN_TAG) { pipeline.push({ task: "NLP_tn", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_TN_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_TN_PARAMETERS"), }, }); }
if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_G2A_TAG) { pipeline.push({ task: "NLP_g2a", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_G2A_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_G2A_PARAMETERS"), }, }); }
return { type: "CONFIG", pipeline: [ ...pipeline, { task: "TTS", exceptionHandlingPolicy: "THROW", config: { tag, parameters: parseJsonEnv("TRUEBAR_TTS_PARAMETERS"), }, }, ], };}
const useSsml = () => { const flag = process.env.TRUEBAR_TTS_ENABLE_SSML; return !!flag && ["1", "true", "yes", "on"].includes(flag.toLowerCase());};
const wrapText = (text: string) => { const rate = process.env.TRUEBAR_TTS_SSML_RATE?.trim(); const rateAttr = rate ? ` rate="${rate}"` : ""; const escaped = text.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">"); return `<speak><prosody${rateAttr}>${escaped}</prosody></speak>`;};
async function main() { const token = await fetchToken(); const ws = new WebSocket(process.env.TRUEBAR_TTS_WS_URL!, { headers: { Authorization: `Bearer ${token}` }, }); const out = createWriteStream("speech.pcm"); let streamed = false; const text = "Hello from the Truebar streaming API. This sample generates audio on demand.";
ws.on("message", (payload, isBinary) => { if (isBinary) { out.write(payload as Buffer); return; }
const msg = JSON.parse(payload.toString()); if (msg.type === "STATUS") { console.log("STATUS:", msg.status); if (msg.status === "CONFIGURED" && !streamed) { streamed = true; const payload = useSsml() ? wrapText(text) : text; ws.send( JSON.stringify({ type: "TEXT_SEGMENT", textSegment: { text: payload }, }), ); ws.send(JSON.stringify({ type: "EOS", lockSession: false })); } if (msg.status === "FINISHED") { out.end(); ws.close(); } return; }
if (msg.type === "ERROR") { console.error("Pipeline error", msg); } });
ws.once("message", () => { ws.send(JSON.stringify(buildPipelineConfig())); });
ws.on("open", () => console.log("TTS stream connected")); ws.on("close", () => console.log("TTS stream closed")); ws.on("error", (err) => console.error("TTS error", err));}
main().catch((err) => { console.error("Fatal error", err); process.exit(1);});
Listen to the output (PCM16LE, 16Â kHz mono):
ffplay -f s16le -ar 16000 -ac 1 speech.pcm
No ffplay?
Install FFmpeg (brew install ffmpeg
, sudo apt-get install ffmpeg
, or choco install ffmpeg
). As alternatives, convert the raw PCM to WAV with ffmpeg -f s16le -ar 16000 -ac 1 -i speech.pcm speech.wav
and play it via afplay
(macOS) or any desktop audio editor (Windows/Audacity).
Set up a virtual environment and run the sample:
python -m venv .venvsource .venv/bin/activate # On Windows use: .venv\Scripts\activatepip install aiohttp numpy soundfilepython tts.py
Python environment
soundfile
depends on libsndfile. Install it via sudo apt-get install libsndfile1
(Debian/Ubuntu), brew install libsndfile
(macOS), or the Pre-built binaries for Windows. If installing native deps is not feasible, swap soundfile
for Python's built-in wave
module and keep the PCM stream.
import asyncioimport jsonimport osimport refrom html import escapefrom typing import Any
import aiohttpimport numpy as npimport soundfile as sf
async def fetch_token(session: aiohttp.ClientSession) -> str: payload = { "grant_type": "password", "username": os.environ["TRUEBAR_USERNAME"], "password": os.environ["TRUEBAR_PASSWORD"], "client_id": os.getenv("TRUEBAR_CLIENT_ID", "truebar-client"), } async with session.post(os.environ["TRUEBAR_AUTH_URL"], data=payload) as resp: if resp.status == 401: raise RuntimeError("Authentication failed. Check username, password, or client permissions.") resp.raise_for_status() data = await resp.json() return data["access_token"]
def load_tts_parameters() -> dict[str, Any]: raw = os.getenv("TRUEBAR_TTS_PARAMETERS") if not raw: return {} try: data = json.loads(raw) except json.JSONDecodeError: return {} return data if isinstance(data, dict) else {}
def load_stage_parameters(env_var: str) -> dict[str, Any]: raw = os.getenv(env_var) if not raw: return {} try: data = json.loads(raw) except json.JSONDecodeError: raise RuntimeError(f"Invalid JSON in {env_var}") return data if isinstance(data, dict) else {}
def build_pipeline() -> list[dict[str, Any]]: tag = os.getenv("TRUEBAR_TTS_TAG", "RIVA:en-US:*:*") is_riva = bool(re.match(r"^(RIVA|RIVA-STREAM|VIT):", tag, re.IGNORECASE)) pipeline = []
if is_riva and os.getenv("TRUEBAR_TTS_NLP_ST_TAG"): pipeline.append({ "task": "NLP_st", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_ST_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_ST_PARAMETERS"), }, })
if is_riva and os.getenv("TRUEBAR_TTS_NLP_TN_TAG"): pipeline.append({ "task": "NLP_tn", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_TN_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_TN_PARAMETERS"), }, })
if is_riva and os.getenv("TRUEBAR_TTS_NLP_G2A_TAG"): pipeline.append({ "task": "NLP_g2a", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_G2A_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_G2A_PARAMETERS"), }, })
pipeline.append( { "task": "TTS", "exceptionHandlingPolicy": "THROW", "config": { "tag": tag, "parameters": load_tts_parameters(), }, }, )
return pipeline
def use_ssml() -> bool: flag = os.getenv("TRUEBAR_TTS_ENABLE_SSML") if flag and flag.lower() in {"1", "true", "yes", "on"}: return True return bool(os.getenv("TRUEBAR_TTS_SSML_RATE"))
def wrap_text(text: str) -> str: rate = os.getenv("TRUEBAR_TTS_SSML_RATE") rate_attr = f' rate="{rate.strip()}"' if rate else "" return f"<speak><prosody{rate_attr}>{escape(text)}</prosody></speak>"
async def main() -> None: async with aiohttp.ClientSession() as session: token = await fetch_token(session)
ws_url = os.environ["TRUEBAR_TTS_WS_URL"] headers = {"Authorization": f"Bearer {token}"}
async with session.ws_connect(ws_url, headers=headers, heartbeat=30) as ws: configured = asyncio.Event() audio_chunks: list[bytes] = []
await ws.send_json({"type": "CONFIG", "pipeline": build_pipeline()})
text = ( "Hello from the Truebar streaming API. " "This sample generates audio on demand." )
async def sender(): try: await asyncio.wait_for(configured.wait(), timeout=5.0) except asyncio.TimeoutError: print("CONFIGURED status not received in time; streaming anyway.")
payload = wrap_text(text) if use_ssml() else text await ws.send_json({ "type": "TEXT_SEGMENT", "textSegment": {"text": payload}, }) await ws.send_json({"type": "EOS", "lockSession": False})
async def receiver(): async for msg in ws: if msg.type in (aiohttp.WSMsgType.CLOSE, aiohttp.WSMsgType.CLOSING): break if msg.type == aiohttp.WSMsgType.BINARY: audio_chunks.append(bytes(msg.data)) continue if msg.type != aiohttp.WSMsgType.TEXT: continue
data = msg.json() message_type = data.get("type")
if message_type == "STATUS": status = data.get("status") print("STATUS:", status) if status in {"INITIALIZED", "CONFIG_REQUIRED", "CONFIGURED"}: configured.set() if status == "FINISHED": break continue
if message_type == "ERROR": configured.set() raise RuntimeError(f"Pipeline returned error: {data}")
await asyncio.gather(sender(), receiver())
if not audio_chunks: raise RuntimeError("Pipeline returned no audio")
pcm = np.frombuffer(b"".join(audio_chunks), dtype="<i2") sf.write("speech.wav", pcm, 16_000) print("Saved synthesis to speech.wav")
if __name__ == "__main__": try: asyncio.run(main()) except KeyboardInterrupt: print("Cancelled by user.")
Text payload format
Truebar accepts a full sentence per TEXT_SEGMENT
. Set TRUEBAR_TTS_ENABLE_SSML=true
(and optionally TRUEBAR_TTS_SSML_RATE
) to wrap the payload as SSML, matching the production implementation.
Play the resulting WAV file:
ffplay speech.wav
Alternative players
Any media player that supports 16Â kHz mono WAV (e.g. QuickTime, Windows Media Player, VLC) works here. If you cannot install FFmpeg, double-click speech.wav
or run open speech.wav
(macOS) / start speech.wav
(Windows).
#
Run the Python sample in Jupyter Notebook%pip install aiohttp numpy soundfile nest_asyncioimport nest_asyncio, asyncio, osfrom tts import main # assumes the tutorial code lives in tts.py next to the notebook
os.environ["TRUEBAR_USERNAME"] = "alice@example.com"os.environ["TRUEBAR_PASSWORD"] = "super-secret-passphrase"os.environ["TRUEBAR_CLIENT_ID"] = "truebar-client"os.environ["TRUEBAR_AUTH_URL"] = "https://playground-auth.true-bar.si/realms/truebar/protocol/openid-connect/token"os.environ["TRUEBAR_TTS_WS_URL"] = "wss://playground-api.true-bar.si/api/pipelines/stream"
nest_asyncio.apply() # allows asyncio.run inside Jupyterawait main()
Play the generated audio directly in the notebook:
from IPython.display import AudioAudio("speech.wav")
#
2. Configure voices and pipeline stages- Voice tags follow the pattern
PROVIDER:locale:gender:voice
. RunGET /api/pipelines?task=TTS
or check the Truebar Console > Pipelines to list everything your tenant can access. SetTRUEBAR_TTS_TAG
to lock the example to a specific voice. - Pipeline parameters let you tune the generated speech without switching voices. Common fields include
speakingRate
(0.5–2.0),pitch
(−20 to +20 semitones), andvolumeGainDb
(−96 to +16). Pass them in theparameters
object shown in the snippets. - Pre-processing stages (punctuation, text normalisation, accentuation) run before synthesis. Add additional objects ahead of the
TTS
task in thepipeline
array to clean or enrich text before it reaches the voice. - Segmenting text: Use multiple
TEXT_SEGMENT
messages for long content. Keep individual segments under a few sentences (<4 KB payload) to avoid hitting per-message limits and to reduce latency.
#
Enable the Riva multi-stage pipelineTruebar’s Riva voices expect a four-stage pipeline (semantic tagging → text normalisation → grapheme-to-accentuation → TTS). Configure it by exporting the NLP stage tags and turning on SSML support:
export TRUEBAR_TTS_TAG="RIVA:sl-SI:ZIGA:20250408-2103"export TRUEBAR_TTS_ENABLE_SSML=true # wrap payload in <speak> automaticallyexport TRUEBAR_TTS_NLP_ST_TAG="VIT:sl-SI:*:1.0.0"export TRUEBAR_TTS_NLP_TN_TAG="VIT:sl-SI:*:*"export TRUEBAR_TTS_NLP_G2A_TAG="VIT:sl-SI:*:*"export TRUEBAR_TTS_NLP_ST_PARAMETERS='{"processSsml": true}'export TRUEBAR_TTS_NLP_G2A_PARAMETERS='{"fastEnabled": true}'
With those variables set, the tutorial’s Python and Node.js samples emit a CONFIG
payload similar to:
{ "type": "CONFIG", "pipeline": [ {"task": "NLP_st", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:1.0.0", "parameters": {"processSsml": true}}}, {"task": "NLP_tn", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:*", "parameters": {}}}, {"task": "NLP_g2a", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:*", "parameters": {"fastEnabled": true}}}, {"task": "TTS", "exceptionHandlingPolicy": "THROW", "config": {"tag": "RIVA:sl-SI:ZIGA:20250408-2103", "parameters": {}}} ]}
You can discover valid tags and parameters via GET /api/pipelines/stages
(use the token from the earlier step) or by copying them from your working .env.truebar
. Toggle TRUEBAR_TTS_ENABLE_SSML
off if you want to send plain text through the same pipeline.
Both the Node.js and Python samples automatically add these NLP stages when the active tag belongs to the Riva family (or when you override with TRUEBAR_TTS_USE_RIVA_PIPELINE=1
).
#
3. Troubleshooting and production readiness- 401 or 403 responses typically mean the service user lacks the Streaming pipelines permission or the password expired. Reset credentials or switch to a client-credential flow if available.
- Watch for 429 rate-limit responses. Batch non-urgent synthesis jobs and retry with exponential backoff.
- Treat network hiccups as transient: reconnect the WebSocket and resume from the last unsent segment. Both samples can be extended with retry wrappers such as
p-retry
(Node) ortenacity
(Python). - Monitor latency by timestamping when you send
CONFIG
, the first audio chunk, andFINISHED
. This highlights voice selection or preprocessing steps that slow down synthesis. - Log and persist pipeline errors from the
ERROR
message type. They often contain actionable details (e.g. unsupported SSML tag, missing pipeline stage).
#
4. Next steps- Prototype advanced SSML features (say-as, emphasis, phonemes) and combine them with preprocessing stages to keep payloads readable.
- Review the Streaming TTS guide for protocol details, SSML authoring tips, and pipeline discovery queries.
- Instrument your application with observability (request IDs, audio durations, jitter) before deploying to production.
- If you also need transcription, try the Streaming STT quickstart and experiment with duplex pipelines.