Phrase Sets (Dictionaries) API
This manual describes how to manage dictionaries via the /api/phrase-sets endpoints and how to reference them when configuring a pipeline.
All requests must include a bearer access token:
Authorization: Bearer <access-token>All endpoints live under the API context path /api (e.g. https://<host>/api/phrase-sets).
1. Concepts#
A phrase set is a named dictionary of phrases. It has one of two mutually exclusive types — you pick it at creation time and cannot change it afterwards.
| Type | Purpose | Where it is used |
|---|---|---|
BOOST | Boost recognition of specific words / phrases by the ASR model. | ASR stage, parameter boostedPhraseSets |
G2A | Custom grapheme‑to‑accent transformations. | NLP_g2a stage, parameter customTransformationSets |
1.1 Visibility and ownership#
Every phrase set has a createdBy (the user who created it, or null for system‑wide / global sets) and an isShared flag. From the API perspective that produces three categories:
| Category | How it is identified in the response | Who can see it |
|---|---|---|
| Personal | createdBy = <your user id>, isShared = false | Only the creator. |
| Shared | createdBy = <someone in your group>, isShared = true | Everyone in the same group as the creator. |
| Global | createdBy = null | Everyone. |
isEditable in the response tells you whether your token can modify the set.
1.2 Authorities#
Default behaviour for a regular user: read your own, your group's shared, and global sets; write only your own. Elevated authorities extend this:
| Authority | Effect |
|---|---|
GROUP_READ | Read every set created by any user in your group (including non‑shared). |
GROUP_WRITE | Write every set created in your group. |
ALL_READ | Read every phrase set in the system. |
ALL_WRITE | Write every non‑global phrase set. |
1.3 Priority when merging sets in a pipeline#
When a pipeline stage references several phrase sets that contain the same text, the one closer to the caller wins (lower number = higher priority):
- your own sets,
- shared sets from your group,
- global sets.
2. Managing phrase sets#
2.1 List phrase sets#
GET /api/phrase-sets ?set-type=BOOST|G2A (optional — filter by type) &user-ids=1,2 (optional — filter by creator id) &group-ids=3,4 (optional — filter by creator's group id)Response: 200 OK
[ { "id": 42, "name": "MyMedicalTerms", "type": "BOOST", "createdBy": 7, "isEditable": true, "isShared": false }]2.2 Get a single phrase set#
GET /api/phrase-sets/{set-id}Returns the same object shape as above.
2.3 Create a phrase set#
POST /api/phrase-setsContent-Type: application/json
{ "name": "MyMedicalTerms", "isShared": false, "type": "BOOST"}name: 1–31 characters, unique per creator.type:BOOSTorG2A.
Response: 201 Created
{ "id": 42 }Errors:
409 Conflict— you already have a phrase set with the same name.
2.4 Update a phrase set#
Use PATCH to change the name and/or the sharing flag. Any field you omit is left untouched.
PATCH /api/phrase-sets/{set-id}Content-Type: application/json
{ "name": "NewName", "isShared": true}Response: 204 No Content
Errors:
400 Bad Request— attempting to sendtype(the type is immutable after creation).403 Forbidden— you can read the set but not modify it.404 Not Found— the set does not exist or you cannot read it.409 Conflict— the new name collides with another of your phrase sets.
2.5 Delete a phrase set#
DELETE /api/phrase-sets/{set-id}Response: 204 No Content. All phrases inside the set are removed together with the set.
3. Managing phrases inside a set#
3.1 List phrases#
GET /api/phrase-sets/{set-id}/phrases ?text=substring (optional — case-insensitive LIKE filter on phrase text)Response: 200 OK with a list of phrase objects (shape depends on the set's type — see below).
3.2 Get a single phrase#
GET /api/phrase-sets/{set-id}/phrases/{phrase-id}3.3 Add a phrase#
The request body is polymorphic. The discriminator field type must match the phrase set's type, otherwise the request is rejected with 400.
3.3.1 BOOST phrase (for a BOOST set)#
POST /api/phrase-sets/{set-id}/phrases{ "type": "boost", "text": "Remdesivir", "boost": 0.3, "variants": ["remdesivire", "Remdesiviru"]}text: required, non‑blank.boost: required recognition boost factor.variants: optional. Each entry must be non‑blank and must not contain the_character. Only honoured by engines that support contextual biasing; others ignore it.
3.3.2 G2A phrase (for a G2A set)#
POST /api/phrase-sets/{set-id}/phrases{ "type": "g2a", "text": "accentuated", "acc": "accéntuáted", "ipa": "əkˈsɛntʃueɪtɪd"}text: required, non‑blank.- At least one of
acc(accentuated written form) oripa(IPA transcription) must be provided. - Pipelines currently consume
acc;ipais stored for future use.
Response: 201 Created
{ "id": 812 }Errors:
400 Bad Request— the phrasetypedoes not match the set's type, or a validation rule failed.409 Conflict— a phrase with the sametextalready exists in this set.
3.4 Replace a phrase#
PUT fully overwrites an existing phrase. The request body has the same shape as POST.
PUT /api/phrase-sets/{set-id}/phrases/{phrase-id}Response: 204 No Content. Returns 409 if the new text collides with a different phrase in the same set.
3.5 Delete a phrase#
DELETE /api/phrase-sets/{set-id}/phrases/{phrase-id}Response: 204 No Content.
3.6 Common error responses#
| Status | Meaning |
|---|---|
400 Bad Request | Validation failed (bad body, mismatched type, forbidden characters in variants, trying to change set type, …). |
403 Forbidden | You can read the set but cannot modify it. |
404 Not Found | The set or phrase does not exist, or you do not have permission to read it. |
409 Conflict | Duplicate name (for sets) or duplicate text (for phrases). |
4. Using dictionaries in a pipeline#
Dictionaries are plugged into a pipeline configuration by id. You do not embed phrases directly in the pipeline request — you reference the phrase set(s) you own or have access to, and the backend expands them when the session starts.
4.1 BOOST sets → ASR stage#
{ "tag": "RIVA:sl-SI:COL:20251020-1400", "parameters": { "boostedPhraseSets": [42, 57], "additionalBoostedPhrases": ["additional phrase", "another"] }}boostedPhraseSets— list of phrase‑set ids. Every id must be readable by the caller and havetype = BOOST; otherwise400.additionalBoostedPhrases— optional list of raw strings that do not need to be stored as a phrase set. They are merged in with a default boost and no variants.
Requirements / behaviour:
- The selected ASR model must advertise
boostedPhraseSetsand/oradditionalBoostedPhrasesparameters. If it does not, requests withboostedPhraseSetsoradditionalBoostedPhrasesare rejected with400. Not all ASR models support boosting — check the stage config options returned by the/stagesendpoint. - The backend flattens all referenced sets into one de‑duplicated list of phrases, applying the priority order described in 1.3.
- Variants (on
BOOSTphrases) are forwarded only to models that understand them; others silently ignore them. The same holds true for the boost value.
4.2 G2A sets → NLP_g2a stage#
{ "tag": "VIT:sl-SI:*:*", "parameters": { "fastEnabled": false, "customTransformationSets": [11, 12], "additionalCustomTransformations": {"word": "wórd"} }}customTransformationSets— list of phrase‑set ids. Every id must be readable by the caller and havetype = G2A; otherwise400.additionalCustomTransformations— optional inline map oftext → accthat does not need to be stored as a phrase set.
Behaviour:
- Phrases from the referenced sets are flattened into a
text → accmap. Entries whoseaccis missing or empty are dropped. - Sets are merged in priority order (1.3).
additionalCustomTransformationsis applied last, so any key it defines overrides a matching entry coming from a phrase set.
4.3 Discovering supported parameters#
Before building a pipeline config you can query the pipeline‑definition endpoints (see the Pipelines tag in Swagger UI at /api/swagger-ui/). The returned options for each stage tell you which parameters the selected model actually supports — including whether boostedPhraseSets or customTransformationSets are configurable.
5. Typical workflow#
POST /api/phrase-sets— create aBOOSTand/orG2Adictionary, remember the returnedid.POST /api/phrase-sets/{id}/phrases— populate the dictionary.- When starting a pipeline (REST or WebSocket at
/pipelines/stream/**), reference the dictionary ids in the matching stage's parameters (boostedPhraseSetsfor ASR,customTransformationSetsfor G2A). - Optionally
PATCHthe set with"isShared": trueso other members of your group can reference the same dictionary. - Keep dictionaries under maintenance with
PATCH/PUT/DELETEas your vocabulary evolves.