Phrase Sets (Dictionaries) API

This manual describes how to manage dictionaries via the /api/phrase-sets endpoints and how to reference them when configuring a pipeline.

All requests must include a bearer access token:

Authorization: Bearer <access-token>

All endpoints live under the API context path /api (e.g. https://<host>/api/phrase-sets).

1. Concepts#

A phrase set is a named dictionary of phrases. It has one of two mutually exclusive types — you pick it at creation time and cannot change it afterwards.

Type	Purpose	Where it is used
`BOOST`	Boost recognition of specific words / phrases by the ASR model.	`ASR` stage, parameter `boostedPhraseSets`
`G2A`	Custom grapheme‑to‑accent transformations.	`NLP_g2a` stage, parameter `customTransformationSets`

1.1 Visibility and ownership#

Every phrase set has a createdBy (the user who created it, or null for system‑wide / global sets) and an isShared flag. From the API perspective that produces three categories:

Category	How it is identified in the response	Who can see it
Personal	`createdBy = <your user id>`, `isShared = false`	Only the creator.
Shared	`createdBy = <someone in your group>`, `isShared = true`	Everyone in the same group as the creator.
Global	`createdBy = null`	Everyone.

isEditable in the response tells you whether your token can modify the set.

1.2 Authorities#

Default behaviour for a regular user: read your own, your group's shared, and global sets; write only your own. Elevated authorities extend this:

Authority	Effect
`GROUP_READ`	Read every set created by any user in your group (including non‑shared).
`GROUP_WRITE`	Write every set created in your group.
`ALL_READ`	Read every phrase set in the system.
`ALL_WRITE`	Write every non‑global phrase set.

1.3 Priority when merging sets in a pipeline#

When a pipeline stage references several phrase sets that contain the same text, the one closer to the caller wins (lower number = higher priority):

your own sets,
shared sets from your group,
global sets.

2. Managing phrase sets#

2.1 List phrase sets#

GET /api/phrase-sets      ?set-type=BOOST|G2A       (optional — filter by type)      &user-ids=1,2             (optional — filter by creator id)      &group-ids=3,4            (optional — filter by creator's group id)

Response: 200 OK

[  {    "id": 42,    "name": "MyMedicalTerms",    "type": "BOOST",    "createdBy": 7,    "isEditable": true,    "isShared": false  }]

2.2 Get a single phrase set#

GET /api/phrase-sets/{set-id}

Returns the same object shape as above.

2.3 Create a phrase set#

POST /api/phrase-setsContent-Type: application/json
{  "name":     "MyMedicalTerms",  "isShared": false,  "type":     "BOOST"}

name: 1–31 characters, unique per creator.
type: BOOST or G2A.

Response: 201 Created

{ "id": 42 }

Errors:

409 Conflict — you already have a phrase set with the same name.

2.4 Update a phrase set#

Use PATCH to change the name and/or the sharing flag. Any field you omit is left untouched.

PATCH /api/phrase-sets/{set-id}Content-Type: application/json
{  "name":     "NewName",  "isShared": true}

Response: 204 No Content

Errors:

400 Bad Request — attempting to send type (the type is immutable after creation).
403 Forbidden — you can read the set but not modify it.
404 Not Found — the set does not exist or you cannot read it.
409 Conflict — the new name collides with another of your phrase sets.

2.5 Delete a phrase set#

DELETE /api/phrase-sets/{set-id}

Response: 204 No Content. All phrases inside the set are removed together with the set.

3. Managing phrases inside a set#

3.1 List phrases#

GET /api/phrase-sets/{set-id}/phrases      ?text=substring     (optional — case-insensitive LIKE filter on phrase text)

Response: 200 OK with a list of phrase objects (shape depends on the set's type — see below).

3.2 Get a single phrase#

GET /api/phrase-sets/{set-id}/phrases/{phrase-id}

3.3 Add a phrase#

The request body is polymorphic. The discriminator field type must match the phrase set's type, otherwise the request is rejected with 400.

3.3.1 `BOOST` phrase (for a `BOOST` set)#

POST /api/phrase-sets/{set-id}/phrases

{  "type":     "boost",  "text":     "Remdesivir",  "boost":    0.3,  "variants": ["remdesivire", "Remdesiviru"]}

text: required, non‑blank.
boost: required recognition boost factor.
variants: optional. Each entry must be non‑blank and must not contain the _ character. Only honoured by engines that support contextual biasing; others ignore it.

3.3.2 `G2A` phrase (for a `G2A` set)#

POST /api/phrase-sets/{set-id}/phrases

{  "type": "g2a",  "text": "accentuated",  "acc":  "accéntuáted",  "ipa":  "əkˈsɛntʃueɪtɪd"}

text: required, non‑blank.
At least one of acc (accentuated written form) or ipa (IPA transcription) must be provided.
Pipelines currently consume acc; ipa is stored for future use.

Response: 201 Created

{ "id": 812 }

Errors:

400 Bad Request — the phrase type does not match the set's type, or a validation rule failed.
409 Conflict — a phrase with the same text already exists in this set.

3.4 Replace a phrase#

PUT fully overwrites an existing phrase. The request body has the same shape as POST.

PUT /api/phrase-sets/{set-id}/phrases/{phrase-id}

Response: 204 No Content. Returns 409 if the new text collides with a different phrase in the same set.

3.5 Delete a phrase#

DELETE /api/phrase-sets/{set-id}/phrases/{phrase-id}

Response: 204 No Content.

3.6 Common error responses#

Status	Meaning
`400 Bad Request`	Validation failed (bad body, mismatched `type`, forbidden characters in `variants`, trying to change set type, …).
`403 Forbidden`	You can read the set but cannot modify it.
`404 Not Found`	The set or phrase does not exist, or you do not have permission to read it.
`409 Conflict`	Duplicate name (for sets) or duplicate `text` (for phrases).

4. Using dictionaries in a pipeline#

Dictionaries are plugged into a pipeline configuration by id. You do not embed phrases directly in the pipeline request — you reference the phrase set(s) you own or have access to, and the backend expands them when the session starts.

4.1 `BOOST` sets → `ASR` stage#

{  "tag": "RIVA:sl-SI:COL:20251020-1400",  "parameters": {    "boostedPhraseSets":        [42, 57],    "additionalBoostedPhrases": ["additional phrase", "another"]  }}

boostedPhraseSets — list of phrase‑set ids. Every id must be readable by the caller and have type = BOOST; otherwise 400.
additionalBoostedPhrases — optional list of raw strings that do not need to be stored as a phrase set. They are merged in with a default boost and no variants.

Requirements / behaviour:

The selected ASR model must advertise boostedPhraseSets and/or additionalBoostedPhrases parameters. If it does not, requests with boostedPhraseSets or additionalBoostedPhrases are rejected with 400. Not all ASR models support boosting — check the stage config options returned by the /stages endpoint.
The backend flattens all referenced sets into one de‑duplicated list of phrases, applying the priority order described in 1.3.
Variants (on BOOST phrases) are forwarded only to models that understand them; others silently ignore them. The same holds true for the boost value.

4.2 `G2A` sets → `NLP_g2a` stage#

{  "tag": "VIT:sl-SI:*:*",  "parameters": {    "fastEnabled":                     false,    "customTransformationSets":        [11, 12],    "additionalCustomTransformations": {"word": "wórd"}  }}

customTransformationSets — list of phrase‑set ids. Every id must be readable by the caller and have type = G2A; otherwise 400.
additionalCustomTransformations — optional inline map of text → acc that does not need to be stored as a phrase set.

Behaviour:

Phrases from the referenced sets are flattened into a text → acc map. Entries whose acc is missing or empty are dropped.
Sets are merged in priority order (1.3).
additionalCustomTransformations is applied last, so any key it defines overrides a matching entry coming from a phrase set.

4.3 Discovering supported parameters#

Before building a pipeline config you can query the pipeline‑definition endpoints (see the Pipelines tag in Swagger UI at /api/swagger-ui/). The returned options for each stage tell you which parameters the selected model actually supports — including whether boostedPhraseSets or customTransformationSets are configurable.

5. Typical workflow#

POST /api/phrase-sets — create a BOOST and/or G2A dictionary, remember the returned id.
POST /api/phrase-sets/{id}/phrases — populate the dictionary.
When starting a pipeline (REST or WebSocket at /pipelines/stream/**), reference the dictionary ids in the matching stage's parameters (boostedPhraseSets for ASR, customTransformationSets for G2A).
Optionally PATCH the set with "isShared": true so other members of your group can reference the same dictionary.
Keep dictionaries under maintenance with PATCH / PUT / DELETE as your vocabulary evolves.

1. Concepts#

1.1 Visibility and ownership#

1.2 Authorities#

1.3 Priority when merging sets in a pipeline#

2. Managing phrase sets#

2.1 List phrase sets#

2.2 Get a single phrase set#

2.3 Create a phrase set#

2.4 Update a phrase set#

2.5 Delete a phrase set#

3. Managing phrases inside a set#

3.1 List phrases#

3.2 Get a single phrase#

3.3 Add a phrase#

3.3.1 BOOST phrase (for a BOOST set)#

3.3.2 G2A phrase (for a G2A set)#

3.4 Replace a phrase#

3.5 Delete a phrase#

3.6 Common error responses#

4. Using dictionaries in a pipeline#

4.1 BOOST sets → ASR stage#

4.2 G2A sets → NLP_g2a stage#

4.3 Discovering supported parameters#

5. Typical workflow#

3.3.1 `BOOST` phrase (for a `BOOST` set)#

3.3.2 `G2A` phrase (for a `G2A` set)#

4.1 `BOOST` sets → `ASR` stage#

4.2 `G2A` sets → `NLP_g2a` stage#