Languages
Each language track (audio, descriptive audio, subtitle or closed-caption track) in the streams that Encoding Service generates has a language code assigned to it, for example eng. In the output, the language code is defined in both the track's metadata in the file and in the DASH manifest or HLS playlist ("manifest"). In the manifest, the track also usually has a display name (e.g. English).
For this to work, the service must acquire a correct language code for each language track.
Acquiring the language code
Encoding Service searches for the language code in the following places, in this order:
- Source file name.
- Source track metadata.
- Fallback language code (default
en).
If no language code was received in any of these steps, the job fails.
Language code format
The language code should be present in our table of supported languages. We support two and three-letter language codes, with culture codes, with the total length of the language code up to 6 characters, for example en, eng, en-US, es-419.
It is possible to use any 2-6 character string as a language code, but depending on the player it might negatively affect the playback if the language code is not a globally recognized one. Also, if the language code is not in our list of supported languages, its display name in the manifest gets set to the same value as the language code.
By default, we fail the job if a language code is not in our list of supported languages, but this safety feature can be disabled by disabling 'Fail on Unrecognized Language Codes' in the 'Processing Profile' settings in Mosaic. When using the Encoding REST API, the same option is "FailOnUnrecognisedLanguageCodes": false in MediaMappings.
Language code in the file name
In the file name the language code can be defined by either naming the files in a certain way (recommended) or by using regex (more versatile).
Metadata
If no language code was not detected in the file name, or if the language code was unrecognized when using file naming convention, we try to detect the language code from the track metadata. A properly prepared source media usually has language codes present in the tracks' metadata, although some file formats, such as .vtt files, usually do not.
Fallback language code
If both steps fail to acquire a usable language code, we set a fallback language code to the track. By default this is set to en. You can change this value and you can also remove it. If you remove it, the job fails if no language code is acquired from the file name or the metadata.
Language code rules
To protect our users from invalid language information in the output, we impose to following rules for the language codes:
- Language code length must be 2-6 characters (this covers
de,de-DE,es-419,eng, etc). - Language code can only include the following characters:
a-z,A-Z,0-9,-- The exception is the language code in the file name when using file naming convention, in which case an underscore (
_)must be used instead of a dash.
- The exception is the language code in the file name when using file naming convention, in which case an underscore (
- Language code can only include a maximum of one
-character (not e.g.de-D-E).
These rules cannot be bypassed and if any language code violates them, the job fails.
und language code
The language code und is currently not supported. If this language code is encountered, it will not be used and the language detecting logic tries to get the language code from the next place (metadata or fallback language code).