Source Subtitle Formats
The Encoding Service supports five source subtitle formats. This article describes which elements within each format are recognized and how they are handled during processing.
Note: All source formats are converted to WebVTT internally. Styling, positioning, and region information from non-WebVTT formats is not preserved in the output. WebVTT cue settings (e.g.
align,position,line,size) are preserved as-is.
SubRip (.srt)
SubRip is a plain-text format with a simple block structure.
Recognized elements
| Element | Handling |
|---|---|
Sequence numbers (e.g. 1, 2, …) | Parsed and ignored |
Timecodes (HH:MM:SS,mmm --> HH:MM:SS,mmm) | Required; milliseconds use a comma separator |
| Multi-line text | Supported; each line within a block becomes a separate text line |
| Empty line | Acts as block separator |
Not recognized
Comments, inline styling tags, and positioning information are not supported.
Example
1
00:00:01,000 --> 00:00:04,000
This is the first subtitle line.
A second line of the same cue.
2
00:00:05,500 --> 00:00:08,000
Another cue.
WebVTT (.vtt)
WebVTT is a text-based format with richer structure than SRT.
Recognized elements
| Element | Handling |
|---|---|
WEBVTT header | Required as the first line of the file |
| Cue numbers | Parsed and ignored |
Timecodes ([HH:]MM:SS.mmm --> [HH:]MM:SS.mmm) | Required; the hours part is optional; both . and , are accepted as the millisecond separator |
Cue settings (align, position, line, size) | Parsed and preserved in the output |
NOTE comment blocks | Skipped (the comment and all subsequent lines up to the next empty line are ignored) |
| Multi-line cue text | Supported |
Not recognized
STYLE blocks (CSS) and REGION definitions are not processed.
Example
WEBVTT
NOTE This is a comment and will be ignored.
1
00:00:01.000 --> 00:00:04.000 align:center position:50%
This cue has positioning settings.
00:00:05.500 --> 00:00:08.000
A cue without a number is also valid.
TTML (.ttml)
TTML (Timed Text Markup Language) is an XML-based format.
Recognized elements
| Element / Attribute | Handling |
|---|---|
<body>/<div>/<p> hierarchy | Standard structure; <p> elements are the cues |
begin on <p> | Required start time |
end on <p> | End time (mutually exclusive with dur) |
dur on <p> | Duration; end time is calculated as begin + dur |
Timestamp format HH:MM:SS.mmm | Standard format |
Timestamp format HH:MM:SS:ff | Frame-based format; frames are converted at 25 fps |
<br/> within <p> | Converted to a new text line within the same cue |
<styling>, <layout>, <region>, inline tts:* attributes | Parsed as part of the XML but stripped from output |
Not recognized
<span> style attributes, TTML metadata, and time offset on <div> are not processed.
Example
<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml">
<body>
<div>
<p begin="00:00:01.000" end="00:00:04.000">First line.<br/>Second line.</p>
<p begin="00:00:05.500" dur="00:00:02.500">Using dur instead of end.</p>
<p begin="00:00:10.000" end="00:00:30:00">Frame-based end time at 25 fps.</p>
</div>
</body>
</tt>
iTT — iTunes Timed Text (.itt)
iTT is Apple's XML-based subtitle format, closely related to TTML.
Recognized elements
| Element / Attribute | Handling |
|---|---|
<div>/<p> hierarchy | Standard structure |
begin on <div> | Global time offset applied to all <p> timestamps |
begin on <p> | Required start time |
end on <p> | Required end time |
Timestamp format HH:MM:SS:ff | Frame-based; frames are converted at 25 fps; negative values are supported |
<br/> within <p> | Converted to a new text line within the same cue |
<span> elements | Content is extracted; style attributes are ignored |
| More than 2 lines per cue | Allowed, but a warning is logged |
Not recognized
xml:lang attributes and span-level styling (font, color, etc.) are not processed.
Example
<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml">
<body>
<div begin="00:00:00:00">
<p begin="00:00:01:00" end="00:00:04:00">
<span>First line.</span><br/><span>Second line.</span>
</p>
<p begin="-00:00:01:00" end="00:00:02:00">Negative begin time (offset by div).</p>
</div>
</body>
</tt>
PAC — Screen Electronics (.pac)
PAC is a binary subtitle format used in broadcast workflows.
Recognized elements
| Element | Handling |
|---|---|
| Timecodes | Parsed |
| Text content | Parsed |
| Position and justification | Parsed; mapped to alignment values in the output |
| Italicization | Parsed |
| Character set / code page | Detected automatically; see supported code pages below |
Supported character sets
| Code page | Script |
|---|---|
| Latin | Western European languages |
| Greek | Greek |
| Latin Czech | Central European languages |
| Arabic | Arabic |
| Hebrew | Hebrew |
| Thai | Thai |
| Cyrillic | Russian and related |
| Chinese Traditional | Traditional Chinese |
| Chinese Simplified | Simplified Chinese |
| Korean | Korean |
| Japanese | Japanese |
Duration alignment
After conversion, the subtitle track is adjusted to match the video stream duration. If the subtitles are longer than the video, trailing cues are removed. If they are shorter, an invisible padding cue is appended so the subtitle track covers the full video length.
Element support summary
| Element | SRT | WebVTT | TTML | iTT | PAC |
|---|---|---|---|---|---|
| Timecodes | Yes | Yes | Yes | Yes | Yes |
| Plain text | Yes | Yes | Yes | Yes | Yes |
| Multi-line cues / line breaks | Yes | Yes | Yes (<br/>) | Yes (<br/>) | Yes |
| Comments | — | Yes (NOTE, skipped) | — | — | — |
| Cue / sequence numbers | Ignored | Ignored | — | — | — |
| Cue settings (align, position…) | — | Preserved | Stripped | — | Converted |
| Inline styling | — | — | Stripped | Stripped | Converted |
| Regions | — | — | Stripped | — | — |
See also: Encoding Phase — Supported Input