Skip to main content

Source Subtitle Formats

The Encoding Service supports five source subtitle formats. This article describes which elements within each format are recognized and how they are handled during processing.

Note: All source formats are converted to WebVTT internally. Styling, positioning, and region information from non-WebVTT formats is not preserved in the output. WebVTT cue settings (e.g. align, position, line, size) are preserved as-is.

SubRip (.srt)

SubRip is a plain-text format with a simple block structure.

Recognized elements

ElementHandling
Sequence numbers (e.g. 1, 2, …)Parsed and ignored
Timecodes (HH:MM:SS,mmm --> HH:MM:SS,mmm)Required; milliseconds use a comma separator
Multi-line textSupported; each line within a block becomes a separate text line
Empty lineActs as block separator

Not recognized

Comments, inline styling tags, and positioning information are not supported.

Example

1
00:00:01,000 --> 00:00:04,000
This is the first subtitle line.
A second line of the same cue.

2
00:00:05,500 --> 00:00:08,000
Another cue.

WebVTT (.vtt)

WebVTT is a text-based format with richer structure than SRT.

Recognized elements

ElementHandling
WEBVTT headerRequired as the first line of the file
Cue numbersParsed and ignored
Timecodes ([HH:]MM:SS.mmm --> [HH:]MM:SS.mmm)Required; the hours part is optional; both . and , are accepted as the millisecond separator
Cue settings (align, position, line, size)Parsed and preserved in the output
NOTE comment blocksSkipped (the comment and all subsequent lines up to the next empty line are ignored)
Multi-line cue textSupported

Not recognized

STYLE blocks (CSS) and REGION definitions are not processed.

Example

WEBVTT

NOTE This is a comment and will be ignored.

1
00:00:01.000 --> 00:00:04.000 align:center position:50%
This cue has positioning settings.

00:00:05.500 --> 00:00:08.000
A cue without a number is also valid.

TTML (.ttml)

TTML (Timed Text Markup Language) is an XML-based format.

Recognized elements

Element / AttributeHandling
<body>/<div>/<p> hierarchyStandard structure; <p> elements are the cues
begin on <p>Required start time
end on <p>End time (mutually exclusive with dur)
dur on <p>Duration; end time is calculated as begin + dur
Timestamp format HH:MM:SS.mmmStandard format
Timestamp format HH:MM:SS:ffFrame-based format; frames are converted at 25 fps
<br/> within <p>Converted to a new text line within the same cue
<styling>, <layout>, <region>, inline tts:* attributesParsed as part of the XML but stripped from output

Not recognized

<span> style attributes, TTML metadata, and time offset on <div> are not processed.

Example

<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml">
<body>
<div>
<p begin="00:00:01.000" end="00:00:04.000">First line.<br/>Second line.</p>
<p begin="00:00:05.500" dur="00:00:02.500">Using dur instead of end.</p>
<p begin="00:00:10.000" end="00:00:30:00">Frame-based end time at 25 fps.</p>
</div>
</body>
</tt>

iTT — iTunes Timed Text (.itt)

iTT is Apple's XML-based subtitle format, closely related to TTML.

Recognized elements

Element / AttributeHandling
<div>/<p> hierarchyStandard structure
begin on <div>Global time offset applied to all <p> timestamps
begin on <p>Required start time
end on <p>Required end time
Timestamp format HH:MM:SS:ffFrame-based; frames are converted at 25 fps; negative values are supported
<br/> within <p>Converted to a new text line within the same cue
<span> elementsContent is extracted; style attributes are ignored
More than 2 lines per cueAllowed, but a warning is logged

Not recognized

xml:lang attributes and span-level styling (font, color, etc.) are not processed.

Example

<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml">
<body>
<div begin="00:00:00:00">
<p begin="00:00:01:00" end="00:00:04:00">
<span>First line.</span><br/><span>Second line.</span>
</p>
<p begin="-00:00:01:00" end="00:00:02:00">Negative begin time (offset by div).</p>
</div>
</body>
</tt>

PAC — Screen Electronics (.pac)

PAC is a binary subtitle format used in broadcast workflows.

Recognized elements

ElementHandling
TimecodesParsed
Text contentParsed
Position and justificationParsed; mapped to alignment values in the output
ItalicizationParsed
Character set / code pageDetected automatically; see supported code pages below

Supported character sets

Code pageScript
LatinWestern European languages
GreekGreek
Latin CzechCentral European languages
ArabicArabic
HebrewHebrew
ThaiThai
CyrillicRussian and related
Chinese TraditionalTraditional Chinese
Chinese SimplifiedSimplified Chinese
KoreanKorean
JapaneseJapanese

Duration alignment

After conversion, the subtitle track is adjusted to match the video stream duration. If the subtitles are longer than the video, trailing cues are removed. If they are shorter, an invisible padding cue is appended so the subtitle track covers the full video length.

Element support summary

ElementSRTWebVTTTTMLiTTPAC
TimecodesYesYesYesYesYes
Plain textYesYesYesYesYes
Multi-line cues / line breaksYesYesYes (<br/>)Yes (<br/>)Yes
CommentsYes (NOTE, skipped)
Cue / sequence numbersIgnoredIgnored
Cue settings (align, position…)PreservedStrippedConverted
Inline stylingStrippedStrippedConverted
RegionsStripped

See also: Encoding Phase — Supported Input

Was this page helpful?