tpwo.github.io

A personal blog

Convert dvdsub into srt subtitles

Written on

I wanted to extract subtitles from a bunch of .mp4 files, as they were containing song names playing in the background of the movie.

This was more complicated than I thought.

It turns out video subtitles can come in very different formats. One of them is dvdsub, and it’s a bitmap subtitle format.

To convert it into something you can grep, you have to go through a lot of steps including OCR.

Fortunately, good people created tools to do just that.

ffmpeg for the win

I loved ffmpeg once I discovered it many years ago. I really don’t like bloated constrained GUI-based apps to manipulate video and audio files.

It can help with subtitles as well.

To install on MacOS you can use homebrew:

brew install ffmpeg

On Linux it should be available in your package manager. On Windows probably the best solution is to use WSL2.

Check your subs

ffmpeg can help you verifying that your subs are actually in dvdsub format. Use -i and point to the video file:

ffmpeg -i <video-file>

e.g. ffmpeg -i video.mp4

Output should look like this:

<...snip...>
Stream #0:0[0x1](eng): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 448x336 [SAR 1:1 DAR 4:3], 825 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
      Metadata:
        creation_time   : 2007-09-27T17:47:49.000000Z
        handler_name    : Video Media Handler
        vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 74 kb/s (default)
      Metadata:
        creation_time   : 2007-09-27T17:47:49.000000Z
        handler_name    : Sound Media Handler
        vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](eng): Subtitle: dvd_subtitle (dvdsub) (mp4s / 0x7334706D), 720x480, 0 kb/s (default)
      Metadata:
        creation_time   : 2007-09-27T17:47:57.000000Z
        handler_name    : Unspecified
At least one output file must be specified

The part we’re interested in is Subtitle: dvd_subtitle (dvdsub). This confirms that our subtitles are bitmap-based, and are one of the three streams in the file.

Indexing is 0-based with a number after Stream #, i.e. 0:2 in case of subtitle stream.

Stream index is required in the next step.

Extract subs into separate container

ffmpeg comes in handy again:

ffmpeg -i <video-file> -map <stream-index> -c:s copy <output-file>.mkv

e.g.

ffmpeg -i video.mp4 -map 0:2 -c:s copy output.mkv

Convert mkv into sub and idx files

This step requires mkvextract. Again, you can install it with homebrew:

brew install mkvtoolnix

Then mkvextract should be available. You can use it to create sub and idx file from mkv:

mkvextract tracks <output-file>.mkv 0:<output-file>.sub

e.g. mkvextract tracks output.mkv 0:output.sub

We do it because OCR tools support this format instead of mkv.

Use OCR to get srt file

This step took me the most, as the tool we’re using here is pretty old, and I had to manually clone repo and tweak it to be able to compile it.

Install VobSub2SRT

You can use my fork, so you don’t have to apply the fix by yourself. The "fix" was bumping the minimum required version of cmake which I did in this one-liner.

Use my homebrew tap

I was able to set up a private homebrew tap, so it’s possible to install my fixed version with these two commands. Homebrew will take care of build dependencies (or at least for cmake and tesseract):

brew tap tpwo/vobsub2srt
brew install --HEAD tpwo/vobsub2srt/vobsub2srt

Setting this up was new for me. It turns out that Homebrew requires <user>/homebrew-<name> repo to be present, so I created it. This repo contains the formula to install VobSub2SRT from my fork.

Manual installation

If you prefer manual installation, it’s also possible. Before compilation, you have to install build dependencies:

brew install cmake
brew install tesseract

Then you can clone my fork and compile the tool:

git clone https://github.com/tpwo/VobSub2SRT
cd VobSub2SRT

# Then you can follow original README
./configure
make

# `vobsub2srt` should be present in `build/bin`

Original README suggests sudo make install, but I just grabbed the compiled binary and moved it to my PATH in ~/.local/bin.

Then you can finally convert sub and idx file into srt file:

# Note we don't pass file extension here
vobsub2srt <output-file>

e.g. vobsub2srt output

Script to run it all with one command

I had a bunch videos with embedded subtitles, so I created a Bash script to quickly run it with for f in *:

title="$1"
title_wo_suffix="${title%.*}"

outdir=music

stream=$(ffmpeg -i "$title" 2>&1 | grep 'dvdsub' | awk -F '[#\[]' '{print $2}')

if [[ -z "$stream" ]]; then
    echo 'No subtitles track in the video!'
    exit 1
fi

mkdir -p "$outdir"

ffmpeg -i "$title" -map "$stream" -c:s copy "$outdir"/"$title_wo_suffix".mkv

mkvextract tracks "$outdir"/"$title_wo_suffix".mkv 0:"$outdir"/"$title_wo_suffix".sub

vobsub2srt "$outdir"/"$title_wo_suffix"

To run it, save it under <name>, do chmod +x <name>, and run:

./<name> <video-file>.<ext>

e.g. ./extract-subs video.mp4.

To run it for e.g. all .mp4 files in the current folder do:

for f in *.mp4; do ./extract-subs.sh "$f"; done

Sources and further reading

✵