Convert dvdsub into srt subtitles
Written on
I wanted to extract subtitles from a bunch of .mp4
files,
as they were containing song names playing in the background of the movie.
This was more complicated than I thought.
It turns out video subtitles can come in very different formats.
One of them is dvdsub
, and it’s a bitmap subtitle format.
To convert it into something you can grep, you have to go through a lot of steps including OCR.
Fortunately, good people created tools to do just that.
ffmpeg for the win
I loved ffmpeg
once I discovered it many years ago.
I really don’t like bloated constrained GUI-based apps
to manipulate video and audio files.
It can help with subtitles as well.
To install on MacOS you can use homebrew:
brew install ffmpeg
On Linux it should be available in your package manager. On Windows probably the best solution is to use WSL2.
Check your subs
ffmpeg
can help you verifying that your subs are actually in dvdsub
format.
Use -i
and point to the video file:
ffmpeg -i <video-file>
e.g. ffmpeg -i video.mp4
Output should look like this:
<...snip...> Stream #0:0[0x1](eng): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 448x336 [SAR 1:1 DAR 4:3], 825 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default) Metadata: creation_time : 2007-09-27T17:47:49.000000Z handler_name : Video Media Handler vendor_id : [0][0][0][0] Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 74 kb/s (default) Metadata: creation_time : 2007-09-27T17:47:49.000000Z handler_name : Sound Media Handler vendor_id : [0][0][0][0] Stream #0:2[0x3](eng): Subtitle: dvd_subtitle (dvdsub) (mp4s / 0x7334706D), 720x480, 0 kb/s (default) Metadata: creation_time : 2007-09-27T17:47:57.000000Z handler_name : Unspecified At least one output file must be specified
The part we’re interested in is
Subtitle: dvd_subtitle (dvdsub)
.
This confirms that our subtitles are bitmap-based,
and are one of the three streams in the file.
Indexing is 0-based with a number after Stream #
,
i.e. 0:2
in case of subtitle stream.
Stream index is required in the next step.
Extract subs into separate container
ffmpeg
comes in handy again:
ffmpeg -i <video-file> -map <stream-index> -c:s copy <output-file>.mkv
e.g.
ffmpeg -i video.mp4 -map 0:2 -c:s copy output.mkv
Convert mkv
into sub
and idx
files
This step requires mkvextract. Again, you can install it with homebrew:
brew install mkvtoolnix
Then mkvextract
should be available.
You can use it to create sub
and idx
file from mkv
:
mkvextract tracks <output-file>.mkv 0:<output-file>.sub
e.g. mkvextract tracks output.mkv 0:output.sub
We do it because OCR tools support this format instead of mkv
.
Use OCR to get srt
file
This step took me the most, as the tool we’re using here is pretty old, and I had to manually clone repo and tweak it to be able to compile it.
Install VobSub2SRT
You can use my fork, so you don’t have to apply the fix by yourself.
The "fix" was bumping the minimum required version of cmake
which I did in
this one-liner.
Use my homebrew tap
I was able to set up a private homebrew tap,
so it’s possible to install my fixed version with these two commands.
Homebrew will take care of build dependencies
(or at least for cmake
and tesseract
):
brew tap tpwo/vobsub2srt brew install --HEAD tpwo/vobsub2srt/vobsub2srt
Setting this up was new for me.
It turns out that Homebrew requires
<user>/homebrew-<name>
repo to be present,
so I created it.
This repo contains the
formula
to install VobSub2SRT
from my fork.
Manual installation
If you prefer manual installation, it’s also possible. Before compilation, you have to install build dependencies:
brew install cmake brew install tesseract
Then you can clone my fork and compile the tool:
git clone https://github.com/tpwo/VobSub2SRT
cd VobSub2SRT
# Then you can follow original README
./configure
make
# `vobsub2srt` should be present in `build/bin`
Original README suggests sudo make install
,
but I just grabbed the compiled binary and moved it to my PATH in ~/.local/bin
.
Then you can finally convert sub
and idx
file into srt
file:
# Note we don't pass file extension here vobsub2srt <output-file>
e.g. vobsub2srt output
Script to run it all with one command
I had a bunch videos with embedded subtitles,
so I created a Bash script to quickly run it with for f in *
:
title="$1"
title_wo_suffix="${title%.*}"
outdir=music
stream=$(ffmpeg -i "$title" 2>&1 | grep 'dvdsub' | awk -F '[#\[]' '{print $2}')
if [[ -z "$stream" ]]; then
echo 'No subtitles track in the video!'
exit 1
fi
mkdir -p "$outdir"
ffmpeg -i "$title" -map "$stream" -c:s copy "$outdir"/"$title_wo_suffix".mkv
mkvextract tracks "$outdir"/"$title_wo_suffix".mkv 0:"$outdir"/"$title_wo_suffix".sub
vobsub2srt "$outdir"/"$title_wo_suffix"
To run it, save it under <name>
,
do chmod +x <name>
,
and run:
./<name> <video-file>.<ext>
e.g. ./extract-subs video.mp4
.
To run it for e.g. all .mp4
files in the current folder do:
for f in *.mp4; do ./extract-subs.sh "$f"; done
Sources and further reading
- Original VobSub2SRT repo (I forked a fork!):
https://github.com/ruediger/VobSub2SRT - Tutorial describing alternative flow with
ffmes
. I figured VobSub2SRT is easier to run on Apple Sillcon Macbook:
https://subarashii-no-fansub.github.io/Subbing-Tutorial/Convert-vobsub-file-into-srt/