Skip to main content

How to automatically create captions for your videos using free tools

If you've ever had to manually add captions and timestamps for an extended length video, you'll understand how painstaking of a process it can be. There are various free and professional products that can recognize speech and transcribe the audio for you to manipulate with captioning software. We'll be outlining one such method using Youtube's automatic and powerful captioning AI and the free and open source software youtube-dl.

Setting expectations

I'd like to limit expectations when using any text-to-speech solution. None of them work 100% of the time. There will be transcribing mistakes that will need to be reviewed and edited. However, the bulk of the work will be transcribed just fine saving you a bunch of time and headache. Secondly, I'm assuming that you need captions for a video that isn't necessarily going to be viewed on a video sharing site like YouTube. Sometimes, you need the captions for other purposes which is our focus here.

Once you've created your content

The first thing you will do is upload your video file to YouTube. I like using the YouTube transcriber since it will also automatically add timestamps to your captions that you can in turn download using the youtube-dl program described later. As an aside, you can optionally use Watson (Yes the jeopardy contest AI) to do the transcribing for you as well but you'll then have to create timestamps yourself.

We'll be uploading video files in private mode. This allows us to keep video files under our control and not share them to the world. Remember, I'm only having YouTube do the quick and dirty transcribing for a video file I want to have mastered locally on my computer. Once transcribed, we can remove it from YouTube entirely if you desire. Once you've created your video file on your computer, do the following.

  1. Navigate to YouTube and log in.
  2. Click the Upload Arrow in the top right.
  3. Set the dropdown to Private.
  4. Drag and drop your video file onto your browser window.

  1. Wait for your video to upload and process.
  2. Once completed, click Done.

Now you wait for YouTube to do its thing transcribing the video's audio file into captions using their AI. The time it can take depends on how busy YouTube's servers are. I've had hour-long videos transcribed in 10 minutes. I've also had ten minute long videos transcribed in a few hours. Patience is key here. You'll know that captions were automatically added when you are able to select the CC option at the bottom of the video.

Extracting the automatically added captions

To get access to the automatically created captions we need to use youtube-dl. Youtube-dl is free and open source software that you can download directly from the maintainers or you can use a package manager such as Brew to download the binaries. For a complete guide on how to install the Brew package manager so you can get access to hundreds of amazing free and open source software right from your terminal, check out our Brew install guide. Assuming you already have Brew installed, do the following.

  1. Open terminal.
  2. Type in brew install youtube-dl.
  3. Hit enter.

Once installed, we can now use youtube-dl to download our captions. Since we've uploaded our video file and set it to private, we'll have to use our YouTube credentials in order to access the video file and extract the captions and timestamps. We'll also avoid re-downloading the video file since we already have it on our computer. This is how we do that.

  1. Open terminal.
  2. Type your info in the following form youtube-dl --write-auto-sub --skip-download -u YourYouTubeUserName -p YourYouTubePassword http://theyoutubeURL
  3. Hit enter.

The command broken down is:

  • The program name youtube-dl.
  • The option to get the automatic captions --write-auto-sub.
  • The option to not re-download the video --skip-download.
  • Adding your username to get access to the private video file -u YourYouTubeUserName.
  • The password for your username -p YourYouTubePassword.
  • And finally, the YouTube URL where the video lives on YouTube servers http://theyoutubeURL.

Adjust your information as needed.

Once you hit enter, your captions will be saved in WEBVTT format with timestamps in the current working terminal directory.

Final comments

So that's it! You can now take those captions and manipulate them as needed. Fix errors, add them to your own publishable videos, etc. There are a lot of other ways to use freely available AI to transcribe your video and audio into text such as SirI and Google Docs. You can see which works best for you and let us know in the comments how you fared!

2 Comments
  • This is a pretty complex set of instructions, especially for people who have never used Terminal before. Even if they're successful in getting the last stage, the example WebVTT captions shown are not "human readable", and very few people will understand WebVTT markup, let alone be successful in editing captions that look like the last image in the article. Users would be better off with a much simpler workflow. 4K Video Downloader will download video (I know you don’t need the video), but more importantly, it downloads subtitles, including YouTube’s auto-captions. It works with private files, and the subtitle format it downloads is Subrip (.srt). (4K Video Downloader is a paid app, but the trial version lets you download several videos and captions) Subrip is very human readable. Users could use a text editor to make changes, or they could use Aegisub, a free subtitle editor available for macOS that’s incredibly easy to use. Once they finish editing, users can upload the fixed captions to YouTube, and delete the auto-caption option. Yes, you need to manually delete auto-captions even if you upload proper English subtitles/closed captions, otherwise you’ll get caption options for both English and English (auto-generated). Lastly, when editing subtitles/closed captions, you might want to include information on captioning practices, such as timing, line breaks, indicating non-dialog sounds, and more. If you want captions to look as professional as your video and sound editing, you need to learn and understand the various rules and guidelines for proper closed captions.
  • I certainly agree it's rather convoluted. It's definitely screaming for an app to create workflows for automating the process. Those will be tricky because of the long and unpredictable delay. I'm going to suggest to the developer of the Downie App ( https://software.charliemonroe.net/downie.php ) to support this. That should be right in Charlie's wheelhouse. Easy transcripts could be a real game-changer for the podcast industry. Not all of it will be good: transcripts will provide an easy way to pierce the opacity of advertising -- and maybe automating mechanisms to skip over the ads. Challenging times! I hope that Leo Laporte discusses this new magical tool on the next TWiT.