What’s CCExtractor?

A tool that analyzes video files and produces independent subtitle files from the closed captions data. CCExtractor is portable, small, and very fast. It works in Linux, Windows, and OSX.

What kind of closed captions does CCExtractor support?

Almost all of them:

Missing:

How easy is it to use CCExtractor?

Very. Just tell it what file to process and it does everything for you.

CCExtractor integration with other tools

It is possible to integrate CCExtractor in a larger process. A couple of tools already call CCExtractor as part their video process - this way they get subtitle support for free. Starting in 0.52, CCExtractor is very front-end friendly. Front-ends can easily get real-time status information. The GUI source code is provided and can be used for reference. Any tool, commercial or not, is specifically allowed to use CCExtractor for any use the authors seem fit. So if your favourite video tools still lacks captioning tool, feel free to send the authors here.

You can also use CCExtractor as a library (as opposed to just running the binary), or take parts of the code. Keep in mind however that CCExtractor is GPLv2 so if you take parts or all of the source code your code must also be GPLv2.

What’s the point of generating separate files for subtitles, if they are already in the source file?
There are several reasons to have subtitles separated from the video file, including:

How I do use subtitles once they are in a separate file?
CCExtractor generates files in the two most common formats: .srt (SubRip) and .smi (which is a Microsoft standard). Most players support at least .srt natively. You just need to name the .srt file as the file you want to play it with, for example sample.avi and sample.srt.

Other formats just as .txt (transcripts) are supported as well.

What kind of files can I extract closed captions from?
CCExtractor currently handles:

Usually, if you record a TV show with your capture card and CCExtractor produces the expected result, it will work for your all recordings. If it doesn’t, which means that your card uses a format CCExtractor can’t handle, please contact me and we’ll try to make it work.

Can I edit the subtitles?
.srt files are just text files, with time information (when subtitles are supposed to be shown and for how long) and some basic formatting (use italics, bold, etc). So you can edit them with any text editor. If you need to do serious editing (such as adjusting timing), you can use subtitle editing tools - there are many available.

Can CCExtractor generate other subtitles formats?
At this time, CCExtractor can generate .srt, .smi and raw and bin files.

What’s a raw file?
A raw file is a file that contains an exact dump of the closed captions bytes, without any processing. This lets you use any tool of your choice to process the data. For example, McPoodle’s excellent tools can generate subtitles files in several formats, adjust timing, etc.

What’s a bin file? How is it different from a raw file?
A bin file contains a dump of the closed captions bytes (same as a raw file) but it also includes timing information. This is a format that we made up for CCExtractor, i.e. it’s not any kind of industry standard. However, it’s the most useful (to us) for debugging purposes, so if you need to send us a sample please use this format. Also, a bin format can hold several CC streams (several languages, even from both analog and digital). A raw file cannot.

How long does it take to process a MPEG file?
Obviously, it depends on the computer and the length of the file. In my (really old) computer it took around 90 seconds for a 45 minutes show in HDTV, with CPU usage around 3% (I/O operations are what’s holding it back). Currently (2018) we’re processing as many as 20 TV channels in real time using a single computer with a i5 CPU.

What platforms does CCExtractor work on?
CCExtractor is developed and tested in Windows and Linux. It is also known to compile and run fine in OSX (a build script is included in the source .zip).

Where can I download it?
The source code is hosted on github. Check out our download page for links to everything. Old versions were hosted on sourceforge. We’re keeping those there for statistical purposes. This is the old download page and this is the old project summary page.

How I can contact the author?
There’s no longer one author. Carlos is still the official maintainer but there’s a lot of people contributing to the project. Best thing is to check out our support page.

How do I use this tool (parameters, etc)?

Run it without parameters and you will get a help screen. Basically, you just give it the input file name, like this:

ccextractor the.sopranos.ts

As for the lack of documentation: There is no lack of documentation! It’s just included in the program itself. Just run it without parameters and you will get complete details.

How can I contribute to this project?
There are several ways:

Does CCExtractor use code from other projects?
Yes. Lots of code came originally from McPoodle’s tools (even though it was ported from Perl to C). We’ve also taken code from MythTV (which in turn took some from other places) and FFmpeg. The teletext code is 95% Petr Kutalek’s and was integrated with permission.

A good thing about Open Source is that you don’t need to reinvent the wheel unless you want to (or unless you think you can come up with a ‘rounder’ wheel).

** If you like CCExtractor but can’t submit code patches, or video samples, you can contribute a bit by inviting the developers to a beer which is just as fine as all other kinds of support.**