How To: read EPUB files from the command line

in linux •  8 years ago  (edited)

Originally posted on my Blog:
manerosss.wordpress.com

Hi!

This post follow a previous one on how to read PDF files from the command line by using poppler

The method used for PDF was to convert them to text or html and then to pipe the output to a pager, browser or editor.

The same method can be used for EPUB files by using a software called epub2text (github page).
But a better way is not to convert them but rather extract the content of an EPUB file by using unzip.

In fact, as written on Wikipedia:

An EPUB file is a ZIP archive that contains, in effect, a website—including HTML files, images, CSS style sheets, and other assets. It also contains metadata.

Therefore the conversion process can be skipped.

To try these commands out I need a EPUB file, Gutenberg.org has plenty of them, I’m going to download Plato – The Republic:

wget https://www.gutenberg.org/ebooks/1497.epub.images -O plato_the_republic.epub

You find other famous books here:


Unzip

The commands are quite simple, to list what’s inside an archive run:

unzip -l plato_the_republic.epub

The output should look like this:

At this point I can pipe the content to stdout so that it appears on the command line.
Now the best way to render this text is with a browser, images and other files are printed as text but html will be properly rendered by the browser.

unzip -p plato_the_republic.epub |w3m

In this case there are no images at all, not even the cover, thus the text looks nice with just some strange stuff at the beginning:

However if your file has images a huge wall of incomprehensible text will appear before the html file that contains the actual book’s words.

Try it for yourself:

wget https://www.gutenberg.org/ebooks/74.epub.images -O mark_twain_the_adventures_of_tom_sawyer.epub

unzip -p mark_twain_the_adventures_of_tom_sawyer.epub |w3m

To avoid this you can either print only the specified file extension:

unzip -l mark_twain_the_adventures_of_tom_sawyer.epub "*.h*"

Or exclude the specified files extension from being piped:

unzip -l mark_twain_the_adventures_of_tom_sawyer.epub -x "*.j*"


epub2txt

This works similar to what showed on the previous post about PDF files, check that page to see some pagers, editors or browser available to display the text.

To use it:

epub2txt plato_the_republic.epub |less


TA SALÜDE

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

It's amazing how much stuff you can do from the command line. I love the versatility of Linux and blog about it on occasion. However I'm not sure if this application would actually help me get more reading done. Maybe if I'm stuck with an extremely old computer with no desktop environment!

Yeah this is in no way better than a graphical EPUB reader (suxh as Mupdf, etc..)
But at least you know hot to open an epub from the command line, this can be very handy for anyone interested in creating scripts.
Cheers!