2011 DEC 12 Markability: Capture Web Articles in Markdown

I’ve thrown a little utility up on Github called Markability, which allows you to capture articles from the web in Markdown format, using a combination of Readability (actually, a port of Readability to Python) and html2text.

Haven’t written documentation yet, but it’s as simple as installing a few Python modules from PyPI, cloning my repo and running from the command line.

easy_install lxml
easy_install readability-lxml
easy_install chardet
git clone https://github.com/aaronsw/html2text.git 
cd html2text
python setup.py install
cd ..
git clone https://github.com/evandeaubl/markability.git
cd markability
python markability.py url...

Multiple URLs on the command line are combined into one document with --- (horizontal rules) separating each page (unfortunately, the Python port of Readability does not handle multi-page articles, so that is a by-hand process). Images are not saved locally yet, but that is a planned feature.

This should get one started. This is a tool I plan to use to archive things for myself, so it will grow as I develop new features for my own needs. Or receive pull requests from others. hint, hint :-)

« Return to Main Blog


You've stumbled on the blog of Evan Deaubl. And I'm flattered. Really!

This is my place for writing about my varied interests. Find out about those varied interests here.

I hope you find something useful here, or at least entertaining.

Other Places Where I Am

Twitter (@evandeaubl)
Github (evandeaubl)