2011 DEC 12 Markability: Capture Web Articles in Markdown
I’ve thrown a little utility up on Github called Markability, which allows you to capture articles from the web in Markdown format, using a combination of Readability (actually, a port of Readability to Python) and html2text.
Haven’t written documentation yet, but it’s as simple as installing a few Python modules from PyPI, cloning my repo and running from the command line.
easy_install lxml
easy_install readability-lxml
easy_install chardet
git clone https://github.com/aaronsw/html2text.git
cd html2text
python setup.py install
cd ..
git clone https://github.com/evandeaubl/markability.git
cd markability
python markability.py url...
Multiple URLs on the command line are combined into one document with --- (horizontal rules) separating each page (unfortunately, the Python port of Readability does not handle multi-page articles, so that is a by-hand process). Images are not saved locally yet, but that is a planned feature.
This should get one started. This is a tool I plan to use to archive things for myself, so it will grow as I develop new features for my own needs. Or receive pull requests from others. hint, hint :-)
You've stumbled on the blog of Evan Deaubl. And I'm flattered. Really!
This is my place for writing about my varied interests. Find out about those varied interests here.
I hope you find something useful here, or at least entertaining.
Other Places Where I Am