You are invited to Log in or Register a free Frihost Account!

IPython and lxml

I have been playing a bit with ipython and lxml these days.
IPython is a powerful and interactive shell for Python. It supports browser based notebooks with support for code, text ( actually html markup ), mathematical expressions, inline plots and other rich media. Nice intro here:

Another nice demo what you can do with ipython actually is the pandas demo video here.
Several additional packages need to be installed first to really be able to use all these features, like pandas, mathplotlib or numpy. A good idea it is to install the entire scipy stack, as described here.
I did the installation first on my windows thinkpad and later on on a Mint Linux box.
This is some work to get thru, like bumping into missing dependencies and installing those first, or try several installation methods in case of problems. Sometimes it is better to take a compiled binary, sometimes using pip install, sometimes fetching a source code package and going from there.
I finally succeeded on both my machines. Next step was to figure out how to run an ipython notebook server, because using ipython notebooks in a browser is the most efficient and fun way to work with ipython. For Windows there are useful instructions here, on my Linux Mint machine it worked very differently, working instructions I finally found here.
After that I developed my first notebook using lxml, called GetTableFromWikipedia, which basically goes out on a wikipedia page ( im my case the one about Chemical Elements ) and fetch a table from there ( in my case table # 10 with a list of chemical elements ), retrieves that table using lxml and xpath code and converts it to csv.
The nice thing about ipython is that you can write code into cells and then just re-run those cells to see results immediately in the browser. This makes it very efficient and convenient to develop code by simply trying, or to do a lot "prototyping" -- which sounds more professional.
Having an ipython notebook server running locally on your machine is certainly a must for developing a notebook. But how to share notebooks with others ? I found allowing to share notebooks with the public. You have to store your notebook somewhere in the cloud and pass the URL to the nbviewer. I uploaded my notebook to one of my dropbox folder and here we go: have a look ! Unfortunately it is not possible to actually run the notebook with nbviewer ( nbviewer basically converts a notebook to html ).
My notebook of course works with other tables too, like the List of rivers longer than 1000 km, published in this wikipedia article as table # 5.

0 blog comments below

© 2005-2011 Frihost, forums powered by phpBB.