Created on Wed Sep 09, 2009 9:27 pm with 8 blog posts
This blog will contains posts about things I learn while setting up a geo-locative game. Languages/Technologies - Python, sqlite, TurboGears
A follow up on the SQLAlchemy vs sqlite post.
Its only now I realize what a bad idea the SQLAlchemy approach is for loading data. While the ORM is good for querying which is what I plan to use, I finally have a solution for loading data using sqlite, and some fast Python algorithms for parsing the data.
It took me a lot of tweaking but I have finally brought the load time down from 30 minutes initially, going up to 2 hours using SQLAlchemy to 5 minutes now. And I think it can be tweaked even further. If anyone is reading this blog at all and is interested, let me know and I can post the files up somewhere.
Here are some links I found that were really helpful in tweaking my Python code:
http://www.python.org/doc/essays/list2str.html - Guido Van Russom's post about optimization
http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results - The Wide-Finder project. An interesting project about parsing text using different approaches and different scripting languages
http://effbot.org/zone/wide-finder.htm - A really useful article about optimization techniques for parsing with Python.
I'm trying to load grid data for a map of about 300x300 grids. Each grid has attributes such as the material in it etc.
I have the data in flat text files and I decided to load it into a sqlite db for queries and fast lookups.
I started with a straight-forward read and load into sqlite using Python's built in sqlite module. The load time was quite large as can be expected for some 3 million rows. I unfortunately don't have the exact numbers, but the total load was within about 30 mins.
Since I need different views of the data and for various other reasons I decided to switch to using SQLAlchemy to access the data.
While SQLAlchemy allowed me to quickly setup the configuration and define my data access classes, the load itself is taking forever. I am writing now even as it is loading, and it seems that the load will take something like 4-6 hours.
I must be doing something differently in SQLAlchemy vs sqlite for there to be such a big difference. I can't believe that an ORM layer like SQLAlchemy would add such a big overhead for inserting data into tables.
More updates when I figure out whats going on.
If you are having trouble with multiple instances of Python (2.4, 2.5, 2.6) running on the same instance, or have development libraries that you want to try out but not necessarily add to the main python libs.. then virtualenv seems to be a good solutions, though it has its share of problems.
But basically the combination of virtualenv and easy_install, allows you to create a separate working instance of python on your machine, into which you can install whatever libraries you need. You can even think of it as being analogous to the setup in a virtual hosting environment. You install the libraries you need and run the code. When you are done you can even move/remove the virtual host, but it does not effect the system wide libraries etc.
Both virtualenv and easy_install are available on pypi.python.org (The Cheeseshop).
easy_install also appears to have a replacement in pip, though I haven't used it much myself. I have been reading good things about it.
I use EmergeDesktop and Samurize to customize my Windows XP Proff environment.
I looked around for Samurize scripts and configs which would allow me to monitor the health of a remote machine. So far I haven't found any, so I am thinking of rolling my own scripts.
This is more of a low priority to-do but should be kind of interesting.
Tools I am planning to use - PuttyCS (http://www.millardsoftware.com/puttycs). Probably some Python scripts on the linux server side that can parse server logs, check memory, CPU utilizations etc.
If anyone has a better idea or solution, or has configs for Samurize to do this, let me know!
My development environment before starting on this project consisted of simple text editors and compilers.
and compilers from the MSys/MinGW package. While this had served me well for things like compiling gimp plugins with python. It really came short with one particular tool called Mapnik.
Mapnik was a pain to setup and install using MSys as well as its required libraries of Boost. After repeatedly installing packages and failing compliations I decided to scrap the MSys approach and get a real dev environment.
I had played around with CoLinux a while back and decided to see how their new versions were doing.
I set up CoLinux (http://www.colinux.org/) with an Ubuntu 9.0 disk image. Disk images are basically windows files that act like a hard disk for CoLinux. So you can go from installing a really small Linux distro, theoretically of 10-20 MB and play with it, or install a full scale system (mine is currently at 4GB).
It was such a breeze using aptitude to setup the required packages and have everything install with a few simple commands.
Mapnik has installed but the TileCache TurboGears example refused to display any map in my case.
Back to debugging it. More info when I get it working.
--> All blog posts