FRIHOST FORUMS SEARCH FAQ TOS BLOGS COMPETITIONS
You are invited to Log in or Register a free Frihost Account!


A question about dictionary of lists -- Python





davidv
This is a very small and basic Python function I wrote a while ago when I was working on a script that takes in 4 datasets and generates XML, HTML and LaTeX documents after the datasets have been processed.

Code:
{'ptv_show.txt': (['title', 'genre', 'rating', 'year_added', 'num_seasons', 'last_update'], 1), 'ptv_ep.txt': (['title', 'season_id'], 1), 'pmovie.txt': (['title', 'genre_one', 'genre_two', 'published', 'rating', 'date_added'], 1), 'ptv_season.txt': (['title', 'show_id', 'num_eps', 'last_update'], 1)}


The above is a generated argument value. It's a dictionary of a tuples. The key is the name of the processed file which contains those fields. Each tuple has a list and an integer. The list contains the columns of the datasets and the the integer is a field that has been chosen by the user to categorize by. There are other functions within the script that determine which fields can and cannot be used to categorize with... e.g. it would be silly to categorize each tv_show with its title.

Code:
def organize_results(files):
    """ organize_results(dict) -> dict

        Function takes the processed files and moves them all into a
        dictionary, categorizing each entry based on user input.

        The dictionary of categorized files is returned.
    """
    print 'organizing processed files...'
    epic_bitch = {} # x = {'file_name': {'category': ['lines']}}
    for file_name, group in files.iteritems():
        # Unpacks tuples
        fields, group_by = group; temp_dic = {}
        # Generate keys with empty list values
        for line in open(file_name, 'rU'):
            temp_dic[line.split(',')[group_by]] = []
        for line in open(file_name, 'rU'):
            temp_dic[line.split(',')[group_by]].append(line)
        epic_bitch[file_name] = temp_dic
    for k, v in epic_bitch.iteritems():
        print k
        for x,y in v.iteritems():
            print x
            for i in xrange(len(y)):
                print y[i][:-1]


In the code above (please excuse my horrid variable names... also, ignoring the last few lines that displays the dictionary) creates a dictionary where the key is the field e.g. genre. The value is a list of strings containing each line in the file. Because I have 4 different datasets, I don't want them to get mixed up so what I did was have a dictionary of dictionaries where the outer is the file.

You can see that I had to open the file twice. The first set of iterations was to populate the empty dictionary with all the possible keys (category to sort by) and then give the value, an empty list. The second set of iteration was simply the same thing except to add a line to each key, value by appending it to a list. The resulting dictionary looks something like this.

Code:
    for k, v in epic_bitch.iteritems():
        print k
        for x,y in v.iteritems():
            print x
            for i in xrange(len(y)):
                print y[i][:-1]


After using that, of course.

Code:
pmovie.txt
93

The Big Bang Theory S04E01 -- The Robotic Manipulation,93
The Big Bang Theory S04E02 -- The Cruciferous Vegetable Amplification,93
The Big Bang Theory S04E03 -- The Zazzy Substitution,93
The Big Bang Theory S04E04 -- The Hot Troll Deviation,93
The Big Bang Theory S04E05 -- The Desperation Emanation,93
The Big Bang Theory S04E06 -- The Irish Pub Formulation,93
The Big Bang Theory S04E07 -- The Apology Insufficiency,93
The Big Bang Theory S04E08 -- The 21-Second Excitation,93
...

81

Chuck S02E01 -- Chuck Versus the First Date,81
Chuck S02E02 -- Chuck Versus the Seduction,81
Chuck S02E03 -- Chuck Versus the Break-Up,81
Chuck S02E04 -- Chuck Versus the Cougars,81
Chuck S02E05 -- Chuck Versus Tom Sawyer,81
Chuck S02E06 -- Chuck Versus the Ex,81
Chuck S02E07 -- Chuck Versus the Fat Lady,81
Chuck S02E08 -- Chuck Versus the Gravitron,81
Chuck S02E09 -- Chuck Versus the Sensei,81
Chuck S02E10 -- Chuck Versus the DeLorean,81
Chuck S02E11 -- Chuck Versus Santa Claus,81
Chuck S02E12 -- Chuck Versus the Third Dimension,81
Chuck S02E13 -- Chuck Versus the Suburbs,81

...


Of course there's a lot more. I think I have over 3000 lines to process but that's what the result looks like.

So I'm wondering if there's a better way to do this. Rather than have two sets of iterations, just have a single one. Right now if I tried to append to an empty dictionary without first generating a set of keys with an empty list, I get a KeyError. If anyone could explain why this is the case, that'd be great.
cgkanchi
What version of Python are you using? If you're using Python 2.5 or above, there is a nice little class called collections.defaultdict to solve exactly this problem:
Code:

... skipping lines
    dictFactory = lambda: ([],0)
    epic_bitch = collections.defaultdict(dictFactory)
    for file_name, group in files.iteritems():
        # Unpacks tuples
        fields, group_by = group
        temp_dic = {}
        # Generate keys with empty list values
        for line in open(file_name, 'rU'):
            temp_dic[line.split(',')[group_by]].append(line)
        epic_bitch[file_name] = temp_dic
...skipping lines

I obviously can't test this, but this should work. If you have any more questions, feel free to ask.

Cheers,
cgkanchi
davidv
This was quite an old post. It's interesting and also very embarrassing to see code I wrote a while ago. There has been lots of revisions made upon the code base but I've stopped. Just looking at that piece of code makes me blush. Embarassed
jcreus
davidv wrote:
This was quite an old post. It's interesting and also very embarrassing to see code I wrote a while ago. There has been lots of revisions made upon the code base but I've stopped. Just looking at that piece of code makes me blush. Embarassed

It happens exactly the same to me. Sometimes, I've been looking at some code I wrote 2/3 years ago... and I say "Gosh! No classes! How ugly" And I keep programming ugly (with classes, however). Sometimes, I must say I do not understand code I wrote time ago. "My hobby: writing list comprehensions in Python" has long-time consequences Smile
davidv
jcreus wrote:
"My hobby: writing list comprehensions in Python" has long-time consequences Smile


One of the many reasons why I love Python. All my recent projects involved tonnes of lambda functions, list comprehensions, map, filter, etc. It's my guilty pleasure. I try to avoid it at work but my mind just goes to lambdas as simple, single lined solutions (I really should take a look at Haskell). Oh how I will regret those decisions in the future.
Related topics
Reply to topic    Frihost Forum Index -> Scripting -> Others

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.