Johan Nilssons Lifestream

Adding Columns in Loop to Pandas DataFrame

I'm struggling to understand the behaviour of Python's Pandas library when adding columns in a loop to a dataframe. I want to loop through a list of objects (these are actually tuples of dates) adding a number of columns in each loop. A simplified version of this is as follows:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(6).reshape(2, 3), columns=('a', 'b', 'c'))

for x in range(10):

    # Printed on each loop:
    print('Adding column type 1')
    df['{}_type1'.format(x)] = 'Type 1'

    # Printed on last loop only:
    print('Adding column type 2')
    df['{}_type2'.format(x)] = 'Type 2'

I would expect this to add 20 new columns to the dataframe (2 per loop), but instead it adds 11 columns; the first 10 of 'Type 1', and the 11th of 'Type 2'. Further, the first print is outputted 10 times but the second only once:

Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 1
Adding column type 2

I am new to Pandas so may be missing something fundamental but this seems like a bug to me, perhaps a rogue continue in the logic that does the vectorised operation? Any thoughts or explanations would be greatly welcomed.

Thanks, Dominic

via Stack Overflow

blog comments powered by Disqus
Get the source for phplifestream at Github