Johan Nilssons Lifestream

Increment an index variable based on an existing sequence in Pandas

I will like to know if it's possible to create an new index in pandas based on the evolution of a column values (like history of a variable). Normally I would do this in a looped coding way (for loops or similar) but I'm putting my effort to learn more about Pandas.

In this example the column 'Tags' contains 'tag1', 'tag2' and 'tag3' but they may be mixed when changing from the 1st to the 2nd and from the 2nd to the 3rd (see the example list below).

The new index should begin from 0 at the first row and increment only when 'tag1' appears (but increment only once, not each time that 'tag1 appears). Then it will remain the same while 'Tags' is 'tag2' and 'tag3'. Finally the new index will increment again and only once when the 'tag1' appears again.

If you agree with me, I can run a row-wise operation within a for loop, using a 'idx++', but I want to write a pandas code. I though in pandas.DataFrame.applymap(), but I'm not sure if this is the way.

If you find a correct way of doing this, I'll appreciate your solution. Thanks in advance!

# example input

df= pd.DataFrame(data=phase_col, columns=['Tags'])

# I need this column:
df['new index']='' # increment once for each sequence of tag1->tag3

# example of an ideal output:

    Tags new index
0   tag1    0     
1   tag1    0     
2   tag1    0     
3   tag1    0     
4   tag2    0     
5   tag2    0     
6   tag1    0     
7   tag2    0     
8   tag1    0     
9   tag2    0     
10  tag3    0     
11  tag3    0     
12  tag2    0     
13  tag3    0     
14  tag2    0     
15  tag3    0     
16  tag3    0     
17  tag3    0     
18  tag1    1     
19  tag1    1     
20  tag2    1     
21  tag2    1     
22  tag3    1     
23  tag1    2     
24  tag2    2     
25  tag3    2     
26  tag1    3     

via Stack Overflow

blog comments powered by Disqus
Get the source for phplifestream at Github