Johan Nilssons Lifestream

pandas str.extractall on complete words

I have a column of tweets. I want to get a list of all mentions inside the tweet using the regex:

\@(\w+)

I tried using df.Tweets.str.extractall('\@(\w+)') but it doesn't succeed with matching the entire word as it wants (my guess) to separate each word to many columns. I get the following error

AssertionError: 1 columns passed, passed data had 15 columns.

I'll say that '\@(\w)' works as expected and returns a result but only the first letter. the + for the entire word is probably the root.

This is the ISIS dataset from Kaggle. For example, the first match is on
'Aslm Please share our new account after the previous one was suspended.@KhalidMaghrebi @seifulmaslul123 @CheerLeadUnited'
using .extract() works fine but only finds the first one. using .extractall('\@(\w)') I get:

             0
  match   
8     0      K
      1      s
      2      C

which makes sense. But extracting all the complete words gives an error.

via Stack Overflow

blog comments powered by Disqus
Get the source for phplifestream at Github