Tuesday, December 4, 2012

The #anzmaz12 hashtag data is building

I was interested to see how it goes. Last year I crunched a whole heap of stuff, calculated Pareto shares and even estimated Negative Binomial Distribution (NBD) parameters for a bunch of those hashtags. It was all fun enough but never really made it into the little five pager. This year:
Just quietly the 20% Pareto share is 85%. The total number of tweets made by the top 20% of tweeters (all six of them) is 578 out of about 679. Yes the Pareto effect is in play but we would know that, because as soon as you get some light / some heavy you'll get some sort of Pareto effect.

This all looks NBD above, just that we don't have a count of the "zero tweeters" which is what you really need if you're an NBD purist. Now that's a whole other question. How many zeroes should there be and how do we classify the population:

  • all conference delegates?
  • all conference delegates with a twitter account?
  • all conference delegates with a twitter account who have viewed the #anzmac12 hashtag?

I did it - stochastically - for an EMAC paper back in 2004 by estimating parameters for a truncated (at zero) NBD and "backfitted" the number of zeroes. I still wonder about that - it sounds a little circular. But these are the games we play - the reviewers didn't mind the idea too much so it got in.

Still, there we are - we have a set of data that is growing and it'll be interested to see how it fills out tomorrow. Leave aside the fact that perhaps the tweets per day activity might be growing - at least among the lighter tweeters.

So many questions...

No comments:

Post a Comment