Minority Report-style pre-crime prediction using Twitter data: possible?

October 8, 2013

Retweet graph of Project X dataset (snapshot at 2012-09-26 04:32:15). (credit: Marijn ten Thij/University o Twente)

It’s the year 2054, where “PreCrime,” a specialized police department, apprehends criminals based on foreknowledge provided by three psychics called “precogs.”

That’s the premise of the Minority Report film. Now University of Twente (UT) mathematics student Marijn ten Thij* has developed a mathematical model that he says could achieve a similar result (but not limited to crimes) by analyzing tweets — perhaps similar to what the NSA is already doing.

Retweets gone wild

In 2008, a party invitation (called “Project X“) went viral on Facebook, ended in rioting and injury after an estimated 30,000 revelers descended on Haren, a small town in the Netherlands. The event also inspired a film of the same name.

ten Thij analyzed all of the retweets about the event for the period before the riots and after. To do that, he entered the retweets into a mathematical model he developed that he says can simulate how Twitter users are connected to each other through retweets.**

“If a trend is connected with an event in real life, we see that different user groups retweet each other’s messages and that users more frequently tweet on the same topic. In the Project X data, we saw this [retrospectively] a day before the event itself happened.”

Nelly Litvak, senior lecturer in the department of Stochastic Operational Research  at the university and ten Thij’s supervisor, said he (Litvak) was “awarded a grant by Google for our research into trend detection and we will certainly continue with our work.”

* Marijn ten Thij studied Applied Mathematics at the University of Twente. He recently graduated from the department of Stochastic Operations Research under Prof. Richard J. Boucherie, PhD. This research is carried out at the Centre for Telematics and Information Technology (CTIT). ten Thij completed his thesis, titled “Modelling trends in social media,” at TNO (Netherlands Organisation for Applied Scientific Research) in the Performance of Networks and Systems expertise group (PONS). TNO carries out research on the societal impact of social media, which includes trend detection. The University of Twente also conducts research on trend detection.

** To see if he could predict a trend independent of the user information, ten  Thij purposefully omitted any information about the Twitter users themselves from consideration in his research. For example, his research did not include whether a particular Twitter user was influential, such as a Dutch celebrity.

Follow-up KurzweailAI conversation with ten Thij

Can you describe your analysis in more detail?

Most current studies into trend prediction use the number of tweets that is posted about a given topic or the actual contents of the messages. Our approach is to analyze the expansion of a topic through the Twittersphere. Using this analysis, we constructed a mathematical model that can simulate the progression of a topic. Through this model, we aim to derive an insight in the critical value for a phase transition where a topic becomes trending. We will investigate this phase transition in more detail in future work.

What did you learn specifically?

We analyzed the number of persons in the graph and the connections between them. We found that the majority of the people discussing this event were located in a single component. But since this did not hold for another dataset we used (concerning the WC speedskating single distances 2013 in Sotsji), we have to investigate this further.

What prediction accuracy did you have and how many hours in advance?

These are questions that we cannot answer thus far; we will have to perform more research. In our study, we only used three datasets. We are currently looking for more datasets to fine-tune our findings.

Is this a useful forecasting tool? If so, when will the model be available?

Since our model cannot decide on how large the actual graph becomes, it cannot give a clear forecast at the moment. However, there are models that possess this power, so combining them could be interesting.

Can you provide some additional interesting examples of the model’s use?

One example that comes to my mind is a demonstration for which people tweet/retweet to others to make them join the demonstration.

What other research is being done in this area?

Here are three examples:

Predicting Retweet Behavior in Weibo Social Network
Analyzing Big Data with Twitter
Prediction of retweet cascade size over time