About The Project

Sick of the city, you need some perspective. You catch the red eye to England to find out more about the masterminds behind Phrase Detectives.

As they welcome you into their brighly lit University, you realise they aren't so shadowy after all.

More about the game...

The Phrase Detectives game has been developed as a fun way for people to collaborate in creating large linguistic resources that will be used to further language technology used on the Internet, in business and on home computers. It is an example of a "Game With A Purpose" or GWAP, where a gaming environment is used for completing tasks on a scale not possible in more traditional ways. The data collected from the game is compiled to create an annotated corpus (a collection of files that have been worked on enough by humans as to be useful for computational purposes). This methodology has been referred to as human computation or the Wisdom of the Crowds, where large groups of people collaboratively come up with good answers.

The original Phrase Detectives game was released in 2008 and collects collaborative anaphoric decisions from online volunteers. As of December 2018 the game has collected over 4 million examples of human language in the database submitted by 60000 players, a collaborative effort of over 9500 hours or 395 days. Exported data from the game shows that the combined answers of players gives a very high quality result.

Read more about the game in these articles:
Innovations Report
Science Daily

Disagreements and Language Interpretation (DALI) Project

Natural language expressions are supposed to be unambiguous in context. Yet more and more examples of use of expressions that are ambiguous in context, yet felicitous and rhetorically unmarked, are emerging. In previous work, we demonstrated that ambiguity in anaphoric reference is ubiquitous, through the study of disagreements in annotation, that we pioneered in CL. Since then, additional cases of ambiguous anaphoric reference have been found; and similar findings have been made for other aspects of language interpretation, including wordsense disambiguation, and even part-of-speech tagging. Using the Phrase Detectives Game-With-A-Purpose to collect massive amounts of judgments online, we found that up to 30% of anaphoric expressions in our data are ambiguous. These findings raise a serious challenge for computational linguistics (CL), as assumptions about the existence of a single interpretation in context are built in the dominant methodology, that depends on a reliably annotated gold standard.

The goal of DALI is to tackle this fundamental issue of disagreements in interpretation by using computational methods for collecting and analysing such disagreements, some of which already exist but have never before been applied in linguistics on a large scale, some we will develop from scratch. First of all, we will develop more advanced games-with-a-purpose to collect massive amounts of data about anaphora from people playing a game.

Secondly, we will use Bayesian models of annotation, widely used in epidemiology but not in linguistics, to analyse such data and identify genuine ambiguities; doing this for anaphora will require novel methods. Third, we will use these data to revisit current theories about anaphoric expressions that do not seem to cause infelicitousness when ambiguous. Finally, we intend to develop the first supervised approach to anaphora resolution that does not require a gold standard as a blueprint for other areas.

The original AnaWiki Project

Creating anaphorically annotated resources through Web cooperation

The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora, but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than one million words.

However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to contribute to collaborative resource creation. AnaWiki is a project that develops tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora).


June 2019
This year has been a busy year for the DALI team analysing and publishing results from Phrase Detectives, including releasing the second version of the annotated corpus. Read the latest publications about the project on the about page.
April 2019
Head office confirm prizes will be continuing into 2019. Great news for you hard working players!
December 2018
We're back! The geeks have been working hard to produce a new version of Phrase Detectives that takes the best bits of the Facebook version (RIP) and updates the original game. Phrase Detectives version 2 may have a few teething problems as players are integrated over so get in contact is something odd happens and we'll get it fixed.
Sept 2018
The team met up at Queen Mary University to discuss progress of Year 2 in the DALI project. Some of our goals are well ahead, which is great news.
May 2018
The team ran a workshop at LREC'18 called Games4NLP where researchers and game developers got together to share ideas.
