More about the game...
The Phrase Detectives
game has been developed as a fun way for people to collaborate in creating large
linguistic resources that will be used to further language technology used on the Internet, in business and on
home computers. It is an example of a "Game With A Purpose" or GWAP, where a gaming environment is used for
completing tasks on a scale not possible in more traditional ways. The data collected from the game is compiled
to create an annotated corpus (a collection of files that have been worked on enough by humans as to be useful
for computational purposes). This methodology has been referred to as human computation or the
Wisdom of the Crowds, where large groups of people collaboratively come up with good answers.
The original Phrase Detectives
game was released in 2008 and collects collaborative anaphoric decisions from online volunteers.
As of December 2018 the game has collected over 4 million examples of human language in the database submitted by 60000 players,
a collaborative effort of over 9500 hours or 395 days.
Exported data from the game shows that the combined answers of players gives a very high quality result.
Read more about the game in these articles:
Natural language expressions are supposed to be unambiguous in context. Yet more and more examples of use of expressions that are ambiguous in context, yet felicitous and rhetorically unmarked, are emerging. In previous work, we demonstrated that ambiguity in anaphoric reference is ubiquitous, through the study of disagreements in annotation, that we pioneered in CL. Since then, additional cases of ambiguous anaphoric reference have been found; and similar findings have been made for other aspects of language interpretation, including wordsense disambiguation, and even part-of-speech tagging. Using the Phrase Detectives
Game-With-A-Purpose to collect massive amounts of judgments online, we found that up to 30% of anaphoric expressions in our data are ambiguous. These findings raise a serious challenge for computational linguistics (CL), as assumptions about the existence of a single interpretation in context are built in the dominant methodology, that depends on a reliably annotated gold standard.
The goal of DALI is to tackle this fundamental issue of disagreements in interpretation by using computational methods for collecting and analysing such disagreements, some of which already exist but have never before been applied in linguistics on a large scale, some we will develop from scratch. First of all, we will develop more advanced games-with-a-purpose to collect massive amounts of data about anaphora from people playing a game.
Secondly, we will use Bayesian models of annotation, widely used in epidemiology but not in linguistics, to analyse such data and identify genuine ambiguities; doing this for anaphora will require novel methods. Third, we will use these data to revisit current theories about anaphoric expressions that do not seem to cause infelicitousness when ambiguous. Finally, we intend to develop the first supervised approach to anaphora resolution that does not require a gold standard as a blueprint for other areas.
Creating anaphorically annotated resources through Web cooperation
The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora, but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than one million words.
However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to contribute to collaborative resource creation. AnaWiki is a project that develops tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora).
Listed below are published papers relating to the Phrase Detectives
game. For more information please