GLTR
Automatically identifies text that has been generated by a computer.
OVERVIEW
The MIT-IBM Watson AI lab and HarvardNLP have developed a tool called GLTR that can automatically detect text that has been generated artificially. This is done through forensic analysis, where GLTR analyzes the likelihood that a language model has generated the text. Specifically, GLTR visually examines the output of the GPT-2 117M language model from OpenAI and ranks each word based on its probability of being produced by the model. The tool highlights the most likely words in green, followed by yellow and red, while the remaining words are shown in purple. This visual representation allows for easy identification of computer-generated text. Additionally, GLTR presents three histograms that provide aggregated information about the text. The first histogram displays the number of words in each category, the second illustrates the probability ratio between the top predicted word and the following word, and the third shows the distribution of prediction entropies. By analyzing these histograms, GLTR offers further evidence of whether a text has been artificially generated. GLTR is particularly useful in detecting fake reviews, comments, or news articles that are generated by large language models, as these texts can be nearly indistinguishable from human-written text to non-expert readers. The tool is accessible through a live demo and its source code is available on Github. Researchers can also refer to the ACL 2019 demo track paper, which has been nominated for best demo.
RELATED PRODUCTS
REVIEWS