Major Projects

Policing Hate Crime: modernising the craft, an evidence based approach (2015-2017)

As the general level of reported crime across the UK falls, hate crimes continue to increase. In the context of national and regional priorities to combat this trend, this project’s partners have come together to form the Hate Crime Consortium, a multidisciplinary team of academics with experts in social science, law/criminology and computer science, public policy researchers, law enforcement officers, and commercial software providers.

The project examines how the policing of hate crime might be improved through the application of NLP technology. It investigates the relationship between community temperament (as expressed on social media) and the pattern and incidence of hate crime and it also explores ways in which NLP technology can improve the recording and investigation of hate crime.

This is a joint project with the Metropolitan Police Service, Palantir Technologies, Demos, and CASM Consulting LLP, and is funded by the Police Knowledge Fund.

Near You Now (2014-2016)

The Near You Now Demonstrator, a software product, allowed customers to access hyperlocal media stories from a range of publishers on mobile devices based on relevance at their precise location. It enables publishers, large and small, to have the opportunity to syndicate content and share revenue. The resulting collaborative ecosystem aimed to create more relevant local media services for end customers, deeper engagement and monetisation opportunities for large regional publishers and a more sustainable future for hundreds of smaller independent hyperlocal media producers across the UK.

This was a joint research project with Near You Now Ltd, Archant Community Media Ltd, London Belongs to Me Ltd (KentishTowner) and Streetbook Ltd (, and was funded by Innovate UK.

In the Hands of the Analyst: Unlocking the value of social media for professional market research (2014-2015)

This project explored the approaches required to undertake attitudinal analysis using social media data in a principled fashion, taking into account issues around representitivity, the limits of current technology and ethical considerations. The project built an integrated system – both technology and method – to analyse social media data in a way that reflects the values and principles of conventional attitudinal research. Details of this system, Method52, can be found here.

It was a joint research project with Ipsos MORI, Demos, and CASM Consulting LLP, and was funded by the Innovate UK, the ESRC, the EPSRC and DSTL.

Developing A Unified Model of Compositional and Distributional Semantics (2012-2015)

Historically, there have been two main approaches to modelling the meaning of language in NLP, the computational approach and the distributional approach. This project sought to explore ways of combining the two perspectives.

Compositional approach to semantics

The 19th century logician Frege proposed that the meaning of a phrase can be determined from the meanings of its parts and how those parts are combined. From this, logicians have developed formal accounts of how the meaning of a sentence can be determined from the relations of words in a sentence, culminating in the work of Richard Montague in the 1970s. The compositional approach addresses a fundamental problem in Linguistics - how it is that humans are able to generate an unlimited number of sentences using a limited vocabulary. We would like computers to have a similar capacity.

Distributional approach to semantics

The distributional approach focuses on the meanings of the words themselves and is based on the ideas of de Saussure in the 1910s, Wittgenstein in the 1940s and later "structural" linguists such as Zellig Harris and Firth. The idea is that the meanings of words can be determined by considering the contexts in which words appear in text. For example, if we take a large amount of text and see which words appear close to the word "dog", and do a similar thing for the word "cat", we will see that the contexts of dog and cat tend to share many words in common (such as walk, run, furry, pet, and so on). Whereas if we see which words appear in the context of the word "television", for example, we will find less overlap with the contexts for "dog". Mathematically these contexts are typically represented in a vector space, so that word meanings occupy positions in a geometrical space; we would expect to find that "dog" and "cat" are much closer in the space than "dog" and "television".

The two approaches to meaning can be roughly characterized as follows: the compositional approach is concerned with how meanings combine, but has little to say about the individual meanings of words; the distributional approach is concerned with word meanings, but has little to say about how those meanings combine.

This project exploited the strengths of the two approaches and examined how they might be combined. This multisite project was funded by the EPSRC, with related grants funding the project at the Universities of Cambridge, Edinburgh, Oxford, York and Queen Mary's College, London. At the TAG Laboratory, this project led to the development of the APT approach to compositional distributional semantics, for further details of which see here.

Mobile Commerce as a service (2013-2014)

The objective of this project was to develop a universal mobile transaction platform that used SMS text messaging as the user interface. The technology aimed to allow the general public to make use of well-understood and ubiquitous text messaging practices to purchase and gift goods and services, affording a low barrier to retailers and service providers to enter the digital marketplace and reach the widest mobile audience.

The TAG Laboratory's role was to explore approaches for automated dialogue beyween customers and the system, analysing text messages to determine customers' intent. It was a joint research project with Parcelpoke Limited and Indulge Retail, and was funded by Innovate UK.

Towards a Social Media Science: Tools and Methodologies (2012-2013)

The explosion of social media has created an unprecedented research opportunity for social scientists. Social media present a digital tableau of society-in-motion: of people arguing, condemning, joking, influencing. The growth of these digital spaces has coincided with the emergence of a family of tools - ‘big data analytics’ – that can make sense of them. Harnessing social media data as behavioral evidence using these tools could bring about a step-change evolution in the social sciences.

This project aimed to explore this potential by creating software tools and methodologies for undertaking an analysis of large social media datastets. The project led to the development of Method51, a system for the analysis of social media data, and a precursor to our current system Method52, further details of which can be seen here. This was a joint project with the think tank Demos and was funded by the ESRC National Centre for Research Methods.

Exploitation of Diverse Data via Automatic Adaptation of Knowledge Extraction Software (2011-2012)

This project involved two industrial partners: Brandwatch and Linguamatics. The companies were developing systems for turning 'big data' into useful business information, and wanted to cover further diverse data sources, from patents to Tweets. The project addressed a bottleneck that often arises when applying natural language processing technology in practical settings: the need for laborious customisation with respect both to the type of data source (e.g. newswire vs. patent literature) and to a domain's terminology (e.g. medical practice vs. pharmaceutical research). The project partners explored ‘distributional’ methods based on contextual similarity of word usage in order to accelerate two key components of this customisation, namely the recognition of concepts and the creation or adaptation of terminologies that link terms to concepts. This allowed software which extracts information from 'big data' to be adapted extremely rapidly to new and diverse data sources.

This project was funded by Innovate UK under the “Harnessing Large and Diverse Data Sources” Competition.