My research interests are focussed on natural language processing / computational linguistics focussing on information extraction and human-in-the-loop NLP. He has been a PI and CoI on various UKRI projects, EU H2020, Innovate UK, Home Office and DSTL projects. Many of these projects are cross-disciplinary in nature, featuring consortia with a mixture of academic and commercial partners experienced in a range of domains and disciplines. His PhD looked into recommender systems and ontologies, working under supervisors Prof Dave de Roure and Prof Sir Nigel Shadbolt, and was completed in Oct 2002.
- Natural Language Processing
- Human-in-the-loop NLP: Active Learning, Adversarial Training, Rationale-based Learning, Interactive Sense Making
- Information Extraction: Few/Zero Shot Learning, Graph-based Models, Behaviour Classification, Geoparsing/Location Extraction, Event Extraction, Argument Mining
- Domains: Law Enforcement, Defence, Mental Health, Environmental Science, Social Science
Natural Language Processing
My research interest lies in the natural language processing area of information extraction and human-in-the-loop NLP, developing novel algorithms to discover and exploit patterns in free text and metadata to extract actionable human intelligence and machine-readable knowledge. In a juxtaposition to big data approaches, my research has focussed on developing novel solutions to problems where training sets are small, sparse or fragmented in nature. This is very common in areas such as social media posts during breaking news events, emerging topics within online community forums, criminal marketplaces exhibiting deliberate obfuscation, and historical datasets where information can be inaccurately recorded, corrupted or lost over time. I am interested in investigating socio-technical NLP approaches promoting explainable and trustworthy AI, human-in-the-loop approaches and information extraction based on techniques such as few/zero-shot learning, graph-based models, sentence embeddings, domain adaption and argument mining.
open source software github
ProTechThem : an ESRC funded project. ProTechThem will explore sharenting (parents sharing online information about minors). Motivation for sharenting and automated detection of risk behaviours online will be explored through online ethnography, criminological analysis and Natural Language Processing (NLP) algorithms to support improvement to cybersecurity behaviours.
SafeSpacesNLP - an UKRI TAS Hub funded project. Behaviour classification NLP in a socio-technical AI setting for online harmful behaviours for children and young people. Exploring human-in-the-loop graph-based and few shot NLP models for behaviour classification of online forum posts.
GloSAT - an UK NERC platform grant. Global Surface Air Temperature (GloSAT) aims to improve understanding of climate variability and change. Objectives include information extraction and data rescue of climate change sensors data from historical texts.
Multimodal audio-textual argumentation mining of political debates : a Web Science Institute grant. Development of a multimodel dataset for training NLP models to perform argument mining of political debates.
CYShadowWatch - Aa UK DSTL funded project. Automated Multilingual Information Extraction for Online Cybercrime Sites. CYShadowWatch will explore statistical machine translation and information extraction of online Russian cybercrime forums.
LPLP - Legal and property language processing : UK Innovate UK funded project. LPLP will develop cutting-edge AI techniques to extract and analyse legal rights and obligations related to property and land, including Natural Language Processing (NLP) algorithms to extract legal rights and obligations from HM Land Registry documents.
FloraGuard project : an UK ESRC funded project. FloraGuard will examine and map from a multidisciplinary perspective the criminal market in endangered plants affecting the UK. Quantitative evidence will come from a combination of surface (web forums, social media) and dark web (TOR forums) crawling of cyber-criminal activity; natural language and machine learning used to socio-economically map this activity at a community level.
Intel-Analysis DSTL : a UK DSTL funded project. Intel-Analysis DSTL uses argumentation schemes and evidential reasoning to support teams of analysts trying to evaluate conflicting hypotheses during real-time events. Evidence is obtained in real-time from a combination of human intelligence reports and information extraction from social media via natural language processing.
REVEAL project : an EU funded FP7 project. REVEAL aims to advance the necessary technologies for making a higher level analysis of social media possible. Focussed on social media verification, including digita ltext forensics, trust and credibility analytics and decision support for journalists verifying user generated content.
ECS Deputy Examinations Officer and Prizes
Module Leader COMP3225 Natural Language Processing
Teaching Team COMP3222/COMP6246 Machine Learning Technologies; COMP3208 Social Computing Techniques; COMP6214 Open Data Innovation
If you are a student interested in a PhD in NLP then have a look at my research projects and contact me to discuss ideas.