University of Southampton Electronics and Computer Science
Professional Engagement

Research areas

Stuart's research interests are focussed on computational linguistics and information extraction.

Computational linguistics - exploring new approaches in computational semantics, including statistical semantics, lexical and discourse semantics, knowledge representation and automated inference.

Information extraction - extraction and use semantic information to intelligently search/index/link/infer within and between application domains. This includes open information extraction, knowledge base population, geoparsing/location extraction, temporal extraction and event/topic detection.

Domain expertise - media, law enforcement, sensors, environment, health, crisis management and defence.

Stuart's research focus is around techniques that can work effectively with small training sets, apriori known expert knowledge and relevance feedback in addition to utilizing larger web-scale corpuses such as DBpedia, NELL and WordNet. By selecting computational linguistics and information extraction algorithms where the underlying knowledge representation preserves domain semantics, incremental knowledge injection becomes possible and the explainability of results using understandable domain features becomes easier. Explainability is particularly important as it helps engender trust in the final results of AI systems. This is in juxtaposition to the trend towards deep learning on big datasets, where the data is often owned and controlled by large commercial companies and solutions tend to be black boxes.

Many interesting research questions exist around scalability, context, sources of bias and explainability of computational linguistics and information extraction algorithms. Large datasets, multi-lingual language models and domain vocabularies all have issues relating to availability, completeness and maintainability which subsequently impacts on the scalability of techniques when applied in the real-world. Context is important to correctly understand information extractions, especially around sarcasm, sentiment and stance. Training examples and datasets that AI algorithms need have biases, based on which group of people created them and the methodology used in thier capture. Understanding how training bias impacts on the final algorithmic result is important. Lastly explainability and algorithmic transparency is critical if end users and decision makers are going to trust and believe the results and patterns found by AI algorithms.

Steering committees

IEEE International Conference on Intelligent Environments [IE] 2016 Posters & Short Paper Track Chair
MediaEval Benchmarking Initiative for Multimedia Evaluation [MediaEval] 2016 Verifying Multimedia Use Task Committee
RGS-IBG Annual Conference 2018, Using New Forms of Data in Research Session Convenor

Programme Committees

ACL North American Chapter of the Association for Computational Linguistics : Human Language Technologies (NAACL-HLT) 2018
IEEE DSAA Special Session on Sentiment, Emotion, and Credibility of Information in Social Data (SeCredISData) @ International Conference on Data Science and Advanced Analytics (DSAA) 2018
IEEE International Conference on Intelligent Environments [IE] 2016 to 2018
Conversations on chatbots workshop [CONVERSATIONS] in conjunction with INSCI 2017
Workshop on Internet for Financial Collective Awareness and Intelligence [IFIN] 2016 to 2017
Workshop on Social News on the Web [SNOW] 2016
Workshop on Web Multimedia Verification [WeMuV] 2015
Intelligent Personalization Workshop @ International Joint Conferences on Artificial Intelligence (IJCAI) 2015
IEEE/WIC/ACM Workshop on Web Personalization, Recommender Systems and Social Media [WPRSM] 2009 to 2014
ACM Conference on Recommender Systems [RECSYS] 2008 to 2013
International Conference on Electronic Commerce and Web Technologies [EC-Web] 2009 to 2013
User Modeling and User-Adapted Interaction: The Journal of Personalization Research [UMUAI] 2013
AAAI Intelligent Techniques for Web Personalization [ITWP] 2004 to 2012
IFIP International Conference on Artificial Intelligence Applications and Innovations [AIAI] 2010
IEEE Topic Feature Discovery and Opinion Mining [TFDOM] 2010
IEEE Recommender Systems and Personalized Retrieval [RSPR] 2008
ACM Genetic and Evolutionary Computation Conference [GECCO] 2004

Research projects

FloraGuard project : an UK ESRC funded project. FloraGuard will examine and map from a multidisciplinary perspective the criminal market in endangered plants affecting the UK. Quantitative evidence will come from a combination of surface (web forums, social media) and dark web (TOR forums) crawling of cyber-criminal activity; natural language & machine learning used to socio-economically map this activity at a community level.

Intel-Analysis DSTL : a UK DSTL funded project. Intel-Analysis DSTL uses argumentation schemes and evidential reasoning to support teams of analysts trying to evaluate conflicting hypotheses during real-time events. Evidence is obtained in real-time from a combination of human intelligence reports and information extraction from social media via natural language processing.

REVEAL project : an EU funded FP7 project. REVEAL aims to advance the necessary technologies for making a higher level analysis of social media possible. Focussed on social media verification, including digita ltext forensics, trust and credibility analytics and decision support for journalists verifying user generated content.

GRAVITATE project : an EU funded H2020 project. Focussed on supporting geometric reconstruction and semantic reunification of cultural heritage objects using techniques such as semantic enrichment using natural language processing, graph matching and 3D geometric matching.

Digital Police Officer (DPO) project : a UK WSI funded project. The DPO project aims to apply linguistic analysis to identify cyber criminals operating under pseudonyms on different online forums and within the same forum. The project will apply natural language processing techniques guided by insights from criminology.

OFERTIE project : an EU funded FP7 project. OFERTIE aims to enhance and use the OFELIA Testbed for OpenFlow Programmable Networking to run experiments to establish how programmable networks can be used to support technical solutions such as multicast and managed QoS, and what business models and value chains would be able to use these solutions in an economically sustainable fashion.

TRIDEC project : an EU funded FP7 project. Focuses on context aware semantic information retrieval and data fusion for crisis management in the Tsunami early warning and Oil rig drilling domains. Work includes geospatial sensor information fusion for decision support, task context management and context aware information filtering of real-time sensor and video event streams.

ENVIROFI project : an EU funded FP7 project. Focuses on context aware semantic information fusion and the creation of future internet environmental enablers. Work includes geospatial sensor information fusion for marine and biodiversity domains, uncertainty context management and context aware information filtering of geo-distributed heterogeneous data streams publishing sensor time series, satellite images, video and web 2.0.

DESURBS project : an EU funded FP7 project. Focussed on knowledge-based decision support tools to help planning organizations (councils, city planners, companies) better understand the vulnerabilities and design possibilities when designing safer urban spaces. Work includes semantic enrichment, personalization of best practice reports, advanced visualization and use of mapping/charting tooling.

IRMOS project : an EU funded FP7 project. Focuses on application performance modelling for use in automated Cloud resource provisioning. Work includes using semantically annotated UML diagrams to produce discrete event simulations and optimised cloud provisioning strategies.

SANY project : an EU funded FP6 project. Focuses on interoperability of in-situ sensors and sensor networks. Work includes building an OGC compliant generic sensor information fusion infrastructure and use of semantic OGC standards to handle fusion processes and application datasets.

MUPPITS and POSTMARK projects : UK TSB funded projects. Focuses on media management for post-production companies. Work includes the creation of a media data and metadata warehouse with auditable event tracking and automated media management. Later work focussed on business models and exploitation paths via media partners such as Pinewood studios.

POLYMNIA project : an EU funded FP6 project. An intelligent cross-media platform for personalised leisure and entertainment in thematic parks or venues. Work included an automated video media production tool encoding directorial knowledge for automated personalized video editing.

PrestoSpace project : an EU funded FP6 project. The project's objective is to provide technical solutions and integrated systems for a complete digital preservation of all kinds of audio-visual collections. Work included a distributed rendering system for video restoration and the overall distributed control of mixed-media restoration sub-systems.

Moretea project : an EPSRC project. Electronic notebook project to improve the information environment for chemists doing chemistry - within and beyond the lab using semantic search within a secure environment.

SCULPTEUR project : an EU funded FP5 semantic web project. A digital library system for searching and retrieval of diverse multimedia representations to support the work of professional users in the fine arts. Work included semantic search and retrieval of mixed-media archives.

GEMSS project : an EU funded FP5 medical Grid project. Six medical imaging application are supported within a secure, commercial Grid infrastructure. Work included GRID based negotiation for medical simulation quality of service.

Quickstep project : a hybrid collaborative/content-based recommender system to recommend on-line research papers. Uses kNN multi-class paper classification and an ontology to enhance the profiling process. Two trials of the system were conducted, both lasting 1.5 months with 14 and 24 subjects. The results demonstrated the utility of using an ontological approach to user profiling and how applying domain knowledge can enhance profiling.

Foxtrot project : evolution of the Quickstep recommender system. Uses pearson-r correlation to recommend and kNN classification to profile user interests. An ontological approach is taken to represent user profiles. This allows users to visualize and edit profiles encouraging direct feedback on what the system thinks they are interested in. A year long trial is in progress with hundreds of staff, postgraduates and undergraduates from the university to evaluate the utility of this approach.

See the publications link for details of the above work.

IT Innovation Centre
Electronics and Computer Science