Filter by type:

Sort by year:

An object-based convolutional neural network (OCNN) for urban land use classification

Ce Zhang, Isabel Sargent, Xin Pan, Huapeng Li, A. Gardiner, Jonathon Hare, Peter M. Atkinson
Article

Abstract

Urban land use information is essential for a variety of urban-related applications such as urban planning and regional administration. The extraction of urban land use from very fine spatial resolution (VFSR) remotely sensed imagery has, therefore, drawn much attention in the remote sensing community. Nevertheless, classifying urban land use from VFSR images remains a challenging task, due to the extreme difficulties in differentiating complex spatial patterns to derive high-level semantic labels. Deep convolutional neural networks (CNNs) offer great potential to extract high-level spatial features, thanks to its hierarchical nature with multiple levels of abstraction. However, blurred object boundaries and geometric distortion, as well as huge computational redundancy, severely restrict the potential application of CNN for the classification of urban land use. In this paper, a novel object-based convolutional neural network (OCNN) is proposed for urban land use classification using VFSR images. Rather than pixel-wise convolutional processes, the OCNN relies on segmented objects as its functional units, and CNN networks are used to analyse and label objects such as to partition within-object and between-object variation. Two CNN networks with different model structures and window sizes are developed to predict linearly shaped objects (e.g. Highway, Canal) and general (other non-linearly shaped) objects. Then a rule-based decision fusion is performed to integrate the class-specific classification results. The effectiveness of the proposed OCNN method was tested on aerial photography of two large urban scenes in Southampton and Manchester in Great Britain. The OCNN combined with large and small window sizes achieved excellent classification accuracy and computational efficiency, consistently outperforming its sub-modules, as well as other benchmark comparators, including the pixel-wise CNN, contextual-based MRF and object-based OBIA-SVM methods. The proposed method provides the first object-based CNN framework to effectively and efficiently address the complicated problem of urban land use classification from VFSR images.

Neural Wikipedian: generating textual summaries from knowledge base triples

Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl
Article

Abstract

Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this paper, we investigate the problem of generating natural language summaries for Semantic Web data using neural networks. Our end-to-end trainable architecture encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We explore a set of different approaches that enable our models to verbalise entities from the input set of triples in the generated text. Our systems are trained and evaluated on two corpora of loosely aligned Wikipedia snippets with triples from DBpedia and Wikidata, with promising results.

Detecting heel strikes for gait analysis through acceleration flow

Yan Sun, Jonathon Hare, Mark Nixon
Article

Abstract

In some forms of gait analysis it is important to be able to capture when the heel strikes occur. In addition, in terms of video analysis of gait, it is important to be able to localise the heel where it strikes on the floor. In this paper, a new motion descriptor, acceleration flow, is introduced for detecting heel strikes. The key frame of heel strike can be determined by the quantity of acceleration flow within the Region of Interest (ROI), and positions of the strike can be found from the centre of rotation caused by radial acceleration. Our approach has been tested on a number of databases which were recorded indoors and outdoors with multiple views and walking directions for evaluating the detection rate under various environments. Experiments show the ability of our approach for both temporal detection and spatial positioning. The immunity of this new approach to three anticipated types of noises in real CCTV footage is also evaluated in our experiments. Our acceleration flow detector is shown to be less sensitive to Gaussian white noise, whilst being effective with images of low-resolution and without incomplete body position information when compared to other techniques.

VPRS-based regional decision fusion of CNN and MRF classifications for very fine resolution remotely sensed images

Ce Zhang, Isabel Sargent, Xin Pan, Andy Gardiner, Jonathon Hare, Peter M. Atkinson
Article

Abstract

Recent advances in computer vision and pattern recognition have demonstrated the superiority of deep neural networks using spatial feature representation, such as convolutional neural networks (CNN), for image classification. However, any classifier, regardless of its model structure (deep or shallow), involves prediction uncertainty when classifying spatially and spectrally complicated very fine spatial resolution (VFSR) imagery. We propose here to characterise the uncertainty distribution of CNN classification and integrate it into a regional decision fusion to increase classification accuracy. Specifically, a variable precision rough set (VPRS) model is proposed to quantify the uncertainty within CNN classifications of VFSR imagery, and partition this uncertainty into positive regions (correct classifications) and non-positive regions (uncertain or incorrect classifications). Those “more correct” areas were trusted by the CNN, whereas the uncertain areas were rectified by a Multi-Layer Perceptron (MLP)-based Markov random field (MLP-MRF) classifier to provide crisp and accurate boundary delineation. The proposed MRF-CNN fusion decision strategy exploited the complementary characteristics of the two classifiers based on VPRS uncertainty description and classification integration. The effectiveness of the MRF-CNN method was tested in both urban and rural areas of southern England as well as Semantic Labelling datasets. The MRF-CNN consistently outperformed the benchmark MLP, SVM, MLP-MRF and CNN and the baseline methods. This research provides a regional decision fusion framework within which to gain the advantages of model-based CNN, while overcoming the problem of losing effective resolution and uncertain prediction at object boundaries, which is especially pertinent for complex VFSR image classification.

How biased is your NLG evaluation?

Pavlos Vougiouklis, Eddy Maddalena, Jonathon Hare, Elena Simperl
Conference or Workshop Item

Abstract

Human assessments by either experts or crowdworkers are used extensively for the evaluation of systems employed on a variety of text generative tasks. In this paper, we focus on the human evaluation of textual summaries from knowledge base triple-facts. More specifically, we investigate possible similarities between the evaluation that is performed by experts and crowdworkers. We generate a set of summaries from DBpedia triples using a state-of-the-art neural network architecture. These summaries are evaluated against a set of criteria by both experts and crowdworkers. Our results highlight significant differences between the scores that are provided by the two groups.

Mind the (language) gap: generation of multilingual Wikipedia summaries from Wikidata for ArticlePlaceholders

Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl
Conference or Workshop Item

Abstract

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is therefore of utmost social and cultural importance to focus efforts on languages whose speakers only have access to limited Wikipedia content. We investigate supporting communities by generating summaries for Wikipedia articles in underserved languages, given structured data as an input.

We focus on an important support for such summaries: ArticlePlaceholders, a dynamically generated content pages in underserved Wikipedias. They enable native speakers to access existing information in Wikidata. To extend those ArticlePlaceholders, we provide a system, which processes the triples of the KB as they are provided by the ArticlePlaceholder, and generate a comprehensible textual summary. This data-driven approach is employed with the goal of understanding how well it matches the communities' needs on two underserved languages on the Web: Arabic, a language with a big community with disproportionate access to knowledge online, and Esperanto, an easily-acquainted, artificial language whose Wikipedia content is maintained by a small but devoted community. With the help of the Arabic and Esperanto Wikipedians, we conduct a study which evaluates not only the quality of the generated text, but also the usefulness of our end-system to any underserved Wikipedia version.

Learning to count objects in natural images for visual question answering

Yan Zhang, Jonathon Hare, Adam Prügel-Bennett
Conference or Workshop Item

Abstract

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.

Deep cascade learning

Enrique, Salvador Marquez, Jonathon Hare, Mahesan Niranjan
Article

Abstract

In this paper, we propose a novel approach for efficient training of deep neural networks in a bottom-up fashion using a layered structure. Our algorithm, which we refer to as Deep Cascade Learning, is motivated by the Cascade Correlation approach of Fahlman who introduced it in the context of perceptrons. We demonstrate our algorithm on networks of convolutional layers, though its applicability is more general. Such training of deep networks in a cascade, directly circumvents the well-known vanishing gradient problem by ensuring that the output is always adjacent to the layer being trained. We present empirical evaluations comparing our deep cascade training with standard End-End training using back propagation of two convolutional neural network architectures on benchmark image classification tasks (CIFAR-10 and CIFAR-100). We then investigate the features learned by the approach and find that better, domain-specific, representations are learned in early layers when compared to what is learned in End-End training. This is partially attributable to the vanishing gradient problem which inhibits early layer filters to change significantly from their initial settings. While both networks perform similarly overall, recognition accuracy increases progressively with each added layer, with discriminative features learnt in every stage of the network, whereas in End-End training, no such systematic feature representation was observed. We also show that such cascade training has significant computational and memory advantages over End-End training, and can be used as a pre-training algorithm to obtain a better performance.

Semantic face signatures: recognizing and retrieving faces by verbal descriptions

Nawaf, Yousef Almudhahka, Mark Nixon, Jonathon Hare
Article

Abstract

The adverse visual conditions of surveillance environments,and the need to identify humans at a distance,have stimulated research in soft biometric attributes. Theseattributes can be used to describe a human’s physical traitssemantically and can be acquired without their cooperation.Soft biometrics can also be employed to retrieve identity from adatabase using verbal descriptions for suspects. In this paper, weexplore unconstrained human face identification with semanticface attributes derived automatically from images. The processuses a deformable face model with keypoint localisation whichis aligned with attributes derived by semantic description. Ournew framework exploits the semantic feature space to inferface signatures from images, and bridges the semantic gapbetween humans and machines with respect to face attributes.We use an unconstrained dataset, LFW-MS4, consisting of allthe subjects from View-1 of the LFW database that have 4 ormore samples. Our new approach demonstrates that retrievalvia estimated comparative facial soft biometrics yields a matchin the top 10.23% of returned subjects. Furthermore, modellingof face image features in the semantic space can achieve anequal error rate of 12.71%. These results reveal the latentbenefits of modelling visual facial features in a semantic space.Moreover, they highlight the potential of using images and verbaldescriptions to generate comparative soft biometrics for subjectidentification and retrieval.

Learning to generate Wikipedia summaries for underserved languages from Wikidata

Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl
Conference or Workshop Item

Abstract

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.

T-REx: A large scale alignment of natural language with knowledge base triples

Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon Hare, Elena Simperl, Frederique Laforest
Conference or Workshop Item

Abstract

Alignments between natural language and Knowledge Base (KB) triples are an essential prerequisite for training machine learning approaches employed in a variety of Natural Language Processing problems. These include Relation Extraction, KB Population, Question Answering and Natural Language Generation from KB triples. Available datasets that provide those alignments are plagued by significant shortcomings – they are of limited size, they exhibit a restricted predicate coverage, and/or they are of unreported quality. To alleviate these shortcomings, we present T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences). T-REx is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates. Additionally, we stress the quality of this language resource thanks to an extensive crowdsourcing evaluation. T-REx is publicly available at: https://w3id.org/t-rex.

Tackling the small data problem in deep learning with multi-sensor approaches

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett
Conference or Workshop Item

Abstract

Within data science, many problems are solved using machine learning. Recently, with the introduction of deep learning, we see this trend spread out across industries of which archaeological object detection on remote sensor data is a case in point. From the known case studies, we have identified the main issues and developed improvements accordingly.

The main issue of archaeological datasets is that there are only a limited number of known sites which makes the networks prone to overfit. Overfitting happens when a network is trained on too few examples and learns patterns that do not generalize well to new data. To an extent, data augmentation can be used to prevent overfitting, however, the training images would still be highly correlated. Therefore, it is argued that the most effect can be gained by limiting storage of irrelevant features in networks. This can be done by optimising network architectures and additionally by using transfer learning in which pre-trained network are used to initialise training. Regardless of pre-training on datasets without archaeological sites, its trained network can still be useful for the low-level features (including lines and edges). A downside of pre-trained networks is that they can only work with data in the same format as they had been trained with.

Our main contribution is the research into including multi-sensor data. We will present approaches to train networks using images with stacks of data, apply fusion networks and by generating pre-trained networks for the available data of different sensors.

Automated detection of archaeology in the New Forest using deep learning with remote sensor data

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett, Isabel Sargent
Conference or Workshop Item

Abstract

As a result of the New Forest Knowledge project, many new sites were discovered. This was partly due to the undertaken LiDAR survey which was followed by an intensive manual process to interpret the results. The research presented in this paper looks at methods to automate this process especially for round barrow detection using deep learning.

Traditionally, automated methods require manual feature engineering to extract the visual appearance of a site on remote sensing data. Whereas this approach is difficult, expensive and bound to detect a single type of site, recent developments have moved towards automated feature learning of which deep learning is the most notable. In our approach, we use known site locations together with LiDAR data and aerial images to train Convolutional Neural Networks (CNNs). This network is typically constructed of many layers with each representing a different filter (e.g. to detect lines or edges). When this network is trained, each new site location that is fed to the network will update the weights of features to better represent the appearance of sites in the remote sensing data. For this learning process, an accurate dataset is required with a lot of examples and therefore the New Forest is a very suitable case study, especially thanks to the extensive research of the New Forest Knowledge project.

In this paper, our latest results will be presented together with a future perspective on how we can scale our approach to a country wide detection method when computing power becomes even more efficient.

Analysing acceleration for motion analysis

Yan Sun, Jonathon Hare, Mark Nixon
Conference or Workshop Item

Abstract

Previous research in motion analysis of image sequences has generally not considered the basic nature of higher orders of motion such as acceleration. In this work, we disambiguate different types of motion, and in particular focus on acceleration. First, we show acceleration can be computed in a principled manner by extending Horn and Schunck’s algorithm for global optical flow estimation. We then demonstrate an approximation of the acceleration field using an alternative established optical flow technique, since most real motions violate the global smoothness assumption of Horn and Schunck. Furthermore, we decompose acceleration into radial and tangential components for greater depth of understanding of the motion. As a general motion descriptor, we show how acceleration provides the capability for differentiating different types of motion in video sequences.

Inference and discovery in remote sensing data with features extracted using deep networks

Isabel Sargent, Jonathon Hare, David Young, Olivia Wilson, Charis Doidge, David Holland, Peter M. Atkinson, Max Bramer, Miltos Petridis
Conference or Workshop Item

Abstract

We aim to develop a process by which we can extract generic features from aerial image data that can both be used to infer the presence of objects and characteristics and to discover new ways of representing the landscape. We investigate the fine-tuning of a 50-layer ResNet deep convolutional neural network that was pre-trained with ImageNet data and extracted features at several layers throughout these pre-trained and the fine-tuned networks. These features were applied to several supervised classification problems, obtaining a significant correlation between the classification accuracy and layer number. Visualising the activation of the networks’ nodes found that fine-tuning had not achieved coherent representations at later layers. We conclude that we need to train with considerably more varied data but that, even without fine tuning, features derived from a deep network can produce better classification results than with image data alone.

Inference and discovery in remote sensing data with features extracted using deep networks

Isabel Sargent, Jonathon Hare, David Young, Olivia Wilson, Charis Doidge, David Holland, Peter M. Atkinson
Conference or Workshop Item

Abstract

We aim to develop a process by which we can extract generic features from aerial image data that can both be used to infer the presence of objects and characteristics and to discover new ways of representing the landscape. We investigate the fine-tuning of a 50-layer ResNet deep convolutional neural network that was pre-trained with ImageNet data and extracted features at several layers throughout these pre-trained and the fine-tuned networks. These features were applied to several supervised classification problems, obtaining a significant correlation between the classification accuracy and layer number. Visualising the activation of the networks’ nodes found that fine-tuning had not achieved coherent representations at later layers. We conclude that we need to train with considerably more varied data but that, even without fine tuning, features derived from a deep network can produce better classification results than with image data alone.

Automation on steroids: an exploration of why deep learning is dominating automation

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett
Conference or Workshop Item

Abstract

Traditionally, research initiatives into automated detection of archaeological objects were focussed on feature engineering to detect individual object types. These methods have been criticised for their lack in accuracy which is mostly caused by their inability to capture the variability within an object type and the objects’ appearance across different land cover types.

Recently, rather than further optimizing features, research has shifted towards feature learning which offers more flexibility. This shift was triggered by the overwhelming successes of deep learning (shown for e.g. self-driving cars and medical imagery). A deep convolutional neural network is build-up out of many layers and learns features from images of known objects which are fed to the network. In the early layers of a network only basic abstractions such as lines and edges are learned and as the deeper layers are reached the features get more refined and are able to extract the key characteristics of the object type. This process is very similar to how a human learns although there are some important advantages to the structure of deep networks. For example, they can be designed to incorporate different types of remote sensor data and can hence internally compare this variety of data. In his manner a network will quickly identify obvious false positives and adapt the weights of the layers accordingly. Another important point is that a network can fully appreciate the small variation of pixel values without any image enhancements. For LiDAR data this effect can be demonstrated with a network that identifies a slope in the first layers of the network and later on learns that the slope direction and local relief are important features for a specific object type.

The above listed approaches just scratch the surface of the wide range of possible methods to using deep learning for aerial archaeology. In the end, the shift in research is mainly driven by the far-future concept of a national model which automatically retrains with newly acquired remote sensing data to allow for new discoveries that can further improve the networks.

A future perspective for automated detection of archaeology using deep learning with remote sensor data

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett
Conference or Workshop Item

Abstract

An essential aspect of archaeology is the protection of sites from looters, extensive agriculture and erosion. Under this constant threat of destruction, it is of utmost importance that sites are located so that they can be monitored and protected. This is mostly done on the ground or by using remote sensing data such as aerial images or LiDAR derived elevation models. This task is time consuming and requires highly specialised and experienced people and would thus immensely benefit from automation. Within this novel research, the potential of deep learning for the detection of archaeological sites is being assessed.

A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification

Ce Zhang, Xin Pan, Huapeng Li, Andy Gardiner, Isabel Sargent, Jonathon Hare, Peter M. Atkinson
Article

Abstract

The contextual-based convolutional neural network (CNN) with deep architecture and pixel-based multilayer perceptron (MLP) with shallow structure are well-recognized neural network algorithms, representing the state-of-the-art deep learning method and the classical non-parametric machine learning approach, respectively. The two algorithms, which have very different behaviours, were integrated in a concise and effective way using a rule-based decision fusion approach for the classification of very fine spatial resolution (VFSR) remotely sensed imagery. The decision fusion rules, designed primarily based on the classification confidence of the CNN, reflect the generally complementary patterns of the individual classifiers. In consequence, the proposed ensemble classifier MLP-CNN harvests the complementary results acquired from the CNN based on deep spatial feature representation and from the MLP based on spectral discrimination. Meanwhile, limitations of the CNN due to the adoption of convolutional filters such as the uncertainty in object boundary partition and loss of useful fine spatial resolution detail were compensated. The effectiveness of the ensemble MLP-CNN classifier was tested in both urban and rural areas using aerial photography together with an additional satellite sensor dataset. The MLP-CNN classifier achieved promising performance, consistently outperforming the pixel-based MLP, spectral and textural-based MLP, and the contextual-based CNN in terms of classification accuracy. This research paves the way to effectively address the complicated problem of VFSR image classification.

What do Wikidata and Wikipedia have in common? An analysis of their use of external references

Alessandro Piscopo, Pavlos Vougiouklis, Lucie-Aimee, Frimelle Kaffee, Christopher Phethean, Jonathon Hare, Elena Simperl
Conference or Workshop Item

Abstract

Wikidata is a community-driven knowledge graph, strongly linked to Wikipedia. However, the connection between the two projects has been sporadically explored. We investigated the relationship between the two projects in terms of the in- formation they contain by looking at their external references. Our findings show that while only a small number of sources is directly reused across Wikidata and Wikipedia, references of- ten point to the same domain. Furthermore, Wikidata appears to use less Anglo-American-centred sources. These results deserve further in-depth investigation.

Automatic semantic face recognition

Nawaf, Yousef Almudhahka, Mark Nixon, Jonathon Hare
Conference or Workshop Item

Abstract

Recent expansion in surveillance systems has motivated research in soft biometrics that enable the unconstrained recognition of human faces. Comparative soft biometrics show superior recognition performance than categorical soft biometrics and have been the focus of several studies which have highlighted their ability for recognition and retrieval in constrained and unconstrained environments. These studies, however, only addressed face recognition for retrieval using human generated attributes, posing a question about the feasibility of automatically generating comparative labels from facial images. In this paper, we propose an approach for the automatic comparative labelling of facial soft biometrics. Furthermore, we investigate unconstrained human face recognition using these comparative soft biometrics in a human labelled gallery (and vice versa). Using a subset from the LFW dataset, our experiments show the efficacy of the automatic generation of comparative facial labels, highlighting the potential extensibility of the approach to other face recognition scenarios and larger ranges of attributes.

A neural network approach for knowledge-driven response generation

Pavlos Vougiouklis, Jonathon Hare, Elena Simperl
Conference or Workshop Item

Unconstrained human identification using comparative facial soft biometrics

Nawaf Y. Almudhahka, Mark S. Nixon, Jonathon S. Hare
Conference or Workshop Item

Abstract

Soft biometrics are attracting a lot of interest with the spread of surveillance systems, and the need to identify humans at distance and under adverse visual conditions. Comparative soft biometrics have shown a significantly better impact on identification performance compared to traditional categorical soft biometrics. However, existing work that has studied comparative soft biometrics was based on small datasets with samples taken under constrained visual conditions. In this paper, we investigate human identification using comparative facial soft biometrics on a larger and more realistic scale using 4038 subjects from the View 1 subset of the LFW database. Furthermore, we introduce a new set of comparative facial soft biometrics and investigate the effect of these on identification and verification performance. Our experiments show that by using only 24 features and 10 comparisons, a rank-10 identification rate of 96.98% and a verification accuracy of 93.66% can be achieved.

Aligning texts and knowledge bases with semantic sentence simplification

Yassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare, Elena Simperl
Conference or Workshop Item

Abstract

Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written in natural language. Our approach consists of three main steps. First, target sentences are annotated automatically with knowledge base (KB) concepts and instances. The triples linking these elements in the KB are extracted as candidate facts to be aligned with the annotated sentence. Second, we use textual mentions referring to the subject and object of these facts to semantically simplify the target sentence via crowdsourcing. Third, the sentences provided by different contributors are post-processed to keep only the most relevant simplifications for the alignment with KB facts. We present different filtering methods, and share the constructed datasets in the public domain. These datasets contain 1050 sentences aligned with 1885 triples. They can be used to train natural language generators as well as semantic or contextual text simplifiers.

Erica the Rhino: a case study in using Raspberry Pi Single Board Computers for interactive art

Philip Basford, Graeme Bragg, Jonathon Hare, Mike Jewell, Kirk Martinez, David Newman, Reena Pau, Ash Smith, Tyler Ward
Article

Abstract

Erica the Rhino is an interactive art exhibit created by the University of Southampton, UK. Erica was created as part of a city wide art trail in 2013 called "Go! Rhinos", curated by Marwell Wildlife, to raise awareness of Rhino conservation. Erica arrived as a white fibreglass shell which was then painted and equipped with 5 Raspberry Pi Single Board Computers (SBC). These computers allowed the audience to interact with Erica through a range of sensors and actuators. In particular, the audience could feed and stroke her to prompt reactions, as well as send her Tweets to change her behaviour. Pi SBCs were chosen because of their ready availability and their educational pedigree. During the deployment, 'coding clubs' were run in the shopping centre where Erica was located, these allowed children to experiment with and program the same components used in Erica. The experience gained through numerous deployments around the country has enabled Erica to be upgraded to increase reliability and ease of maintenance, whilst the release of the Pi 2 has allowed her responsiveness to be improved.

Human face identification via comparative soft biometrics

Nawaf Almudhahka, Mark Nixon, Jonathon Hare
Conference or Workshop Item

Detection of Social Events in Streams of Social Multimedia

Jonathon Hare, Sina Samangooei, Mahesan Niranjan, Nicholas Gibbins
Article

Abstract

Combining items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and consume more effectively the torrents of information continuously being made available on the social web. This task is made challenging due to the scale of the streams and the inherently multimodal nature of the information being contextualised.

The problem of grouping social media items into meaningful groups can be seen as an ill-posed and application specific unsupervised clustering problem. A fundamental question in multimodal contexts is determining which features best signify that two items should belong to the same grouping.

This paper presents a methodology which approaches social event detection as a streaming multi-modal clustering task. The methodology takes advantage of the temporal nature of social events and as a side benefit, allows for scaling to real-world datasets. Specific challenges of the social event detection task are addressed: the engineering and selection of the features used to compare items to one another; a feature fusion strategy that incorporates relative importance of features; the construction of a single sparse affinity matrix; and clustering techniques which produce meaningful item groups whilst scaling to cluster very large numbers of items.

The state-of-the-art approach presented here is evaluated using the ReSEED dataset with standardised evaluation measures. With automatically learned feature weights, we achieve an F1 score of 0.94, showing that a good compromise between precision and recall of clusters can be achieved. In a comparison with other state-of-the-art algorithms our approach is shown to give the best results.

Entity-based Opinion Mining from Text and Multimedia

Diana Maynard, Jonathon Hare
Book Section

Getting by with a little help from the crowd: optimal human computation approaches to social image labeling

Babak Loni, Jonathon Hare, Mihai Georgescu, Michael Riegler, Mohamed Morchid, Richard Dufour, Martha Larson
Conference or Workshop Item

Abstract

Validating user tags helps to refine them, making them more useful for finding images. In the case of interpretation-sensitive tags, however, automatic (i.e., pixel-based) approaches cannot be expected to deliver optimal results. Instead, human input is key. This paper studies how crowdsourcing-based approaches to image tag validation can achieve parsimony in their use of human input from the crowd, in the form of votes collected from workers on a crowdsourcing platform. Experiments in the domain of social fashion images are carried out using the dataset published by the Crowdsourcing Task of the Mediaeval 2013 Multimedia Benchmark. Experimental results reveal that when a larger number of crowd-contributed votes are available, it is difficult to beat a majority vote. However, additional information sources, i.e., crowdworker history and visual image features, allow us to maintain similar validation performance while making use of less crowd-contributed input. Further, investing in “expensive" experts who collaborate to create definitions of interpretation-sensitive concepts does not nec- essarily pay off. Instead, experts can cause interpretations of concepts to drift away from conventional wisdom. In short, validation of interpretation-sensitive user tags for social images is possible, with “just a little help from the crowd."

NicePic! A system for extracting attractive photos from Flickr streams

Stefan Siersdorfer, Sergej Zerr, Jose San Pedro, Jonathon Hare
Conference or Workshop Item

Information extraction from multimedia web documents: an open-source platform and testbed

David Dupplaw, Michael Matthews, Richard Johansson, Giulia Boato, Andrea Costanzo, Marco Fontani, Enrico Minack, Elena Demidova, Roi Blanco, Thomas Griffiths, Paul H. Lewis, Jonathon Hare, Alessandro Moschitti
Article

Abstract

The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval.

Placing Photos with a Multimodal Probability Density Function

Jonathon Hare, Jamie Davies, Sina Samangooei, Paul H. Lewis
Conference or Workshop Item

Abstract

Knowing the location where a photograph was taken provides us with data that could be useful in a wide spectrum of applications. With the advance of digital cameras, and with many users exchanging their digital cameras for GPS-enabled mobile phones, photographs annotated with geographical locations are becoming ever more present on photo-sharing websites such as Flickr. However there is still a mass of content that is not geotagged, meaning that algorithms for efficient and accurate geographical estimation of an image are needed. This paper presents a general model for effectively using both textual metadata and visual features of photos to automatically place them on a world map with state-of-the-art performance. In addition, we explore how information from user-modelling can be fused with our model, and investigate the effect such modelling has on performance.

SemanticNews: Enriching publishing of news stories

Jonathon Hare, David Newman, Wim Peters, Mark Greenwood, Jana Eggink
Monograph

Abstract

A central goal for the EPSRC funded Semantic Media Network project is to support interesting collaboration opportunities between researchers in order to foster relationships and encourage working together (EPSRC priority 'Working Together'). SemanticNews was one of the four projects funded in the first round of Semantic Media Network mini-projects, and was collaboration between the Universities of Southampton and Sheffield, together with the BBC.
The SemanticNews project aimed to promote people's comprehension and assimilation of news by augmenting broadcast news discussion and debate with information from the semantic web in the form of linked open data (LOD). The project has laid the foundations for a toolkit for (semi- ) automatic provision of semantic analysis and contextualization of the discussion of current events, encompassing state of the art semantic web technologies including text mining, consolidation against Linked Open Data, and advanced visualisation.
SemanticNews was bootstrapped using episodes of the BBC Question Time programme that already had transcripts and manually curated metadata, which included a list of the topical questions being debated. This information was used to create a workflow that a) extracts relevant entities using established named entity recognition techniques to identify the types of information to contextualise for a news article; b) provides associations with concepts from LOD resources; and, c) visualises the context using information derived from the LOD cloud.
This document forms the final report of the SemanticNews project, and describes in detail the processes and techniques explored for the enrichment of Question Time episodes. The final section of the report discusses how this work could be expanded in the future, and also makes a few recommendations for additional data that could be could be captured during the production process that would make the automatic generation of the contextualisation easier.

Exploiting multimedia in creating and analysing multimedia Web archives

Jonathon Hare, David Dupplaw, Paul H. Lewis, Wendy Hall, Kirk Martinez
Article

Abstract

The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general.

Multimodal Sentiment Analysis of Social Media

Diana Maynard, David Dupplaw, Jonathon Hare
Conference or Workshop Item

Abstract

This paper describes the approach we take to the analysis of social media, combining opinion mining from text and multimedia (images, videos, etc), and centred on entity and event recognition. We examine a particular use case, which is to help archivists select material for inclusion in an archive of social media for preserving community memories, moving towards structured preservation around semantic categories. The textual approach we take is rule-based and builds on a number of sub-components, taking into account issues inherent in social media such as noisy ungrammatical text, use of swear words, sarcasm etc. The analysis of multimedia content complements this work in order to help resolve ambiguity and to provide further contextual information. We provide two main innovations in this work: first, the novel combination of text and multimedia opinion mining tools; and second, the adaptation of NLP tools for opinion mining specific to the problems of social media.

Experiments in Diversifying Flickr Result Sets

Neha Jain, Jonathon Hare, Sina Samangooei, John Preston, Jamie Davies, David Dupplaw, Paul H. Lewis
Conference or Workshop Item

Abstract

The 2013 MediaEval Retrieving Diverse Social Images Task looked to tackling the problem of search result diversification of Flickr results sets formed from queries about geographic places and landmarks. In this paper we describe our approach of using a min-max similarity diversifier coupled with pre-filters and a reranker. We also demonstrate a number of novel features for measuring similarity to use in the diversification step.

A Unified, Modular and Multimodal Approach to Search and Hyperlinking Video

John Preston, Jonathon Hare, Sina Samangooei, Jamie Davies, Neha Jain, David Dupplaw, Paul H. Lewis
Conference or Workshop Item

Abstract

This paper describes a modular architecture for searching and hyperlinking clips of TV programmes. The architecture aimed to unify the combination of features from different modalities through a common representation based on a set of probability density functions over the timeline of a programme. The core component of the system consisted of analysis of sections of transcripts based on a textual query. Results show that search is made worse by the addition of other components, whereas in hyperlinking precision is increased by the addition of visual features.

Identifying the Geographic Location of an Image with a Multimodal Probability Density Function

Jamie Davies, Jonathon Hare, Sina Samangooei, John Preston, Neha Jain, David Dupplaw, Paul H. Lewis
Conference or Workshop Item

Abstract

There is a wide array of online photographic content that is not geotagged. Algorithms for efficient and accurate geographical estimation of an image are needed to geolocate these photos. This paper presents a general model for using both textual metadata and visual features of photos to automatically place them on a world map.

Social Event Detection via sparse multi-modal feature selection and incremental density based clustering

Sina Samangooei, Jonathon Hare, David Dupplaw, Mahesan Niranjan, Nicholas Gibbins, Paul H. Lewis, Jamie Davies, Neha Jain, John Preston
Conference or Workshop Item

Abstract

Combining items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextu- alise and effectively consume the torrents of information now made available on the social web. This task is made challenging due to the scale of the streams and the inherently multimodal nature of the information to be contextualised. We present a methodology which approaches social event detection as a multi-modal clustering task. We address the various challenges of this task: the selection of the features used to compare items to one another; the construction of a single sparse affinity matrix; combining the features; relative importance of features; and clustering techniques which produce meaningful item groups whilst scaling to cluster large numbers of items. In our best tested configuration we achieve an F1 score of 0.94, showing that a good compromise between precision and recall of clusters can be achieved using our technique.

An investigation of techniques that aim to improve the quality of labels provided by the crowd

Jonathon Hare, Maribel Acosta, Anna Weston, E. Simperl, Sina Samangooei, David Dupplaw, Paul H. Lewis
Conference or Workshop Item

Abstract

The 2013 MediaEval Crowdsourcing task looked at the problem of working with noisy crowdsourced annotations of image data. The aim of the task was to investigate possible techniques for estimating the true labels of an image by using the set of noisy crowdsourced labels, and possibly any content and metadata from the image itself. For the runs in this paper, we’ve applied a shotgun approach and tried a number of existing techniques, which include generative probabilistic models and further crowdsourcing.

The role of multimedia in archiving community memories

Jonathon S. Hare, David Dupplaw, Wendy Hall, Paul Lewis, Kirk Martinez
Conference or Workshop Item

Abstract

The data contained on the web and social web is inherently multimedia; consisting of a mix of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. This paper explores some uses for the automatic analysis of multimedia data within the context of the archival and post-archival analysis of community memories on the web and social web.

OpenIMAJ – Intelligent Multimedia Analysis in Java

Jonathon Hare, Sina Samangooei, David Dupplaw
Article

Building a Multimedia Web Observatory Platform

Jonathon Hare, David Dupplaw, Wendy Hall, Paul H. Lewis, Kirk Martinez
Conference or Workshop Item

Abstract

The data contained within the web is inherently multimedia; consisting of a rich mix of textual, visual and audio modalities. Prospective Web Observatories need to take this into account from the ground up. This paper explores some uses for the automatic analysis of multimedia data within a Web Observatory, and describes a potential platform for an extensible and scalable multimedia Web Observatory.

The Southampton University Web Observatory

Wendy Hall, Thanassis Tiropanis, Ramine Tinati, Paul Booth, Paul Gaskell, Jonathon Hare, Les Carr
Conference or Workshop Item

Twitter's visual pulse

Jonathon Hare, Sina Samangooei, David Dupplaw, Paul H. Lewis
Conference or Workshop Item

Abstract

Millions of images are tweeted every day, yet very little research has looked at the non-textual aspect of social media communication. In this work we have developed a system to analyse streams of image data. In particular we explore trends in similar, related, evolving or even duplicated visual artefacts in the mass of tweeted image data — in short, we explore the visual pulse of Twitter.

Explicit diversification of image search

Jonathon Hare, Paul H. Lewis
Conference or Workshop Item

Abstract

Search result diversification can increase user satisfaction in answering a particular information need. There are many ways of diversify search results. In some cases the user has a clear idea of how they would like to see their results diversified. This work presents a system that is capable of diversifying search results along specific user-specified axes of diversity.

Practical scalable image analysis and indexing using Hadoop

Jonathon S. Hare, Sina Samangooei, Paul H. Lewis
Article

Abstract

The ability to handle very large amounts of image data is important for image analysis, indexing and retrieval applications. Sadly, in the literature, scalability aspects are often ignored or glanced over, especially with respect to the intricacies of actual implementation details.

In this paper we present a case-study showing how a standard bag-of-visual-words image indexing pipeline can be scaled across a distributed cluster of machines. In order to achieve scalability, we investi- gate the optimal combination of hybridisations of the MapReduce distributed computational framework which allows the components of the analysis and indexing pipeline to be effectively mapped and run on modern server hardware. We then demonstrate the scalability of the approach practically with a set of image analysis and indexing tools built on top of the Apache Hadoop MapReduce framework. The tools used for our experiments are freely available as open-source software, and the paper fully describes the nuances of their implementation.

Semantically Tagging Images of Landmarks

Heather S. Packer, Jonathon S. Hare, Sina Samangooei, Paul Lewis
Conference or Workshop Item

PicAlert!: a system for privacy-aware image classification and retrieval

Sergej Zerr, Stefan Siersdorfer, Jonathon Hare
Conference or Workshop Item

Abstract

Photo publishing in Social Networks and other Web2.0 applications has become very popular due to the pervasive availability of cheap digital cameras, powerful batch upload tools and a huge amount of storage space. A portion of uploaded images are of a highly sensitive nature, disclosing many details of the users’ private life. We have developed a web service which can detect private images within a user’s photo stream and provide support in making privacy decisions in the sharing context. In addition, we present a privacy-oriented image search application which automatically identifies potentially sensitive images in the result set and separates them from the remaining pictures

Event Detection using Twitter and Structured Semantic Query Expansion

Heather S. Packer, Sina Samangooei, Jonathon S. Hare, Nicholas Gibbins, Paul Lewis
Conference or Workshop Item

Proceedings of the 1st International Workshop on Knowledge Extraction & Consolidation from Social Media (KECSM-2012), Boston, USA, November 12, 2012

Diana Maynard, Stefan Dietze, Wim Peters, Jonathon Hare
Book

I know what you did last summer! - privacy-aware image classification and search

Sergej Zerr, Stefan Siersdorfer, Jonathon Hare, Elena Demidova
Conference or Workshop Item

ImageTerrier: an extensible platform for scalable high-performance image retrieval

Jonathon Hare, Sina Samangooei, David Dupplaw, Paul H. Lewis
Conference or Workshop Item

OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia Analysis and Indexing of Images

Jonathan Hare, Sina Samangooei, David Dupplaw
Conference or Workshop Item

Abstract

OpenIMAJ and ImageTerrier are recently released open-source libraries and tools for experimentation and development of multimedia applications using Java-compatible programming languages. OpenIMAJ (the Open toolkit for Intelligent Multimedia Analysis in Java) is a collection of libraries for multimedia analysis. The image libraries contain methods for processing images and extracting state- of-the-art features, including SIFT. The video and audio libraries support both cross-platform capture and processing. The clustering and nearest-neighbour libraries contain efficient, multi-threaded implementations of clustering algorithms. The clustering library makes it possible to easily create BoVW representations for images and videos. OpenI-MAJ also incorporates a number of tools to enable extremely- large-scale multimedia analysis using distributed computing with Apache Hadoop. ImageTerrier is a scalable, high-performance search engine platform for content-based image retrieval applications using features extracted with the OpenIMAJ library and tools. The ImageTerrier platform provides a comprehensive test-bed for experimenting with image retrieval techniques. The platform incorporates a state-of-the-art implementation of the single-pass indexing technique for constructing inverted indexes and is capable of producing highly compressed index data structures.

Efficient clustering and quantisation of SIFT features: Exploiting characteristics of the SIFT descriptor and interest region detectors under image inversion

Jonathon Hare, Sina Samangooei, Paul Lewis
Conference or Workshop Item

Abstract

The SIFT keypoint descriptor is a powerful approach to encoding local image description using edge orientation histograms. Through codebook construction via k-means clustering and quantisation of SIFT features we can achieve image retrieval treating images as bags-of-words. Intensity inversion of images results in distinct SIFT features for a single local image patch across the two images. Intensity inversions notwithstanding these two patches are structurally identical. Through careful reordering of the SIFT feature vectors, we can construct the SIFT feature that would have been generated from a non-inverted image patch starting with those extracted from an inverted image patch. Furthermore, through examination of the local feature detection stage, we can estimate whether a given SIFT feature belongs in the space of inverted features, or non-inverted features. Therefore we can consistently separate the space of SIFT features into two distinct subspaces. With this knowledge, we can demonstrate reduced time complexity of codebook construction via clustering by up to a factor of four and also reduce the memory consumption of the clustering algorithms while producing equivalent retrieval results.

Analyzing and Predicting Sentiment of Images on the Social Web

Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan Deng
Conference or Workshop Item

Abstract

In this paper we study the connection between sentiment of images expressed in metadata and their visual content in the social photo sharing environment Flickr. To this end, we consider the bag-of-visual words representation as well as the color distribution of images, and make use of the SentiWordNet thesaurus to extract numerical values for their sentiment from accompanying textual metadata. We then perform a discriminative feature analysis based on information theoretic methods, and apply machine learning techniques to predict the sentiment of images. Our large-scale empirical study on a set of over half a million Flickr images shows a considerable correlation between sentiment and visual features, and promising results towards estimating the polarity of sentiment in images.

Image and Collateral Text in Support of Auto-annotation and Sentiment Analysis

Pamela Zontone, Giulia Boato, Jonathon Hare, Paul Lewis, Stefan Siersdorfer, Enrico Minack
Conference or Workshop Item

Abstract

We present a brief overview of the way in which image analysis, coupled with associated collateral text, is being used for auto-annotation and sentiment analysis. In particular, we describe our approach to auto-annotation using the graph- theoretic dominant set clustering algorithm and the annotation of images with sentiment scores from SentiWordNet. Preliminary results are given for both, and our planned work aims to explore synergies between the two approaches.

Automatically Annotating the MIR Flickr Dataset: Experimental Protocols, Openly Available Data and Semantic Spaces

Jonathan Hare, Paul Lewis
Conference or Workshop Item

Abstract

The availability of a large, freely redistributable set of high-quality annotated images is critical to allowing researchers in the area of automatic annotation, generic object recognition and concept detection to compare results. The recent introduction of the MIR Flickr dataset allows researchers such access. A dataset by itself is not enough, and a set of repeatable guidelines for performing evaluations that are comparable is required. In many cases it also is useful to compare the machine-learning components of different automatic annotation techniques using a common set of image features. This paper seeks to provide a solid, repeatable methodology and protocol for performing evaluations of automatic annotation software using the MIR Flickr dataset together with freely available tools for measuring performance in a controlled manner. This protocol is demonstrated through a set of experiments using a “semantic space” auto-annotator previously developed by the authors, in combination with a set of visual term features for the images that has been made publicly available for download. The paper also discusses how much training data is required to train the semantic space annotator with the MIR Flickr dataset. It is the hope of the authors that researchers will adopt this methodology and produce results from their own annotators that can be directly compared to those presented in this work.

Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces

Jonathan Hare, Paul Lewis
Conference or Workshop Item

Abstract

This paper proposes a new technique for auto-annotation and semantic retrieval based upon the idea of linearly mapping an image feature space to a keyword space. The new technique is compared to several related techniques, and a number of salient points about each of the techniques are discussed and contrasted. The paper also discusses how these techniques might actually scale to a real-world retrieval problem, and demonstrates this though a case study of a semantic retrieval technique being used on a real-world data-set (with a mix of annotated and unannotated images) from a picture library.

Image diversity analysis: context, opinion and bias

Pamela Zontone, Giulia Boato, F. G. B. De Natale, Alessia De Rosa, Mauro Barni, Alessandro Piva, Jonathan Hare, David Dupplaw, Paul Lewis
Conference or Workshop Item

Abstract

The diffusion of new Internet and web technologies has increased the distribution of different digital content, such as text, sounds, images and videos. In this paper we focus on images and their role in the analysis of diversity. We consider diversity as a concept that takes into account the wide variety of information sources, and their differences in perspective and viewpoint. We describe a number of different dimensions of diversity; in particular, we analyze the dimensions related to image searches and context analysis, emotions conveyed by images and opinion mining, and bias analysis.

IAM@ImageCLEFPhotoAnnotation 2009: Naïve application of a linear-algebraic semantic space

Jonathan Hare, Paul Lewis, Francesca Borri, Alessandro Nardi, Carol Peters
Conference or Workshop Item

Abstract

This paper describes Southampton's submissions to the 2009 ImageCLEF photo annotation task. For the task we used an annotation system based on the idea of constructing semantic spaces, which was developed previously at Southampton. To represent the image content, we used a combination of different SIFT and Colour-SIFT features detected using the difference-of-Gaussian and MSER techniques. These features were converted into a visual term representation by applying vector quantisation using a codebook learnt from a hierarchical k-means clustering. In terms of EER and AUC, the annotator performs reasonably well, however, it struggles when evaluated using the hierarchical measure proposed for the task, due to the way the annotation confidences are thresholded.

IAM@ImageCLEFphoto 2009: Experiments on Maximising Diversity using Image Features

Jonathan Hare, David Dupplaw, Paul Lewis, Francesca Borri, Alessandro Nardi, Carol Peters
Conference or Workshop Item

Abstract

This paper describes the diversity enabled retrieval system constructed at Southampton for the ImageCLEFphoto 2009 task. The retrieval system used Terrier as the underlying textual indexing and retrieval system, and combined it with a technique for re-ranking the results by maximising the visual dissimilarity of retrieved images. The results show that our visual re-ranking method does indeed work at increasing the diversity in the top results, however, at the same time it causes a slight drop in precision. The text-based approach designed for handling the 'part 1 topics' of the task is also shown to perform very well.

Delivery of QTIv2 question types

Gary Wills, Hugh Davis, Lester Gilbert, Jonathon Hare, Yvonne Howard, Steve Jeyes, David Millard, Robert Sherratt
Article

Abstract

The IMS Question and Test Interoperability (QTI) standard identifies sixteen different question types which may be used in on-line assessment. While some partial implementations exist, the R2Q2 project has developed a complete solution that renders and responds to all sixteen question types as specified. In addition, care has been taken in the R2Q2 project to ensure that the solution produced will allow for future changes in the specification. The design of R2Q2 is described, the focus being on lessons learnt. We describe the architecture and the rationale of the internal Web services and explain the approach taken in implementing the QTI specification, showing how the design allows for future tags to be added with the minimal of programming effort. The QTI standard has not had a great take-up in part due to the lack of tools. In the 2006 JISC Capital, three Assessment projects were commissioned: item authoring, item banking, and QTI-compliant test delivery. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to the item authoring and item banking services, and the integration of the R2Q2 Web service.

Application of the LifeGuide: The development and quantitative analysis of the 'Internet Doctor'

J.A. Joseph, L. Yardley, Jonathan Hare, Adrian Osmond, Yang Yang, Mark Weal, Gary Wills
Conference or Workshop Item

Abstract

LifeGuide is a software package that allows health professionals and researchers with no programming skills to easily and flexibly create, evaluate and modify behavioural interventions. An intervention called the ‘Internet Doctor’ was developed as a way of identifying many of the tools that were required in LifeGuide. The ‘Internet Doctor’ provides people suffering from cold and flu symptoms with tailored advice for the self-care of cold and flu symptoms. Participants were automatically randomised to one of two versions of the website: (i) the full, ‘more interactive’ version, or, (ii) a ‘less interactive’ version which omitted references to the Internet Doctor and links to obtain further information. Participants who viewed the less interactive version were more likely to complete the full consultation cycle for their selected symptom and were also more likely to consult for more symptoms than those in the less interactive version. Few participants clicked on the optional links in the more interactive version. It is concluded that although the more interactive version of the website provided more information, participants did not make full use of the interactive features which displayed this information, and did not consult for as many symptoms, so may not have benefited from the website as much as those viewing the less interactive version.

Designing authoring tools for the creation of on-line behavioural interventions

Adrian Osmond, Jonathan Hare, Joseph Price, Ashley Smith, Mark Weal, Gary Wills, Yang Yang, Lucy Yardley, David De Roure
Conference or Workshop Item

Abstract

Behavioural interventions are used by social scientists to effect change in a person’s behaviour. The LifeGuide project is developing tools to enable the easy creation, deployment and trialling of Internet-based behavioural interventions. The use of on-line behavioural interventions is appealing as it can be more cost effective than face-to-face interventions, can deliver tailored advice at times that suit the participants, and can provide detailed statistical information that can be used to better understand behaviour or demonstrate the efficacy of the interventions themselves. The problem however is that developing on-line interventions is a complex, time-consuming task that often has involved high levels of specialist computing support in construction and delivery. The LifeGuide project is looking to put tools into the hands of domain specialists (psychologists, social scientists, health professionals, etc.) that enable them to easily construct their own behavioural interventions and deploy them on the Internet. This paper looks at the authoring tools currently being developed by the project, assesses their usability through case studies of interventions developed so far, and suggests where the project will look in the future to continue to improve the tools to meet the needs of the wide range of intervention authors.

Application of the LifeGuide: The development and quantitative analysis of the 'Internet Doctor'

J.A. Joseph, Lucy Yardley, J Hare, A Osmond, Yang Yang, Mark J. Weal, Gary B. Wills, S Michie
Conference or Workshop Item

Abstract

LifeGuide is a software package that allows health professionals and researchers
with no programming skills to easily and flexibly create, evaluate and modify behavioural
interventions. An intervention called the ‘Internet Doctor’ was developed as a way of
identifying many of the tools that were required in LifeGuide. The ‘Internet Doctor’ provides
people suffering from cold and flu symptoms with tailored advice for the self-care of cold and
flu symptoms. Participants were automatically randomised to one of two versions of the
website: (i) the full, ‘more interactive’ version, or, (ii) a ‘less interactive’ version which
omitted references to the Internet Doctor and links to obtain further information. Participants
who viewed the less interactive version were more likely to complete the full consultation
cycle for their selected symptom and were also more likely to consult for more symptoms
than those in the less interactive version. Few participants clicked on the optional links in the
more interactive version. It is concluded that although the more interactive version of the
website provided more information, participants did not make full use of the interactive
features which displayed this information, and did not consult for as many symptoms, so may
not have benefited from the website as much as those viewing the less interactive version.

Introduction to the LifeGuide: software facilitating the development of interactive internet interventions

Lucy Yardley, Adrian Osmond, Jonathon Hare, Gary Wills, Mark Weal, Dave De Roure, Susan Michie
Conference or Workshop Item

Abstract

We are developing a set of software resources named
‘the LifeGuide’ that will enable researchers to collaboratively
create, evaluate and modify two central dimensions of
behavioural interventions: a) providing tailored advice; b)
supporting sustained behaviour.

Introduction to the LifeGuide: software facilitating the development of interactive behaviour change internet interventions

Lucy Yardley, Adrian Osmond, Jonathan Hare, Gary Wills, Mark Weal, David De Roure, Susan Michie
Conference or Workshop Item

Abstract

We are developing a set of software resources named ‘the LifeGuide’ that will enable researchers to collaboratively create, evaluate and modify two central dimensions of behavioural interventions: a) providing tailored advice; b) supporting sustained behaviour.

LifeGuide: A platform for performing web-based behavioural interventions

Jonathan Hare, Adrian Osmond, Yang Yang, Gary Wills, Mark Weal, David De Roure, Judith Joseph, Lucy Yardley
Conference or Workshop Item

Abstract

Behavioural interventions are a technique used by social scientists and health professionals to mediate the behaviour of a subject. Traditionally, interventions take the form of tailored advice given in a face-to-face setting. Internet-based behavioural interventions harness the power of the web to deliver tailored advice to participants at the time that most suits them. The LifeGuide project is a multidisciplinary collaboration with the aim of developing and proving a set of software tools for the development and deployment of internet-based behavioural interventions. The tools developed in LifeGuide cover the complete lifecycle of an intervention, from initial authoring to trialling and refinement to final deployment. Looking ahead, in the longer term we intend to investigate how the LifeGuide toolset can be applied to other domains.

Assessment delivery engine for QTIv2 tests

Gary Wills, Jonathan Hare, Jiri Kajaba, David Argles, Lester Gilbert, David Millard
Conference or Workshop Item

Abstract

The IMS Question and Test Interoperability (QTI) standard has not had a great take-up in part due to the lack of tools. In the 2006 JISC Capital, three Assessment projects were commissioned: item authoring, item banking, and QTI-compliant test delivery. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to the item authoring and item banking services, and the integration of the R2Q2 Web service. The project first developed a java library to implement the system. This will allow other developers and researchers to build their own system or take aspects of QTI they want to implement.

Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces

Jonathan Hare, Sina Samangooei, Paul Lewis, Mark Nixon
Conference or Workshop Item

Abstract

Semantic spaces encode similarity relationships between objects as a function of position in a mathematical space. This paper discusses three different formulations for building semantic spaces which allow the automatic-annotation and semantic retrieval of images. The models discussed in this paper require that the image content be described in the form of a series of visual-terms, rather than as a continuous feature-vector. The paper also discusses how these term-based models compare to the latest state-of-the-art continuous feature models for auto-annotation and retrieval.

A delivery engine for QTI assessments

Gary Wills, Jonathan Hare, Jiri Kajaba, David Argles, Lester Gilbert, David Millard
Article

Abstract

The IMS Question and Test Interoperability (QTI) standard has had a restricted take-up, in part due to the lack of tools. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to item authoring and item banking services, and the integration of the R2Q2 web service. The tools developed operate with a web client, as a plug-in to Moodle, or as a desktop application. The paper also reports on the load testing of the internal services and concludes that these are best represented as components. The project first developed a Java library to implement the system. This will allow other developers and researchers to build their own system or incorporate aspects of QTI they want to implement

Assessment Delivery Engine for QTIv2 Tests.

Gary Wills, Lester Gilbert, Jonathan Hare, Jiri Kajaba, David Argles, David Millard
Conference or Workshop Item

Abstract

The IMS Question and Test Interoperability (QTI) standard has not had a great take-up in part due to the lack of tools. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to the item authoring and item banking services, and the integration of the R2Q2 Web service. The project first developed a java library to implement the system. This will allow other developers and researchers to build their own system or take aspects of QTI they want to implement.

Giving order to image queries

Jonathan Hare, Patrick Sinclair, Paul Lewis, Kirk Martinez, Theo Gevers, Ramesh Jain, Simone Santini
Conference or Workshop Item

Abstract

Users of image retrieval systems often find it frustrating that the image they are looking for is not ranked near the top of the results they are presented. This paper presents a computational approach for ranking keyworded images in order of relevance to a given keyword. Our approach uses machine learning to attempt to learn what visual features within an image are most related to the keywords, and then provide ranking based on similarity to a visual aggregate. To evaluate the technique, a Web 2.0 application has been developed to obtain a corpus of user-generated ranking information for a given image collection that can be used to evaluate the performance of the ranking algorithm.

MapSnapper: Engineering an Efficient Algorithm for Matching Images of Maps from Mobile Phones

Jonathan Hare, Paul Lewis, Layla Gordon, Glenn Hart, Theo Gevers, Ramesh Jain, Simone Santini
Conference or Workshop Item

Abstract

The MapSnapper project aimed to develop a system for robust matching of low-quality images of a paper map taken from a mobile phone against a high quality digital raster representation of the same map. The paper presents a novel methodology for performing content-based image retrieval and object recognition from query images that have been degraded by noise and subjected to transformations through the imaging system. In addition the paper also provides an insight into the evaluation-driven development process that was used to incrementally improve the matching performance until the design specifications were met.

Facing the reality of semantic image retrieval

Peter G. B. Enser, Christine J. Sandom, Jonathon S. Hare, Paul H. Lewis
Article

Abstract

Purpose – To provide a better-informed view of the extent of the semantic gap in image retrieval, and the limited potential for bridging it offered by current semantic image retrieval techniques. Design/methodology/approach – Within an ongoing project, a broad spectrum of operational image retrieval activity has been surveyed, and, from a number of collaborating institutions, a test collection assembled which comprises user requests, the images selected in response to those requests, and their associated metadata. This has provided the evidence base upon which to make informed observations on the efficacy of cutting-edge automatic annotation techniques which seek to integrate the text-based and content-based image retrieval paradigms. Findings – Evidence from the real-world practice of image retrieval highlights the existence of a generic-specific continuum of object identification, and the incidence of temporal, spatial, significance and abstract concept facets, manifest in textual indexing and real-query scenarios but often having no directly visible presence in an image. These factors combine to limit the functionality of current semantic image retrieval techniques, which interpret only visible features at the generic extremity of the generic-specific continuum. Research limitations/implications – The project is concerned with the traditional image retrieval environment in which retrieval transactions are conducted on still images which form part of managed collections. The possibilities offered by ontological support for adding functionality to automatic annotation techniques are considered. Originality/value – The paper offers fresh insights into the challenge of migrating content-based image retrieval from the laboratory to the operational environment, informed by newly-assembled, comprehensive, live data.

Bridging the Semantic Gap in Visual Information Retrieval: End of Project Report

Christine Sandom, Jonathan Hare, Peter Enser, Paul Lewis
Monograph

Delivery of QTIv2 Question Types

Gary Wills, Hugh Davis, Lester Gilbert, Jonathon Hare, Yvonne Howard, Steve Jeyes, David Millard, Robert Sherratt
Conference or Workshop Item

Abstract

The QTI standard identifies sixteen different question types which may be used in on-line assessment. While some partial implementations exist, the R2Q2 project has developed a complete solution that renders and responds to all sixteen question types as specified. In addition, care has been taken in the R2Q2 project to ensure that the solution produced will allow for future changes in the specification. The paper summarises the rationale of Web services and a Service Oriented Architecture, and then demonstrates how the R2Q2 project integrates into JISC’s e-Framework, and the reference model for assessment (FREMA). The design of R2Q2 is described, the focus being on lessons learnt. We describe the architecture and the rationale of the internal Web services and explain the approach taken in implementing the QTI specification, showing how the design allows for future tags to be added with the minimal of programming effort. A major objective of the design was to solve the problem of having to undertake a major redesign and reimplementation as a result of minor modifications to the specification. In the 2006 Capital Programme from JISC, three new projects were commissioned in the area of Assessment: one for authoring of items, one for item banking, and one for a complete test engine as described in the QTI specification. The R2Q2 Web service is at the heart of all three projects and this paper will describe how the R2Q2 Web service will be used.

Semantic Facets: An in-depth Analysis of a Semantic Image Retrieval System

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom
Conference or Workshop Item

Abstract

This paper introduces a faceted model of image semantics which attempts to express the richness of semantic content interpretable within an image. Using a large image data-set from a museum collection the paper shows how the facet representation can be applied. The second half of the paper describes our semantic retrieval system, and demonstrates its use with the museum image collection. A retrieval evaluation is performed using the system to investigate how the retrieval performance varies with respect to each of the facet categories. A number of factors related to the image data-set that affect the quality of retrieval are also discussed.

How to spot a Dalmatian in a pack of Dogs; A data-driven approach to searching unannotated images using natural language

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom
Conference or Workshop Item

Abstract

This poster demonstrates our recent work in the field of intelligent image retrieval in response to real requests from the practitioner domain. The poster shows how we are developing a data-driven 'semantic space' framework for information retrieval which can enable retrieval of unannotated imagery through natural language queries, and also facilitate automatic annotation of imagery.

Saliency for Image Description and Retrieval

Jonathon S. Hare
Thesis

Abstract

We live in a world where we are surrounded by ever increasing numbers of images. More often than not, these images have very little metadata by which they can be indexed and searched. In order to avoid information overload, techniques need to be developed to enable these image collections to be searched by their content. Much of the previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This thesis initially discusses how this problem can be circumvented by using salient interest regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The thesis discusses a number of different saliency detectors that are suitable for robust retrieval purposes and performs a comparison between a number of these region detectors. The thesis then discusses how salient regions can be used for image retrieval using a number of techniques, but most importantly, two techniques inspired from the field of textual information retrieval. Using these robust retrieval techniques, a new paradigm in image retrieval is discussed, whereby the retrieval takes place on a mobile device using a query image captured by a built-in camera. This paradigm is demonstrated in the context of an art gallery, in which the device can be used to find more information about particular images. The final chapter of the thesis discusses some approaches to bridging the semantic gap in image retrieval. The chapter explores ways in which un-annotated image collections can be searched by keyword. Two techniques are discussed; the first explicitly attempts to automatically annotate the un-annotated images so that the automatically applied annotations can be used for searching. The second approach does not try to explicitly annotate images, but rather, through the use of linear algebra, it attempts to create a semantic space in which images and keywords are positioned such that images are close to the keywords that represent them within the space.

Ambient Gestures

Maria Karam, Jonathon Hare, Paul Lewis, m.c. schraefel
Monograph

Abstract

We present Ambient Gestures, a novel gesture-based system designed to support ubiquitous ‘in the environment’ interactions with everyday computing technology. Hand gestures and audio feedback allow users to control computer applications without reliance on a graphical user interface, and without having to switch from the context of a non-computer task to the context of the computer. The Ambient Gestures system is composed of a vision recognition software application, a set of gestures to be processed by a scripting application and a navigation and selection application that is controlled by the gestures. This system allows us to explore gestures as the primary means of interaction within a multimodal, multimedia environment. In this paper we describe the Ambient Gestures system, define the gestures and the interactions that can be achieved in this environment and present a formative study of the system. We conclude with a discussion of our findings and future applications of Ambient Gestures in ubiquitous computing.

Mind the Gap: Another look at the problem of the semantic gap in image retrieval

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom, Edward Y. Chang, Alan Hanjalic, Nicu Sebe
Conference or Workshop Item

Abstract

This paper attempts to review and characterise the problem of the semantic gap in image retrieval and the attempts being made to bridge it. In particular, we draw from our own experience in user queries, automatic annotation and ontological techniques. The first section of the paper describes a characterisation of the semantic gap as a hierarchy between the raw media and full semantic understanding of the media's content. The second section discusses real users' queries with respect to the semantic gap. The final sections of the paper describe our own experience in attempting to bridge the semantic gap. In particular we discuss our work on auto-annotation and semantic-space models of image retrieval in order to bridge the gap from the bottom up, and the use of ontologies, which capture more semantics than keyword object labels alone, as a technique for bridging the gap from the top down.

Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches

Jonathon S. Hare, Patrick A. S. Sinclair, Paul H. Lewis, Kirk Martinez, Peter G.B. Enser, Christine J. Sandom, Paolo Bouquet, Roberto Brunelli, Jean-Pierre Chanod, Claudia Niederée, Heiko Stoermer
Conference or Workshop Item

Abstract

Semantic representation of multimedia information is vital for enabling the kind of multimedia search capabilities that professional searchers require. Manual annotation is often not possible because of the shear scale of the multimedia information that needs indexing. This paper explores the ways in which we are using both top-down, ontologically driven approaches and bottom-up, automatic-annotation approaches to provide retrieval facilities to users. We also discuss many of the current techniques that we are investigating to combine these top-down and bottom-up approaches.

The Reality of the Semantic Gap in Image Retrieval

Peter G. B. Enser, Christine J. Sandom, Paul H. Lewis, Jonathon S. Hare
Conference or Workshop Item

A Linear-Algebraic Technique with an Application in Semantic Image Retrieval

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom, Hari Sundaram, Milind Naphade, John R. Smith, Yong Rui
Article

Abstract

This paper presents a novel technique for learning the underlying structure that links visual observations with semantics. The technique, inspired by a text-retrieval technique known as cross-language latent semantic indexing uses linear algebra to learn the semantic structure linking image features and keywords from a training set of annotated images. This structure can then be applied to unannotated images, thus providing the ability to search the unannotated images based on keyword. This factorisation approach is shown to perform well, even when using only simple global image features.

Image Auto-annotation using a Statistical Model with Salient Regions

Jiayu Tang, Jonathon S. Hare, Paul H. Lewis
Conference or Workshop Item

Abstract

Traditionally, statistical models for image auto-annotation have been coupled with image segmentation. Considering the performance of the current segmentation algorithms, it can be meaningful to avoid a segmentation stage. In this paper, we propose a new approach to image auto-annotation using statistical models. In this approach, segmentation is avoided through the use of salient regions. The use of the statistical model results in an annotation performance which improves upon our previously proposed saliency-based word propagation technique. We also show that the use of salient regions achieves better results than the use of general image regions or segments.

iGesture: A Platform for Investigating Multimodal, Multimedia Gesture-based Interactions

Jonathon Hare, Maria Karam, Paul Lewis, m.c. schraefel
Monograph

Abstract

This paper introduces the iGesture platform for investigating multimodal gesture based interactions in multimedia contexts. iGesture is a low-cost, extensible system that uses visual recognition of hand movements to support gesture-based input. Computer vision techniques support gesture based interactions that are lightweight, with minimal interaction constraints. The system enables gestures to be carried out 'in the environment' at a distance from the camera, enabling multimodal interaction in a naturalistic, transparent manner in a ubiquitous computing environment. The iGesture system can also be rapidly scripted to enable gesture-based input with a wide variety of applications. In this paper we present the technology behind the iGesture software, and a performance evaluation of the gesture recognition subsystem. We also present two exemplar multimedia application contexts which we are using to explore ambient gesture-based interactions.

Saliency-based Models of Image Content and their Application to Auto-Annotation by Semantic Propagation

Jonathon S. Hare, Paul H. Lewis
Conference or Workshop Item

Abstract

In this paper, we propose a model of automatic image annotation based on propagation of keywords. The model works on the premise that visually similar image content is likely to have similar semantic content. Image content is extracted using local descriptors at salient points within the image and quantising the feature-vectors into visual terms. The visual terms for each image are modelled using techniques taken from the information retrieval community. The modelled information from an unlabelled query image is compared to the models of a corpus of labelled images and labels are propagated from the most similar labelled images to the query image.

On Image Retrieval using Salient Regions with Vector-Spaces and Latent Semantics

Jonathon S. Hare, Paul H. Lewis, Wee-Kheng Leow, Michael S. Lew, Tat-Seng Chua, Wei-Ying Ma, Lekha Chaisorn, Erwin M. Bakker
Article

Abstract

The vector-space retrieval model and Latent Semantic Indexing approaches to retrieval have been used heavily in the field of text information retrieval over the past years. The use of these approaches in image retrieval, however, has been somewhat limited. In this paper, we present methods for using these techniques in combination with an invariant image representation based on local descriptors of salient regions. The paper also presents an evaluation in which the two techniques are used to find images with similar semantic labels.

Content-based image retrieval using a mobile device as a novel interface

Jonathon S. Hare, Paul H. Lewis, Rainer W. Lienhart, Noburu Babaguchi, Edward Y. Chang
Conference or Workshop Item

Abstract

This paper presents an investigation into the use of a mobile device as a novel interface to a content-based image retrieval system. The initial development has been based on the concept of using the mobile device in an art gallery for mining data about the exhibits, although a number of other applications are envisaged. The paper presents a novel methodology for performing content-based image retrieval and object recognition from query images that have been degraded by noise and subjected to transformations through the imaging system. The methodology uses techniques inspired from the information retrieval community in order to aid efficient indexing and retrieval. In particular, a vector-space model is used in the efficient indexing of each image, and a two-stage pruning/ranking procedure is used to determine the correct matching image. The retrieval algorithm is shown to outperform a number of existing algorithms when used with query images from the mobile device.

Salient Regions for Query by Image Content

Jonathon S. Hare, Paul H. Lewis, Peter Enser, Yiannis Kompatsiaris, Noel E. O'Connor
Article

Abstract

Much previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This paper discusses how this problem can be circumvented by using salient interest points and compares and contrasts an extension to previous work in which the concept of scale is incorporated into the selection of salient regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The paper describes and contrasts two such salient region descriptors and compares them through their repeatability rate under a range of common image transforms. Finally, the paper goes on to investigate the performance of one of the salient region detectors in an image retrieval situation.

Scale Saliency: Applications in Visual Matching, Tracking and View-Based Object Recognition

Jonathon S. Hare, Paul H. Lewis
Conference or Workshop Item

Abstract

In this paper, we introduce a novel technique for image matching and feature-based tracking. The technique is based on the idea of using the Scale-Saliency algorithm to pick a sparse number of ‘interesting’ or ‘salient’ features. Feature vectors for each of the salient regions are generated and used in the matching process. Due to the nature of the sparse representation of feature vectors generated by the technique, sub-image matching is also accomplished. We demonstrate the techniques robustness to geometric transformations in the query image and suggest that the technique would be suitable for view-based object recognition. We also apply the matching technique to the problem of feature tracking across multiple video frames by matching salient regions across frame pairs. We show that our tracking algorithm is able to explicitly extract the 3D motion vector of each salient region during the tracking process, using a single uncalibrated camera. We illustrate the functionality of our tracking algorithm by showing results from tracking a single salient region in near real-time with a live camera input.