The 'big data' challenge
Volume, variety, veracity and velocity of data.
According to computer giant IBM, 2.5 exabytes – that’s 2.5bn gigabytes – of data were generated on the Web every day in 2012 and Europe’s governments are sitting on data assets that could be worth €40bn a year. Dr Elena Simperl, Associate Professor in the Web Internet Science Group explains the issues with ‘big data’ and what this wealth of data could be used for.
Big data is the term used for a collection of data sets so large and complex that it becomes difficult to process. However, when focusing on the amount of data produced on the Web, it’s not just about volume, Elena explains.
“We are constantly being bombarded with a variety of different data in different languages, and different formats such as images and videos. The veracity of the data is also an issue; it could be inaccurate or out of date,” she says. “Finally velocity of data is something to think about. Many data these days come from data streams – for example in social media where a stream of posts changes all the time – instead of one database where data may remain unchanged for long periods.”
Elena is an expert in semantic technologies, which involve using formal knowledge representation and artificial intelligence methods to teach machines to understand the meaning of data and potentially predict behaviour in the future. “Semantic technologies give us the means to infer new knowledge based on what we see. They also give us the means to integrate data that comes from many different sources on the Web,” says Elena.
This kind of technology is very useful in digital marketing. Data from social media analysis can give huge insights into how consumers think. “In marketing it is important to predict how a certain message will be received, but also to suggest when a brand should take the initiative and launch a new campaign or reach out to particular customers,” Elena explains.
“Our research looks at predicting how behaviour on the Web will change by observing what has already happened, but also – and this is where web science comes in – by injecting what we know from theories from social sciences and economics about human behaviour.”
Our research looks at predicting how behaviour on the Web will change by observing what has already happened, but also – and this is where web science comes in – by injecting what we know from theories from social sciences and economics about human behaviour
Elena and her team are also focusing on improving the quality of data using social computing and crowdsourcing platforms. Their latest project involves collecting and analysing data on light pollution. With increased urbanisation and more people moving from the countryside to the cities, and with our increase in energy consumption, pollution from lighting is still a big issue. “We are working with communities that are concerned with this development, collecting data from them as well as from sensors from different environmental agencies,” says Elena. “Using big algorithms, we identify patterns in behaviour and use comparative analysis between phenomena that happen across the world, to understand what causes changes in light pollution in different areas.”
“The overall aim of the project is to use all this crowdsourced information to determine how we could create light sources that are better for the environment and for the people living in those areas,” Elena says. “Our approach to all our work is to think about the human element involved when solving technological problems – that’s what web science is all about.”
Southampton is the birthplace of the web science discipline, which provides a thorough understanding of the Web as a social and technical phenomenon. This demands new ways of working, across traditional academic disciplines, to build skills and expertise in the technical underpinnings of the Web as well as the social processes that have shaped its evolution and the impact of the Web on society.
“In recent times our Web and Internet Science research group has fought for more awareness of interdisciplinary research and the importance of the Web across disciplines,” explains Elena. “Going forward, this is the only way we can tackle global issues such as big data, by collaborating with social and human scientists to make sure we understand behaviour. Then we can develop tools that are going to be really useful for the user.”