We hosted our Cambridge Artificial Intelligence Summit, sponsored by Cambridge Judge Business School Executive Education, on 15–16 June, welcoming Analysts, Data Scientists and Researchers to network, develop new skills and gain insight into the evolving field of Data Science.
Dr Steven McDermott
Qualitative Analysis and Social Media Lead, HMRC
Session: AI as Moderator/Mediator in the Recognition of Citizen’s Voice with Social Media at the Cambridge AI Summit in June 2018.
Abstract: Government departments are now utilising customer feedback channels and social media in an attempt to respond to crowdsourced insights and eventually informing policy. They are also using social media listening platforms to listen in to conversations taking place regarding their departments. They are also taking tentative steps into machine learning and AI techniques. The debates surrounding these tools have tended to frame such activity as surveillance and opening up the possibility of Armageddon with the rise of the machines. However, how can the voice of the citizen be recognised and responded to if these departments are discouraged from listening and using the latest tools? Does the utilisation of social media, machine learning and AI offer the potential means of escaping from the stranglehold of top–down, stage–managed politics. If millions of people could be the producers as well as receivers of political messages, could that invigorate democracy? And what role will machine learning and AI play in this emerging new media ecology? I intend to present a peak behind the curtain regarding the level of listening that is taking place and how machine learning and AI are being applied. Asking can this be done ethically and to enhance democratic processes and improve evidence based policy decisions. In which ways will democratic institutions have to change in order to meet these challenges?
Join our community of hundreds of researchers, analysts and data scientists for an opportunity to network, develop new skills and gain insight into the evolving field of data science.
Hear from industry and academic speakers representing a range of sectors, from research and bioinformatics to business and finance
Learn about the practical application and implementation of the latest tools, techniques to industry case-studies.
Share knowledge, pick up new ideas and connect with developers, analysts, researchers and executives.
The Data Science Summit’s are all about putting research into action. You can see how the latest techniques are implemented, network with other leaders and specialists in the field who make research actionable, and get insight on how you can help transform your company, teams and the way you work.
Sarah Curshen, Director of Executive Education Custom Programmes, Cambridge Judge Business School
Prof. Kenneth Benoit
Professor of Quantitative Social Research Methods, London School of Economics
Session: Quantitative Text Mining, the Social Scientific Way: Mining Social Media on Brexit
Dr. Sebastian Kaltwang and Brook Roberts
Machine Learning Engineer, FiveAI
Session: Overcoming the Data Bottleneck for Self-driving Cars
Cloud Developer Advocate, Google
Session: Google Cloud AutoML
Artificial Intelligence DevRel EMEA, Nvidia
Session: Artificial intelligence and the evolution of the computing platform
Dr Haitham Bou-Ammar
Head of Reinforcement Learning and Tuneable AI, Prowler
Session: Data-Efficient Reinforcement Learning
Dr Maksim Sipos
Session: Automated feature extraction and selection for challenging time-series prediction problems
Dr Jeremy Bradley
Lead Data Scientist, Royal Mail
Session: Data Science as a Transformative process
Dr Steven McDermott
Qualitative Analysis and Social Media Lead, HMRC
Session: AI as Moderator/Mediator in the Recognition of Citizen’s Voice with Social Media
- Value: This project is open to self-funded students worldwide.
UK and EU applicants are eligible for funding from the EPSRC NPIF scholarship.
- Number of awards: 1
- Deadline: 31 May 2018
Type of project
Competition funded PhD projects
Contact Dr Georgios Aivaliotis to discuss this project further informally.
HMRC collects a wealth of data regarding tax compliance of companies and individuals. Sometimes people and companies do not pay the correct amount of tax on time for a variety of reasons (e.g. lack of knowledge, lack of ability, evasion). The data collected are “big”, i.e. a high number of variables and many clients and are of both temporal (time stamped) as well as static nature.
The aim of this project will be to develop the necessary methodology that allows to extract information from the data and to apply machine learning and pattern mining alongside classical statistical techniques in order to predict which cases are most likely to result in non-compliance so that early action can be taken. Linking SME’s and HMRC data will be an additional possibility and challenge. As a follow-up, economic models will be developed that look into the cost of interventions and what actions are economically meaningful to ensure compliance.
The successful PhD candidate will work under the guidance of an academic as well as industrial (HMRC Digital Academy and Cambridge Spark) supervisor(s). HMRC and Cambridge Spark will provide expertise in the data, the possibility of working onsite and training. Cambridge Spark offers a variety of training, conferences and workshops in AI and data analytics methodology. HMRC Digital Academy runs a series of regular seminars and are investing in research in data analytics.
Applicants should have, or expect to obtain, a minimum of a UK upper second class honours degree in Mathematics or a related discipline, or equivalent. Applicants whose first language is not English must also meet the University’s English language requirements.
How to apply
Formal applications for research degree study should be made online through the university’s website. Please state clearly in the research information second that the PhD you wish to be considered for is ‘Predictive analytics for tax compliance’ as well as Dr Georgios Aivaliotis as your proposed supervisor.
If English is not your first language, you must provide evidence that you meet the University’s minimum English Language requirements.
We welcome scholarship applications from all suitably-qualified candidates, but UK black and minority ethnic (BME) researchers are currently under-represented in our Postgraduate Research community, and we would therefore particularly encourage applications from UK BME candidates. All scholarships will be awarded on the basis of merit.
by Dr Steven McDermott Qualitative Analysis and Social Media Lead, Digital Data Academy, Her Majesty’s Revenue and Customs, UK
Government departments are now utilising customer feedback channels and social media in an attempt to respond to crowdsourced insights and eventually informing policy. They are also using social media listening platforms to listen in to conversations taking place regarding their departments. They are also taking tentative steps into machine learning and AI techniques. The debates surrounding these tools have tended to frame such activity as surveillance and opening up the possibility of Armageddon with the rise of the machines. However, how can the voice of the citizen be recognised and responded to if these departments are discouraged from listening and using the latest tools? Does the utilisation of social media, machine learning and AI offer the potential means of escaping from the stranglehold of top–down, stage–managed politics. If millions of people could be the producers as well as receivers of political messages, could that invigorate democracy? And what role will machine learning and AI play in this emerging new media ecology? I intend to present a peak behind the curtain regarding the level of listening that is taking place and how machine learning and AI are being applied. Asking can this be done ethically and to enhance democratic processes and improve evidence based policy decisions. In which ways will democratic institutions have to change in order to meet these challenges?
Why a ‘Listening Organisation’?
Macnamara (2016) has issued a list of criteria for organisations wishing to adhere to the maxim of being a listening organisation. It is acknowledged within Her Majesty’s Revenue and Customs (HMRC) that it is some way short of meeting those criteria – despite pockets of good practice. Part of the strategy within HMRC is that by moving to digital and utilising advances in technology and software in particular that they will be a listening organisation. HMRC are trying to address the identified `crisis of listening’ within the organisation in the hope of regaining trust and re-engaging people whose voices are unheard or ignored. HMRC in doing so understands that urgent attention to organizational listening is essential for maintaining governance, democracy, organisational legitimacy, business sustainability, and social equity.
The department is attempting to use data, to procure software tools, implement processes and change the culture in order to act on the insights generated by the data. So there is an acknowledgement that the solution to the ‘crisis of listening’ is more than a technological one. A core aspect of overcoming this crisis is the implementation of ‘real-time’ listening capability. A key component of ‘real-time’ listening is the monitoring and response to social media interactions between HMRC representatives and citizens/customers of HMRC services. Coupled with the empowerment of staff to act on feedback at all levels in order for HMRC to be in place to be a world-class customer listening organisation. HMRC is exploring the capability of ‘data scientists’ and Machine Learning (ML) to develop less labour intensive practises of responding to customer feedback.
The discourse of ‘crisis of listening’ posited by Macnamara seems to be in stark contrast to the discourse of an emerging ‘surveillance capitalism’ and the rise of the machines. HMRC has pointed out that as well as technic focused solutions there is also the need to shift from a top-down, staged managed politics. On one side governments are invading citizen’s privacy by eavesdropping in on social media and at the same time governments are not listening enough to its citizens and failing to recognise or ignore the voice of the citizen. This polemic seems rather naïve and sensationalist. HMRC’s response is to side with Macnamara and build processes and systems that will enable the utilisation of big data; the empowerment of staff to respond to customer feedback. A key driver is the desire to use machine learning, and artificial intelligence to achieve this. However, listening in on social media platforms such as Twitter and Facebook in an attempt to read public sentiment – if it is not coupled with serious attempts “to connect representation to institutional work of speaking for, to and with the represented” (Coleman, 2017: 106) it is state surveillance.
The problem with implementing a ‘big data’ solution is not a cultural of resistance to change but of knowledge. The limitation of introducing big data analytics and data scientists with machine learning is that humans need to make judgements on what is generated. The judgements require human interpretation of the results and visualisations generated by the algorithms. According to Floridi (2012) the problem with big data is epistemological not cultural one.
The material presented here will assess who are the self-declared experts and organisations that are lying claim to such expertise – “and […] claiming the power to authorize what constituted acceptable knowledge in specific fields and what [does] not” (Robertson and Travaglia, 2016). The contention being aimed at big data practitioners and democratic institutions – and the methodological approaches that they practice are that they have the potential to undermine freewill/autonomy. The goal of big data analytics is to change people’s behaviour at scale. A Chief Data Scientist of a Silicon Valley company that develops applications to improve students’ learning states that…
“The goal of everything we do is to change people’s actual behaviour at scale. When people use our app, we can capture their behaviours, identify good and bad behaviours, and develop ways to reward the good and punish the bad (emphasis added). We can test how actionable our cues are for them and how profitable for us.” (Zuboff, 2016)
Intending to “punish the bad” – is a disciplinary rather than a control mechanism. This attitude contradicts at least two principles of ethical research and potentially a third with wider social and cultural ramifications. Central principles of social science research are that the subjects are afforded autonomy, beneficence and justice (Childress, Meslin and Shapiro, 2005). Individuals are to be treated as autonomous agents – this normally results in informed consent being sought before publication (a limited interpretation of autonomy – and one that needs to be addressed again); the researcher is to minimise harm and relate it to the potential benefits of the study; and finally the benefits are to be distributed in a just manner and no undue denial of such benefits is to be imposed on any member. Such goals as outlined by the data scientist are indicative of at least an individual – potentially a discipline that is devoid of ethical training.
Social media platforms and the people who use them are not a representative sample of the population as a whole – they are self-selected at best and possibly the already vocal within online debates and in wider society as a whole. The platforms: Twitter, Facebook and Instagram are data brokers in the first instance and harvest and sell user data on to third parties usually for the better targeting of advertisement. These social media data brokerage firms are the visible vanguard of surveillance capitalism. The option to choose not take an active role on these platforms is possible but severely curtailed as it comes with a sense opting out of the contemporary world (Cegłowski, 2016).
From the early days of the internet it was viewed as a potential way of escaping the top-down heavily managed political performances. Suddenly anyone could be a producer as well as a receiver of political messages. For Coleman (2017) governments and global institutions have failed to democratise their ways of operating. The opportunity to reinvent and re-strengthen democracy for the 21st Century has been missed.
Coleman’s view that what needs to change is not the technology or the culture but the political architecture upon which democracy rests. On the one hand we have the technologically focused drive with a top down structure. What is needed is a reorienting of the structures versus the human agent based on a cultural turn required to meet the crisis of listening.
Two approaches – one dominated by large macro level structuralist understanding of human behaviour and another that allows for the micro events and personal understandings of agents to also be influencing changes in the environment. These disciplines are not clearly delineated – there are those within the Data Science discipline who are prepared to acknowledge the utility of human interpretation of data over algorithmic accounts. A principal Data Scientist at @BoozAllen, recently stated that only using computer algorithms for visualisation…
“[…] can miss salient (explanatory) features of the data [therefore] a data analytics approach that combines the best of both worlds (machine algorithms and human perception) will enable efficient and effective exploration of large high-dimensional data”.
Data Science generates crude quantitative knowledge, or “calculated publics” (Gillespie, 2014). It also creates crude calculated customer/citizen types that are reductive. What is needed is an acknowledgment of the limitations of the quantitative approach – opening the door to the possibility of a cultural – acknowledgment of the qualitative approaches.
Such calculated citizen types are devoid of individual citizen responses to political, cultural and social intervention and wilfully disrespectful to the autonomy of the people involved and the dynamics of state and citizen interaction. There are calls within data science practitioners for a shift to include more social scientific approaches. It is also without notions of geographic location, postcode, gender, age, class or social status. It is a classification of people and groups without any reference to work from the social sciences.
There is a core narrative running through the departments dominant narrative of making tax digital. It is the idea that digital online self-service applications and websites will somehow do away with the expensive telephone capabilities within the department. It is founded on the same march of technology story that has pre-dated most shifts in the uptake of the most recent piece of technology. Radio was going to replace newspapers; Television was going to replace radio; and that computers were going to replace paper. What happens is not one medium replacing another but that content or processes move and continue to be reliant on the others. A lot of transactions between HMRC and their ‘customers’ can be facilitated by the move to digital platforms but the telephone will – or voice to voice interaction between two people – will be required. Whether that is human to human or human to chatbot is to be seen. What we are witnessing is an evolution of the media ecology. Government departments are enthralled by the prospects of moving the cost of interactions from the department to the consumer – jumping on the ‘home manufacturing’ or ‘self-service’ bandwagon.
The promotion of self-quantifying applications for governance purposes is the ongoing increase in “home manufacturing” (Lambert, 2015: 251-252). To the growing list of unpaid labour via the self-service petrol station, the self-checkout machine at the supermarket, check-in machines at the airport, ticket vending machines at the tube, train and bus stations, ATMs at the bank, self-service fast food restaurants add self-help applications that facilitate and monitor a citizen’s governance. Governance apps and the mechanisms that facilitate them are another way collecting data on people.
Big data analytics is built on a myth that tries to hide the reality of a situation that pairing human behaviour with technological innovation results in surplus behavioural value. The business world needs to convince the social that what they trade in – data – is worthless.
The real problems with big data are not quantity or quality of the data but rather one of epistemology (Floridi, 2012); and another problem (according to a report published by the in 2016 which surveyed 448 senior executives and professionals based in the United States on the current state of marketing and sales analytics from pharmaceuticals, medical devises, IT and telecoms) is the lack of impact of big data analytics. The impact is described as modest at best.
What follows here is the presentation of the tools of big data process monitoring that are being used to listen to the voice of citizens in relation to HMRC in the United Kingdom. The methodologies that are to be applied here will ultimately shape the insights gained. The tools applied to this digital context will be digital and such an approach will render digital insights; an issue discussed at length by others (Baym, 2013; Boyd and Crawford, 2012; Clough et al., 2015; Gitelman and Jackson, 2013; Kitchin and Lauriault, 2014; Kitchin, 2014; Manovich, 2011; Van Dijck, 2014). Once the digital methods, tools and interpretations have been presented the material will then move to a more human centred analysis. Rather than move to the application of qualitative small scale methods the intention is to place the issues and insights within wider political economic and communicative interpretations of what is going on with big data and governance.
Social Media Analytics in Practice
A link to the slides presented on 8th March 2018 - to “Answering Social Science Questions with Social Media Data” conference hosted by NSMNSS. With the application of data mining techniques of a social media analytics software called Brandwatch and the visualisation tool Vizia; social media platform (Twitter); and two forms of analysis; social network analysis and content analysis the goal is to re-present the techniques and tools of data scientists. The tools here will also include machine learning and automated approaches. HMRC Listening Organisation
Data is never raw and is always the output of an algorithm that requires validation. ‘Data’ is the result of a long chain of requirements, goals and (in the case of big data) wider political economy. Without context and meaning the data becomes fetishized. The ‘insights’ are at the macro level – devoid of context and therefore meaning and categorised into certain ‘calculated publics’. It is advisable that citizens and the public have a level of knowledge of how these calculations are performed in order to aid is navigating their outcomes particularly in relation to our governance and the governance of the wider public. The algorithmic black box needs to be unpacked and the assemblages of control that reside within these instrument need to be displayed and debated.
- What questions can be answered using social media analytics in a governance capacity?
- How does an organisation show that it is listening to its customers?
- What role can Machine Learning and AI play in the interface between state and citizen?
- Are epistemological and ethical concerns playing their role in the uptake of Machine Learning, AI and algorithms?
- What impact is the introduction GDPR (General Data Protection Regulation) likely to have?
The methodological contention here is that big data does not represent what we think it represents. It is not representing the social structure or patterns of interaction at a macro scale. The data presented here is being presented in a way that is hopefully worthy of our consideration (Robertson and Travaglia, 2014). The data presented here is not objective – it does not represent how wider discourse surrounding policies is conducted on other platforms and in face to face interaction. What the approach applied here can tell us is what organisations; and who are ‘influencing’ the debates about Big data and Governance on Twitter. Hopefully it is also clear that what is being presented here is a peak behind the curtains, a look under the hood, and shining a light into the black box of big data analytics, and data-mining. It will in some small way enable us to see how societies are to be regulated if left to the practices and procedures of data scientists. It is looking at a sphere of the social as seen through the prism of datafication and provides insight into the references and meanings that are being constructed; it is not only a glimpse of a limited percentage of the population of data scientists and big data and governance analytics. It is also a glimpse of the various ways in which they intend to define, manage and govern us; capture our behaviours, identify good and bad behaviours, and develop ways to reward the good and punish the bad among us.
Big data and governance not only have problems regarding matters such as privacy, ownership and a lack of tangible results (so far) it also has one that is less to do with quantity and quality of data or even the technicalities surrounding it. It has an epistemological problem (Floridi, 2012). It doesn’t have a clearly distinct set of criteria that needs to be met in order for it to be able to make assertions that are not simply statements of belief. Big data lacks a theory of knowledge. Some claim to have found such a theory – Pentland (2014) argues that they have created a true social physics.
Events – Archive of presentations and papers
We encourage speakers at SRA events to let us make their presentations available online: the archive below has been gathered from events going back to 2005. It includes our annual conferences, summer events, social media conferences and other workshops and seminars.