Social media data is of great value to researchers. What people say on platforms such as Twitter can tell us a great deal about society’s attitudes. It can also allow us to capture reactions to political events in the moment. However, since the Cambridge Analytica scandal – which involved data from millions of Facebook users being mined and used for political ads to sway the 2016 US presidential election – people are more sceptical of how their data is used.
For the last two years the National Centre for Social Research (NatCen) has hosted an event at Twitter UK as part of the Festival of Social Science, run by the Economic and Social Research Council (ESRC).
This year the event will focus on how social media data can be used in research that benefits society, as well as what ethical considerations there are in this field. It will also look at how things have changed since last year in this incredibly fast-moving and topical field.
- Welcome: Joe Rice, Data & Enterprise Solutions, EMEA, Twitter UK
- Introduction from the chair: Gerry Nicolaas, Director of Methods, National Centre for Social Research
- Talk: Kenneth Cukier, Senior Editor at The Economist – How social media data allows us to see different
- Talk: Cath Sleeman, Quantitative Research Fellow at Nesta – The state of interactive data visualisation
- Talk: Walid Magdy, Assistant Lecturer at the University of Edinburgh School of Informatics and Faculty Fellow at the Alan Turing Institute – Assessing online behaviour to measure the existence of prejudice and social inclusion
- Talk: Dr Elena Martellozzo, Criminologist and Senior Lecturer at Middlesex University – How social media is shaping criminal research
- Talk: Dr Steven McDermott, Data Scientist at HMRC/Member of Government Social Research Group (GSR) – The impact of social media research on public policy
- Audience Q&A
An extract from “The role of informal networks in creating knowledge among health-care managers: a prospective case study” https://www.journalslibrary.nihr.ac.uk/hsdr/hsdr02120/#/full-report
Leximancer is computer software that conducts quantitative content analysis using a machine learning technique. It learns what the main concepts are in a text and how they relate to each other. It conducts a thematic analysis and a relational (or semantic) analysis of the interview data. Leximancer provides word frequency counts and co-occurrence counts of concepts present in the transcripts of the narrative interviews. It is:
[A] Method for transforming lexical co-occurrence information from natural language into semantic patterns in an unsupervised manner. It employs two stages of co-occurrence information extraction—semantic and relational—using a different algorithm for each stage. The algorithms used are statistical, but they employ nonlinear dynamics and machine learning.
Smith and Humphreys, p. 2686Once a concept has been identified by the machine learning process, Leximancer then creates a thesaurus of words that are associated with that concept giving the ‘concept its semantic or definitional content’.87
We are made aware of the larger context of all the narrative interviews of the cluster and the prominence of certain concepts. It ensures that we do not become fixated on some concepts to the detriment of others. Leximancer uses a combination of techniques such as Bayesian statistics that record the occurrence of a word and connects it to the occurrence of a series of other words. It then quantifies those outputs by coding the segments of text, from one sentence to groups of sentences. As the data set presented here is relatively small, we are looking at the data sentence by sentence. Each word or concept is associated with a subset of related terms. The next step involves the machine learning from the concepts already uncovered and linked to other concepts creating a ‘concept space’. It then iteratively creates a thesaurus around a group of seed concepts. This information is visualised using network analysis.
Emergent themes are then visible to the user, and are expandable using the map visualisation that links directly to the areas of the data in which the concept occurs. The themes map enables a quick reading of the narrative interviews. It lets us see what the dominant themes are, rather than imposing our own interpretations on the data. The proximity of two concepts indicates how often or not they appear in similar conceptual contexts. So, when two concepts are placed at a distance from each other, it indicates that they are not used in the same context. The themes are the coloured circles around clusters of concepts. The lines or pathways navigate the most likely path in conceptual space between concepts in order to aid reading the map. The connectivity score reflects the degree (equivalent to degree score in network analysis) to which the theme is connected to the other concepts in the map.
Re-presenting narrative interviews
We focus here on results from one of our sites, site 1, to illustrate our methods. A thematic analysis looking at the ranked ordering of the concept list was created and then a thesaurus for each concept was collected. The thesaurus list for each concept, presented in Table 26, shows the most strongly connected – either directly or indirectly – related words to the concept they are defining.87
At site 1, of the top 20 most important concepts, two – smoke/smoking and tobacco – are the focus of the cluster selected from the sociograms for the ‘Goes To’ network in round 2. This is the specific problem identified by the cluster. They are more generally concerned with health of people; however, they have focused on smoking as the main hindrance to achieving public health. As the cluster is involved in public health, this is not surprising. The focus is on smoking.
The clusters’ values and preferences are related to the urgency with which a working solution is required and, looking at the concept of time in Table 26 with an absolute count of 67 and a relative count of 20%, it is present in the clusters cognition. The source of that urgency is thematically related to meetings, issue, year, local, working, person, services, public and support.
The value present in the cluster is that of public or more specifically public health. There is a level of uncertainty surrounding the problem of smoking as the suppose (absolute count of 99, relative count of 29%) and probably (absolute count of 96, relative count of 28%) concepts are prominent for this cluster, with obviously (absolute count of 91, relative count of 27%) less prominent.
TABLE 26 Site 1: top 20 ranked concepts
|Top 20 word-like concepts||Absolute count||Relative count||Thesaurus|
|People||337||100%||smoke, probably, working, trying, services, time, talk, group, prevalence, suppose|
|Smoking||273||81%||prevalence, service, services, smoke, working, somebody, talk, team, meeting, support|
|Health||243||72%||public, management, issue, support, look, team, services, smoke, service, coming|
|Work||200||59%||tobacco, public, trying, health, smoke, time, things, issue, probably, doing|
|Service||159||47%||spec, management, provider, year, smoking, look, working, public, health, doing|
|Cause||156||46%||stuff, smoke, management, things, involved, meetings, level, saying, work, talk|
|Public||126||37%||health, management, working, probably, team, smoke, year, issue, service, different|
|Things||121||36%||tobacco, different, issue, prevalence, look, cause, support, suppose, meetings, trying|
|Doing||121||36%||spec, somebody, year, stuff, look, service, probably, provider, work, used|
|Suppose||99||29%||trying, services, prevalence, smoke, talk, issue, things, money, meetings, probably|
|Probably||96||28%||thought, public, person, services, talk, people, coming, management, doing, trying|
|Group||96||28%||tobacco, person, trying, suppose, people, team, public, local, terms, support|
|Different||93||28%||working, things, support, role, tobacco, look, public, meetings, level, service|
|Team||91||27%||support, public, health, smoking, level, involved, services, different, group, thought|
|Obviously||91||27%||management, provider, spec, prevalence, smoking, services, public, local, involved, role|
|Terms||85||25%||support, level, look, person, different, probably, tobacco, stuff, coming, work|
|Services||81||24%||smoking, probably, suppose, management, health, person, time, prevalence, team, tobacco|
|Tobacco||77||23%||things, group, coming, different, trying, work, smoke, issue, services, terms|
|Time||67||20%||used, meetings, issue, year, local, working, person, services, public, support|
|Look||66||20%||prevalence, spec, provider, role, things, health, service, different, doing, terms|
|Name-like||Absolute count||Relative count||Thesaurus|
|Tobacco Alliance||45||13%||person, prevalence, group, probably, meeting, tobacco, suppose, look, time, smoke|
|Site 1||40||12%||prevalence, trying, provider, coming, team, year, group, public, tobacco, services|
|PCT||40||12%||year, role, money, doing, different, cause, time, prevalence, provider, people|
The range of activities which can be used to share and exchange knowledge related to the specific problem of smoking are within the cluster. These activities are situated within the context that the cluster is in. After a 20-year period of market-inspired organisational reform (managerialism, or the New Public Management), concepts such as, service (absolute count of 159 and relative count 47%) and public (absolute count of 126 and relative count of 37%) could be indicative of a social policy-orientated outlook rather than a managerial one. The thesaurus list does contain the more market inspired concept of management. However, management does appear, with a low absolute count of 30 and a relative count of 0.9%.
It appears that the cluster is social policy orientated. However, this is not unambiguously so. The concept of service is related to spec, management, provider, year, smoking, look, working, public, health and doing. The concept of public is related to health, management, working, probably, team, smoke, year, issue, service and different. So, the concept of service is orientated towards management rather more so than public, health, and public relates to health, and management more than service. What this highlights is the level of ambiguity around the concept of management for the cluster.
How the concepts are semantically contextualised can be seen in the Leximancer concept map below (Figure 30). The map is a re-presentation of the relational or semantic characteristics of the concepts presented in Table 1. To paraphrase Rooney,87 the direct co-occurrence between concepts is extracted from the data and these direct links are based on the strength of relations between the concepts. The more often two concepts appear together in the same sentence the more likely they are to be linked together. Leximancer then compares each concepts thesaurus and creates indirect links between them, meaning that even when concepts do not appear in the same sentence together there can still be an indirect connection between them. So, Leximancer rank orders concepts and presents them according to the strength of association and semantic similarity. So,
Concepts that are directly related but not necessarily strongly semantically linked can be far apart on the concept map while concepts that are strongly semantically related will be close to each other on the concept map . . . concepts that occur in similar semantic contexts tend to form clusters (or gather together).
Rooney, p. 41087
FIGURE 30 Leximancer map of the 11 site 1 narrative interviews combined. General concepts are in black with themes in colour.
The coloured circles indicate the thematic space of a theme with the label of that theme at the centre. The words in black are the concepts and the lines between are links that tell us which concepts are semantically linked. When two or more circles overlap it indicates that the themes are semantically related to each other.
Figure 31 shows for the cluster the most dominant theme is SMOKING, followed by PEOPLE, CAUSE, TIME, SAYING, OBVIOUSLY, INVOLVED, GROUP, TERMS, LEVEL, TALK, TOBACCO, ALLIANCE, MONEY, THOUGHT and USED. The proximity of SMOKING, PEOPLE, CAUSE, TIME, OBVIOUSLY and INVOLVED themes indicate that they are related to each other in a chain-like manner. GROUP, TERMS, LEVEL, TALK, TOBACCO, ALLIANCE, MONEY, THOUGHT and USED are semantically isolated. The name like concepts of Tobacco Alliance and PCT are not directly connected to the dominant theme of SMOKING, while site 1 is within the SMOKING theme. Therefore, site 1 resides in the same semantic space and is connected to SMOKING, while Tobacco Alliance and PCT are not.
Focusing on the theme of SMOKING (see Figure 31), it is associated and linked with smoking, health, services, prevalence, support, spec, site 1, different and look. However, smoking is not linked directly but indirectly to management, team or working. SMOKING is also semantically associated with management, as the two themes overlap slightly.
FIGURE 31 Close-up of the SMOKING theme from site 1.
We tell more than we realise we know
Taking Polanyi’s concept of ‘knowing more than we can tell’, we can reformulate it to read ‘we tell more than we realise we know’, to paraphrase Zappavigna (p. 298).88
This is the position that speakers express what they tacitly know through grammatical patterns without being aware they are doing so. By carefully analysing the grammatical patterns of ‘under-representations’88 in texts, we can bring to the fore the tacit knowledge assertions of our interviewees.
The next phase of the analysis of the interview texts makes explicit that which is tacit by looking at the function of the grammatical choices the interviewees are making.
Systemic Functional Linguistics (SFL) is an analytical method from Halliday89 which is concerned with grammar’s functionality or, rather, how it creates and expresses meaning. It regards grammar as a system of explaining things by referring to other things. Each system of the interconnected words construct the ‘meaning potential’ shaped by the semantic choices being made and the activity in the brain. SFL’s position is that when text is analysed it brings to the fore the meaningful choices made at the expense of the choices that were not made. This analysis goes beyond the usual procedures employed by others who typically look at scenarios90,91 and narratives92 and then deliver a running commentary on the text.
The functionality in language is central to language or rather the function of language is to convey experience and to generate interaction with others. With the construction of experience and interaction needing cohesion and continuity of text, a second function of language emerges – that experience and interaction require text. According to SFL, language has three ‘metafunctions’, ideational, interpersonal and textual, with the term ‘metafunction’ being used to ensure that function is regarded as an integral component of the interaction of the three terms.
The texts collected during the narrative interviews are ways of being that allows the relationship between the text and persons involved to bring in the ‘below-view patterning in language’.88 SFL allows us to bring out the ‘tacit assumptions and ideological assumptions’ that characterise certain domains of discourse. This corresponds with Halliday’s interpersonal function of language. So when analysing the texts this accounts for social practices that are being realised in the texts.
Language is an abstract social structure that defines what is and is not possible. Orders of discourse are linguistic practices that select which linguistic elements are included and excluded, and texts or social events are the products of the mediation by orders of discourse. Focusing on analysing the use of nominalisations, modality, generalisation and agencyin what people commit themselves to when they make statements, ask questions, make demands or offers in texts we are able to categorise the tacit knowledge assertions that are being made during the narrative interviews.88
The following descriptions are taken from Zappavigna.88 Her approach is also based on the ideas of Nonaka and Takeuchi93 – that middle managers are knowledge engineers. When they are involved in creating mid-level business and product concepts they mediate between ‘what is’ (epistemic modality; is, are, was, were . . .) and ‘what should be’ (deontic modality; should, would, will, ought to be, can . . .). They remake reality, or engineer new knowledge assertions, according to the ideas they have received from meetings and documents from more senior or external inputs. ‘They facilitate all four varieties of knowledge conversion and engineer knowledge spirals between organisational levels (cross-levelling). Their essential skills are in project coordination, formulating hypotheses, integrative methodologies, facilitating dialogue, use of metaphor, ability to engender trust, and ability to envision the future based on an understanding of the past’ (Nonaka and Takeuchi).93
According to Zappavigna,88 these attempts at project co-ordination, formulating hypotheses, integrative methodologies, facilitating dialogue, use of metaphor, ability to engender trust, and ability to envision the future based on an understanding of the past are evident in the choices they make when talking about what they do. Analysing the specific words they use can highlight for us when they are facilitating knowledge.
Zappavigna argues that ‘the central linguistic process of tacit knowing is ‘under-representation’. The under-representation of meaning is how tacit knowledge is indicated in language.’88
The use of nominalisation in speech indicates an ongoing project. By looking at when the interviewees use nominalisation, we are seeing where the interviewee is presenting an ambiguous or unambiguous relationship with the statement they are making.
When they refer to processes as things such as ‘health improvement’, which is an ongoing project that they are co-ordinating, they in fact see is it as a project and they refer to it as an entity in its own right. The meaning of ‘a person’s need to do something’ (i.e. improve health) has become condensed with the use of ‘ment’ in improvement.
Nominalisations are demarcated by the use of suffixes (able, ad, age, agogy, al, ality, ative, ment, to name only a few) which are placed at the end of words.
Processes become things that act on other processes as things, then this relation of ‘acting upon’ itself becomes a thing. The unfolding of activity sequences are finally re-expressed as parts of composition taxonomies, as criteria for classifying the abstract entities they modify. Instead of a sensually experienced world of unfolding processes involving actual people, things, places and qualities, reality comes to be experienced virtually as a generalised structure of abstractions.
Rose, pp. 263–494
The use of modality in speech indicates the formulating of hypotheses [is, are, were] – ability to envision the future [should, would, will].
Examples of modality are can, could, should, would, might, must and probably (this list is not exhaustive). They are an indicator of the level of certainty or uncertainty that the speaker has in regard to the assertion being made.
Modality contains meaning by embedding the agent motivating the opinion expressed. The use of modality in text under-represents agency or cause. For example, an IT professional might say ‘I should reassess this requirement’.
The use of the modal verb should is masking the ‘who’ or ‘what’ motivating the process of reassessing. It could be a command from a senior and not from the interviewee.
Rather than saying that something is a fact, speakers make generalisations in order to sound less direct and allow for uncertainty in the statement that they are making. Generalisations indicate to us the cognitive process and contents of the statement.
Generalisation contains meaning through underspecifying a concept and pattern. Examples of words that demarcate generalisations are some, a bit, a few, any, part of, complete, entire, none, no one, nothing and zero (again to name only a few). The generalisation usually follows these words.
General terms are not necessarily more abstract; a bird is no more abstract than a pigeon. But some words have referents that are purely abstract – words like cost and clue and habit and strange; they are construing some aspect of our experience, but there is no concrete thing or process with which they can be identified.
Halliday and Matthiessen, p. 61595Generalisation underspecifies meaning and highlights assumptions; examples are system and programme.
Within Leximancer, there is a pre-set ability to conduct sentiment analysis. Simply put, sentiment analysis measures the attitude of a speaker or writer towards a concept, whether they express something positively or negatively. In order to conduct cognitive analysis, we have combined sentiment, nominalisation, generalisation and modality. By doing so we focus on what the interviewee holds to be pre-supposed or tacit knowledge, thereby enabling us to answer two questions: what do they know, and what do they not know?
Cognitive analysis using sentiment analysis settings
What types of knowledge is the cluster concerned with? Taking each concept as highlighted by Leximancer and extracting the complete thesaurus of all words related to that concept by Leximancer, we then count the number of uses of nominalisation, modality, generalisation and agency in relation to each concept (see Table 27).
It is clear from Table 27 and Figure 32 that the cluster is predominantly involved in the use of nominalisations; this indicates ongoing projects being perceived as entities in their own right rather than processes. What Table 27 and Figure 32 do not tell us, however, is whether the cluster perceives these projects are ongoing or finished, whether they are making claims with epistemic certainty or uncertainty and whether there assertions are based on assumptions or ‘fact’.
TABLE 27 Site 1: top 20 ranked concepts by types of knowledge
|Concept||Nominalisation: project coordination||Modality: formulating hypotheses||Generalisations||Agency|
|Name-like||Nominalisation: project coordination||Modality: formulating hypotheses||Generalisations||Agency: power|
FIGURE 32 Nominalisation, modality and generalisation frequency for each concept of site 1 interviews.
What follows is an automated report generated by limiting the number of concepts to 23 listed in Figure 30, above, plus 2 GPs, and Public Health, as they were highlighted by Leximancer as potential names. The categories of interest are the interviewee data files. So, what we get is an analysis of each interviewee’s use of the top concepts for the cluster.
As well as that, the technology within Leximancer that analyses positive and negative sentiment has been altered to include categorisation of terms that indicate nominalisation, generalisation and modality. The results are presented in a high-level, visual chart displayed in a ‘magic quadrant’ format. The axes are relative frequency, which is a measure of the conditional probability of the concept given the categories of Sentiment, Nominalisation, Generalisation and Modality (cognitive analysis – positive or negative). We are looking at the occurrence of positive or negative words when ‘health’ is mentioned. The axes labelled ‘strength’ is a measure of the conditional probability of the category cognitive analysis – positive or negative given the particular concept (e.g. how often is ‘service’ mentioned with positive or negative cognition?).
There are four areas to the quadrant, and the different colours of concepts refer to different interviewees’ accounts. Concepts in quadrant one (bottom left) are weak and less prevalent within the interviewee’s data – this is where negative Sentiment, Nominalisation, Generalisation and Modality manifest. Concepts in quadrant four (top right) are strong, prominent and more likely to co-occur with the category. This is where positive Sentiment, Nominalisation, Generalisation and Modality sit.
Figure 33 indicates a low frequency for the majority of concepts except for terms and obviously and these are both from one interviewee. A majority of the concepts are also viewed negatively on the negative cognition scale.
FIGURE 33 Cognitive analysis quadrant of top 20 concepts: frequency and strength results for site 1.
When the data from Figure 33 are compared with the cognition scale frequency and strength results of the cluster, this generates Figure 34 (presented below). The concepts cause, service, health, smoking, people and work are viewed moderately positively on the cognition scale. They have also scored highly for cognition scale for each concept of site interviews in Figure 33.
FIGURE 34 Cognitive analysis quadrant of top 20 concepts: frequency and strength compared with cluster results for site 1.
The most striking aspect of Figure 34, which shows all interviews combined as well as the individual interviews, is that the Tobacco Alliance, which has a high frequency score, also has a negative or weak cognition score, meaning that the concept Tobacco Alliance is used in a manner that indicates that the cluster does not know what the Tobacco Alliance is, or what it intends to do. Work, cause, smoking, people, health and service are all within the positive quadrant of the scale, indicating that these terms are used positively and that the cluster knows what these things are. For the cluster, the concepts public, things, doing, suppose, probably, group, different, team, obviously, terms, tobacco, time, look, Tobacco Alliance, site 1 and PCTs fall into the negative, high-frequency quadrant.
Site 1 documents results
Figure 35 shows that for the cluster the most dominant theme is TOBACCO, followed by SMOKING, LOCAL, SUPPORT, GROUPS, SMOKEFREE, SCHOOL, ENSURE, SMOKING, YEAR, TOBACCO, CIGARETTES and PROJECT. The proximity of TOBACCO, LOCAL and GROUPS are overlapping. This indicates that they are related to each other in a chain-like manner. YEAR, SCHOOL and PROJECT are semantically isolated. The concepts of council, control, products, public, communities and inequalities are directly connected to the dominant theme of TOBACCO (Figure 36). The theme of SCHOOL is semantically isolated from the dominant theme of TOBACCO.
FIGURE 35 Leximancer default positions map of the documentation for site 1. General concepts are in black with themes in colour.
FIGURE 36 Close-up of the TOBACCO theme from documents for site 1.
TABLE 28 Site 1 documents top 20 ranked concepts
|Top 20 word-like concepts||Count||Relevance||Thesaurus|
|smoking||797||100%||prevalence, quit, risk, children, smokers, likely, reduce, social, groups, service|
|tobacco||774||97%||products, control, illicit, councils, use, communities, key, public, local, reduces|
|control||650||82%||councils, illicit, products, tobacco, use, key, communities, reduce, national, programme|
|inequalities||458||57%||health, councils, public, use, approach, communities, reduce, control, services national|
|health||446||56%||inequalities, councils, public, reduce, use, approach, communities, services control, social|
|people||405||51%||young, children, social, likely, smokers, quit, groups, products, smoke, use|
|young||393||49%||people, children, social, likely, smokers, quit, groups, smoke, products, group|
|local||327||41%||services, effective, national, areas, communities, approach, public, partnership, community, use|
|smokers||267||34%||quit, likely, groups, communities, services, impact, cigarettes, year, range prevalence|
|illicit||253||32%||products, programme, control, tobacco, partnership, reduce, working, impact, communities, key|
|smoke||247||31%||children, risk, likely, smoke-free, legislation, cigarettes, smokers, people, year, young|
|support||229||29%||services, effective, local, staff, areas, quit, legislation, ensure, national, research|
|use||177||22%||reduce, social, communities, impact, national, range, products, areas, interventions, councils|
|work||158||20%||partnership, legislation, effective, national, working, public, local, including, programme, reduce|
|communities||157||20%||key, approach, councils, partnership, public, effective, social, use, local, reduce|
|groups||157||20%||social, likely, smokers, key, range, group, services, areas, research, communities|
|school||151||19%||policy, staff, smokefree, ensure, legislation, including, support, community, children, smoking|
|public||149||19%||approach, legislation, communities, inequalities, health, working, reduce, local, partnership, work|
|councils||149||19%||inequalities, key, communities, health, approach, control, use, tobacco, services, range|
|prevalence||136||17%||reduce, areas, national, smoking groups, smokers, further, year, services, likely|
|England||123||15%||year, public, prevalence, reduce, young, people, national, children, control, research|
|Smoking||86||11%||risk, prevalence, groups, social, smoking, interventions, further, including, year, smokers|
|R&M||71||9%||groups, smokers, communities, group, key, likely, impact, social, councils, quit|
|Tobacco||56||7%||products, control, use, tobacco, smoke, cigarettes, public, social, people, young|
Note on R&M (‘routine and manual’) smokers
The term ‘routine and manual’ (R&M) is widely used by NHS partners, but is less commonly used by councils where deprivation and geographical classifications take precedence over occupational classifications. R&M smokers are defined by their occupation according to the Standard Occupational Classification (SOC) codes where jobs are classified by their skill level and skill content. The SOC codes for R&M groups include occupations such as lower supervisory and technical or routine and semi-routine occupations. While R&M smokers are defined by their occupation, most non-employed people (the unemployed, the retired, those looking after a home, those on government employment or training schemes, the sick, and people with disabilities) are classified according to their last main job. This means that many individuals who fall into the R&M category are not employed in R&M occupations. This qualification is important, particularly in the context of the current economic climate, with increased unemployment levels and worklessness being a key priority for many councils.
Comparison of site 1 documents against cluster (Figure 37)
FIGURE 37 Cognitive analysis quadrant of top 20 concepts: frequency and strength results for site 1 with comparison with all interviewee data for the same site.
We hosted our Cambridge Artificial Intelligence Summit, sponsored by Cambridge Judge Business School Executive Education, on 15–16 June, welcoming Analysts, Data Scientists and Researchers to network, develop new skills and gain insight into the evolving field of Data Science.
Dr Steven McDermott
Qualitative Analysis and Social Media Lead, HMRC
Session: AI as Moderator/Mediator in the Recognition of Citizen’s Voice with Social Media at the Cambridge AI Summit in June 2018.
Abstract: Government departments are now utilising customer feedback channels and social media in an attempt to respond to crowdsourced insights and eventually informing policy. They are also using social media listening platforms to listen in to conversations taking place regarding their departments. They are also taking tentative steps into machine learning and AI techniques. The debates surrounding these tools have tended to frame such activity as surveillance and opening up the possibility of Armageddon with the rise of the machines. However, how can the voice of the citizen be recognised and responded to if these departments are discouraged from listening and using the latest tools? Does the utilisation of social media, machine learning and AI offer the potential means of escaping from the stranglehold of top–down, stage–managed politics. If millions of people could be the producers as well as receivers of political messages, could that invigorate democracy? And what role will machine learning and AI play in this emerging new media ecology? I intend to present a peak behind the curtain regarding the level of listening that is taking place and how machine learning and AI are being applied. Asking can this be done ethically and to enhance democratic processes and improve evidence based policy decisions. In which ways will democratic institutions have to change in order to meet these challenges?
Join our community of hundreds of researchers, analysts and data scientists for an opportunity to network, develop new skills and gain insight into the evolving field of data science.
Hear from industry and academic speakers representing a range of sectors, from research and bioinformatics to business and finance
Learn about the practical application and implementation of the latest tools, techniques to industry case-studies.
Share knowledge, pick up new ideas and connect with developers, analysts, researchers and executives.
The Data Science Summit’s are all about putting research into action. You can see how the latest techniques are implemented, network with other leaders and specialists in the field who make research actionable, and get insight on how you can help transform your company, teams and the way you work.
Sarah Curshen, Director of Executive Education Custom Programmes, Cambridge Judge Business School
Prof. Kenneth Benoit
Professor of Quantitative Social Research Methods, London School of Economics
Session: Quantitative Text Mining, the Social Scientific Way: Mining Social Media on Brexit
Dr. Sebastian Kaltwang and Brook Roberts
Machine Learning Engineer, FiveAI
Session: Overcoming the Data Bottleneck for Self-driving Cars
Cloud Developer Advocate, Google
Session: Google Cloud AutoML
Artificial Intelligence DevRel EMEA, Nvidia
Session: Artificial intelligence and the evolution of the computing platform
Dr Haitham Bou-Ammar
Head of Reinforcement Learning and Tuneable AI, Prowler
Session: Data-Efficient Reinforcement Learning
Dr Maksim Sipos
Session: Automated feature extraction and selection for challenging time-series prediction problems
Dr Jeremy Bradley
Lead Data Scientist, Royal Mail
Session: Data Science as a Transformative process
Dr Steven McDermott
Qualitative Analysis and Social Media Lead, HMRC
Session: AI as Moderator/Mediator in the Recognition of Citizen’s Voice with Social Media
- Value: This project is open to self-funded students worldwide.
UK and EU applicants are eligible for funding from the EPSRC NPIF scholarship.
- Number of awards: 1
- Deadline: 31 May 2018
Type of project
Competition funded PhD projects
Contact Dr Georgios Aivaliotis to discuss this project further informally.
HMRC collects a wealth of data regarding tax compliance of companies and individuals. Sometimes people and companies do not pay the correct amount of tax on time for a variety of reasons (e.g. lack of knowledge, lack of ability, evasion). The data collected are “big”, i.e. a high number of variables and many clients and are of both temporal (time stamped) as well as static nature.
The aim of this project will be to develop the necessary methodology that allows to extract information from the data and to apply machine learning and pattern mining alongside classical statistical techniques in order to predict which cases are most likely to result in non-compliance so that early action can be taken. Linking SME’s and HMRC data will be an additional possibility and challenge. As a follow-up, economic models will be developed that look into the cost of interventions and what actions are economically meaningful to ensure compliance.
The successful PhD candidate will work under the guidance of an academic as well as industrial (HMRC Digital Academy and Cambridge Spark) supervisor(s). HMRC and Cambridge Spark will provide expertise in the data, the possibility of working onsite and training. Cambridge Spark offers a variety of training, conferences and workshops in AI and data analytics methodology. HMRC Digital Academy runs a series of regular seminars and are investing in research in data analytics.
Applicants should have, or expect to obtain, a minimum of a UK upper second class honours degree in Mathematics or a related discipline, or equivalent. Applicants whose first language is not English must also meet the University’s English language requirements.
How to apply
Formal applications for research degree study should be made online through the university’s website. Please state clearly in the research information second that the PhD you wish to be considered for is ‘Predictive analytics for tax compliance’ as well as Dr Georgios Aivaliotis as your proposed supervisor.
If English is not your first language, you must provide evidence that you meet the University’s minimum English Language requirements.
We welcome scholarship applications from all suitably-qualified candidates, but UK black and minority ethnic (BME) researchers are currently under-represented in our Postgraduate Research community, and we would therefore particularly encourage applications from UK BME candidates. All scholarships will be awarded on the basis of merit.