By Benjamin Timmermans @8w

My research is about

Using crowdsourcing to extract human interpretation

I'm trying to achieve

Understand controversies through human and machine data

My research question is

How to enable the discovery and understanding of controversial issues and events by combining human-machine active learning workflows.


  • The ControCurator Project
  • Motivation
  • State of the Art
  • Methodology

The ControCurator Project


Who's Involved

Lora Aroyo: Crowdsourcing (non-expert) annotations from CrowdTruth

Evangelos Kanoulas: Combining active learning with expert and crowd annotations

Johan Oomen: Analysis and mining of broadcast news

Edwin Kuipers: Analysis and mining of user-generated data

Understanding controversies for

  • Journalists, News agencies
  • Media professionals
  • Companies
  • Governments
  • General public (Preventing filter bubbles)

Goal of the project

How to enable the discovery and understanding
of controversial issues and events by combining
human-machine active learning workflows.



noun | con·tro·ver·sy | \ˈkän-trə-ˌvər-sē

An argument that involves many people who strongly disagree about something


Prolonged public disagreement or heated discussion

Oxford English Dictionary

When people often disagree and debate with opposing viewpoints

Mejova, Y., Zhang, A. X., Diakopoulos, N., & Castillo, C. (2014).

Aspects of a Controvery


Arguments, Viewpoints, Disagreement, Discussion

about anything

by many people

is strong, is heated, is prolonged, is public

Wicked problems

Problems that are (near) impossible to solve

Solutions are not right or wrong, but good and bad.

as defined by Rittel and Webber in 1973

Example Topics

US Presidential Elections
  • Trump
  • Hilary
  • Pro vaccination
  • Anti vaccination
Climate Change
  • Humans are responsible
  • Human are not responsible

US Presidential Elections

Trump's statements are controversial
  • He strongly expresses opposing views
  • He is popular and media fires it up
  • He says things at the right moment
  • They add to him being controversial

Hillary's actions are controversial
  • Her email scandal
  • Her health issues


Pro vaccination
  • Lots of scientific evidence
  • Vaccines save lives
  • Not vaccinating hurts vaccinated

Anti vaccination
  • Existing since 1700
  • Causes autism
  • Freedom of choice
  • It is unnatural


US Presidential Elections

Climate Change

Defining the problem:

it is about identifying (opposing) views?




Being able to capture and model of a lot of content,
with different granularities and temporal aspects

State of the Art


Analysing controversies using wikipedia discussions

Cons: Limited to Wikipedia

Automated controversy detection on the web.

Dori-Hacohen, Shiri, and James Allan. ECAI 2015.

Linking topics to Wikipedia discussions

Cons: Limited to wikipedia articles

Mining Contrastive Opinions on Political Texts using Cross-Perspective Topic Model.

Fang, Si, Somasundaram, & Yu.
ACM Web Search and Data Mining 2012

Topic modeling on Dutch parliamentary discussions

Cons: They assume the contrastive parties are known and how many topics there are

Controversy and sentiment in online news.

Mejova, Y., Zhang, A. X., Diakopoulos, N., & Castillo, C. (2014).

Crowdsourcing words indicating controversies

Cons: If someone says something is controversial it does not mean it is controversial

Identifying Controversial Issues and Their Sub-topics in News Articles

Yoonjung Choi, Yuchul Jung, Sung-Hyon Myaeng, PAISI'10

Sentiment analysis of topics

Cons: Limited to sentiment analysis and web articles

The gap we are filling

  • No predefined topics
  • Not limited to wikipedia
  • Unknown parties in the debate
  • Not limited to sentiment analysis
  • Combining different media and features



Anti-Vaccination debate: QuPID dataset

350 related web documents:

  • News articles
  • Blog posts
  • Dictionaries

Extending with:
  • CrowdyNews: Social Media
  • NISV: Broadcast news
  • VU: Crowdsourcing / Nichesourcing

How documents are referencing


  1. Gather related documents
  2. Extract features that indicate controversy
  3. Train predictive algorithm

Controversy Barometer

  • Identifying controversial claims
  • combining machine and crowd annotations in an active workflow

Event Blender

  • Summarizing high-profile events
  • Timelines to understand development of events


Compare to baseline of controversy in Wikipedia discussions

Match topics to wikipedia articles


Understand Controversies by:

  • lots of data from multiple sources
  • at different granularities
  • with temporal aspects
  • train a predictive algorithm with the relevant features