Targeted audience
  • AI Master
  • Ability to read and understand papers written in English.
  • Ability to perform academic writing.
  • Strong programming skills (e.g.Java, essential)
  • Lectures Information Retrieval oder Information Mining and the use of tools such as RapidMiner (essential)

Task description

Social media platforms allow millions of internet users to easily create and share multi-media content. This generates a continuously increasing volume of big data that harbours precious knowledge of the crowds. Much of crowd wisdom is bundled up in arguments, i.e. claims that are supported or refuted by evidence. This evidential data could be used to answer questions, understand complex phenomena or evaluate services and products - if it was easily accessible. However, currently, analytic tools can only tell what users report in big data, not why.

Given the volume of data (big data) the arguments will necessarily recur and be also disconnected from each other. In Twitter, for instance, the communication between the users happens asynchronously. Therefore, the arguments will easily be repeated by several contributors. Thus, it is important to determine similar arguments and group them under a representative argument.

This master project will investigate argument clustering using semantic similarity, textual entailment and supervised machine learning approaches. Each of these should be evaluated against the gold standard data (DART data reported by Bosc et al. 2016).

Bosc, Tom and Cabrio, Elena and Villata, Serena. DART: a Dataset of Arguments and their Relations on Twitter. Proceedings of the 10th edition of the Language Resources and Evaluation Conference, 2016.