Betreuer

Bearbeiter

Formalia

Zielgruppe
  • AI Master
Voraussetzungen
  • Ability to read and understand papers written in English.
  • Ability to perform academic writing.
  • Strong programming skills (e.g.Java, essential)
  • Lectures Information Retrieval oder Information Mining (essential)

Aufgabenstellung

Social media platforms allow millions of internet users to easily create and share multi-media content. This generates a continuously increasing volume of big data that harbours precious knowledge of the crowds. Much of crowd wisdom is bundled up in arguments, i.e. claims that are supported or refuted by evidence. This evidential data could be used to answer questions, understand complex phenomena or evaluate services and products - if it was easily accessible. However, currently, analytic tools can only tell what users report in big data, not why. Furthermore, currently most argument mining studies work with English or resource rich European languages. A desired situation would be if such systems exist also for under-resourced languages such as Urdu, Arabic, Persian and Turkish.

This master project will contribute towards developing tools for automatic extraction of relevant and reliable arguments from multi-lingual big data. In particular the tools will be applied to news articles written in English, Arabic, Persian, Urdu and Turkish.

The student will be provided with news articles in the respective languages -- the data is close to 1TB. Before argument mining a desired situation is when the documents are paired to comparable corpora. Two documents written in two different languages are comparable if they talk about the same topic or event. The aim of this project is to develop a tool for creating comparable corpora -- i.e. pairing e.g. English documents with Arabic ones. The project should investigate features to determine the similarity between documents. Also new development in semantic similarity such as word embeddings should be considered in determining the similarity between documents. Given gold standard data those features should be used to train a classifier (multi-label). To assess the quality of the classifier it should be also evaluated against the gold standard data.

Tasks: