Betreuer

Bearbeiter

Formalia

Zielgruppe
  • AI Master
Voraussetzungen
  • Ability to read and understand papers written in English.
  • Ability to perform academic writing.
  • Strong programming skills (e.g.Java, essential)
  • Lectures Information Retrieval oder Information Mining (essential)

Aufgabenstellung

Social media platforms allow millions of internet users to easily create and share multi-media content. This generates a continuously increasing volume of big data that harbours precious knowledge of the crowds. Much of crowd wisdom is bundled up in arguments, i.e. claims that are supported or refuted by evidence. This evidential data could be used to answer questions, understand complex phenomena or evaluate services and products - if it was easily accessible. However, currently, analytic tools can only tell what users report in big data, not why. Furthermore, currently most argument mining studies work with English or resource rich European languages. A desired situation would be if such systems exist also for under-resourced languages such as Urdu, Arabic, Persian and Turkish.

This master project will contribute towards developing tools for automatic extraction of relevant and reliable arguments from multi-lingual big data. In particular the tools will be applied to news articles written in English, Arabic, Persian, Urdu and Turkish.

The student will be provided with comparable news articles in the respective languages. Two documents written in two different languages are comparable if they talk about the same topic or event. The aim of this project is to develop a tool for creating comparable text passages -- i.e. matching text passages from an English file with text passages in the other language (e.g. Urdu) . The project should investigate features to determine the similarity between text passages. Also new development in semantic similarity such as word embeddings, neural networks should be considered in determining the similarity between the passages. Given gold standard data those features should be used to train a classifier (multi-label). To assess the quality of the classifier it should be also evaluated against the gold standard data.

Tasks: