• Good knowledge in Data Mining

Task description

Healthcare insurance companies collect huge amounts of medical data comprising details like the patient's demographic information, medical details like diagnosis done, procedures performed and the related costs etc. In certain situations (e.g. change of contracts), it is important to estimate the expected healthcare costs for a certain amount of time (e.g. the next 2 years). Possible approaches for addressing this issue include numeric prediction and classification.

In this thesis, classification approaches are to be investigated. For this, person health records are to be classified into five different buckets corresponding to different cost intervals. as classification method, support vector machines (SVMs) are to be used. Since SVMs basically only support binary classification, the multi-classification problem can be handled either by one-against-one or by one-against-rest classification. Further problems are caused by the skewed distribution of items over the five classes classes.

  1. Data preparationÖ application of different sampling methods
  2. SVM classification: selection of SVM implementation, investigation of multiclassification approaches
  3. Comparison of results: application of statistical tests


Nello Cristianini, John Shawe-Taylor: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.