Data mining, decision tree, sampling, data preparation
Hepatitis is a liver disease characterized by inflammatory cells in the tissues of the liver. It causes mild to serious effect to the liver so that patient may even die of it. On the other hand, decision trees are important data mining tools in medicine, because doctors can easily understand the final result of data mining so that they can be used to diagnose the disease. Decision tree algorithms give priority to the classes having more training instances for better classification, so that over-sampling for a minor can be a plausible technique for better classification of the minor class, if we are more interested in the better classification of the minor class. In our hepatitis data the ratio of minor versus major is 32 vs. 123. As a way to build better decision tree for a minority class without sacrificing overall accuracy much, we select good synthetic over-sampled data instances for our decision tree. By selecting good synthetic data instances, we may achieve our goal. Experiments with various levels of over-sampling proved our assertion.
Cite this paper
Hyontai Sug. (2016) Flexible Data Mining for Medicine Data . Biology and Biomedicine, 1, 102-105