Effects of Integrated Kernal PCA Function on Data Sampling Techniques for Liver Disease Prediction

Rubia Yasmin

An integrated novel feature extraction is proposed using different kernel functions of principal component analysis (PCA) for handling feature extraction from imbalanced medical datasets to diagnose liver patients. Feature extraction using PCA projects the original features to lower dimension space that maximizes variance. In the presence of imbalanced data, feature extraction using PCA for classification tasks is challenging because the extracted features favor the majority class. The problem is more acute when kernel PCA uses instead of classical PCA. Linear, polynomial, and sigmoid kernels are used in PCA and each kernel has a different intention to project the features space. The majority of medical data frequently shows a class imbalance which misleads machine-learning algorithm performances. To solve this problem, we propose an integrated kernel-based feature extraction framework to classify liver patients from an imbalanced liver function dataset using recently developed hybrid data balancing techniques as well as robust methods for the imputation of missing values and outliers. Five classification algorithms and five data balancing techniques are considered for comparing performance on the imbalanced Indian Liver Patients Database (IPLD). The proposed integrated pipeline strategy improves the different classification evaluation matrices in the range of 5-20% in contrast to other methods.

Download Full Article