br Fang Qi Li Shi Lin Wang Gong
Fang-Qi Li, Shi-Lin Wang∗, Gong-Shen Liu
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
Cervical cancer screening
Bayesian Possibilistic C-Means clustering
Recently, a lot of attention has been given to the treatment of cervical cancer due to its high lethality and morbidity. Early screening of this disease is of vital importance. In this paper, we propose an automatic cervical cancer screening algorithm that analyzes the re-lated risk factors to provide preliminary diagnostic information for medical practitioners. In cervical cancer screening, a number of risk factors are considered to be highly private or sensitive, and many patients elect not to provide the corresponding information. Such severe amount of missing attributes leads to great di culties for many automatic screen-ing algorithms. To solve this problem, a Bayesian Possibilistic C-means (BPCM in short) clustering algorithm is proposed to discover the representative patterns from the complete data and to estimate the missing values of a specific sample using its closest representa-tive pattern. After the data completion step, a two-stage fuzzy ensemble learning scheme is proposed to derive the final screening result. In the first stage, the bootstrap aggrega-tion (bagging in short) procedure is adopted to sample the entire class-imbalanced dataset into a number of class-balanced subsets. In the second stage, a number of weak classifiers are trained on each subset and a fuzzy logic based approach is designed to analyze the classification results of the weak classifiers and to obtain the final classification result. Ex-periments have been conducted on a dataset containing 858 patients. From the experiment results, it Loxapine Succinate can be observed that the proposed BPCM can effectively discover the underlying patterns and is reliable in estimating the missing attribute compared with the traditional approaches. Moreover, by applying the proposed fuzzy ensemble learning scheme, the final classification results on the completed data by BPCM are promising (an accuracy of 76% or a positive sensitivity of 79%) under the severe missing-attribute scenario (only 6% samples with complete data).
With the past decades witnessing the blooming development of data science as well as bioscience, increasing efforts are now being devoted to combining the techniques in these two fields, and the results have been fruitful and inspiring [14,24,35]. Apart from applying the latest machine learning algorithms to biomedical data sets [5,22], many works were motivated by the unique challenges hair shaft biomedical data inherited from the biological and clinical circumstance. In the
∗ Corresponding author. E-mail address: [email protected] (S.-L. Wang).
literature concerning bioscience and medical science, the most frequently studied issues include the high dimensionality
 that requires effective feature selection , severe class imbalance  and privacy issue  with the consequent uncertainty . The above issues in biomedical data pose great challenges to the classical machine learning and data mining tools and methods.
Among various data mining and decision-making tasks in the biomedical contexts, computer-aided diagnosis [41,50] (CAD) has attracted much research interest because CAD systems can help to save lives in many developing countries where the medical resource is still scanty. Existing CAD systems usually adopt various kind of machine learning algorithms to analyze disease related information (including images, unstructured data and etc.) and to provide advice to doctors. For example, in cancer screening, the X-ray images, Computed Tomography (CT) scan pictures  are provided as the inputs to the CAD system. Some machine learning algorithms are then applied on the inputs to discover the underlying patterns, which can effectively differentiate cancer patients/potential cancer patients against healthy people. The most widely used learners include k-nearest-neighbours, naive Bayes classifier, neural network . However, appropriate modifications are often necessary for the traditional machine learning tools to perform well under the biomedical context.