br Network based methylation NBM score of EC gene further
3.2.2. Network based methylation (NBM) score of EC gene further improves classification accuracy Since many ECs do not show methylation changes between the re-current and non-recurrent tumors, they may have played a role through their interactions with the DMs, for example, DNA methylation changes of the DM Loxapine Succinate may affect the functions of the ECs via protein-protein interactions. To measure the relevance between the DMs and ECs based on the network topology, we used a well-established random walk pro-cedure to compute the probability for a random walker starting at a DM gene to reach any EC genes. The TDM score of a DM gene is then distrib-uted to the EC genes according to these probabilities. As some of the ECs may also have TDM scores, probabilities were also calculated for a random walker starting at an EC gene to reach other EC genes and the ECs may both distribute its TDM scores to other ECs and receive distri-butions from other DMs and ECs. These are combined together to derive the NBM scores for all the ECs (see Methods).
Fig. 2(d) shows the NBM scores for the ECs. Interestingly, while none of the ECs were differentially methylated at p-value b 0.02 according to
the TDM scores, 203 (43%) of the 474 ECs show statistically significant difference between recurrent and non-recurrent tumors (p b 0.02, Student's t-test) according to the NBM scores, confirming that the ECs may indeed have functions related to the DMs in cancer metastasis. These NBM scores of ECs are then used, either alone or in combina-tion with the TDM scores of the DM genes, to construct a support vector machine (SVM) classifier to separate recurrent and non-recurrent tu-mors. As shown in Table 2, the performance of the classifier constructed with the NBM scores of the ECs (EC*) is significantly higher than that with the original TDM scores of the ECs, and even slightly better than that of the DM genes. This is to some extent not surprising, as the NBM scores of the ECs are derived from the TDM scores of the DMs. To see if indeed the ECs provide any additional information other than approximating the DM genes, we combined the TDM scores of the DMs and the NBM scores of the ECs (DM + EC*, Table 2). As shown, this resulted in the highest classification accuracy (kappa statistic 0.513 and accuracy 82.9%), suggesting that the topologically derived scores for the ECs provide non-redundant, orthogonal information than the TDM scores of the DM genes. In addition, when the PPI network is randomly rewired, the benefit of the EC genes vanishes1 (Table 2, EC # and DM + EC #). Finally, it is worth noting that the performance of the algorithm is relatively robust with respect to the parameter (restart probability) of the random walk procedure (Fig. 3).
3.2.3. Comparison with existing methods
We compared the performance of our algorithm with two alterna-tive methods. First, we implemented a simple pathway-based approach by using each KEGG pathway as a metagene. Briefly, for each KEGG pathway and each patient, we counted the number of genes in that pathway that had a positive TDM score, as well as the number of genes that had a negative score. Therefore, each pathway will result in two features: one for positive scores and one for negative scores. (This strategy has the best classification performance among multiple varia-tions of pathway-based models.) We used all 208 KEGG pathways, resulting in 516 features for each patient. Second, we downloaded the program from  for identifying discriminative subnetworks, i.e., sub-networks whose average node activity can discriminate the two classes of samples. Following the suggestions from the authors, we limited the subnetwork size to five, and obtained the top 100 subnetworks with the highest discriminative power. These discriminative subnetworks (DS) are then used in two ways (denoted DS and DS*, respectively): (1) the genes in the subnetworks were pooled, and these individual genes