In Silico Validation of ncRNA-ncRNA Interaction Sites with ncRNAs represented by k-mers features
Özet
A recent catalogue of human transcriptome, namely CHESS database, assembled from RNA sequencing experiments as a part of the Genotype-Tissue Expression (GTEx) Project reported more non-coding RNA genes (21,856) than protein-coding (21,306), revealing an unexpectedly vast amount of transcriptional noise (Pertea et al, 2018). In this study, we introduce a workflow coded in KNIME that computationally distinguishes the ncRNA-ncRNA interaction sites with less reliable interaction sites containing less experimentally validated binding sites than the interaction sites with more experimental validation. Duplex structure and k-mer features of the ncRNA-ncRNA binding sites with experimental verification were used as input to the classification workflow. In our analysis, we observed that although duplex structure features had no positive effect on the success rate of the classification, using just the k-mer features, ~80% success could be achieved in categorization of the confidence of the ncRNA-ncRNA binding sites. Our result verified the classification performance of miRNA-mRNA targets using only k-mer features from our previous study (Yousef et al, 2018). © 2019 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.