Basit öğe kaydını göster

dc.contributor.authorSüzek, Barış Ethem
dc.contributor.authorWang, Yuqi
dc.contributor.authorHuang, Hongzhan
dc.contributor.authorMcGarvey, Peter B.
dc.contributor.authorWu, Cathy H.
dc.date.accessioned2020-11-20T15:06:15Z
dc.date.available2020-11-20T15:06:15Z
dc.date.issued2015
dc.identifier.issn1367-4803
dc.identifier.issn1460-2059
dc.identifier.urihttps://doi.org/10.1093/bioinformatics/btu739
dc.identifier.urihttps://hdl.handle.net/20.500.12809/3110
dc.description0000-0002-1521-4306en_US
dc.descriptionWOS: 000352268900017en_US
dc.descriptionPubMed ID: 25398609en_US
dc.description.abstractMotivation: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. Results: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (similar to 7 times shorter hit list before expansion), faster (similar to 6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation.en_US
dc.description.sponsorshipNational Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [U41HG006104]; NATIONAL HUMAN GENOME RESEARCH INSTITUTEUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Human Genome Research Institute (NHGRI) [U41HG007822, U41HG006104, U41HG007822, U41HG007822, U41HG007822, U41HG007822, U41HG006104, U41HG006104, U41HG006104, U41HG006104, U41HG006104, U41HG007822, U41HG006104] Funding Source: NIH RePORTERen_US
dc.description.sponsorshipThis project is supported by the UniProt grant U41HG006104 from the National Institutes of Health.en_US
dc.item-language.isoengen_US
dc.publisherOxford Univ Pressen_US
dc.item-rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectUniRef clustersen_US
dc.titleUniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searchesen_US
dc.item-typearticleen_US
dc.contributor.departmentMÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.contributor.institutionauthorSüzek, Barış Ethem
dc.identifier.doi10.1093/bioinformatics/btu739
dc.identifier.volume31en_US
dc.identifier.issue6en_US
dc.identifier.startpage926en_US
dc.identifier.endpage932en_US
dc.relation.journalBioinformaticsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster