Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.

Detalhes bibliográficos
Autor(a) principal: Pineda-Peña,,  A.C.
Data de Publicação: 2013
Outros Autores: Faria, Nuno Rodrigues, Imbrechts, Stijn, Libin,, P., Abecasis, Ana, Deforche, K, Ruano Gómez , Lorena, Gomez-Lopez, Arley, Camacho, Ricardo Jorge, de Oliveira, T., Vandamme, Anne-mieke
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/116966
Resumo: BACKGROUND To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice. We also present the upgraded version 3 of the Rega HIV subtyping tool (REGAv3). METHODOLOGY HIV-1 pol sequences (PR+RT) for 4674 patients retrieved from the Portuguese HIV Drug Resistance Database, and 1872 pol sequences trimmed from full-length genomes retrieved from the Los Alamos database were classified with statistical-based tools such as COMET, jpHMM and STAR; similarity-based tools such as NCBI and Stanford; and phylogenetic-based tools such as REGA version 2 (REGAv2), REGAv3, and SCUEAL. The performance of these tools, for pol, and for PR and RT separately, was compared in terms of reproducibility, sensitivity and specificity with respect to the gold standard which was manual phylogenetic analysis of the pol region. RESULTS The sensitivity and specificity for subtypes B and C was more than 96% for seven tools, but was variable for other subtypes such as A, D, F and G. With regard to the most common circulating recombinant forms (CRFs), the sensitivity and specificity for CRF01_AE was ~99% with statistical-based tools, with phylogenetic-based tools and with Stanford, one of the similarity based tools. CRF02_AG was correctly identified for more than 96% by COMET, REGAv3, Stanford and STAR. All the tools reached a specificity of more than 97% for most of the subtypes and the two main CRFs (CRF01_AE and CRF02_AG). Other CRFs were identified only by COMET, REGAv2, REGAv3, and SCUEAL and with variable sensitivity. When analyzing sequences for PR and RT separately, the performance for PR was generally lower and variable between the tools. Similarity and statistical-based tools were 100% reproducible, but this was lower for phylogenetic-based tools such as REGA (~99%) and SCUEAL (~96%). CONCLUSIONS REGAv3 had an improved performance for subtype B and CRF02_AG compared to REGAv2 and is now able to also identify all epidemiologically relevant CRFs. In general the best performing tools, in alphabetical order, were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and REGAv3 when analyzing most of the CRFs. Based on this study, we recommend to confirm subtyping with 2 well performing tools, and be cautious with the interpretation of short sequences.
id RCAP_07a50580508374a445be2e19cef0cb1c
oai_identifier_str oai:run.unl.pt:10362/116966
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.GeneticsSoftwareEcology, Evolution, Behavior and SystematicsInfectious DiseasesSDG 3 - Good Health and Well-beingBACKGROUND To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice. We also present the upgraded version 3 of the Rega HIV subtyping tool (REGAv3). METHODOLOGY HIV-1 pol sequences (PR+RT) for 4674 patients retrieved from the Portuguese HIV Drug Resistance Database, and 1872 pol sequences trimmed from full-length genomes retrieved from the Los Alamos database were classified with statistical-based tools such as COMET, jpHMM and STAR; similarity-based tools such as NCBI and Stanford; and phylogenetic-based tools such as REGA version 2 (REGAv2), REGAv3, and SCUEAL. The performance of these tools, for pol, and for PR and RT separately, was compared in terms of reproducibility, sensitivity and specificity with respect to the gold standard which was manual phylogenetic analysis of the pol region. RESULTS The sensitivity and specificity for subtypes B and C was more than 96% for seven tools, but was variable for other subtypes such as A, D, F and G. With regard to the most common circulating recombinant forms (CRFs), the sensitivity and specificity for CRF01_AE was ~99% with statistical-based tools, with phylogenetic-based tools and with Stanford, one of the similarity based tools. CRF02_AG was correctly identified for more than 96% by COMET, REGAv3, Stanford and STAR. All the tools reached a specificity of more than 97% for most of the subtypes and the two main CRFs (CRF01_AE and CRF02_AG). Other CRFs were identified only by COMET, REGAv2, REGAv3, and SCUEAL and with variable sensitivity. When analyzing sequences for PR and RT separately, the performance for PR was generally lower and variable between the tools. Similarity and statistical-based tools were 100% reproducible, but this was lower for phylogenetic-based tools such as REGA (~99%) and SCUEAL (~96%). CONCLUSIONS REGAv3 had an improved performance for subtype B and CRF02_AG compared to REGAv2 and is now able to also identify all epidemiologically relevant CRFs. In general the best performing tools, in alphabetical order, were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and REGAv3 when analyzing most of the CRFs. Based on this study, we recommend to confirm subtyping with 2 well performing tools, and be cautious with the interpretation of short sequences.Instituto de Higiene e Medicina Tropical (IHMT)Centro de Malária e outras Doenças Tropicais (CMDT)RUNPineda-Peña,,  A.C.,Faria, Nuno RodriguesImbrechts, StijnLibin,, P.Abecasis, AnaDeforche, KRuano Gómez , LorenaGomez-Lopez, ArleyCamacho, Ricardo Jorgede Oliveira, T.Vandamme, Anne-mieke2021-05-04T22:32:12Z2013-01-012013-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10362/116966eng1567-7257PURE: 182640https://doi.org/10.1016/j.meegid.2013.04.032info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:59:53Zoai:run.unl.pt:10362/116966Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:43:24.533175Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
spellingShingle Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
Pineda-Peña,,  A.C.,
Genetics
Software
Ecology, Evolution, Behavior and Systematics
Infectious Diseases
SDG 3 - Good Health and Well-being
title_short Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_full Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_fullStr Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_full_unstemmed Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_sort Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
author Pineda-Peña,,  A.C.,
author_facet Pineda-Peña,,  A.C.,
Faria, Nuno Rodrigues
Imbrechts, Stijn
Libin,, P.
Abecasis, Ana
Deforche, K
Ruano Gómez , Lorena
Gomez-Lopez, Arley
Camacho, Ricardo Jorge
de Oliveira, T.
Vandamme, Anne-mieke
author_role author
author2 Faria, Nuno Rodrigues
Imbrechts, Stijn
Libin,, P.
Abecasis, Ana
Deforche, K
Ruano Gómez , Lorena
Gomez-Lopez, Arley
Camacho, Ricardo Jorge
de Oliveira, T.
Vandamme, Anne-mieke
author2_role author
author
author
author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Instituto de Higiene e Medicina Tropical (IHMT)
Centro de Malária e outras Doenças Tropicais (CMDT)
RUN
dc.contributor.author.fl_str_mv Pineda-Peña,,  A.C.,
Faria, Nuno Rodrigues
Imbrechts, Stijn
Libin,, P.
Abecasis, Ana
Deforche, K
Ruano Gómez , Lorena
Gomez-Lopez, Arley
Camacho, Ricardo Jorge
de Oliveira, T.
Vandamme, Anne-mieke
dc.subject.por.fl_str_mv Genetics
Software
Ecology, Evolution, Behavior and Systematics
Infectious Diseases
SDG 3 - Good Health and Well-being
topic Genetics
Software
Ecology, Evolution, Behavior and Systematics
Infectious Diseases
SDG 3 - Good Health and Well-being
description BACKGROUND To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice. We also present the upgraded version 3 of the Rega HIV subtyping tool (REGAv3). METHODOLOGY HIV-1 pol sequences (PR+RT) for 4674 patients retrieved from the Portuguese HIV Drug Resistance Database, and 1872 pol sequences trimmed from full-length genomes retrieved from the Los Alamos database were classified with statistical-based tools such as COMET, jpHMM and STAR; similarity-based tools such as NCBI and Stanford; and phylogenetic-based tools such as REGA version 2 (REGAv2), REGAv3, and SCUEAL. The performance of these tools, for pol, and for PR and RT separately, was compared in terms of reproducibility, sensitivity and specificity with respect to the gold standard which was manual phylogenetic analysis of the pol region. RESULTS The sensitivity and specificity for subtypes B and C was more than 96% for seven tools, but was variable for other subtypes such as A, D, F and G. With regard to the most common circulating recombinant forms (CRFs), the sensitivity and specificity for CRF01_AE was ~99% with statistical-based tools, with phylogenetic-based tools and with Stanford, one of the similarity based tools. CRF02_AG was correctly identified for more than 96% by COMET, REGAv3, Stanford and STAR. All the tools reached a specificity of more than 97% for most of the subtypes and the two main CRFs (CRF01_AE and CRF02_AG). Other CRFs were identified only by COMET, REGAv2, REGAv3, and SCUEAL and with variable sensitivity. When analyzing sequences for PR and RT separately, the performance for PR was generally lower and variable between the tools. Similarity and statistical-based tools were 100% reproducible, but this was lower for phylogenetic-based tools such as REGA (~99%) and SCUEAL (~96%). CONCLUSIONS REGAv3 had an improved performance for subtype B and CRF02_AG compared to REGAv2 and is now able to also identify all epidemiologically relevant CRFs. In general the best performing tools, in alphabetical order, were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and REGAv3 when analyzing most of the CRFs. Based on this study, we recommend to confirm subtyping with 2 well performing tools, and be cautious with the interpretation of short sequences.
publishDate 2013
dc.date.none.fl_str_mv 2013-01-01
2013-01-01T00:00:00Z
2021-05-04T22:32:12Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/116966
url http://hdl.handle.net/10362/116966
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1567-7257
PURE: 182640
https://doi.org/10.1016/j.meegid.2013.04.032
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138043958919168