Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

Detalhes bibliográficos
Autor(a) principal: Toledo,Cíntia Matsuda
Data de Publicação: 2014
Outros Autores: Cunha,Andre, Scarton,Carolina, Aluísio,Sandra
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Dementia & Neuropsychologia
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227
Resumo: Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.
id ANCC-1_bc9045c79b23f60a24be878d83ae5971
oai_identifier_str oai:scielo:S1980-57642014000300227
network_acronym_str ANCC-1
network_name_str Dementia & Neuropsychologia
repository_id_str
spelling Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysisnatural language processinglanguage testsnarrativesadultseducational statusage groupsDiscourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento2014-09-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227Dementia & Neuropsychologia v.8 n.3 2014reponame:Dementia & Neuropsychologiainstname:Associação de Neurologia Cognitiva e do Comportamento (ANCC)instacron:ANCC10.1590/S1980-57642014DN83000006info:eu-repo/semantics/openAccessToledo,Cíntia MatsudaCunha,AndreScarton,CarolinaAluísio,Sandraeng2015-10-20T00:00:00Zoai:scielo:S1980-57642014000300227Revistahttp://www.demneuropsy.com.br/ONGhttps://old.scielo.br/oai/scielo-oai.php||demneuropsy@uol.com.br1980-57641980-5764opendoar:2015-10-20T00:00Dementia & Neuropsychologia - Associação de Neurologia Cognitiva e do Comportamento (ANCC)false
dc.title.none.fl_str_mv Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
spellingShingle Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
Toledo,Cíntia Matsuda
natural language processing
language tests
narratives
adults
educational status
age groups
title_short Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_full Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_fullStr Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_full_unstemmed Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_sort Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
author Toledo,Cíntia Matsuda
author_facet Toledo,Cíntia Matsuda
Cunha,Andre
Scarton,Carolina
Aluísio,Sandra
author_role author
author2 Cunha,Andre
Scarton,Carolina
Aluísio,Sandra
author2_role author
author
author
dc.contributor.author.fl_str_mv Toledo,Cíntia Matsuda
Cunha,Andre
Scarton,Carolina
Aluísio,Sandra
dc.subject.por.fl_str_mv natural language processing
language tests
narratives
adults
educational status
age groups
topic natural language processing
language tests
narratives
adults
educational status
age groups
description Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.
publishDate 2014
dc.date.none.fl_str_mv 2014-09-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1590/S1980-57642014DN83000006
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento
publisher.none.fl_str_mv Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento
dc.source.none.fl_str_mv Dementia & Neuropsychologia v.8 n.3 2014
reponame:Dementia & Neuropsychologia
instname:Associação de Neurologia Cognitiva e do Comportamento (ANCC)
instacron:ANCC
instname_str Associação de Neurologia Cognitiva e do Comportamento (ANCC)
instacron_str ANCC
institution ANCC
reponame_str Dementia & Neuropsychologia
collection Dementia & Neuropsychologia
repository.name.fl_str_mv Dementia & Neuropsychologia - Associação de Neurologia Cognitiva e do Comportamento (ANCC)
repository.mail.fl_str_mv ||demneuropsy@uol.com.br
_version_ 1754212930981724160