Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
Autor(a) principal: | |
---|---|
Data de Publicação: | 2014 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Dementia & Neuropsychologia |
Texto Completo: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227 |
Resumo: | Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods. |
id |
ANCC-1_bc9045c79b23f60a24be878d83ae5971 |
---|---|
oai_identifier_str |
oai:scielo:S1980-57642014000300227 |
network_acronym_str |
ANCC-1 |
network_name_str |
Dementia & Neuropsychologia |
repository_id_str |
|
spelling |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysisnatural language processinglanguage testsnarrativesadultseducational statusage groupsDiscourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento2014-09-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227Dementia & Neuropsychologia v.8 n.3 2014reponame:Dementia & Neuropsychologiainstname:Associação de Neurologia Cognitiva e do Comportamento (ANCC)instacron:ANCC10.1590/S1980-57642014DN83000006info:eu-repo/semantics/openAccessToledo,Cíntia MatsudaCunha,AndreScarton,CarolinaAluísio,Sandraeng2015-10-20T00:00:00Zoai:scielo:S1980-57642014000300227Revistahttp://www.demneuropsy.com.br/ONGhttps://old.scielo.br/oai/scielo-oai.php||demneuropsy@uol.com.br1980-57641980-5764opendoar:2015-10-20T00:00Dementia & Neuropsychologia - Associação de Neurologia Cognitiva e do Comportamento (ANCC)false |
dc.title.none.fl_str_mv |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis |
title |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis |
spellingShingle |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis Toledo,Cíntia Matsuda natural language processing language tests narratives adults educational status age groups |
title_short |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis |
title_full |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis |
title_fullStr |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis |
title_full_unstemmed |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis |
title_sort |
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis |
author |
Toledo,Cíntia Matsuda |
author_facet |
Toledo,Cíntia Matsuda Cunha,Andre Scarton,Carolina Aluísio,Sandra |
author_role |
author |
author2 |
Cunha,Andre Scarton,Carolina Aluísio,Sandra |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Toledo,Cíntia Matsuda Cunha,Andre Scarton,Carolina Aluísio,Sandra |
dc.subject.por.fl_str_mv |
natural language processing language tests narratives adults educational status age groups |
topic |
natural language processing language tests narratives adults educational status age groups |
description |
Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods. |
publishDate |
2014 |
dc.date.none.fl_str_mv |
2014-09-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1980-57642014000300227 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1590/S1980-57642014DN83000006 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html |
dc.publisher.none.fl_str_mv |
Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento |
publisher.none.fl_str_mv |
Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento |
dc.source.none.fl_str_mv |
Dementia & Neuropsychologia v.8 n.3 2014 reponame:Dementia & Neuropsychologia instname:Associação de Neurologia Cognitiva e do Comportamento (ANCC) instacron:ANCC |
instname_str |
Associação de Neurologia Cognitiva e do Comportamento (ANCC) |
instacron_str |
ANCC |
institution |
ANCC |
reponame_str |
Dementia & Neuropsychologia |
collection |
Dementia & Neuropsychologia |
repository.name.fl_str_mv |
Dementia & Neuropsychologia - Associação de Neurologia Cognitiva e do Comportamento (ANCC) |
repository.mail.fl_str_mv |
||demneuropsy@uol.com.br |
_version_ |
1754212930981724160 |