Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.17645/pag.v8i2.2591 |
Resumo: | This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach. |
id |
RCAP_70bc0c3d8c86d9ea6bdc37a8c20840a6 |
---|---|
oai_identifier_str |
oai:ojs.cogitatiopress.com:article/2591 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Setsannotation; automation; discourse networks; machine learning; migration discourseThis article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.Cogitatio2020-06-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.17645/pag.v8i2.2591oai:ojs.cogitatiopress.com:article/2591Politics and Governance; Vol 8, No 2 (2020): Policy Debates and Discourse Network Analysis; 326-3392183-2463reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttps://www.cogitatiopress.com/politicsandgovernance/article/view/2591https://doi.org/10.17645/pag.v8i2.2591https://www.cogitatiopress.com/politicsandgovernance/article/view/2591/2591https://www.cogitatiopress.com/politicsandgovernance/article/downloadSuppFile/2591/1112Copyright (c) 2020 Sebastian Haunss, Jonas Kuhn, Sebastian Padó, Andre Blessing, Nico Blokker, Erenay Dayanik, Gabriella Lapesahttp://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessHaunss, SebastianKuhn, JonasPadó, SebastianBlessing, AndreBlokker, NicoDayanik, ErenayLapesa, Gabriella2022-10-21T16:03:53Zoai:ojs.cogitatiopress.com:article/2591Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T16:13:48.013068Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets |
title |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets |
spellingShingle |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets Haunss, Sebastian annotation; automation; discourse networks; machine learning; migration discourse |
title_short |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets |
title_full |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets |
title_fullStr |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets |
title_full_unstemmed |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets |
title_sort |
Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets |
author |
Haunss, Sebastian |
author_facet |
Haunss, Sebastian Kuhn, Jonas Padó, Sebastian Blessing, Andre Blokker, Nico Dayanik, Erenay Lapesa, Gabriella |
author_role |
author |
author2 |
Kuhn, Jonas Padó, Sebastian Blessing, Andre Blokker, Nico Dayanik, Erenay Lapesa, Gabriella |
author2_role |
author author author author author author |
dc.contributor.author.fl_str_mv |
Haunss, Sebastian Kuhn, Jonas Padó, Sebastian Blessing, Andre Blokker, Nico Dayanik, Erenay Lapesa, Gabriella |
dc.subject.por.fl_str_mv |
annotation; automation; discourse networks; machine learning; migration discourse |
topic |
annotation; automation; discourse networks; machine learning; migration discourse |
description |
This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-06-02 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.17645/pag.v8i2.2591 oai:ojs.cogitatiopress.com:article/2591 |
url |
https://doi.org/10.17645/pag.v8i2.2591 |
identifier_str_mv |
oai:ojs.cogitatiopress.com:article/2591 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://www.cogitatiopress.com/politicsandgovernance/article/view/2591 https://doi.org/10.17645/pag.v8i2.2591 https://www.cogitatiopress.com/politicsandgovernance/article/view/2591/2591 https://www.cogitatiopress.com/politicsandgovernance/article/downloadSuppFile/2591/1112 |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Cogitatio |
publisher.none.fl_str_mv |
Cogitatio |
dc.source.none.fl_str_mv |
Politics and Governance; Vol 8, No 2 (2020): Policy Debates and Discourse Network Analysis; 326-339 2183-2463 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799130591275253760 |