Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets

Detalhes bibliográficos
Autor(a) principal: Haunss, Sebastian
Data de Publicação: 2020
Outros Autores: Kuhn, Jonas, Padó, Sebastian, Blessing, Andre, Blokker, Nico, Dayanik, Erenay, Lapesa, Gabriella
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.17645/pag.v8i2.2591
Resumo: This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.
id RCAP_70bc0c3d8c86d9ea6bdc37a8c20840a6
oai_identifier_str oai:ojs.cogitatiopress.com:article/2591
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Setsannotation; automation; discourse networks; machine learning; migration discourseThis article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.Cogitatio2020-06-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.17645/pag.v8i2.2591oai:ojs.cogitatiopress.com:article/2591Politics and Governance; Vol 8, No 2 (2020): Policy Debates and Discourse Network Analysis; 326-3392183-2463reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttps://www.cogitatiopress.com/politicsandgovernance/article/view/2591https://doi.org/10.17645/pag.v8i2.2591https://www.cogitatiopress.com/politicsandgovernance/article/view/2591/2591https://www.cogitatiopress.com/politicsandgovernance/article/downloadSuppFile/2591/1112Copyright (c) 2020 Sebastian Haunss, Jonas Kuhn, Sebastian Padó, Andre Blessing, Nico Blokker, Erenay Dayanik, Gabriella Lapesahttp://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessHaunss, SebastianKuhn, JonasPadó, SebastianBlessing, AndreBlokker, NicoDayanik, ErenayLapesa, Gabriella2022-10-21T16:03:53Zoai:ojs.cogitatiopress.com:article/2591Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T16:13:48.013068Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
spellingShingle Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
Haunss, Sebastian
annotation; automation; discourse networks; machine learning; migration discourse
title_short Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_full Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_fullStr Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_full_unstemmed Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
title_sort Integrating Manual and Automatic Annotation for the Creation of Discourse Network Data Sets
author Haunss, Sebastian
author_facet Haunss, Sebastian
Kuhn, Jonas
Padó, Sebastian
Blessing, Andre
Blokker, Nico
Dayanik, Erenay
Lapesa, Gabriella
author_role author
author2 Kuhn, Jonas
Padó, Sebastian
Blessing, Andre
Blokker, Nico
Dayanik, Erenay
Lapesa, Gabriella
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Haunss, Sebastian
Kuhn, Jonas
Padó, Sebastian
Blessing, Andre
Blokker, Nico
Dayanik, Erenay
Lapesa, Gabriella
dc.subject.por.fl_str_mv annotation; automation; discourse networks; machine learning; migration discourse
topic annotation; automation; discourse networks; machine learning; migration discourse
description This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine learning based annotation support is compared. The design and setting aim to measure and evaluate: a) annotation speed; b) annotation quality; and c) applicability to the use case of discourse network generation. While the results indicate only slight increases in terms of annotation speed, the authors find a moderate boost in annotation quality. Additionally, with the help of manual annotation of the actors and filtering out of the false positives, the machine learning based annotation suggestions allow the authors to fully recover the core network of the discourse as extracted from the articles annotated during the experiment. This is due to the redundancy which is naturally present in the annotated texts. Thus, assuming a research focus not on the complete network but the network core, an AI-based annotation can provide reliable information about discourse networks with much less human intervention than compared to the traditional manual approach.
publishDate 2020
dc.date.none.fl_str_mv 2020-06-02
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.17645/pag.v8i2.2591
oai:ojs.cogitatiopress.com:article/2591
url https://doi.org/10.17645/pag.v8i2.2591
identifier_str_mv oai:ojs.cogitatiopress.com:article/2591
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://www.cogitatiopress.com/politicsandgovernance/article/view/2591
https://doi.org/10.17645/pag.v8i2.2591
https://www.cogitatiopress.com/politicsandgovernance/article/view/2591/2591
https://www.cogitatiopress.com/politicsandgovernance/article/downloadSuppFile/2591/1112
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Cogitatio
publisher.none.fl_str_mv Cogitatio
dc.source.none.fl_str_mv Politics and Governance; Vol 8, No 2 (2020): Policy Debates and Discourse Network Analysis; 326-339
2183-2463
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799130591275253760