Exploring events and distributed representations of text in multi-document summarization

Detalhes bibliográficos
Autor(a) principal: Marujo, L.
Data de Publicação: 2016
Outros Autores: Ling, W., Ribeiro, R., Gershman, A., Carbonell, J., de Matos, D., Neto, J. P.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/11022
Resumo: In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.
id RCAP_cafc6c65b6840fe9c93e159cd077159b
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/11022
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Exploring events and distributed representations of text in multi-document summarizationMulti-document summarizationExtractive summarizationEvent detectionDistributed representations of textIn this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.Elsevier Science BV2016-03-04T15:03:33Z2016-01-01T00:00:00Z20162019-03-28T16:29:57Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/11022eng0950-705110.1016/j.knosys.2015.11.005Marujo, L.Ling, W.Ribeiro, R.Gershman, A.Carbonell, J.de Matos, D.Neto, J. P.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:41:07Zoai:repositorio.iscte-iul.pt:10071/11022Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:19:04.271039Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Exploring events and distributed representations of text in multi-document summarization
title Exploring events and distributed representations of text in multi-document summarization
spellingShingle Exploring events and distributed representations of text in multi-document summarization
Marujo, L.
Multi-document summarization
Extractive summarization
Event detection
Distributed representations of text
title_short Exploring events and distributed representations of text in multi-document summarization
title_full Exploring events and distributed representations of text in multi-document summarization
title_fullStr Exploring events and distributed representations of text in multi-document summarization
title_full_unstemmed Exploring events and distributed representations of text in multi-document summarization
title_sort Exploring events and distributed representations of text in multi-document summarization
author Marujo, L.
author_facet Marujo, L.
Ling, W.
Ribeiro, R.
Gershman, A.
Carbonell, J.
de Matos, D.
Neto, J. P.
author_role author
author2 Ling, W.
Ribeiro, R.
Gershman, A.
Carbonell, J.
de Matos, D.
Neto, J. P.
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Marujo, L.
Ling, W.
Ribeiro, R.
Gershman, A.
Carbonell, J.
de Matos, D.
Neto, J. P.
dc.subject.por.fl_str_mv Multi-document summarization
Extractive summarization
Event detection
Distributed representations of text
topic Multi-document summarization
Extractive summarization
Event detection
Distributed representations of text
description In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.
publishDate 2016
dc.date.none.fl_str_mv 2016-03-04T15:03:33Z
2016-01-01T00:00:00Z
2016
2019-03-28T16:29:57Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/11022
url http://hdl.handle.net/10071/11022
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0950-7051
10.1016/j.knosys.2015.11.005
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier Science BV
publisher.none.fl_str_mv Elsevier Science BV
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134749803937792