Exploring events and distributed representations of text in multi-document summarization
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10071/11022 |
Resumo: | In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007. |
id |
RCAP_cafc6c65b6840fe9c93e159cd077159b |
---|---|
oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/11022 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Exploring events and distributed representations of text in multi-document summarizationMulti-document summarizationExtractive summarizationEvent detectionDistributed representations of textIn this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.Elsevier Science BV2016-03-04T15:03:33Z2016-01-01T00:00:00Z20162019-03-28T16:29:57Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/11022eng0950-705110.1016/j.knosys.2015.11.005Marujo, L.Ling, W.Ribeiro, R.Gershman, A.Carbonell, J.de Matos, D.Neto, J. P.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:41:07Zoai:repositorio.iscte-iul.pt:10071/11022Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:19:04.271039Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Exploring events and distributed representations of text in multi-document summarization |
title |
Exploring events and distributed representations of text in multi-document summarization |
spellingShingle |
Exploring events and distributed representations of text in multi-document summarization Marujo, L. Multi-document summarization Extractive summarization Event detection Distributed representations of text |
title_short |
Exploring events and distributed representations of text in multi-document summarization |
title_full |
Exploring events and distributed representations of text in multi-document summarization |
title_fullStr |
Exploring events and distributed representations of text in multi-document summarization |
title_full_unstemmed |
Exploring events and distributed representations of text in multi-document summarization |
title_sort |
Exploring events and distributed representations of text in multi-document summarization |
author |
Marujo, L. |
author_facet |
Marujo, L. Ling, W. Ribeiro, R. Gershman, A. Carbonell, J. de Matos, D. Neto, J. P. |
author_role |
author |
author2 |
Ling, W. Ribeiro, R. Gershman, A. Carbonell, J. de Matos, D. Neto, J. P. |
author2_role |
author author author author author author |
dc.contributor.author.fl_str_mv |
Marujo, L. Ling, W. Ribeiro, R. Gershman, A. Carbonell, J. de Matos, D. Neto, J. P. |
dc.subject.por.fl_str_mv |
Multi-document summarization Extractive summarization Event detection Distributed representations of text |
topic |
Multi-document summarization Extractive summarization Event detection Distributed representations of text |
description |
In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-03-04T15:03:33Z 2016-01-01T00:00:00Z 2016 2019-03-28T16:29:57Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10071/11022 |
url |
http://hdl.handle.net/10071/11022 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0950-7051 10.1016/j.knosys.2015.11.005 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science BV |
publisher.none.fl_str_mv |
Elsevier Science BV |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134749803937792 |