Term Weighting based on Document Revision History
Autor(a) principal: | |
---|---|
Data de Publicação: | 2011 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://repositorio.inesctec.pt/handle/123456789/2237 http://dx.doi.org/10.1002/asi.21597 |
Resumo: | In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our [...] |
id |
RCAP_506937006da44b7b97929176eea9ba9c |
---|---|
oai_identifier_str |
oai:repositorio.inesctec.pt:123456789/2237 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Term Weighting based on Document Revision HistoryIn real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our [...]2017-11-16T13:23:31Z2011-01-01T00:00:00Z2011info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/2237http://dx.doi.org/10.1002/asi.21597engSérgio NunesCristina RibeiroGabriel Davidinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:19:54Zoai:repositorio.inesctec.pt:123456789/2237Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:52:23.943902Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Term Weighting based on Document Revision History |
title |
Term Weighting based on Document Revision History |
spellingShingle |
Term Weighting based on Document Revision History Sérgio Nunes |
title_short |
Term Weighting based on Document Revision History |
title_full |
Term Weighting based on Document Revision History |
title_fullStr |
Term Weighting based on Document Revision History |
title_full_unstemmed |
Term Weighting based on Document Revision History |
title_sort |
Term Weighting based on Document Revision History |
author |
Sérgio Nunes |
author_facet |
Sérgio Nunes Cristina Ribeiro Gabriel David |
author_role |
author |
author2 |
Cristina Ribeiro Gabriel David |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Sérgio Nunes Cristina Ribeiro Gabriel David |
description |
In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our [...] |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011-01-01T00:00:00Z 2011 2017-11-16T13:23:31Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.inesctec.pt/handle/123456789/2237 http://dx.doi.org/10.1002/asi.21597 |
url |
http://repositorio.inesctec.pt/handle/123456789/2237 http://dx.doi.org/10.1002/asi.21597 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799131600005365760 |