On URL and content persistence
Autor(a) principal: | |
---|---|
Data de Publicação: | 2005 |
Outros Autores: | |
Tipo de documento: | Relatório |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/14153 |
Resumo: | This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic distribution. We characterized persistent URLs and contents, and identified reasons for URL death. We found that lasting contents tend to be referenced by different URLs during their lifetime. On the other hand, half of the contents referenced by persistent URLs did not change |
id |
RCAP_cffcf6dc9ccdfd1c85d537cc6a2048ce |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/14153 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
On URL and content persistenceURL persistencecontent persistencetombaThis report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic distribution. We characterized persistent URLs and contents, and identified reasons for URL death. We found that lasting contents tend to be referenced by different URLs during their lifetime. On the other hand, half of the contents referenced by persistent URLs did not changeDepartment of Informatics, University of LisbonRepositório da Universidade de LisboaGomes, DanielSilva, Mário J.2009-02-10T13:12:02Z2005-122005-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/reportapplication/pdfhttp://hdl.handle.net/10451/14153porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:47Zoai:repositorio.ul.pt:10451/14153Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:35:59.785756Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
On URL and content persistence |
title |
On URL and content persistence |
spellingShingle |
On URL and content persistence Gomes, Daniel URL persistence content persistence tomba |
title_short |
On URL and content persistence |
title_full |
On URL and content persistence |
title_fullStr |
On URL and content persistence |
title_full_unstemmed |
On URL and content persistence |
title_sort |
On URL and content persistence |
author |
Gomes, Daniel |
author_facet |
Gomes, Daniel Silva, Mário J. |
author_role |
author |
author2 |
Silva, Mário J. |
author2_role |
author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Gomes, Daniel Silva, Mário J. |
dc.subject.por.fl_str_mv |
URL persistence content persistence tomba |
topic |
URL persistence content persistence tomba |
description |
This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic distribution. We characterized persistent URLs and contents, and identified reasons for URL death. We found that lasting contents tend to be referenced by different URLs during their lifetime. On the other hand, half of the contents referenced by persistent URLs did not change |
publishDate |
2005 |
dc.date.none.fl_str_mv |
2005-12 2005-12-01T00:00:00Z 2009-02-10T13:12:02Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/report |
format |
report |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/14153 |
url |
http://hdl.handle.net/10451/14153 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Department of Informatics, University of Lisbon |
publisher.none.fl_str_mv |
Department of Informatics, University of Lisbon |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134258566004736 |