Collecting Statistics about the Portuguese Web
Autor(a) principal: | |
---|---|
Data de Publicação: | 2003 |
Outros Autores: | |
Tipo de documento: | Relatório |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/14211 |
Resumo: | This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the results |
id |
RCAP_825f66214581aa0d6df3c23d9e6e5ca7 |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/14211 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Collecting Statistics about the Portuguese WebWebcharacterizationPortuguesePortugaltumba!statisticscrawlingThis report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the resultsDepartment of Informatics, University of LisbonRepositório da Universidade de LisboaGomes, DanielSilva, Mário J.2009-02-10T13:11:41Z2003-062003-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/reportapplication/pdfhttp://hdl.handle.net/10451/14211porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:53Zoai:repositorio.ul.pt:10451/14211Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:36:02.438029Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Collecting Statistics about the Portuguese Web |
title |
Collecting Statistics about the Portuguese Web |
spellingShingle |
Collecting Statistics about the Portuguese Web Gomes, Daniel Web characterization Portuguese Portugal tumba! statistics crawling |
title_short |
Collecting Statistics about the Portuguese Web |
title_full |
Collecting Statistics about the Portuguese Web |
title_fullStr |
Collecting Statistics about the Portuguese Web |
title_full_unstemmed |
Collecting Statistics about the Portuguese Web |
title_sort |
Collecting Statistics about the Portuguese Web |
author |
Gomes, Daniel |
author_facet |
Gomes, Daniel Silva, Mário J. |
author_role |
author |
author2 |
Silva, Mário J. |
author2_role |
author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Gomes, Daniel Silva, Mário J. |
dc.subject.por.fl_str_mv |
Web characterization Portuguese Portugal tumba! statistics crawling |
topic |
Web characterization Portuguese Portugal tumba! statistics crawling |
description |
This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the results |
publishDate |
2003 |
dc.date.none.fl_str_mv |
2003-06 2003-06-01T00:00:00Z 2009-02-10T13:11:41Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/report |
format |
report |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/14211 |
url |
http://hdl.handle.net/10451/14211 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Department of Informatics, University of Lisbon |
publisher.none.fl_str_mv |
Department of Informatics, University of Lisbon |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134259327270912 |