Collecting Statistics about the Portuguese Web

Detalhes bibliográficos
Autor(a) principal: Gomes, Daniel
Data de Publicação: 2003
Outros Autores: Silva, Mário J.
Tipo de documento: Relatório
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/14211
Resumo: This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the results
id RCAP_825f66214581aa0d6df3c23d9e6e5ca7
oai_identifier_str oai:repositorio.ul.pt:10451/14211
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Collecting Statistics about the Portuguese WebWebcharacterizationPortuguesePortugaltumba!statisticscrawlingThis report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the resultsDepartment of Informatics, University of LisbonRepositório da Universidade de LisboaGomes, DanielSilva, Mário J.2009-02-10T13:11:41Z2003-062003-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/reportapplication/pdfhttp://hdl.handle.net/10451/14211porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:53Zoai:repositorio.ul.pt:10451/14211Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:36:02.438029Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Collecting Statistics about the Portuguese Web
title Collecting Statistics about the Portuguese Web
spellingShingle Collecting Statistics about the Portuguese Web
Gomes, Daniel
Web
characterization
Portuguese
Portugal
tumba!
statistics
crawling
title_short Collecting Statistics about the Portuguese Web
title_full Collecting Statistics about the Portuguese Web
title_fullStr Collecting Statistics about the Portuguese Web
title_full_unstemmed Collecting Statistics about the Portuguese Web
title_sort Collecting Statistics about the Portuguese Web
author Gomes, Daniel
author_facet Gomes, Daniel
Silva, Mário J.
author_role author
author2 Silva, Mário J.
author2_role author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Gomes, Daniel
Silva, Mário J.
dc.subject.por.fl_str_mv Web
characterization
Portuguese
Portugal
tumba!
statistics
crawling
topic Web
characterization
Portuguese
Portugal
tumba!
statistics
crawling
description This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the results
publishDate 2003
dc.date.none.fl_str_mv 2003-06
2003-06-01T00:00:00Z
2009-02-10T13:11:41Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/report
format report
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/14211
url http://hdl.handle.net/10451/14211
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Department of Informatics, University of Lisbon
publisher.none.fl_str_mv Department of Informatics, University of Lisbon
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134259327270912