Genealogical trees on the web : a search engine user perspective.

Detalhes bibliográficos
Autor(a) principal: Yates, Ricardo Baeza
Data de Publicação: 2008
Outros Autores: Pereira Junior, Álvaro Rodrigues, Ziviani, Nivio
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UFOP
dARK ID: ark:/61566/001300000283n
Texto Completo: http://www.repositorio.ufop.br/handle/123456789/1676
Resumo: This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents.
id UFOP_9e799d1a66015b59144ddc8d0ca3ed28
oai_identifier_str oai:repositorio.ufop.br:123456789/1676
network_acronym_str UFOP
network_name_str Repositório Institucional da UFOP
repository_id_str 3233
spelling Genealogical trees on the web : a search engine user perspective.WebTextContent evolutionSearch engineWeb miningThis paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents.2012-10-18T19:01:55Z2012-10-18T19:01:55Z2008info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectapplication/pdfYATES, R. B.; PEREIRA JÚNIOR, A. R.; ZIVIANI, N. Genealogical trees on the web : a search engine user perspective. In. 17th International World Wide Web Conference, 17,. 2008. Beijing. Anais... Beijing: International World Wide Web Conference, 2008. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/www08.pdf>. Acesso em: 18 out. 2012.http://www.repositorio.ufop.br/handle/123456789/1676ark:/61566/001300000283nYates, Ricardo BaezaPereira Junior, Álvaro RodriguesZiviani, Nivioengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOPinfo:eu-repo/semantics/openAccess2024-11-10T14:35:09Zoai:repositorio.ufop.br:123456789/1676Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332024-11-10T14:35:09Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false
dc.title.none.fl_str_mv Genealogical trees on the web : a search engine user perspective.
title Genealogical trees on the web : a search engine user perspective.
spellingShingle Genealogical trees on the web : a search engine user perspective.
Yates, Ricardo Baeza
Web
Text
Content evolution
Search engine
Web mining
title_short Genealogical trees on the web : a search engine user perspective.
title_full Genealogical trees on the web : a search engine user perspective.
title_fullStr Genealogical trees on the web : a search engine user perspective.
title_full_unstemmed Genealogical trees on the web : a search engine user perspective.
title_sort Genealogical trees on the web : a search engine user perspective.
author Yates, Ricardo Baeza
author_facet Yates, Ricardo Baeza
Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
author_role author
author2 Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
author2_role author
author
dc.contributor.author.fl_str_mv Yates, Ricardo Baeza
Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
dc.subject.por.fl_str_mv Web
Text
Content evolution
Search engine
Web mining
topic Web
Text
Content evolution
Search engine
Web mining
description This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents.
publishDate 2008
dc.date.none.fl_str_mv 2008
2012-10-18T19:01:55Z
2012-10-18T19:01:55Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv YATES, R. B.; PEREIRA JÚNIOR, A. R.; ZIVIANI, N. Genealogical trees on the web : a search engine user perspective. In. 17th International World Wide Web Conference, 17,. 2008. Beijing. Anais... Beijing: International World Wide Web Conference, 2008. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/www08.pdf>. Acesso em: 18 out. 2012.
http://www.repositorio.ufop.br/handle/123456789/1676
dc.identifier.dark.fl_str_mv ark:/61566/001300000283n
identifier_str_mv YATES, R. B.; PEREIRA JÚNIOR, A. R.; ZIVIANI, N. Genealogical trees on the web : a search engine user perspective. In. 17th International World Wide Web Conference, 17,. 2008. Beijing. Anais... Beijing: International World Wide Web Conference, 2008. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/www08.pdf>. Acesso em: 18 out. 2012.
ark:/61566/001300000283n
url http://www.repositorio.ufop.br/handle/123456789/1676
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFOP
instname:Universidade Federal de Ouro Preto (UFOP)
instacron:UFOP
instname_str Universidade Federal de Ouro Preto (UFOP)
instacron_str UFOP
institution UFOP
reponame_str Repositório Institucional da UFOP
collection Repositório Institucional da UFOP
repository.name.fl_str_mv Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)
repository.mail.fl_str_mv repositorio@ufop.edu.br
_version_ 1817705746344706048