The evolution of web content and search engines.

Detalhes bibliográficos
Autor(a) principal: Yates, Ricardo Baeza
Data de Publicação: 2006
Outros Autores: Pereira Junior, Álvaro Rodrigues, Ziviani, Nivio
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UFOP
Texto Completo: http://www.repositorio.ufop.br/handle/123456789/1677
Resumo: The evolution of web content and search engines The Web grows at a fast pace and little is known about how new content is generated. The objective of this paper is to study the dynamics of content evolution in the Web, giv-ing answers to questions like: How much new content has evolved from the Web old content? How much of the Web content is biased by ranking algorithms of search engines? We used four snapshots of the Chilean Web containing documents of all the Chilean primary domains, crawled in four distinct periods of time. If a page in a newer snapshot has content of a page in an older snapshot, we say that the source is a parent of the new page. Our hypothesis is that when pages have parents, in a portion of pages there was a query that related the parents and made possible the creation of the new page. Thus, part of the Web content is biased by the ranking function of search engines. We also de¯ne a genealogical tree for the Web, where many pages are new and do not have parents and others have one or more parents. We present the Chilean Web genealogical tree and study its components. To the best of our knowledge this is the ¯rst paper that studies how old content is used to create new content, relating a search engine ranking algorithm with the creation of new pages.
id UFOP_8a6e37486b550e5405d026f47ef8098b
oai_identifier_str oai:localhost:123456789/1677
network_acronym_str UFOP
network_name_str Repositório Institucional da UFOP
repository_id_str 3233
spelling Yates, Ricardo BaezaPereira Junior, Álvaro RodriguesZiviani, Nivio2012-10-18T19:15:45Z2012-10-18T19:15:45Z2006YATES, R. B.; PEREIRA JUNIOR, A. R.; ZIVIANI, N. The evolution of web content and search engines. In. 8th ACM Workshop on Web Mining and Web Usage Analysis,8. 2006. Philadelphia. Anais... Philadelphia: ACM Workshop on Web Mining and Web Usage Analysis, 2006. v. 1. Disponível em: <http://webmining.spd.louisville.edu/webkdd06/papers/paper-7-The%20evolution%20of%20Web-Alvaro-Baeza-Final.pdf>. Acesso em: 18 out. 2012.http://www.repositorio.ufop.br/handle/123456789/1677The evolution of web content and search engines The Web grows at a fast pace and little is known about how new content is generated. The objective of this paper is to study the dynamics of content evolution in the Web, giv-ing answers to questions like: How much new content has evolved from the Web old content? How much of the Web content is biased by ranking algorithms of search engines? We used four snapshots of the Chilean Web containing documents of all the Chilean primary domains, crawled in four distinct periods of time. If a page in a newer snapshot has content of a page in an older snapshot, we say that the source is a parent of the new page. Our hypothesis is that when pages have parents, in a portion of pages there was a query that related the parents and made possible the creation of the new page. Thus, part of the Web content is biased by the ranking function of search engines. We also de¯ne a genealogical tree for the Web, where many pages are new and do not have parents and others have one or more parents. We present the Chilean Web genealogical tree and study its components. To the best of our knowledge this is the ¯rst paper that studies how old content is used to create new content, relating a search engine ranking algorithm with the creation of new pages.The evolution of web content and search engines.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOPinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://www.repositorio.ufop.br/bitstream/123456789/1677/5/license.txt8a4605be74aa9ea9d79846c1fba20a33MD55ORIGINALEVENTO_EvolutionContentSearch.pdfEVENTO_EvolutionContentSearch.pdfapplication/pdf523686http://www.repositorio.ufop.br/bitstream/123456789/1677/1/EVENTO_EvolutionContentSearch.pdfca7a7d3971c7e53ba69f45cbf54cd160MD51123456789/16772017-01-05 08:44:24.122oai:localhost:123456789/1677Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332017-01-05T13:44:24Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false
dc.title.pt_BR.fl_str_mv The evolution of web content and search engines.
title The evolution of web content and search engines.
spellingShingle The evolution of web content and search engines.
Yates, Ricardo Baeza
title_short The evolution of web content and search engines.
title_full The evolution of web content and search engines.
title_fullStr The evolution of web content and search engines.
title_full_unstemmed The evolution of web content and search engines.
title_sort The evolution of web content and search engines.
author Yates, Ricardo Baeza
author_facet Yates, Ricardo Baeza
Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
author_role author
author2 Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
author2_role author
author
dc.contributor.author.fl_str_mv Yates, Ricardo Baeza
Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
description The evolution of web content and search engines The Web grows at a fast pace and little is known about how new content is generated. The objective of this paper is to study the dynamics of content evolution in the Web, giv-ing answers to questions like: How much new content has evolved from the Web old content? How much of the Web content is biased by ranking algorithms of search engines? We used four snapshots of the Chilean Web containing documents of all the Chilean primary domains, crawled in four distinct periods of time. If a page in a newer snapshot has content of a page in an older snapshot, we say that the source is a parent of the new page. Our hypothesis is that when pages have parents, in a portion of pages there was a query that related the parents and made possible the creation of the new page. Thus, part of the Web content is biased by the ranking function of search engines. We also de¯ne a genealogical tree for the Web, where many pages are new and do not have parents and others have one or more parents. We present the Chilean Web genealogical tree and study its components. To the best of our knowledge this is the ¯rst paper that studies how old content is used to create new content, relating a search engine ranking algorithm with the creation of new pages.
publishDate 2006
dc.date.issued.fl_str_mv 2006
dc.date.accessioned.fl_str_mv 2012-10-18T19:15:45Z
dc.date.available.fl_str_mv 2012-10-18T19:15:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.citation.fl_str_mv YATES, R. B.; PEREIRA JUNIOR, A. R.; ZIVIANI, N. The evolution of web content and search engines. In. 8th ACM Workshop on Web Mining and Web Usage Analysis,8. 2006. Philadelphia. Anais... Philadelphia: ACM Workshop on Web Mining and Web Usage Analysis, 2006. v. 1. Disponível em: <http://webmining.spd.louisville.edu/webkdd06/papers/paper-7-The%20evolution%20of%20Web-Alvaro-Baeza-Final.pdf>. Acesso em: 18 out. 2012.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufop.br/handle/123456789/1677
identifier_str_mv YATES, R. B.; PEREIRA JUNIOR, A. R.; ZIVIANI, N. The evolution of web content and search engines. In. 8th ACM Workshop on Web Mining and Web Usage Analysis,8. 2006. Philadelphia. Anais... Philadelphia: ACM Workshop on Web Mining and Web Usage Analysis, 2006. v. 1. Disponível em: <http://webmining.spd.louisville.edu/webkdd06/papers/paper-7-The%20evolution%20of%20Web-Alvaro-Baeza-Final.pdf>. Acesso em: 18 out. 2012.
url http://www.repositorio.ufop.br/handle/123456789/1677
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFOP
instname:Universidade Federal de Ouro Preto (UFOP)
instacron:UFOP
instname_str Universidade Federal de Ouro Preto (UFOP)
instacron_str UFOP
institution UFOP
reponame_str Repositório Institucional da UFOP
collection Repositório Institucional da UFOP
bitstream.url.fl_str_mv http://www.repositorio.ufop.br/bitstream/123456789/1677/5/license.txt
http://www.repositorio.ufop.br/bitstream/123456789/1677/1/EVENTO_EvolutionContentSearch.pdf
bitstream.checksum.fl_str_mv 8a4605be74aa9ea9d79846c1fba20a33
ca7a7d3971c7e53ba69f45cbf54cd160
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)
repository.mail.fl_str_mv repositorio@ufop.edu.br
_version_ 1801685731843768320