Comparison of existing open-source tools for Web crawling and indexing of free Music

Detalhes bibliográficos
Autor(a) principal: Serrão, C.
Data de Publicação: 2013
Outros Autores: Ricardo, A.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013
https://ciencia.iscte-iul.pt/public/pub/id/14731
http://hdl.handle.net/10071/7329
Resumo: This paper presents a portrait of existing open-source web crawlers tools that also have an indexing component. The goal is to understand what tool is best suited to crawl and index a large collection of music MP3 files freely available in the Internet. In this study each piece of software is briefly described, with an overview, identification of some users, and their main advantages and disadvantages. In order to better understand the most significant differences between the different tools a resume of features like: programming language in which they are written, the platform used for deployment, the type of index used, database integration, front-end capabilities, existence of a plugin system, MP3 and Adobe Flash (SWF files) parsing support, is presented. Finally the tools were classified according to the prospected collection size, being divided into tools to mirror small collections, medium and large collections with software capable of handling large amounts of data. In conclusion, an assessment on which tools are best suited to handle large collections in a distributed way is made.
id RCAP_bba0c967173364f5254720c2d98a7acd
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/7329
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Comparison of existing open-source tools for Web crawling and indexing of free MusicContent Analysis and IndexingInformation Storage and RetrievalInformation FilteringRetrieval ProcessSelection ProcessOpen SourceCreative CommonsMusicMP3.This paper presents a portrait of existing open-source web crawlers tools that also have an indexing component. The goal is to understand what tool is best suited to crawl and index a large collection of music MP3 files freely available in the Internet. In this study each piece of software is briefly described, with an overview, identification of some users, and their main advantages and disadvantages. In order to better understand the most significant differences between the different tools a resume of features like: programming language in which they are written, the platform used for deployment, the type of index used, database integration, front-end capabilities, existence of a plugin system, MP3 and Adobe Flash (SWF files) parsing support, is presented. Finally the tools were classified according to the prospected collection size, being divided into tools to mirror small collections, medium and large collections with software capable of handling large amounts of data. In conclusion, an assessment on which tools are best suited to handle large collections in a distributed way is made.Journal of Telecommunications2014-05-22T10:56:03Z2013-01-01T00:00:00Z20132014-05-22T10:54:10Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013https://ciencia.iscte-iul.pt/public/pub/id/14731http://hdl.handle.net/10071/7329eng2042-8839Serrão, C.Ricardo, A.info:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:52:09Zoai:repositorio.iscte-iul.pt:10071/7329Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:25:57.690267Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Comparison of existing open-source tools for Web crawling and indexing of free Music
title Comparison of existing open-source tools for Web crawling and indexing of free Music
spellingShingle Comparison of existing open-source tools for Web crawling and indexing of free Music
Serrão, C.
Content Analysis and Indexing
Information Storage and Retrieval
Information Filtering
Retrieval Process
Selection Process
Open Source
Creative Commons
Music
MP3.
title_short Comparison of existing open-source tools for Web crawling and indexing of free Music
title_full Comparison of existing open-source tools for Web crawling and indexing of free Music
title_fullStr Comparison of existing open-source tools for Web crawling and indexing of free Music
title_full_unstemmed Comparison of existing open-source tools for Web crawling and indexing of free Music
title_sort Comparison of existing open-source tools for Web crawling and indexing of free Music
author Serrão, C.
author_facet Serrão, C.
Ricardo, A.
author_role author
author2 Ricardo, A.
author2_role author
dc.contributor.author.fl_str_mv Serrão, C.
Ricardo, A.
dc.subject.por.fl_str_mv Content Analysis and Indexing
Information Storage and Retrieval
Information Filtering
Retrieval Process
Selection Process
Open Source
Creative Commons
Music
MP3.
topic Content Analysis and Indexing
Information Storage and Retrieval
Information Filtering
Retrieval Process
Selection Process
Open Source
Creative Commons
Music
MP3.
description This paper presents a portrait of existing open-source web crawlers tools that also have an indexing component. The goal is to understand what tool is best suited to crawl and index a large collection of music MP3 files freely available in the Internet. In this study each piece of software is briefly described, with an overview, identification of some users, and their main advantages and disadvantages. In order to better understand the most significant differences between the different tools a resume of features like: programming language in which they are written, the platform used for deployment, the type of index used, database integration, front-end capabilities, existence of a plugin system, MP3 and Adobe Flash (SWF files) parsing support, is presented. Finally the tools were classified according to the prospected collection size, being divided into tools to mirror small collections, medium and large collections with software capable of handling large amounts of data. In conclusion, an assessment on which tools are best suited to handle large collections in a distributed way is made.
publishDate 2013
dc.date.none.fl_str_mv 2013-01-01T00:00:00Z
2013
2014-05-22T10:56:03Z
2014-05-22T10:54:10Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013
https://ciencia.iscte-iul.pt/public/pub/id/14731
http://hdl.handle.net/10071/7329
url https://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013
https://ciencia.iscte-iul.pt/public/pub/id/14731
http://hdl.handle.net/10071/7329
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2042-8839
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Journal of Telecommunications
publisher.none.fl_str_mv Journal of Telecommunications
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134822654803968