Comparison of existing open-source tools for Web crawling and indexing of free Music
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013 https://ciencia.iscte-iul.pt/public/pub/id/14731 http://hdl.handle.net/10071/7329 |
Resumo: | This paper presents a portrait of existing open-source web crawlers tools that also have an indexing component. The goal is to understand what tool is best suited to crawl and index a large collection of music MP3 files freely available in the Internet. In this study each piece of software is briefly described, with an overview, identification of some users, and their main advantages and disadvantages. In order to better understand the most significant differences between the different tools a resume of features like: programming language in which they are written, the platform used for deployment, the type of index used, database integration, front-end capabilities, existence of a plugin system, MP3 and Adobe Flash (SWF files) parsing support, is presented. Finally the tools were classified according to the prospected collection size, being divided into tools to mirror small collections, medium and large collections with software capable of handling large amounts of data. In conclusion, an assessment on which tools are best suited to handle large collections in a distributed way is made. |
id |
RCAP_bba0c967173364f5254720c2d98a7acd |
---|---|
oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/7329 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Comparison of existing open-source tools for Web crawling and indexing of free MusicContent Analysis and IndexingInformation Storage and RetrievalInformation FilteringRetrieval ProcessSelection ProcessOpen SourceCreative CommonsMusicMP3.This paper presents a portrait of existing open-source web crawlers tools that also have an indexing component. The goal is to understand what tool is best suited to crawl and index a large collection of music MP3 files freely available in the Internet. In this study each piece of software is briefly described, with an overview, identification of some users, and their main advantages and disadvantages. In order to better understand the most significant differences between the different tools a resume of features like: programming language in which they are written, the platform used for deployment, the type of index used, database integration, front-end capabilities, existence of a plugin system, MP3 and Adobe Flash (SWF files) parsing support, is presented. Finally the tools were classified according to the prospected collection size, being divided into tools to mirror small collections, medium and large collections with software capable of handling large amounts of data. In conclusion, an assessment on which tools are best suited to handle large collections in a distributed way is made.Journal of Telecommunications2014-05-22T10:56:03Z2013-01-01T00:00:00Z20132014-05-22T10:54:10Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013https://ciencia.iscte-iul.pt/public/pub/id/14731http://hdl.handle.net/10071/7329eng2042-8839Serrão, C.Ricardo, A.info:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:52:09Zoai:repositorio.iscte-iul.pt:10071/7329Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:25:57.690267Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Comparison of existing open-source tools for Web crawling and indexing of free Music |
title |
Comparison of existing open-source tools for Web crawling and indexing of free Music |
spellingShingle |
Comparison of existing open-source tools for Web crawling and indexing of free Music Serrão, C. Content Analysis and Indexing Information Storage and Retrieval Information Filtering Retrieval Process Selection Process Open Source Creative Commons Music MP3. |
title_short |
Comparison of existing open-source tools for Web crawling and indexing of free Music |
title_full |
Comparison of existing open-source tools for Web crawling and indexing of free Music |
title_fullStr |
Comparison of existing open-source tools for Web crawling and indexing of free Music |
title_full_unstemmed |
Comparison of existing open-source tools for Web crawling and indexing of free Music |
title_sort |
Comparison of existing open-source tools for Web crawling and indexing of free Music |
author |
Serrão, C. |
author_facet |
Serrão, C. Ricardo, A. |
author_role |
author |
author2 |
Ricardo, A. |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Serrão, C. Ricardo, A. |
dc.subject.por.fl_str_mv |
Content Analysis and Indexing Information Storage and Retrieval Information Filtering Retrieval Process Selection Process Open Source Creative Commons Music MP3. |
topic |
Content Analysis and Indexing Information Storage and Retrieval Information Filtering Retrieval Process Selection Process Open Source Creative Commons Music MP3. |
description |
This paper presents a portrait of existing open-source web crawlers tools that also have an indexing component. The goal is to understand what tool is best suited to crawl and index a large collection of music MP3 files freely available in the Internet. In this study each piece of software is briefly described, with an overview, identification of some users, and their main advantages and disadvantages. In order to better understand the most significant differences between the different tools a resume of features like: programming language in which they are written, the platform used for deployment, the type of index used, database integration, front-end capabilities, existence of a plugin system, MP3 and Adobe Flash (SWF files) parsing support, is presented. Finally the tools were classified according to the prospected collection size, being divided into tools to mirror small collections, medium and large collections with software capable of handling large amounts of data. In conclusion, an assessment on which tools are best suited to handle large collections in a distributed way is made. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-01-01T00:00:00Z 2013 2014-05-22T10:56:03Z 2014-05-22T10:54:10Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013 https://ciencia.iscte-iul.pt/public/pub/id/14731 http://hdl.handle.net/10071/7329 |
url |
https://sites.google.com/site/journaloftelecommunications/volume-18-issue-1-january-2013 https://ciencia.iscte-iul.pt/public/pub/id/14731 http://hdl.handle.net/10071/7329 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2042-8839 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/embargoedAccess |
eu_rights_str_mv |
embargoedAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Journal of Telecommunications |
publisher.none.fl_str_mv |
Journal of Telecommunications |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134822654803968 |