A commodity platform for Distributed Data Mining - the HARVARD System

Detalhes bibliográficos
Autor(a) principal: Ruy Ramos
Data de Publicação: 2006
Outros Autores: Rui Camacho, Pedro Souto
Tipo de documento: Livro
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://repositorio-aberto.up.pt/handle/10216/73310
Resumo: Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.
id RCAP_b96626d7978d8b146da15d269a2c0e70
oai_identifier_str oai:repositorio-aberto.up.pt:10216/73310
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A commodity platform for Distributed Data Mining - the HARVARD SystemEngenharia de computadores, Engenharia electrotécnica, electrónica e informáticaComputer engineering, Electrical engineering, Electronic engineering, Information engineeringSystems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.20062006-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/73310engRuy RamosRui CamachoPedro Soutoinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:29:19Zoai:repositorio-aberto.up.pt:10216/73310Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:02:20.319421Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A commodity platform for Distributed Data Mining - the HARVARD System
title A commodity platform for Distributed Data Mining - the HARVARD System
spellingShingle A commodity platform for Distributed Data Mining - the HARVARD System
Ruy Ramos
Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
title_short A commodity platform for Distributed Data Mining - the HARVARD System
title_full A commodity platform for Distributed Data Mining - the HARVARD System
title_fullStr A commodity platform for Distributed Data Mining - the HARVARD System
title_full_unstemmed A commodity platform for Distributed Data Mining - the HARVARD System
title_sort A commodity platform for Distributed Data Mining - the HARVARD System
author Ruy Ramos
author_facet Ruy Ramos
Rui Camacho
Pedro Souto
author_role author
author2 Rui Camacho
Pedro Souto
author2_role author
author
dc.contributor.author.fl_str_mv Ruy Ramos
Rui Camacho
Pedro Souto
dc.subject.por.fl_str_mv Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
topic Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
description Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.
publishDate 2006
dc.date.none.fl_str_mv 2006
2006-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/book
format book
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio-aberto.up.pt/handle/10216/73310
url https://repositorio-aberto.up.pt/handle/10216/73310
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135947049140224