A commodity platform for Distributed Data Mining - the HARVARD System
Autor(a) principal: | |
---|---|
Data de Publicação: | 2006 |
Outros Autores: | , |
Tipo de documento: | Livro |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://repositorio-aberto.up.pt/handle/10216/73310 |
Resumo: | Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool. |
id |
RCAP_b96626d7978d8b146da15d269a2c0e70 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/73310 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A commodity platform for Distributed Data Mining - the HARVARD SystemEngenharia de computadores, Engenharia electrotécnica, electrónica e informáticaComputer engineering, Electrical engineering, Electronic engineering, Information engineeringSystems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.20062006-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/73310engRuy RamosRui CamachoPedro Soutoinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:29:19Zoai:repositorio-aberto.up.pt:10216/73310Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:02:20.319421Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A commodity platform for Distributed Data Mining - the HARVARD System |
title |
A commodity platform for Distributed Data Mining - the HARVARD System |
spellingShingle |
A commodity platform for Distributed Data Mining - the HARVARD System Ruy Ramos Engenharia de computadores, Engenharia electrotécnica, electrónica e informática Computer engineering, Electrical engineering, Electronic engineering, Information engineering |
title_short |
A commodity platform for Distributed Data Mining - the HARVARD System |
title_full |
A commodity platform for Distributed Data Mining - the HARVARD System |
title_fullStr |
A commodity platform for Distributed Data Mining - the HARVARD System |
title_full_unstemmed |
A commodity platform for Distributed Data Mining - the HARVARD System |
title_sort |
A commodity platform for Distributed Data Mining - the HARVARD System |
author |
Ruy Ramos |
author_facet |
Ruy Ramos Rui Camacho Pedro Souto |
author_role |
author |
author2 |
Rui Camacho Pedro Souto |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Ruy Ramos Rui Camacho Pedro Souto |
dc.subject.por.fl_str_mv |
Engenharia de computadores, Engenharia electrotécnica, electrónica e informática Computer engineering, Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia de computadores, Engenharia electrotécnica, electrónica e informática Computer engineering, Electrical engineering, Electronic engineering, Information engineering |
description |
Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool. |
publishDate |
2006 |
dc.date.none.fl_str_mv |
2006 2006-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/book |
format |
book |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio-aberto.up.pt/handle/10216/73310 |
url |
https://repositorio-aberto.up.pt/handle/10216/73310 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135947049140224 |