Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
Autor(a) principal: | |
---|---|
Data de Publicação: | 2011 |
Outros Autores: | , , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1109/PDCAT.2011.58 http://hdl.handle.net/11449/72860 |
Resumo: | Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE. |
id |
UNSP_2f5f94bc2908ce2ce1c57b724f0bdee8 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/72860 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreadingAlgorithmData cleansingDuplicated tuplesData cleaningKnowledge discovery in databaseMissing valuesMulti-threadingNull valueDatabase systemsLinguisticsOptimizationAlgorithmsAiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.Depto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio PretoDepartamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio PretoDepto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio PretoDepartamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio PretoUniversidade Estadual Paulista (Unesp)De Andrade, Tiago Luís [UNESP]De Souza, Rogéria Cristiane Gratão [UNESP]Babini, Maurizio [UNESP]Valêncio, Carlos Roberto [UNESP]2014-05-27T11:26:14Z2014-05-27T11:26:14Z2011-12-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject299-304http://dx.doi.org/10.1109/PDCAT.2011.58Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304.http://hdl.handle.net/11449/7286010.1109/PDCAT.2011.582-s2.0-848566608934644812253875832403506647150341359146517545178640000-0002-9325-31590000-0002-7449-9022Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengParallel and Distributed Computing, Applications and Technologies, PDCAT Proceedingsinfo:eu-repo/semantics/openAccess2021-10-23T10:10:59Zoai:repositorio.unesp.br:11449/72860Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462021-10-23T10:10:59Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading |
title |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading |
spellingShingle |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading De Andrade, Tiago Luís [UNESP] Algorithm Data cleansing Duplicated tuples Data cleaning Knowledge discovery in database Missing values Multi-threading Null value Database systems Linguistics Optimization Algorithms |
title_short |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading |
title_full |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading |
title_fullStr |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading |
title_full_unstemmed |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading |
title_sort |
Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading |
author |
De Andrade, Tiago Luís [UNESP] |
author_facet |
De Andrade, Tiago Luís [UNESP] De Souza, Rogéria Cristiane Gratão [UNESP] Babini, Maurizio [UNESP] Valêncio, Carlos Roberto [UNESP] |
author_role |
author |
author2 |
De Souza, Rogéria Cristiane Gratão [UNESP] Babini, Maurizio [UNESP] Valêncio, Carlos Roberto [UNESP] |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (Unesp) |
dc.contributor.author.fl_str_mv |
De Andrade, Tiago Luís [UNESP] De Souza, Rogéria Cristiane Gratão [UNESP] Babini, Maurizio [UNESP] Valêncio, Carlos Roberto [UNESP] |
dc.subject.por.fl_str_mv |
Algorithm Data cleansing Duplicated tuples Data cleaning Knowledge discovery in database Missing values Multi-threading Null value Database systems Linguistics Optimization Algorithms |
topic |
Algorithm Data cleansing Duplicated tuples Data cleaning Knowledge discovery in database Missing values Multi-threading Null value Database systems Linguistics Optimization Algorithms |
description |
Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE. |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011-12-01 2014-05-27T11:26:14Z 2014-05-27T11:26:14Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1109/PDCAT.2011.58 Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304. http://hdl.handle.net/11449/72860 10.1109/PDCAT.2011.58 2-s2.0-84856660893 4644812253875832 4035066471503413 5914651754517864 0000-0002-9325-3159 0000-0002-7449-9022 |
url |
http://dx.doi.org/10.1109/PDCAT.2011.58 http://hdl.handle.net/11449/72860 |
identifier_str_mv |
Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304. 10.1109/PDCAT.2011.58 2-s2.0-84856660893 4644812253875832 4035066471503413 5914651754517864 0000-0002-9325-3159 0000-0002-7449-9022 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
299-304 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1803650182972178432 |