Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

De Andrade, Tiago Luís [UNESP]; De Souza, Rogéria Cristiane Gratão [UNESP]; Babini, Maurizio [UNESP]; Valêncio, Carlos Roberto [UNESP]

Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

Detalhes bibliográficos
Autor(a) principal:	De Andrade, Tiago Luís [UNESP]
Data de Publicação:	2011
Outros Autores:	De Souza, Rogéria Cristiane Gratão [UNESP], Babini, Maurizio [UNESP], Valêncio, Carlos Roberto [UNESP]
Tipo de documento:	Artigo de conferência
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1109/PDCAT.2011.58 http://hdl.handle.net/11449/72860
Resumo:	Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.

Metadados do item

id	UNSP_2f5f94bc2908ce2ce1c57b724f0bdee8
oai_identifier_str	oai:repositorio.unesp.br:11449/72860
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreadingAlgorithmData cleansingDuplicated tuplesData cleaningKnowledge discovery in databaseMissing valuesMulti-threadingNull valueDatabase systemsLinguisticsOptimizationAlgorithmsAiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.Depto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio PretoDepartamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio PretoDepto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio PretoDepartamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio PretoUniversidade Estadual Paulista (Unesp)De Andrade, Tiago Luís [UNESP]De Souza, Rogéria Cristiane Gratão [UNESP]Babini, Maurizio [UNESP]Valêncio, Carlos Roberto [UNESP]2014-05-27T11:26:14Z2014-05-27T11:26:14Z2011-12-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject299-304http://dx.doi.org/10.1109/PDCAT.2011.58Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304.http://hdl.handle.net/11449/7286010.1109/PDCAT.2011.582-s2.0-848566608934644812253875832403506647150341359146517545178640000-0002-9325-31590000-0002-7449-9022Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengParallel and Distributed Computing, Applications and Technologies, PDCAT Proceedingsinfo:eu-repo/semantics/openAccess2021-10-23T10:10:59Zoai:repositorio.unesp.br:11449/72860Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462021-10-23T10:10:59Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
title	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
spellingShingle	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading De Andrade, Tiago Luís [UNESP] Algorithm Data cleansing Duplicated tuples Data cleaning Knowledge discovery in database Missing values Multi-threading Null value Database systems Linguistics Optimization Algorithms
title_short	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
title_full	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
title_fullStr	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
title_full_unstemmed	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
title_sort	Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading
author	De Andrade, Tiago Luís [UNESP]
author_facet	De Andrade, Tiago Luís [UNESP] De Souza, Rogéria Cristiane Gratão [UNESP] Babini, Maurizio [UNESP] Valêncio, Carlos Roberto [UNESP]
author_role	author
author2	De Souza, Rogéria Cristiane Gratão [UNESP] Babini, Maurizio [UNESP] Valêncio, Carlos Roberto [UNESP]
author2_role	author author author
dc.contributor.none.fl_str_mv	Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv	De Andrade, Tiago Luís [UNESP] De Souza, Rogéria Cristiane Gratão [UNESP] Babini, Maurizio [UNESP] Valêncio, Carlos Roberto [UNESP]
dc.subject.por.fl_str_mv	Algorithm Data cleansing Duplicated tuples Data cleaning Knowledge discovery in database Missing values Multi-threading Null value Database systems Linguistics Optimization Algorithms
topic	Algorithm Data cleansing Duplicated tuples Data cleaning Knowledge discovery in database Missing values Multi-threading Null value Database systems Linguistics Optimization Algorithms
description	Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.
publishDate	2011
dc.date.none.fl_str_mv	2011-12-01 2014-05-27T11:26:14Z 2014-05-27T11:26:14Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/conferenceObject
format	conferenceObject
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1109/PDCAT.2011.58 Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304. http://hdl.handle.net/11449/72860 10.1109/PDCAT.2011.58 2-s2.0-84856660893 4644812253875832 4035066471503413 5914651754517864 0000-0002-9325-3159 0000-0002-7449-9022
url	http://dx.doi.org/10.1109/PDCAT.2011.58 http://hdl.handle.net/11449/72860
identifier_str_mv	Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304. 10.1109/PDCAT.2011.58 2-s2.0-84856660893 4644812253875832 4035066471503413 5914651754517864 0000-0002-9325-3159 0000-0002-7449-9022
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	299-304
dc.source.none.fl_str_mv	Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1803650182972178432

Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

Registros relacionados