Semantic Similarity Match for Data Quality

Martins, Fernando; Falcão, André; Couto, Francisco M.

Semantic Similarity Match for Data Quality

Detalhes bibliográficos
Autor(a) principal:	Martins, Fernando
Data de Publicação:	2007
Outros Autores:	Falcão, André, Couto, Francisco M.
Tipo de documento:	Relatório
Idioma:	por
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10451/14158
Resumo:	Data quality is a critical aspect of applications that support business operations. Often entities are represented more than once in data repositories. Since duplicate records do not share a common key, they are hard to detect. Duplicate detection over text is usually performed using lexical approaches, which do not capture text sense. The difficulties increase when the duplicate detection must be performed using the text sense. This work presents a semantic similarity approach, based on a text sense matching mechanism, that performs the detection of text units which are similar in sense. The goal of the proposed semantic similarity approach is therefore to perform the duplicate detection task in a data quality process

Metadados do item

id	RCAP_33a12691348cb22b9dda534961712cbc
oai_identifier_str	oai:repositorio.ul.pt:10451/14158
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Semantic Similarity Match for Data Qualitysemantic similaritydata cleaningdata qualitywordnetsimilarity matchData quality is a critical aspect of applications that support business operations. Often entities are represented more than once in data repositories. Since duplicate records do not share a common key, they are hard to detect. Duplicate detection over text is usually performed using lexical approaches, which do not capture text sense. The difficulties increase when the duplicate detection must be performed using the text sense. This work presents a semantic similarity approach, based on a text sense matching mechanism, that performs the detection of text units which are similar in sense. The goal of the proposed semantic similarity approach is therefore to perform the duplicate detection task in a data quality processDepartment of Informatics, University of LisbonRepositório da Universidade de LisboaMartins, FernandoFalcão, AndréCouto, Francisco M.2009-02-10T13:12:03Z2007-102007-10-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/reportapplication/pdfhttp://hdl.handle.net/10451/14158porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:48Zoai:repositorio.ul.pt:10451/14158Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:36:00.010636Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Semantic Similarity Match for Data Quality
title	Semantic Similarity Match for Data Quality
spellingShingle	Semantic Similarity Match for Data Quality Martins, Fernando semantic similarity data cleaning data quality wordnet similarity match
title_short	Semantic Similarity Match for Data Quality
title_full	Semantic Similarity Match for Data Quality
title_fullStr	Semantic Similarity Match for Data Quality
title_full_unstemmed	Semantic Similarity Match for Data Quality
title_sort	Semantic Similarity Match for Data Quality
author	Martins, Fernando
author_facet	Martins, Fernando Falcão, André Couto, Francisco M.
author_role	author
author2	Falcão, André Couto, Francisco M.
author2_role	author author
dc.contributor.none.fl_str_mv	Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv	Martins, Fernando Falcão, André Couto, Francisco M.
dc.subject.por.fl_str_mv	semantic similarity data cleaning data quality wordnet similarity match
topic	semantic similarity data cleaning data quality wordnet similarity match
description	Data quality is a critical aspect of applications that support business operations. Often entities are represented more than once in data repositories. Since duplicate records do not share a common key, they are hard to detect. Duplicate detection over text is usually performed using lexical approaches, which do not capture text sense. The difficulties increase when the duplicate detection must be performed using the text sense. This work presents a semantic similarity approach, based on a text sense matching mechanism, that performs the detection of text units which are similar in sense. The goal of the proposed semantic similarity approach is therefore to perform the duplicate detection task in a data quality process
publishDate	2007
dc.date.none.fl_str_mv	2007-10 2007-10-01T00:00:00Z 2009-02-10T13:12:03Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/report
format	report
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10451/14158
url	http://hdl.handle.net/10451/14158
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Department of Informatics, University of Lisbon
publisher.none.fl_str_mv	Department of Informatics, University of Lisbon
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134258574393344

Semantic Similarity Match for Data Quality

Registros relacionados