End-to-end system for extracting and interpreting testual information of interest from identify documents images.

Gutiérrez Menéndez, José Carlos

End-to-end system for extracting and interpreting testual information of interest from identify documents images.

Detalhes bibliográficos
Autor(a) principal:	Gutiérrez Menéndez, José Carlos
Data de Publicação:	2019
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	http://www.teses.usp.br/teses/disponiveis/3/3141/tde-19112019-100543/
Resumo:	Identity documents (ID) are one of the primary sources for obtaining information about a citizen. The center of many applications within the administrative and service sectors is the extraction of the data contained in ID cards. Therefore, in this research is proposed the implementation of an automated system able to extract and interpret the textual information from identity documents images. The proposed end-to-end system allows the automation of a registration or verification process that requires the acquisition of information about a citizen using his identity documents. The system obtained through this research is considered as an end-to-end system since it covers every stage of the information of interest extraction process from IDs images. Different to the template-based systems, the proposed system uses a semantic attribution algorithm that allows to classify and attribute meaning to the information from IDs according to its semantics. This research is the first comprehensive description of a complete information extraction system to process IDs that describes from image processing to named entity recognition. To evaluate the performance of the research were proposed different metrics based on the internal functions of the system. The final evaluation shows satisfactory results showing that the end-to-end system is capable of extracting and interpreting textual information from identity documents images without prior knowledge of their layouts.

Metadados do item

id	USP_721d9e2151f3dc7ff220e821919463e8
oai_identifier_str	oai:teses.usp.br:tde-19112019-100543
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	End-to-end system for extracting and interpreting testual information of interest from identify documents images.Sistema de ponta a ponta para extração e interpretação das informações de interesse textuais a partir de imagens de documentos de identidade.DocumentosIdentificaçãoIdentification documentsNamed entity recognition and classificationReconhecimento de textoReconhecimento e classificação da entidade nomeadaText recognitionIdentity documents (ID) are one of the primary sources for obtaining information about a citizen. The center of many applications within the administrative and service sectors is the extraction of the data contained in ID cards. Therefore, in this research is proposed the implementation of an automated system able to extract and interpret the textual information from identity documents images. The proposed end-to-end system allows the automation of a registration or verification process that requires the acquisition of information about a citizen using his identity documents. The system obtained through this research is considered as an end-to-end system since it covers every stage of the information of interest extraction process from IDs images. Different to the template-based systems, the proposed system uses a semantic attribution algorithm that allows to classify and attribute meaning to the information from IDs according to its semantics. This research is the first comprehensive description of a complete information extraction system to process IDs that describes from image processing to named entity recognition. To evaluate the performance of the research were proposed different metrics based on the internal functions of the system. The final evaluation shows satisfactory results showing that the end-to-end system is capable of extracting and interpreting textual information from identity documents images without prior knowledge of their layouts.Os documentos de identidade (ID) são uma das principais fontes para obter informações sobre um cidadão. O centro de muitas aplicações nos setores administrativos e de serviços é a extração dos dados contidos nos cartões de identificação. Portanto, nesta pesquisa é proposta a implementação de um sistema automatizado capaz de extrair e interpretar as informações textuais a partir de imagens de documentos de identidade. O sistema de ponta a ponta proposto permite a automação de um processo de registro ou verificação que requer a aquisição de informações sobre um cidadão usando seus documentos de identidade. O sistema obtido através desta pesquisa é considerado como um sistema de ponta a ponta, uma vez que abrange todas as etapas do processo de extração das informações de interesse a partir de imagens de IDs. Diferente dos sistemas baseados em modelos, o um algoritmo de atribuição semântica que permite classificar e atribuir significado às informações dos IDs baseado nas semânticas destas. Esta pesquisa é a primeira descrição abrangente de um sistema completo de extração de informações para processar IDs que descreve desde o processamento da imagem até o reconhecimento da entidade nomeada. Para avaliar o desempenho da pesquisa, foram propostas diferentes métricas baseadas nas funções internas do sistema. A avaliação final mostra resultados satisfatórios indicando que o sistema de ponta a ponta é capaz de extrair e interpretar informações textuais de imagens de documentos de identidade sem conhecimento prévio de seus layouts.Biblioteca Digitais de Teses e Dissertações da USPBressan, GraçaGutiérrez Menéndez, José Carlos2019-05-14info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/3/3141/tde-19112019-100543/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-10-09T12:45:41Zoai:teses.usp.br:tde-19112019-100543Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212024-10-09T12:45:41Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	End-to-end system for extracting and interpreting testual information of interest from identify documents images. Sistema de ponta a ponta para extração e interpretação das informações de interesse textuais a partir de imagens de documentos de identidade.
title	End-to-end system for extracting and interpreting testual information of interest from identify documents images.
spellingShingle	End-to-end system for extracting and interpreting testual information of interest from identify documents images. Gutiérrez Menéndez, José Carlos Documentos Identificação Identification documents Named entity recognition and classification Reconhecimento de texto Reconhecimento e classificação da entidade nomeada Text recognition
title_short	End-to-end system for extracting and interpreting testual information of interest from identify documents images.
title_full	End-to-end system for extracting and interpreting testual information of interest from identify documents images.
title_fullStr	End-to-end system for extracting and interpreting testual information of interest from identify documents images.
title_full_unstemmed	End-to-end system for extracting and interpreting testual information of interest from identify documents images.
title_sort	End-to-end system for extracting and interpreting testual information of interest from identify documents images.
author	Gutiérrez Menéndez, José Carlos
author_facet	Gutiérrez Menéndez, José Carlos
author_role	author
dc.contributor.none.fl_str_mv	Bressan, Graça
dc.contributor.author.fl_str_mv	Gutiérrez Menéndez, José Carlos
dc.subject.por.fl_str_mv	Documentos Identificação Identification documents Named entity recognition and classification Reconhecimento de texto Reconhecimento e classificação da entidade nomeada Text recognition
topic	Documentos Identificação Identification documents Named entity recognition and classification Reconhecimento de texto Reconhecimento e classificação da entidade nomeada Text recognition
description	Identity documents (ID) are one of the primary sources for obtaining information about a citizen. The center of many applications within the administrative and service sectors is the extraction of the data contained in ID cards. Therefore, in this research is proposed the implementation of an automated system able to extract and interpret the textual information from identity documents images. The proposed end-to-end system allows the automation of a registration or verification process that requires the acquisition of information about a citizen using his identity documents. The system obtained through this research is considered as an end-to-end system since it covers every stage of the information of interest extraction process from IDs images. Different to the template-based systems, the proposed system uses a semantic attribution algorithm that allows to classify and attribute meaning to the information from IDs according to its semantics. This research is the first comprehensive description of a complete information extraction system to process IDs that describes from image processing to named entity recognition. To evaluate the performance of the research were proposed different metrics based on the internal functions of the system. The final evaluation shows satisfactory results showing that the end-to-end system is capable of extracting and interpreting textual information from identity documents images without prior knowledge of their layouts.
publishDate	2019
dc.date.none.fl_str_mv	2019-05-14
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://www.teses.usp.br/teses/disponiveis/3/3141/tde-19112019-100543/
url	http://www.teses.usp.br/teses/disponiveis/3/3141/tde-19112019-100543/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815256520718286848

End-to-end system for extracting and interpreting testual information of interest from identify documents images.

Registros relacionados