Finding structured data from text using language models

SILVA, Levy de Souza

Finding structured data from text using language models

Detalhes bibliográficos
Autor(a) principal:	SILVA, Levy de Souza
Data de Publicação:	2023
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Repositório Institucional da UFPE
Texto Completo:	https://repositorio.ufpe.br/handle/123456789/55271
Resumo:	The Internet is a rich source of structured information. From Web Tables to public datasets, there exists a huge corpus of relational data online. Previous studies estimate that over 418M tables, in Hypertext Markup Language (HTML) format, can be found on the Web. Not limited to them, a large number of data repositories also provide ac- cess to thousands of datasets. As a result of that, over the last years, a growing body of work has begun to explore this data for several downstream applications. For example, Web Tables have been widely utilized for the task of Question Answering (QA), whose goal is to retrieve a table that answers a query from a table collection. In the context of datasets, their most popular application is the dataset retrieval task, which aims to find structured datasets for an end-user. The point of intersection for table/dataset re- trieval is that they need to match unstructured queries and relational data, in addition to being a ranking task. Moreover, the core challenge of this task is how to construct a robust matching model for computing this similarity degree. Towards this front, this thesis work is divided into three parts. In the first one, we explore the problem of QA Table Retrieval, in which our goal is to outline the best solutions for this task. In se- quence, we focus on an unexplored news-table matching problem, whose Web Tables are applied to augmenting news stories. Lastly, we concentrate on the dataset retrieval task. Specifically, we summarize our main contributions as follows: (I) we present a novel tax- onomy for table retrieval that classifies the table retrieval methods into five groups, from probabilistic approaches to sophisticated neural networks. Our research also points out that the best results for this task are achieved by using deep neural models, built on top of recurrent networks and convolutional architectures; (II) we introduce a novel atten- tion model based on Bidirectional Encoder Representations from Transformers (BERT) for computing the similarity degree between news stories and Web Tables, in addition to comparing its performance against Information Retrieval (IR) techniques, document/sen- tence encoders, text-matching models, and neural IR approaches. In short, a hypothesis test confirms that our approach outperforms all baselines in terms of the Mean Reciprocal Ranking metric; and (III) we propose Data Augmentation Pipeline for Dataset Retrieval (DAPDR), a solution that leverages Large Language Models (LLMs) to create synthetic questions for dataset descriptions, which are then applied to training supervised retrievers. Finally, we evaluate DAPDR on dataset search benchmarks using a set of dense retrievers, whose main results show that the retrievers tuned in DAPDR statistically outperform the original models at different Normalized Discounted Cumulative Gain (NDCG) levels.

Metadados do item

id	UFPE_60a8f2ed0a73effa7debaf815f502f5d
oai_identifier_str	oai:repositorio.ufpe.br:123456789/55271
network_acronym_str	UFPE
network_name_str	Repositório Institucional da UFPE
repository_id_str	2221
spelling	SILVA, Levy de Souzahttp://lattes.cnpq.br/1532801358254302http://lattes.cnpq.br/7113249247656195BARBOSA, Luciano de Andrade2024-02-29T11:54:33Z2024-02-29T11:54:33Z2023-12-07SILVA, Levy de Souza. Finding structured data from text using language models. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.https://repositorio.ufpe.br/handle/123456789/55271The Internet is a rich source of structured information. From Web Tables to public datasets, there exists a huge corpus of relational data online. Previous studies estimate that over 418M tables, in Hypertext Markup Language (HTML) format, can be found on the Web. Not limited to them, a large number of data repositories also provide ac- cess to thousands of datasets. As a result of that, over the last years, a growing body of work has begun to explore this data for several downstream applications. For example, Web Tables have been widely utilized for the task of Question Answering (QA), whose goal is to retrieve a table that answers a query from a table collection. In the context of datasets, their most popular application is the dataset retrieval task, which aims to find structured datasets for an end-user. The point of intersection for table/dataset re- trieval is that they need to match unstructured queries and relational data, in addition to being a ranking task. Moreover, the core challenge of this task is how to construct a robust matching model for computing this similarity degree. Towards this front, this thesis work is divided into three parts. In the first one, we explore the problem of QA Table Retrieval, in which our goal is to outline the best solutions for this task. In se- quence, we focus on an unexplored news-table matching problem, whose Web Tables are applied to augmenting news stories. Lastly, we concentrate on the dataset retrieval task. Specifically, we summarize our main contributions as follows: (I) we present a novel tax- onomy for table retrieval that classifies the table retrieval methods into five groups, from probabilistic approaches to sophisticated neural networks. Our research also points out that the best results for this task are achieved by using deep neural models, built on top of recurrent networks and convolutional architectures; (II) we introduce a novel atten- tion model based on Bidirectional Encoder Representations from Transformers (BERT) for computing the similarity degree between news stories and Web Tables, in addition to comparing its performance against Information Retrieval (IR) techniques, document/sen- tence encoders, text-matching models, and neural IR approaches. In short, a hypothesis test confirms that our approach outperforms all baselines in terms of the Mean Reciprocal Ranking metric; and (III) we propose Data Augmentation Pipeline for Dataset Retrieval (DAPDR), a solution that leverages Large Language Models (LLMs) to create synthetic questions for dataset descriptions, which are then applied to training supervised retrievers. Finally, we evaluate DAPDR on dataset search benchmarks using a set of dense retrievers, whose main results show that the retrievers tuned in DAPDR statistically outperform the original models at different Normalized Discounted Cumulative Gain (NDCG) levels.CNPqA Internet é uma rica fonte de informação estruturada. De tabelas Hypertext Markup Language (HTML) até coleções de dados públicos, existe um enorme conjunto de dados relacionais online. Estudos anteriores estimam que mais de 418 milhões de tabelas, em formato HTML, podem ser encontradas na Internet. Não se limitando a estas, um grande número de repositórios de dados fornecem acesso a milhares de coleções estruturadas. Como resultado, nos últimos anos, vários estudos exploram estes dados em diversas apli- cações. Por exemplo, tabelas HTML são geralmente utilizadas na tarefa de perguntas e respostas: considerando uma pergunta e uma coleção de tabelas, o objetivo é encontrar uma tabela, desta coleção, que possa ser utilizada como resposta para esta pergunta. No contexto de dados públicos, a principal aplicação é a busca por conjunto de dados, que encontra uma coleção de dados para um usuário final. O ponto de intersecção destas tare- fas é a correspondência de dados estruturados e não estruturados, além de uma tarefa de classificação. Ademais, o principal desafio é construir um modelo computacional robusto para calcular a similaridade entre perguntas e tabelas. Nesse contexto, este trabalho de tese está dividido em três partes. Na primeira, exploramos o problema de recuperação de tabelas para perguntas e respostas, sumarizando as melhores soluções para esta tarefa. Em seguida, introduzimos uma nova tarefa para correlação de notícias e tabelas, apli- cadas para expandir o conteúdo das notícias. Por fim, focamos na tarefa de busca por conjuntos de dados. Especificamente, as principais contribuições desta tese são: (I) nós apresentamos uma nova taxonomia para a tarefa de recuperação de tabelas que classifica os métodos de recuperação de tabelas em cinco grupos, desde abordagens probabilísticas até redes neurais sofisticadas. Este estudo também aponta que os melhores resultados para esta tarefa são alcançados por meio de modelos de redes neurais profundas, uti- lizando redes recorrentes e arquiteturas convolucionais; (II) nós introduzimos um novo modelo de atenção baseado em Bidirectional Encoder Representations from Transformers (BERT) para calcular o grau de similaridade entre notícias e tabelas, além de comparar seu desempenho com técnicas de recuperação de informação, codificadores de sentenças e documentos, modelos de correspondência de textos e abordagens de redes neurais. Em re- sumo, um teste de hipótese confirma que nossa abordagem supera todos os outros modelos considerando uma métrica de classificação média; e (III) nós propomos Data Augmenta- tion Pipeline for Dataset Retrieval (DAPDR), uma solução que usa modelos de linguagens para criar perguntas sintéticas para coleções de dados, que são aplicadas no treinamento de modelos supervisionados. Por fim, DAPDR é avaliado utilizando dados experimentais para esta tarefa e modelos densos de recuperação de informação, cujos principais resulta- dos mostram que os modelos ajustados em DAPDR superam estatisticamente os modelos originais em diferentes níveis de Normalized Discounted Cumulative Gain (NDCG).engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessInteligência computacionalTabelas da internetRecuperação de tabelasCorrespondência de notícias e tabelasFinding structured data from text using language modelsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisdoutoradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPECC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/55271/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52ORIGINALTESE Levy de Souza Silva.pdfTESE Levy de Souza Silva.pdfapplication/pdf1875732https://repositorio.ufpe.br/bitstream/123456789/55271/1/TESE%20Levy%20de%20Souza%20Silva.pdf3469aa82f747b2c91f9a3df08d41ed4fMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82362https://repositorio.ufpe.br/bitstream/123456789/55271/3/license.txt5e89a1613ddc8510c6576f4b23a78973MD53TEXTTESE Levy de Souza Silva.pdf.txtTESE Levy de Souza Silva.pdf.txtExtracted texttext/plain295487https://repositorio.ufpe.br/bitstream/123456789/55271/4/TESE%20Levy%20de%20Souza%20Silva.pdf.txtd6e230a377351ed3e98d214daff4707fMD54THUMBNAILTESE Levy de Souza Silva.pdf.jpgTESE Levy de Souza Silva.pdf.jpgGenerated Thumbnailimage/jpeg1185https://repositorio.ufpe.br/bitstream/123456789/55271/5/TESE%20Levy%20de%20Souza%20Silva.pdf.jpg12f896c3b5c4315ae48cdd324dbedc2cMD55123456789/552712024-03-01 02:24:28.066oai:repositorio.ufpe.br:123456789/55271VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2l6YcOnw6NvIGRlIERvY3VtZW50b3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRQoKCkRlY2xhcm8gZXN0YXIgY2llbnRlIGRlIHF1ZSBlc3RlIFRlcm1vIGRlIERlcMOzc2l0byBMZWdhbCBlIEF1dG9yaXphw6fDo28gdGVtIG8gb2JqZXRpdm8gZGUgZGl2dWxnYcOnw6NvIGRvcyBkb2N1bWVudG9zIGRlcG9zaXRhZG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgZSBkZWNsYXJvIHF1ZToKCkkgLSBvcyBkYWRvcyBwcmVlbmNoaWRvcyBubyBmb3JtdWzDoXJpbyBkZSBkZXDDs3NpdG8gc8OjbyB2ZXJkYWRlaXJvcyBlIGF1dMOqbnRpY29zOwoKSUkgLSAgbyBjb250ZcO6ZG8gZGlzcG9uaWJpbGl6YWRvIMOpIGRlIHJlc3BvbnNhYmlsaWRhZGUgZGUgc3VhIGF1dG9yaWE7CgpJSUkgLSBvIGNvbnRlw7pkbyDDqSBvcmlnaW5hbCwgZSBzZSBvIHRyYWJhbGhvIGUvb3UgcGFsYXZyYXMgZGUgb3V0cmFzIHBlc3NvYXMgZm9yYW0gdXRpbGl6YWRvcywgZXN0YXMgZm9yYW0gZGV2aWRhbWVudGUgcmVjb25oZWNpZGFzOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIG9icmEgY29sZXRpdmEgKG1haXMgZGUgdW0gYXV0b3IpOiB0b2RvcyBvcyBhdXRvcmVzIGVzdMOjbyBjaWVudGVzIGRvIGRlcMOzc2l0byBlIGRlIGFjb3JkbyBjb20gZXN0ZSB0ZXJtbzsKClYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogbyBhcnF1aXZvIGRlcG9zaXRhZG8gY29ycmVzcG9uZGUgw6AgdmVyc8OjbyBmaW5hbCBkbyB0cmFiYWxobzsKClZJIC0gcXVhbmRvIHRyYXRhci1zZSBkZSBUcmFiYWxobyBkZSBDb25jbHVzw6NvIGRlIEN1cnNvLCBEaXNzZXJ0YcOnw6NvIG91IFRlc2U6IGVzdG91IGNpZW50ZSBkZSBxdWUgYSBhbHRlcmHDp8OjbyBkYSBtb2RhbGlkYWRlIGRlIGFjZXNzbyBhbyBkb2N1bWVudG8gYXDDs3MgbyBkZXDDs3NpdG8gZSBhbnRlcyBkZSBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvLCBxdWFuZG8gZm9yIGVzY29saGlkbyBhY2Vzc28gcmVzdHJpdG8sIHNlcsOhIHBlcm1pdGlkYSBtZWRpYW50ZSBzb2xpY2l0YcOnw6NvIGRvIChhKSBhdXRvciAoYSkgYW8gU2lzdGVtYSBJbnRlZ3JhZG8gZGUgQmlibGlvdGVjYXMgZGEgVUZQRSAoU0lCL1VGUEUpLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gQWJlcnRvOgoKTmEgcXVhbGlkYWRlIGRlIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGF1dG9yIHF1ZSByZWNhZW0gc29icmUgZXN0ZSBkb2N1bWVudG8sIGZ1bmRhbWVudGFkbyBuYSBMZWkgZGUgRGlyZWl0byBBdXRvcmFsIG5vIDkuNjEwLCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIGEgcGFydGlyIGRhIGRhdGEgZGUgZGVww7NzaXRvLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gUmVzdHJpdG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAgZGUgMTkgZGUgZmV2ZXJlaXJvIGRlIDE5OTgsIGFydC4gMjksIGluY2lzbyBJSUksIGF1dG9yaXpvIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgUGVybmFtYnVjbyBhIGRpc3BvbmliaWxpemFyIGdyYXR1aXRhbWVudGUsIHNlbSByZXNzYXJjaW1lbnRvIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgcGFyYSBmaW5zIGRlIGxlaXR1cmEsIGltcHJlc3PDo28gZS9vdSBkb3dubG9hZCAoYXF1aXNpw6fDo28pIGF0cmF2w6lzIGRvIHNpdGUgZG8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBubyBlbmRlcmXDp28gaHR0cDovL3d3dy5yZXBvc2l0b3Jpby51ZnBlLmJyLCBxdWFuZG8gZmluZGFyIG8gcGVyw61vZG8gZGUgZW1iYXJnbyBjb25kaXplbnRlIGFvIHRpcG8gZGUgZG9jdW1lbnRvLCBjb25mb3JtZSBpbmRpY2FkbyBubyBjYW1wbyBEYXRhIGRlIEVtYmFyZ28uCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212024-03-01T05:24:28Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv	Finding structured data from text using language models
title	Finding structured data from text using language models
spellingShingle	Finding structured data from text using language models SILVA, Levy de Souza Inteligência computacional Tabelas da internet Recuperação de tabelas Correspondência de notícias e tabelas
title_short	Finding structured data from text using language models
title_full	Finding structured data from text using language models
title_fullStr	Finding structured data from text using language models
title_full_unstemmed	Finding structured data from text using language models
title_sort	Finding structured data from text using language models
author	SILVA, Levy de Souza
author_facet	SILVA, Levy de Souza
author_role	author
dc.contributor.authorLattes.pt_BR.fl_str_mv	http://lattes.cnpq.br/1532801358254302
dc.contributor.advisorLattes.pt_BR.fl_str_mv	http://lattes.cnpq.br/7113249247656195
dc.contributor.author.fl_str_mv	SILVA, Levy de Souza
dc.contributor.advisor1.fl_str_mv	BARBOSA, Luciano de Andrade
contributor_str_mv	BARBOSA, Luciano de Andrade
dc.subject.por.fl_str_mv	Inteligência computacional Tabelas da internet Recuperação de tabelas Correspondência de notícias e tabelas
topic	Inteligência computacional Tabelas da internet Recuperação de tabelas Correspondência de notícias e tabelas
description	The Internet is a rich source of structured information. From Web Tables to public datasets, there exists a huge corpus of relational data online. Previous studies estimate that over 418M tables, in Hypertext Markup Language (HTML) format, can be found on the Web. Not limited to them, a large number of data repositories also provide ac- cess to thousands of datasets. As a result of that, over the last years, a growing body of work has begun to explore this data for several downstream applications. For example, Web Tables have been widely utilized for the task of Question Answering (QA), whose goal is to retrieve a table that answers a query from a table collection. In the context of datasets, their most popular application is the dataset retrieval task, which aims to find structured datasets for an end-user. The point of intersection for table/dataset re- trieval is that they need to match unstructured queries and relational data, in addition to being a ranking task. Moreover, the core challenge of this task is how to construct a robust matching model for computing this similarity degree. Towards this front, this thesis work is divided into three parts. In the first one, we explore the problem of QA Table Retrieval, in which our goal is to outline the best solutions for this task. In se- quence, we focus on an unexplored news-table matching problem, whose Web Tables are applied to augmenting news stories. Lastly, we concentrate on the dataset retrieval task. Specifically, we summarize our main contributions as follows: (I) we present a novel tax- onomy for table retrieval that classifies the table retrieval methods into five groups, from probabilistic approaches to sophisticated neural networks. Our research also points out that the best results for this task are achieved by using deep neural models, built on top of recurrent networks and convolutional architectures; (II) we introduce a novel atten- tion model based on Bidirectional Encoder Representations from Transformers (BERT) for computing the similarity degree between news stories and Web Tables, in addition to comparing its performance against Information Retrieval (IR) techniques, document/sen- tence encoders, text-matching models, and neural IR approaches. In short, a hypothesis test confirms that our approach outperforms all baselines in terms of the Mean Reciprocal Ranking metric; and (III) we propose Data Augmentation Pipeline for Dataset Retrieval (DAPDR), a solution that leverages Large Language Models (LLMs) to create synthetic questions for dataset descriptions, which are then applied to training supervised retrievers. Finally, we evaluate DAPDR on dataset search benchmarks using a set of dense retrievers, whose main results show that the retrievers tuned in DAPDR statistically outperform the original models at different Normalized Discounted Cumulative Gain (NDCG) levels.
publishDate	2023
dc.date.issued.fl_str_mv	2023-12-07
dc.date.accessioned.fl_str_mv	2024-02-29T11:54:33Z
dc.date.available.fl_str_mv	2024-02-29T11:54:33Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	SILVA, Levy de Souza. Finding structured data from text using language models. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.
dc.identifier.uri.fl_str_mv	https://repositorio.ufpe.br/handle/123456789/55271
identifier_str_mv	SILVA, Levy de Souza. Finding structured data from text using language models. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.
url	https://repositorio.ufpe.br/handle/123456789/55271
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv	Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv	UFPE
dc.publisher.country.fl_str_mv	Brasil
publisher.none.fl_str_mv	Universidade Federal de Pernambuco
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE
instname_str	Universidade Federal de Pernambuco (UFPE)
instacron_str	UFPE
institution	UFPE
reponame_str	Repositório Institucional da UFPE
collection	Repositório Institucional da UFPE
bitstream.url.fl_str_mv	https://repositorio.ufpe.br/bitstream/123456789/55271/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/55271/1/TESE%20Levy%20de%20Souza%20Silva.pdf https://repositorio.ufpe.br/bitstream/123456789/55271/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/55271/4/TESE%20Levy%20de%20Souza%20Silva.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/55271/5/TESE%20Levy%20de%20Souza%20Silva.pdf.jpg
bitstream.checksum.fl_str_mv	e39d27027a6cc9cb039ad269a5db8e34 3469aa82f747b2c91f9a3df08d41ed4f 5e89a1613ddc8510c6576f4b23a78973 d6e230a377351ed3e98d214daff4707f 12f896c3b5c4315ae48cdd324dbedc2c
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv	attena@ufpe.br
_version_	1802310812896854016

Finding structured data from text using language models

Registros relacionados