Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional PUC-Campinas |
Texto Completo: | http://repositorio.sis.puc-campinas.edu.br/xmlui/handle/123456789/17187 |
Resumo: | The Internet of Things is a paradigm that interconnects several smart devices through the internet to provide ubiquitous services to users. This paradigm and Web 2.0 platforms generate countless amounts of textual data. Thus, a significant challenge in this context is automatically performing text classification. State-of-the-art outcomes have recently been obtained by employing language models trained from scratch on corpora made up from news online to handle text classification better. A language model that we can highlight is BERT (Bidirectional Encoder Representations from Transformers) and also DistilBERT is a pre-trained smaller general-purpose language representation model. In this context, through a case study, we propose performing the text classification task with two previously mentioned models for two languages (English and Brazilian Portuguese) in different datasets. The results show that DistilBERT’s training time for English and Brazilian Portuguese was about 45% faster than its larger counterpart, it was also 40% smaller, and preserves about 96% of language comprehension skills for balanced datasets. |
id |
PUC_CAMP-5_a67cf78e1d6ae32b822a081d56265b57 |
---|---|
oai_identifier_str |
oai:repositorio.sis.puc-campinas.edu.br:123456789/17187 |
network_acronym_str |
PUC_CAMP-5 |
network_name_str |
Repositório Institucional PUC-Campinas |
repository_id_str |
|
spelling |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case StudyRumo a técnicas de aprendizagem por transferência - BERT, DistilBERT, BERTimbau e DistilBERTimbau para classificação automática de texto de diferentes idiomas: um estudo de casobig datapre-trained modelBERTDistilBERTBERTimbauDistilBERTimbautransformerbased machine learningThe Internet of Things is a paradigm that interconnects several smart devices through the internet to provide ubiquitous services to users. This paradigm and Web 2.0 platforms generate countless amounts of textual data. Thus, a significant challenge in this context is automatically performing text classification. State-of-the-art outcomes have recently been obtained by employing language models trained from scratch on corpora made up from news online to handle text classification better. A language model that we can highlight is BERT (Bidirectional Encoder Representations from Transformers) and also DistilBERT is a pre-trained smaller general-purpose language representation model. In this context, through a case study, we propose performing the text classification task with two previously mentioned models for two languages (English and Brazilian Portuguese) in different datasets. The results show that DistilBERT’s training time for English and Brazilian Portuguese was about 45% faster than its larger counterpart, it was also 40% smaller, and preserves about 96% of language comprehension skills for balanced datasets.Não recebi financiamentoSensorsPontifícia Universidade Católica de Campinas (PUC-Campinas)Barbon, Rafael SilvaAkabane, Ademar Takeo2024-03-18T14:50:50Z2024-03-18T14:50:50Z2022-10-26info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.sis.puc-campinas.edu.br/xmlui/handle/123456789/1718797138912188129636781874728187325enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional PUC-Campinasinstname:Pontifícia Universidade Católica de Campinas (PUC-CAMPINAS)instacron:PUC_CAMP2024-03-18T14:50:50Zoai:repositorio.sis.puc-campinas.edu.br:123456789/17187Repositório InstitucionalPRIhttps://repositorio.sis.puc-campinas.edu.br/oai/requestsbi.bibliotecadigital@puc-campinas.edu.bropendoar:2024-03-18T14:50:50Repositório Institucional PUC-Campinas - Pontifícia Universidade Católica de Campinas (PUC-CAMPINAS)false |
dc.title.none.fl_str_mv |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study Rumo a técnicas de aprendizagem por transferência - BERT, DistilBERT, BERTimbau e DistilBERTimbau para classificação automática de texto de diferentes idiomas: um estudo de caso |
title |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study |
spellingShingle |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study Barbon, Rafael Silva big data pre-trained model BERT DistilBERT BERTimbau DistilBERTimbau transformerbased machine learning |
title_short |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study |
title_full |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study |
title_fullStr |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study |
title_full_unstemmed |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study |
title_sort |
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study |
author |
Barbon, Rafael Silva |
author_facet |
Barbon, Rafael Silva Akabane, Ademar Takeo |
author_role |
author |
author2 |
Akabane, Ademar Takeo |
author2_role |
author |
dc.contributor.none.fl_str_mv |
Pontifícia Universidade Católica de Campinas (PUC-Campinas) |
dc.contributor.author.fl_str_mv |
Barbon, Rafael Silva Akabane, Ademar Takeo |
dc.subject.por.fl_str_mv |
big data pre-trained model BERT DistilBERT BERTimbau DistilBERTimbau transformerbased machine learning |
topic |
big data pre-trained model BERT DistilBERT BERTimbau DistilBERTimbau transformerbased machine learning |
description |
The Internet of Things is a paradigm that interconnects several smart devices through the internet to provide ubiquitous services to users. This paradigm and Web 2.0 platforms generate countless amounts of textual data. Thus, a significant challenge in this context is automatically performing text classification. State-of-the-art outcomes have recently been obtained by employing language models trained from scratch on corpora made up from news online to handle text classification better. A language model that we can highlight is BERT (Bidirectional Encoder Representations from Transformers) and also DistilBERT is a pre-trained smaller general-purpose language representation model. In this context, through a case study, we propose performing the text classification task with two previously mentioned models for two languages (English and Brazilian Portuguese) in different datasets. The results show that DistilBERT’s training time for English and Brazilian Portuguese was about 45% faster than its larger counterpart, it was also 40% smaller, and preserves about 96% of language comprehension skills for balanced datasets. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-10-26 2024-03-18T14:50:50Z 2024-03-18T14:50:50Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.sis.puc-campinas.edu.br/xmlui/handle/123456789/17187 9713891218812963 6781874728187325 |
url |
http://repositorio.sis.puc-campinas.edu.br/xmlui/handle/123456789/17187 |
identifier_str_mv |
9713891218812963 6781874728187325 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Sensors |
publisher.none.fl_str_mv |
Sensors |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional PUC-Campinas instname:Pontifícia Universidade Católica de Campinas (PUC-CAMPINAS) instacron:PUC_CAMP |
instname_str |
Pontifícia Universidade Católica de Campinas (PUC-CAMPINAS) |
instacron_str |
PUC_CAMP |
institution |
PUC_CAMP |
reponame_str |
Repositório Institucional PUC-Campinas |
collection |
Repositório Institucional PUC-Campinas |
repository.name.fl_str_mv |
Repositório Institucional PUC-Campinas - Pontifícia Universidade Católica de Campinas (PUC-CAMPINAS) |
repository.mail.fl_str_mv |
sbi.bibliotecadigital@puc-campinas.edu.br |
_version_ |
1798415782749667328 |