Automatic FoodEx2 classification system for food description
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.22/21470 |
Resumo: | Food is an impacting factor in human health. Food security protects the consumers by offering a safety net from which they can trust the quality of the product. In Europe, entities such as the European Food and Safety Authority (EFSA) are risk assessors. They provide information used to shape laws around food security. To collect data regarding food safety the EFSA developed a comprehensive food classification and description system, called FoodEx2. The FoodEx2 coding system uses manual process to map food descriptions to FoodEx2 codes. The motivation for this work comes from the reduced time that could be obtained by using an algorithm to automate the code generation. It is already known that the application of Knowledge Discovery in Databases is a fundamental area to automatically produce patterns from large quantities of data. The main objective of this project is to explore automatic approaches to classify food descriptions with FoodEx2 codes. In this work several classic classifiers are compared in the prediction of FoodEx2 base codes, a multiclass classification task. The performances were explored in distinct datasets along with different levels of text preprocessing using the metrics exact match ratio and the f1-score and document representation Bag-Of-Words with TF IDF weighting. All the datasets contain imbalanced data distributions. The documents are composed of short texts describing ingredients, dishes, and animal sample details. The performances varied mainly between datasets and classifiers. The best performing classifiers were Random Forests, Decision Trees, and Linear Support Vector Machines. The results show that the creation of an automatic classifier is dependent on further exploration of the available data. |
id |
RCAP_2eeb1920039e6407002c917cb0997723 |
---|---|
oai_identifier_str |
oai:recipp.ipp.pt:10400.22/21470 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Automatic FoodEx2 classification system for food descriptionKnowledge discovery in databasesMachine learningText miningFood classificationFoodEx2Food is an impacting factor in human health. Food security protects the consumers by offering a safety net from which they can trust the quality of the product. In Europe, entities such as the European Food and Safety Authority (EFSA) are risk assessors. They provide information used to shape laws around food security. To collect data regarding food safety the EFSA developed a comprehensive food classification and description system, called FoodEx2. The FoodEx2 coding system uses manual process to map food descriptions to FoodEx2 codes. The motivation for this work comes from the reduced time that could be obtained by using an algorithm to automate the code generation. It is already known that the application of Knowledge Discovery in Databases is a fundamental area to automatically produce patterns from large quantities of data. The main objective of this project is to explore automatic approaches to classify food descriptions with FoodEx2 codes. In this work several classic classifiers are compared in the prediction of FoodEx2 base codes, a multiclass classification task. The performances were explored in distinct datasets along with different levels of text preprocessing using the metrics exact match ratio and the f1-score and document representation Bag-Of-Words with TF IDF weighting. All the datasets contain imbalanced data distributions. The documents are composed of short texts describing ingredients, dishes, and animal sample details. The performances varied mainly between datasets and classifiers. The best performing classifiers were Random Forests, Decision Trees, and Linear Support Vector Machines. The results show that the creation of an automatic classifier is dependent on further exploration of the available data.Faria, Brígida MónicaReis, Luís PauloPimenta, RuiRepositório Científico do Instituto Politécnico do PortoFonseca, João Emanuel Sousa2023-11-28T01:34:22Z2022-11-282022-11-28T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/21470TID:203147391enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T01:47:12Zoai:recipp.ipp.pt:10400.22/21470Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:41:31.951553Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Automatic FoodEx2 classification system for food description |
title |
Automatic FoodEx2 classification system for food description |
spellingShingle |
Automatic FoodEx2 classification system for food description Fonseca, João Emanuel Sousa Knowledge discovery in databases Machine learning Text mining Food classification FoodEx2 |
title_short |
Automatic FoodEx2 classification system for food description |
title_full |
Automatic FoodEx2 classification system for food description |
title_fullStr |
Automatic FoodEx2 classification system for food description |
title_full_unstemmed |
Automatic FoodEx2 classification system for food description |
title_sort |
Automatic FoodEx2 classification system for food description |
author |
Fonseca, João Emanuel Sousa |
author_facet |
Fonseca, João Emanuel Sousa |
author_role |
author |
dc.contributor.none.fl_str_mv |
Faria, Brígida Mónica Reis, Luís Paulo Pimenta, Rui Repositório Científico do Instituto Politécnico do Porto |
dc.contributor.author.fl_str_mv |
Fonseca, João Emanuel Sousa |
dc.subject.por.fl_str_mv |
Knowledge discovery in databases Machine learning Text mining Food classification FoodEx2 |
topic |
Knowledge discovery in databases Machine learning Text mining Food classification FoodEx2 |
description |
Food is an impacting factor in human health. Food security protects the consumers by offering a safety net from which they can trust the quality of the product. In Europe, entities such as the European Food and Safety Authority (EFSA) are risk assessors. They provide information used to shape laws around food security. To collect data regarding food safety the EFSA developed a comprehensive food classification and description system, called FoodEx2. The FoodEx2 coding system uses manual process to map food descriptions to FoodEx2 codes. The motivation for this work comes from the reduced time that could be obtained by using an algorithm to automate the code generation. It is already known that the application of Knowledge Discovery in Databases is a fundamental area to automatically produce patterns from large quantities of data. The main objective of this project is to explore automatic approaches to classify food descriptions with FoodEx2 codes. In this work several classic classifiers are compared in the prediction of FoodEx2 base codes, a multiclass classification task. The performances were explored in distinct datasets along with different levels of text preprocessing using the metrics exact match ratio and the f1-score and document representation Bag-Of-Words with TF IDF weighting. All the datasets contain imbalanced data distributions. The documents are composed of short texts describing ingredients, dishes, and animal sample details. The performances varied mainly between datasets and classifiers. The best performing classifiers were Random Forests, Decision Trees, and Linear Support Vector Machines. The results show that the creation of an automatic classifier is dependent on further exploration of the available data. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-11-28 2022-11-28T00:00:00Z 2023-11-28T01:34:22Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/21470 TID:203147391 |
url |
http://hdl.handle.net/10400.22/21470 |
identifier_str_mv |
TID:203147391 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799131502987968512 |