Automatic Nutritional Information Extraction from Photographic Images of Labels
Autor(a) principal: | |
---|---|
Data de Publicação: | 2015 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/83493 |
Resumo: | In the past years people showed an increasing interest in improving their diet. Many factors can be pointed to this growth, being one of them the alarming explosion of diet related diseases. This group of diseases is progressively becoming the most common causes of death, including cardiovascular diseases, obesity, diabetes and cancer. Currently, almost all food products on the market contain nutrition labels, which is any information that appears on the product package referring to the values of the following nutrients: energy, proteins, carbohydrates, fats, dietary fiber, sodium, vitamins and minerals. This information provides a great insight of a product composition and helps the consumers to make healthier food choices. While the labels do not have a regulated or standard format, each product often presents the nutrition information differently, leading to a wide variety of nutrition labels present in the market. This, combined with the high amount of information displayed and the difficulty of interpreting the data without the necessary knowledge, makes the extraction of relevant data and analysis a hard task for consumers. One of the solutions to simplify this task suggested in many of the studies on this subject, is to present a summary of nutrition information as a complement to the nutrient-specific information. The main outcome of this project is to overcome this problem and offer the consumer a tool to help in the extraction and interpretation of these values, by offering to the consumer an Android application. This application tries to extract automatically the nutritional information of an image of a nutrition declaration and presents it in a single, cross-sectional shape, following the new regulations and with some additional aids, including relative values to the recommended daily doses and simplified schemes. In addition to this feature, it is also possible to compare between two products of the same category. In order to achieve these goals, it is necessary to convert the image into digital text to be processed later. To perform this conversion the application uses the OCR engine developed by Google, Tesseract. Many problems were found throughout the development of this project, such as the low accuracy of the OCR engine or the problems of acquiring the images using a mobile device. However, after some pre and post processing algorithms, the accuracy increased to 55%, 83% more than without any preprocessing. In addition, the percentage of images that returns 0 matches decreased from 30% to 8%. |
id |
RCAP_776efdc64bfe3ce0b9e453698dc82e7e |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/83493 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Automatic Nutritional Information Extraction from Photographic Images of LabelsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringIn the past years people showed an increasing interest in improving their diet. Many factors can be pointed to this growth, being one of them the alarming explosion of diet related diseases. This group of diseases is progressively becoming the most common causes of death, including cardiovascular diseases, obesity, diabetes and cancer. Currently, almost all food products on the market contain nutrition labels, which is any information that appears on the product package referring to the values of the following nutrients: energy, proteins, carbohydrates, fats, dietary fiber, sodium, vitamins and minerals. This information provides a great insight of a product composition and helps the consumers to make healthier food choices. While the labels do not have a regulated or standard format, each product often presents the nutrition information differently, leading to a wide variety of nutrition labels present in the market. This, combined with the high amount of information displayed and the difficulty of interpreting the data without the necessary knowledge, makes the extraction of relevant data and analysis a hard task for consumers. One of the solutions to simplify this task suggested in many of the studies on this subject, is to present a summary of nutrition information as a complement to the nutrient-specific information. The main outcome of this project is to overcome this problem and offer the consumer a tool to help in the extraction and interpretation of these values, by offering to the consumer an Android application. This application tries to extract automatically the nutritional information of an image of a nutrition declaration and presents it in a single, cross-sectional shape, following the new regulations and with some additional aids, including relative values to the recommended daily doses and simplified schemes. In addition to this feature, it is also possible to compare between two products of the same category. In order to achieve these goals, it is necessary to convert the image into digital text to be processed later. To perform this conversion the application uses the OCR engine developed by Google, Tesseract. Many problems were found throughout the development of this project, such as the low accuracy of the OCR engine or the problems of acquiring the images using a mobile device. However, after some pre and post processing algorithms, the accuracy increased to 55%, 83% more than without any preprocessing. In addition, the percentage of images that returns 0 matches decreased from 30% to 8%.2015-07-162015-07-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/83493TID:201310651engLara Rafaela Almeida Marinhainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:27:09Zoai:repositorio-aberto.up.pt:10216/83493Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:24:02.147987Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Automatic Nutritional Information Extraction from Photographic Images of Labels |
title |
Automatic Nutritional Information Extraction from Photographic Images of Labels |
spellingShingle |
Automatic Nutritional Information Extraction from Photographic Images of Labels Lara Rafaela Almeida Marinha Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Automatic Nutritional Information Extraction from Photographic Images of Labels |
title_full |
Automatic Nutritional Information Extraction from Photographic Images of Labels |
title_fullStr |
Automatic Nutritional Information Extraction from Photographic Images of Labels |
title_full_unstemmed |
Automatic Nutritional Information Extraction from Photographic Images of Labels |
title_sort |
Automatic Nutritional Information Extraction from Photographic Images of Labels |
author |
Lara Rafaela Almeida Marinha |
author_facet |
Lara Rafaela Almeida Marinha |
author_role |
author |
dc.contributor.author.fl_str_mv |
Lara Rafaela Almeida Marinha |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
In the past years people showed an increasing interest in improving their diet. Many factors can be pointed to this growth, being one of them the alarming explosion of diet related diseases. This group of diseases is progressively becoming the most common causes of death, including cardiovascular diseases, obesity, diabetes and cancer. Currently, almost all food products on the market contain nutrition labels, which is any information that appears on the product package referring to the values of the following nutrients: energy, proteins, carbohydrates, fats, dietary fiber, sodium, vitamins and minerals. This information provides a great insight of a product composition and helps the consumers to make healthier food choices. While the labels do not have a regulated or standard format, each product often presents the nutrition information differently, leading to a wide variety of nutrition labels present in the market. This, combined with the high amount of information displayed and the difficulty of interpreting the data without the necessary knowledge, makes the extraction of relevant data and analysis a hard task for consumers. One of the solutions to simplify this task suggested in many of the studies on this subject, is to present a summary of nutrition information as a complement to the nutrient-specific information. The main outcome of this project is to overcome this problem and offer the consumer a tool to help in the extraction and interpretation of these values, by offering to the consumer an Android application. This application tries to extract automatically the nutritional information of an image of a nutrition declaration and presents it in a single, cross-sectional shape, following the new regulations and with some additional aids, including relative values to the recommended daily doses and simplified schemes. In addition to this feature, it is also possible to compare between two products of the same category. In order to achieve these goals, it is necessary to convert the image into digital text to be processed later. To perform this conversion the application uses the OCR engine developed by Google, Tesseract. Many problems were found throughout the development of this project, such as the low accuracy of the OCR engine or the problems of acquiring the images using a mobile device. However, after some pre and post processing algorithms, the accuracy increased to 55%, 83% more than without any preprocessing. In addition, the percentage of images that returns 0 matches decreased from 30% to 8%. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-07-16 2015-07-16T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/83493 TID:201310651 |
url |
https://hdl.handle.net/10216/83493 |
identifier_str_mv |
TID:201310651 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136155947499521 |