Automatic Nutritional Information Extraction from Photographic Images of Labels

Detalhes bibliográficos
Autor(a) principal: Lara Rafaela Almeida Marinha
Data de Publicação: 2015
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/83493
Resumo: In the past years people showed an increasing interest in improving their diet. Many factors can be pointed to this growth, being one of them the alarming explosion of diet related diseases. This group of diseases is progressively becoming the most common causes of death, including cardiovascular diseases, obesity, diabetes and cancer. Currently, almost all food products on the market contain nutrition labels, which is any information that appears on the product package referring to the values of the following nutrients: energy, proteins, carbohydrates, fats, dietary fiber, sodium, vitamins and minerals. This information provides a great insight of a product composition and helps the consumers to make healthier food choices. While the labels do not have a regulated or standard format, each product often presents the nutrition information differently, leading to a wide variety of nutrition labels present in the market. This, combined with the high amount of information displayed and the difficulty of interpreting the data without the necessary knowledge, makes the extraction of relevant data and analysis a hard task for consumers. One of the solutions to simplify this task suggested in many of the studies on this subject, is to present a summary of nutrition information as a complement to the nutrient-specific information. The main outcome of this project is to overcome this problem and offer the consumer a tool to help in the extraction and interpretation of these values, by offering to the consumer an Android application. This application tries to extract automatically the nutritional information of an image of a nutrition declaration and presents it in a single, cross-sectional shape, following the new regulations and with some additional aids, including relative values to the recommended daily doses and simplified schemes. In addition to this feature, it is also possible to compare between two products of the same category. In order to achieve these goals, it is necessary to convert the image into digital text to be processed later. To perform this conversion the application uses the OCR engine developed by Google, Tesseract. Many problems were found throughout the development of this project, such as the low accuracy of the OCR engine or the problems of acquiring the images using a mobile device. However, after some pre and post processing algorithms, the accuracy increased to 55%, 83% more than without any preprocessing. In addition, the percentage of images that returns 0 matches decreased from 30% to 8%.
id RCAP_776efdc64bfe3ce0b9e453698dc82e7e
oai_identifier_str oai:repositorio-aberto.up.pt:10216/83493
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Automatic Nutritional Information Extraction from Photographic Images of LabelsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringIn the past years people showed an increasing interest in improving their diet. Many factors can be pointed to this growth, being one of them the alarming explosion of diet related diseases. This group of diseases is progressively becoming the most common causes of death, including cardiovascular diseases, obesity, diabetes and cancer. Currently, almost all food products on the market contain nutrition labels, which is any information that appears on the product package referring to the values of the following nutrients: energy, proteins, carbohydrates, fats, dietary fiber, sodium, vitamins and minerals. This information provides a great insight of a product composition and helps the consumers to make healthier food choices. While the labels do not have a regulated or standard format, each product often presents the nutrition information differently, leading to a wide variety of nutrition labels present in the market. This, combined with the high amount of information displayed and the difficulty of interpreting the data without the necessary knowledge, makes the extraction of relevant data and analysis a hard task for consumers. One of the solutions to simplify this task suggested in many of the studies on this subject, is to present a summary of nutrition information as a complement to the nutrient-specific information. The main outcome of this project is to overcome this problem and offer the consumer a tool to help in the extraction and interpretation of these values, by offering to the consumer an Android application. This application tries to extract automatically the nutritional information of an image of a nutrition declaration and presents it in a single, cross-sectional shape, following the new regulations and with some additional aids, including relative values to the recommended daily doses and simplified schemes. In addition to this feature, it is also possible to compare between two products of the same category. In order to achieve these goals, it is necessary to convert the image into digital text to be processed later. To perform this conversion the application uses the OCR engine developed by Google, Tesseract. Many problems were found throughout the development of this project, such as the low accuracy of the OCR engine or the problems of acquiring the images using a mobile device. However, after some pre and post processing algorithms, the accuracy increased to 55%, 83% more than without any preprocessing. In addition, the percentage of images that returns 0 matches decreased from 30% to 8%.2015-07-162015-07-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/83493TID:201310651engLara Rafaela Almeida Marinhainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:27:09Zoai:repositorio-aberto.up.pt:10216/83493Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:24:02.147987Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Automatic Nutritional Information Extraction from Photographic Images of Labels
title Automatic Nutritional Information Extraction from Photographic Images of Labels
spellingShingle Automatic Nutritional Information Extraction from Photographic Images of Labels
Lara Rafaela Almeida Marinha
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Automatic Nutritional Information Extraction from Photographic Images of Labels
title_full Automatic Nutritional Information Extraction from Photographic Images of Labels
title_fullStr Automatic Nutritional Information Extraction from Photographic Images of Labels
title_full_unstemmed Automatic Nutritional Information Extraction from Photographic Images of Labels
title_sort Automatic Nutritional Information Extraction from Photographic Images of Labels
author Lara Rafaela Almeida Marinha
author_facet Lara Rafaela Almeida Marinha
author_role author
dc.contributor.author.fl_str_mv Lara Rafaela Almeida Marinha
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description In the past years people showed an increasing interest in improving their diet. Many factors can be pointed to this growth, being one of them the alarming explosion of diet related diseases. This group of diseases is progressively becoming the most common causes of death, including cardiovascular diseases, obesity, diabetes and cancer. Currently, almost all food products on the market contain nutrition labels, which is any information that appears on the product package referring to the values of the following nutrients: energy, proteins, carbohydrates, fats, dietary fiber, sodium, vitamins and minerals. This information provides a great insight of a product composition and helps the consumers to make healthier food choices. While the labels do not have a regulated or standard format, each product often presents the nutrition information differently, leading to a wide variety of nutrition labels present in the market. This, combined with the high amount of information displayed and the difficulty of interpreting the data without the necessary knowledge, makes the extraction of relevant data and analysis a hard task for consumers. One of the solutions to simplify this task suggested in many of the studies on this subject, is to present a summary of nutrition information as a complement to the nutrient-specific information. The main outcome of this project is to overcome this problem and offer the consumer a tool to help in the extraction and interpretation of these values, by offering to the consumer an Android application. This application tries to extract automatically the nutritional information of an image of a nutrition declaration and presents it in a single, cross-sectional shape, following the new regulations and with some additional aids, including relative values to the recommended daily doses and simplified schemes. In addition to this feature, it is also possible to compare between two products of the same category. In order to achieve these goals, it is necessary to convert the image into digital text to be processed later. To perform this conversion the application uses the OCR engine developed by Google, Tesseract. Many problems were found throughout the development of this project, such as the low accuracy of the OCR engine or the problems of acquiring the images using a mobile device. However, after some pre and post processing algorithms, the accuracy increased to 55%, 83% more than without any preprocessing. In addition, the percentage of images that returns 0 matches decreased from 30% to 8%.
publishDate 2015
dc.date.none.fl_str_mv 2015-07-16
2015-07-16T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/83493
TID:201310651
url https://hdl.handle.net/10216/83493
identifier_str_mv TID:201310651
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136155947499521