Using data mining techniques to support breast cancer diagnosis
Autor(a) principal: | |
---|---|
Data de Publicação: | 2014 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.22/5790 |
Resumo: | More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques. |
id |
RCAP_471fcd3ced5b0740441f34983dfc1494 |
---|---|
oai_identifier_str |
oai:recipp.ipp.pt:10400.22/5790 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Using data mining techniques to support breast cancer diagnosisAplicação de técnicas de data mining para suporte ao diagnóstico de cancro da mamaMore than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.Cada vez mais assistimos a um aumento global do número de métodos de apoio a decisão e diagnóstico assistido por computador, aplicados a diversas áreas da medicina. Na área de investigação do cancro da mama muitos são os trabalhos que têm sido desenvolvidos como segunda leitura de modo a reduzir o número de falsos positivos no diagnóstico. Neste estudo é apresentado um conjunto de técnicas de data mining que poderão ser aplicadas a um sistema de apoio à decisão na área do diagnóstico de cancro da mama. Esta abordagem tem por objetivo ajudar os clínicos na identificação de achados mamográficos como microcalcificações, massas e mesmo tecidos normais, de forma a evitar diagnósticos errados. Para isso, neste trabalho é usada uma base de dados fidedigna, de 410 imagens correspondentes a 115 pacientes, contendo análises prévias, realizadas por radiologistas, de microcalcificações, massas e tecidos considerados normais. Ao longo deste trabalho são utilizadas duas técnicas de extração de características, a matriz de coocorrência de níveis de cinza e a matriz de comprimento da linha de níveis de cinza. Para a classificação foram considerados diferentes cenários de acordo com diferentes padrões de distinção de lesões e ainda vários classificadores de forma a distinguir as melhores performances em cada caso descrito. Os vários classificadores usados foram Naïve Bayes, Support Vector Machines, k-nearest Neighbors e Decision Trees (J48 e Random Forests). Os resultados obtidos na distinção dos achados mamográficos revelaram percentagens de valor preditivo positivo e de precisão bastante boas. São ainda apresentados outros resultados relacionados com sistemas de classificação de densidade mamária e escala BI-RADS®. O melhor método de previsão encontrado, perante todos os grupos testados, foi o classificador Random Forest e o melhor desempenho foi conseguido através da distinção de microcalcificações. As conclusões feitas ao longo dos vários cenários testados foram interessantes em termos que representam uma nova perspetiva no diagnóstico do cancro da mama, utilizando técnicas de data mining.Marreiros, GoretiFreitas, AlbertoRepositório Científico do Instituto Politécnico do PortoDiz, Joana Moreira2015-04-08T13:30:35Z20142014-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/5790TID:201816636enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T12:45:56Zoai:recipp.ipp.pt:10400.22/5790Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:26:23.577801Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Using data mining techniques to support breast cancer diagnosis Aplicação de técnicas de data mining para suporte ao diagnóstico de cancro da mama |
title |
Using data mining techniques to support breast cancer diagnosis |
spellingShingle |
Using data mining techniques to support breast cancer diagnosis Diz, Joana Moreira |
title_short |
Using data mining techniques to support breast cancer diagnosis |
title_full |
Using data mining techniques to support breast cancer diagnosis |
title_fullStr |
Using data mining techniques to support breast cancer diagnosis |
title_full_unstemmed |
Using data mining techniques to support breast cancer diagnosis |
title_sort |
Using data mining techniques to support breast cancer diagnosis |
author |
Diz, Joana Moreira |
author_facet |
Diz, Joana Moreira |
author_role |
author |
dc.contributor.none.fl_str_mv |
Marreiros, Goreti Freitas, Alberto Repositório Científico do Instituto Politécnico do Porto |
dc.contributor.author.fl_str_mv |
Diz, Joana Moreira |
description |
More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques. |
publishDate |
2014 |
dc.date.none.fl_str_mv |
2014 2014-01-01T00:00:00Z 2015-04-08T13:30:35Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/5790 TID:201816636 |
url |
http://hdl.handle.net/10400.22/5790 |
identifier_str_mv |
TID:201816636 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799131358647287808 |