Using data mining techniques to support breast cancer diagnosis

Diz, Joana Moreira

Using data mining techniques to support breast cancer diagnosis

Detalhes bibliográficos
Autor(a) principal:	Diz, Joana Moreira
Data de Publicação:	2014
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10400.22/5790
Resumo:	More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.

Metadados do item

id	RCAP_471fcd3ced5b0740441f34983dfc1494
oai_identifier_str	oai:recipp.ipp.pt:10400.22/5790
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Using data mining techniques to support breast cancer diagnosisAplicação de técnicas de data mining para suporte ao diagnóstico de cancro da mamaMore than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.Cada vez mais assistimos a um aumento global do número de métodos de apoio a decisão e diagnóstico assistido por computador, aplicados a diversas áreas da medicina. Na área de investigação do cancro da mama muitos são os trabalhos que têm sido desenvolvidos como segunda leitura de modo a reduzir o número de falsos positivos no diagnóstico. Neste estudo é apresentado um conjunto de técnicas de data mining que poderão ser aplicadas a um sistema de apoio à decisão na área do diagnóstico de cancro da mama. Esta abordagem tem por objetivo ajudar os clínicos na identificação de achados mamográficos como microcalcificações, massas e mesmo tecidos normais, de forma a evitar diagnósticos errados. Para isso, neste trabalho é usada uma base de dados fidedigna, de 410 imagens correspondentes a 115 pacientes, contendo análises prévias, realizadas por radiologistas, de microcalcificações, massas e tecidos considerados normais. Ao longo deste trabalho são utilizadas duas técnicas de extração de características, a matriz de coocorrência de níveis de cinza e a matriz de comprimento da linha de níveis de cinza. Para a classificação foram considerados diferentes cenários de acordo com diferentes padrões de distinção de lesões e ainda vários classificadores de forma a distinguir as melhores performances em cada caso descrito. Os vários classificadores usados foram Naïve Bayes, Support Vector Machines, k-nearest Neighbors e Decision Trees (J48 e Random Forests). Os resultados obtidos na distinção dos achados mamográficos revelaram percentagens de valor preditivo positivo e de precisão bastante boas. São ainda apresentados outros resultados relacionados com sistemas de classificação de densidade mamária e escala BI-RADS®. O melhor método de previsão encontrado, perante todos os grupos testados, foi o classificador Random Forest e o melhor desempenho foi conseguido através da distinção de microcalcificações. As conclusões feitas ao longo dos vários cenários testados foram interessantes em termos que representam uma nova perspetiva no diagnóstico do cancro da mama, utilizando técnicas de data mining.Marreiros, GoretiFreitas, AlbertoRepositório Científico do Instituto Politécnico do PortoDiz, Joana Moreira2015-04-08T13:30:35Z20142014-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/5790TID:201816636enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T12:45:56Zoai:recipp.ipp.pt:10400.22/5790Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:26:23.577801Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Using data mining techniques to support breast cancer diagnosis Aplicação de técnicas de data mining para suporte ao diagnóstico de cancro da mama
title	Using data mining techniques to support breast cancer diagnosis
spellingShingle	Using data mining techniques to support breast cancer diagnosis Diz, Joana Moreira
title_short	Using data mining techniques to support breast cancer diagnosis
title_full	Using data mining techniques to support breast cancer diagnosis
title_fullStr	Using data mining techniques to support breast cancer diagnosis
title_full_unstemmed	Using data mining techniques to support breast cancer diagnosis
title_sort	Using data mining techniques to support breast cancer diagnosis
author	Diz, Joana Moreira
author_facet	Diz, Joana Moreira
author_role	author
dc.contributor.none.fl_str_mv	Marreiros, Goreti Freitas, Alberto Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv	Diz, Joana Moreira
description	More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.
publishDate	2014
dc.date.none.fl_str_mv	2014 2014-01-01T00:00:00Z 2015-04-08T13:30:35Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.22/5790 TID:201816636
url	http://hdl.handle.net/10400.22/5790
identifier_str_mv	TID:201816636
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799131358647287808

Using data mining techniques to support breast cancer diagnosis

Registros relacionados