Copycat CNN: convolutional neural network extraction attack with unlabeled natural images
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Tese |
Idioma: | por |
Título da fonte: | Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) |
Texto Completo: | http://repositorio.ufes.br/handle/10/16914 |
Resumo: | Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems in recent years, leading to many companies developing neuralbased products that require expensive data acquisition, annotation, and model generation. To protect their models from being copied or attacked, companies often deliver them as black-boxes only accessible through APIs, that must be secure, robust, and reliable across different problem domains. However, recent studies have shown that state-of-the-art CNNs have vulnerabilities, where simple perturbations in input images can change the model’s response, and even images unrecognizable to humans can achieve a higher level of confidence in the model’s output. These methods need to access the model parameters, but there are studies showing how to generate a copy (imitation) of a model using its probabilities (soft-labels) and problem domain data. By using the surrogate model, an adversary can perform attacks on the target model with a higher possibility of success. We further explored these vulnerabilities. Our hypothesis is that by using publicly available images (accessible to everyone) and responses that any model should provide (even blackboxes), it is possible to copy a model achieving high performance. Therefore, we proposed a method called Copycat to explore CNN classification models. Our main goal is to copy the model in two stages: first, by querying it with random natural images, such as those from ImageNet, and annotating its maximum probabilities (hard-labels). Then, using these labeled images to train a Copycat model that should achieve similar performance to the target model. We evaluated this hypothesis on seven real-world problems and against a cloud-based API. All Copycat models achieved performance (F1-Score) above 96.4% when compared to target models. After achieving these results, we performed several experiments to consolidate and evaluate our method. Furthermore, concerned about such vulnerability, we also analyzed various existing defenses against the Copycat method. Among the experiments, defenses that detect attack queries do not work against our method, but defenses that use watermarking can identify the target model’s Intellectual Property. Thus, the method proved to be effective in model extraction, having immunity to the literature defenses, but being identified only by watermark defenses. |
id |
UFES_3f338eb79d6a9358c395874eaa4aad93 |
---|---|
oai_identifier_str |
oai:repositorio.ufes.br:10/16914 |
network_acronym_str |
UFES |
network_name_str |
Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) |
repository_id_str |
2108 |
spelling |
Santos, Thiago Oliveira doshttps://orcid.org/0000-0001-7607-635Xhttp://lattes.cnpq.br/5117339495064254Silva, Jacson Rodrigues Correia dahttps://orcid.org/0000-0002-4314-1693http://lattes.cnpq.br/0637308986252382Goncalves, Claudine Santos Baduehttps://orcid.org/0000-0003-1810-8581http://lattes.cnpq.br/1359531672303446Luz, Eduardo Jose da Silvahttps://orcid.org/0000-0001-5249-1559http://lattes.cnpq.br/5385878413487984Almeida Junior, Jurandy Gomes dehttps://orcid.org/0000-0002-4998-6996http://lattes.cnpq.br/4495269939725770Rauber, Thomas Walterhttps://orcid.org/0000000263806584http://lattes.cnpq.br/04625494820327042024-05-30T01:41:49Z2024-05-30T01:41:49Z2023-04-25Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems in recent years, leading to many companies developing neuralbased products that require expensive data acquisition, annotation, and model generation. To protect their models from being copied or attacked, companies often deliver them as black-boxes only accessible through APIs, that must be secure, robust, and reliable across different problem domains. However, recent studies have shown that state-of-the-art CNNs have vulnerabilities, where simple perturbations in input images can change the model’s response, and even images unrecognizable to humans can achieve a higher level of confidence in the model’s output. These methods need to access the model parameters, but there are studies showing how to generate a copy (imitation) of a model using its probabilities (soft-labels) and problem domain data. By using the surrogate model, an adversary can perform attacks on the target model with a higher possibility of success. We further explored these vulnerabilities. Our hypothesis is that by using publicly available images (accessible to everyone) and responses that any model should provide (even blackboxes), it is possible to copy a model achieving high performance. Therefore, we proposed a method called Copycat to explore CNN classification models. Our main goal is to copy the model in two stages: first, by querying it with random natural images, such as those from ImageNet, and annotating its maximum probabilities (hard-labels). Then, using these labeled images to train a Copycat model that should achieve similar performance to the target model. We evaluated this hypothesis on seven real-world problems and against a cloud-based API. All Copycat models achieved performance (F1-Score) above 96.4% when compared to target models. After achieving these results, we performed several experiments to consolidate and evaluate our method. Furthermore, concerned about such vulnerability, we also analyzed various existing defenses against the Copycat method. Among the experiments, defenses that detect attack queries do not work against our method, but defenses that use watermarking can identify the target model’s Intellectual Property. Thus, the method proved to be effective in model extraction, having immunity to the literature defenses, but being identified only by watermark defenses.Redes Neurais Convolucionais (CNNs) têm alcançado alto desempenho em vários problemas nos últimos anos, levando muitas empresas a desenvolverem produtos com redes neurais que exigem altos custos para aquisição de dados, anotação e geração de modelos. Como medida de proteção, as empresas costumam entregar seus modelos como caixas-pretas acessíveis apenas por APIs, que devem ser seguras, robustas e confiáveis em diferentes domínios de problemas. No entanto, estudos recentes mostraram que CNNs estado-da-arte têm vulnerabilidades, onde perturbações simples nas imagens de entrada podem mudar as respostas do modelo, e até mesmo imagens irreconhecíveis por humanos podem alcançar uma predição com alto grau de confiança do modelo. Esses métodos precisam acessar os parâmetros do modelo, mas há estudos mostrando como gerar uma cópia (imitação) de um modelo usando suas probabilidades (soft-labels) e dados do domínio do problema. Com um modelo substituto, um adversário pode efetuar ataques ao modelo alvo com maior possibilidade de sucesso. Este trabalho explora ainda mais essas vulnerabilidades. A hipótese é que usando imagens publicamente disponíveis (que todos tem acesso) e respostas que qualquer modelo deve fornecer (mesmo caixa-preta) é possível copiar um modelo atingindo alto desempenho. Por isso, foi proposto um método chamado Copycat para explorar modelos de classificação de CNN. O objetivo principal foi copiar o modelo em duas etapas: primeiro, consultando-o com imagens naturais aleatórias, como do ImageNet, e anotando suas probabilidades máximas (hard-labels). Depois, usando essas imagens rotuladas para treinar um modelo Copycat que deve alcançar desempenho semelhante ao modelo alvo. Avaliamos essa hipótese em sete problemas do mundo real e contra uma API baseada em nuvem, atingindo desempenhos (F1-Score) em todos modelos Copycat acima de 96,4% quando comparados aos modelos alvo. Após atingir esses resultados, realizamos vários experimentos para consolidar e avaliar nosso método. Além disso, preocupados com essa vulnerabilidade, também analisamos várias defesas existentes contra o método Copycat. Dentre os experimentos, as defesas que detectam consultas de ataque não funcionam contra o método, mas defesas que usam marca d’água conseguem identificar a Propriedade Intelectual do modelo alvo. Assim, o método se mostrou eficaz na extração de modelos, possuindo imunidade às defesas da literatura, sendo identificado apenas por defesas de marca d’água.Texthttp://repositorio.ufes.br/handle/10/16914porUniversidade Federal do Espírito SantoDoutorado em Ciência da ComputaçãoPrograma de Pós-Graduação em InformáticaUFESBRCentro Tecnológicosubject.br-rjbnCiência da ComputaçãoAprendizado ProfundoRedes Neurais ConvolucionaisRoubo de Conhecimento de Redes NeuraisDestilação de ConhecimentoExtração de ModeloRoubo de ModeloCompressão de ModeloCopycat CNN: convolutional neural network extraction attack with unlabeled natural imagestitle.alternativeinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)instname:Universidade Federal do Espírito Santo (UFES)instacron:UFESORIGINALJACSON RODRIGUES CORREIA DA SILVA.pdfapplication/pdf78813398http://repositorio.ufes.br/bitstreams/16cbe44b-29db-439e-81ad-6eb4ee4fa4ca/downloadb1bfbaeed31431270c04c91c7db23564MD5110/169142024-08-22 09:15:46.151oai:repositorio.ufes.br:10/16914http://repositorio.ufes.brRepositório InstitucionalPUBhttp://repositorio.ufes.br/oai/requestopendoar:21082024-10-15T17:53:19.344017Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)false |
dc.title.none.fl_str_mv |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images |
dc.title.alternative.none.fl_str_mv |
title.alternative |
title |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images |
spellingShingle |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images Silva, Jacson Rodrigues Correia da Ciência da Computação Aprendizado Profundo Redes Neurais Convolucionais Roubo de Conhecimento de Redes Neurais Destilação de Conhecimento Extração de Modelo Roubo de Modelo Compressão de Modelo subject.br-rjbn |
title_short |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images |
title_full |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images |
title_fullStr |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images |
title_full_unstemmed |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images |
title_sort |
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images |
author |
Silva, Jacson Rodrigues Correia da |
author_facet |
Silva, Jacson Rodrigues Correia da |
author_role |
author |
dc.contributor.authorID.none.fl_str_mv |
https://orcid.org/0000-0002-4314-1693 |
dc.contributor.authorLattes.none.fl_str_mv |
http://lattes.cnpq.br/0637308986252382 |
dc.contributor.advisor1.fl_str_mv |
Santos, Thiago Oliveira dos |
dc.contributor.advisor1ID.fl_str_mv |
https://orcid.org/0000-0001-7607-635X |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/5117339495064254 |
dc.contributor.author.fl_str_mv |
Silva, Jacson Rodrigues Correia da |
dc.contributor.referee1.fl_str_mv |
Goncalves, Claudine Santos Badue |
dc.contributor.referee1ID.fl_str_mv |
https://orcid.org/0000-0003-1810-8581 |
dc.contributor.referee1Lattes.fl_str_mv |
http://lattes.cnpq.br/1359531672303446 |
dc.contributor.referee2.fl_str_mv |
Luz, Eduardo Jose da Silva |
dc.contributor.referee2ID.fl_str_mv |
https://orcid.org/0000-0001-5249-1559 |
dc.contributor.referee2Lattes.fl_str_mv |
http://lattes.cnpq.br/5385878413487984 |
dc.contributor.referee3.fl_str_mv |
Almeida Junior, Jurandy Gomes de |
dc.contributor.referee3ID.fl_str_mv |
https://orcid.org/0000-0002-4998-6996 |
dc.contributor.referee3Lattes.fl_str_mv |
http://lattes.cnpq.br/4495269939725770 |
dc.contributor.referee4.fl_str_mv |
Rauber, Thomas Walter |
dc.contributor.referee4ID.fl_str_mv |
https://orcid.org/0000000263806584 |
dc.contributor.referee4Lattes.fl_str_mv |
http://lattes.cnpq.br/0462549482032704 |
contributor_str_mv |
Santos, Thiago Oliveira dos Goncalves, Claudine Santos Badue Luz, Eduardo Jose da Silva Almeida Junior, Jurandy Gomes de Rauber, Thomas Walter |
dc.subject.cnpq.fl_str_mv |
Ciência da Computação |
topic |
Ciência da Computação Aprendizado Profundo Redes Neurais Convolucionais Roubo de Conhecimento de Redes Neurais Destilação de Conhecimento Extração de Modelo Roubo de Modelo Compressão de Modelo subject.br-rjbn |
dc.subject.por.fl_str_mv |
Aprendizado Profundo Redes Neurais Convolucionais Roubo de Conhecimento de Redes Neurais Destilação de Conhecimento Extração de Modelo Roubo de Modelo Compressão de Modelo |
dc.subject.br-rjbn.none.fl_str_mv |
subject.br-rjbn |
description |
Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems in recent years, leading to many companies developing neuralbased products that require expensive data acquisition, annotation, and model generation. To protect their models from being copied or attacked, companies often deliver them as black-boxes only accessible through APIs, that must be secure, robust, and reliable across different problem domains. However, recent studies have shown that state-of-the-art CNNs have vulnerabilities, where simple perturbations in input images can change the model’s response, and even images unrecognizable to humans can achieve a higher level of confidence in the model’s output. These methods need to access the model parameters, but there are studies showing how to generate a copy (imitation) of a model using its probabilities (soft-labels) and problem domain data. By using the surrogate model, an adversary can perform attacks on the target model with a higher possibility of success. We further explored these vulnerabilities. Our hypothesis is that by using publicly available images (accessible to everyone) and responses that any model should provide (even blackboxes), it is possible to copy a model achieving high performance. Therefore, we proposed a method called Copycat to explore CNN classification models. Our main goal is to copy the model in two stages: first, by querying it with random natural images, such as those from ImageNet, and annotating its maximum probabilities (hard-labels). Then, using these labeled images to train a Copycat model that should achieve similar performance to the target model. We evaluated this hypothesis on seven real-world problems and against a cloud-based API. All Copycat models achieved performance (F1-Score) above 96.4% when compared to target models. After achieving these results, we performed several experiments to consolidate and evaluate our method. Furthermore, concerned about such vulnerability, we also analyzed various existing defenses against the Copycat method. Among the experiments, defenses that detect attack queries do not work against our method, but defenses that use watermarking can identify the target model’s Intellectual Property. Thus, the method proved to be effective in model extraction, having immunity to the literature defenses, but being identified only by watermark defenses. |
publishDate |
2023 |
dc.date.issued.fl_str_mv |
2023-04-25 |
dc.date.accessioned.fl_str_mv |
2024-05-30T01:41:49Z |
dc.date.available.fl_str_mv |
2024-05-30T01:41:49Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.ufes.br/handle/10/16914 |
url |
http://repositorio.ufes.br/handle/10/16914 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
Text |
dc.publisher.none.fl_str_mv |
Universidade Federal do Espírito Santo Doutorado em Ciência da Computação |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Informática |
dc.publisher.initials.fl_str_mv |
UFES |
dc.publisher.country.fl_str_mv |
BR |
dc.publisher.department.fl_str_mv |
Centro Tecnológico |
publisher.none.fl_str_mv |
Universidade Federal do Espírito Santo Doutorado em Ciência da Computação |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) instname:Universidade Federal do Espírito Santo (UFES) instacron:UFES |
instname_str |
Universidade Federal do Espírito Santo (UFES) |
instacron_str |
UFES |
institution |
UFES |
reponame_str |
Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) |
collection |
Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) |
bitstream.url.fl_str_mv |
http://repositorio.ufes.br/bitstreams/16cbe44b-29db-439e-81ad-6eb4ee4fa4ca/download |
bitstream.checksum.fl_str_mv |
b1bfbaeed31431270c04c91c7db23564 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 |
repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES) |
repository.mail.fl_str_mv |
|
_version_ |
1813022513321476096 |