Dissertação

Compression of activation signals from partitioned deep neural networks exploring temporal correlation

The use of artificial neural networks for object detection, along with advancements in 6G and IoT research, plays an important role in applications such as drone-based monitoring of structures, search and rescue operations, and deployment on hardware platforms like FPGAs. However, a key challenge...

ver descrição completa

Autor principal: SILVA, Lucas Damasceno
Grau: Dissertação
Idioma: eng
Publicado em: Universidade Federal do Pará 2025
Assuntos:
Acesso em linha: https://repositorio.ufpa.br/jspui/handle/2011/16859
Resumo:
The use of artificial neural networks for object detection, along with advancements in 6G and IoT research, plays an important role in applications such as drone-based monitoring of structures, search and rescue operations, and deployment on hardware platforms like FPGAs. However, a key challenge in implementing these networks on such hardware is the need to economize computational resources. Despite substantial advances in computational capacity, implementing devices with ample resources remains challenging. As a solution, techniques for partitioning and compressing neural networks, as well as compressing activation signals (or feature maps), have been developed. This work proposes a system that partitions neural network models for object detection in videos, allocating part of the network to an end device and the remainder to a cloud server. The system also compresses the feature maps generated by the last layers on the end device by exploiting temporal correlation, enabling a predictive compression scheme. This approach allows neural networks to be embedded in low-power devices while respecting the computational limits of the device, the transmission rate constraints of the communication channel between the device and server, and the network’s accuracy requirements. Experiments conducted on pre-trained neural network models show that the proposed system can significantly reduce the amount of data to be stored or transmitted by leveraging temporal correlation, facilitating the deployment of these networks on devices with limited computational power