paint-brush
Unleashing 2x Acceleration for DNNs: Transforming Models with Integral Neural Networks in Just 1 Minby@thestage
1,231 reads
1,231 reads

Unleashing 2x Acceleration for DNNs: Transforming Models with Integral Neural Networks in Just 1 Min

by TheStage AI6mAugust 31st, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

In this article, we will guide you through the process of transforming a 4x image super-resolution EDSR model to an INN, then show how to achieve structured pruning of the model. Finally, we will convert the INN back to a discrete DNN and deploy it on an Nvidia GPU for efficient inference.
featured image - Unleashing 2x Acceleration for DNNs: Transforming Models with Integral Neural Networks in Just 1 Min
TheStage AI HackerNoon profile picture
0-item
1-item


Integral Neural Networks (INNs) are flexible architectures that trained once can be transformed to arbitrary user-defined size without any fine-tuning. As sound waves (music) can be sampled at any desired sampling rate (sound quality) INNs can change data and parameters shape (DNN quality) dynamically.


INNs applications. During inference, one can change the size of the network dynamically depending on the hardware or data conditions. Size reduction is structured and leads automatically to compression and acceleration of the neural network.



The team of TheStage.ai presented their paper, “Integral Neural Networks,” at the IEEE/CVF CVPR 2023 conference. This work was recognized as one of only 12 ‘Award Candidate’ papers featured at the event. INNs are a new class of neural networks that combine continuous parameters and integral operators to represent basic layers. During the inference stage, INNs convert to a vanilla DNN representation by discrete sampling from continuous weights. Parameters of such networks are continuous along filter and channel dimensions which leads to structured pruning without fine-tuning just by re-discretization along those dimensions.


In this article, we will guide you through the process of transforming a 4x image super-resolution EDSR model to an INN, then show how to achieve structured pruning of the model. Finally, we will convert the INN back to a discrete DNN and deploy it on an Nvidia GPU for efficient inference. This article will proceed as follows:


  1. A brief introduction to INNs.
  2. Overview of EDSR network for super-resolution task.
  3. TorchIntegral framework application to obtain integral EDSR in a single line of code.
  4. Structural pruning of INN without INN fine-tuning (fast pipeline).
  5. Deploy pruned models on Nvidia GPUs.


For further information and updates, please check the following resources:

INN project site

INN project Github

Support code of this article


Feature maps of discrete EDSR.


Feature maps of INN EDSR. It is easy to see that channels in INNs are organized continuously.


INNs for DNNs pruning without fine-tuning

Layers in INNs are replaced by integral operators, but the practical evaluation of integral operators requires the discretization of input signals for the usage of numerical integration methods. It turns out that the layers in INNs are designed in such a way as to coincide with classical DNN layers (fully-connected, convolutions) after discretization.


Overview of the integral fully-connected layer evaluation.


EDSR 4x Image Super-Resolution pruning

Image super-resolution is a well-known computer vision task where an image should be enhanced with a known or unknown degradation operator. We consider classical super-resolution form using bicubic downsampling as a degradation operator.


Many architectures can be used for image super-resolution tasks, including high-end neural networks based on diffusion models and transformers. In this document, we will focus on the 4x EDSR architecture. EDSR architecture is well-suited for our demonstration as it comprises ResNet (which is widely used in many deep learning problems) and a 4x upsampling block at the end. A schematic description of EDSR can be found in the following figures.


EDSR architecture. The EDSR architecture consists of a sequence of Residual Blocks, followed by an upsample block. This upsample block is comprised of several convolutions and Upsample layers.


Left: Residual block architecture. Right: Upsampling block for 4x super-resolution, each Upsample layer has 2x scale.


Specifics of EDSR architecture pruning

Structured pruning, which involves deleting entire filters or channels, has unique implications for residual blocks, which serve as the primary building blocks in EDSR. Since each state is updated by adding a Conv -> ReLU -> Conv block to the input, the input and output signals must have the same number of parameters. This can be efficiently managed in TorchIntegral by creating a pruning dependency graph. The figure below illustrates that the second convolution of each residual block forms a single group.


Pruning dependency groups. In a sequence of residual blocks, the second convolution forms a single group. 


To prune the second convolution in a residual block, it is necessary to prune all second convolutions in each residual block. However, for a more flexible setup, we prune filters of the first convolutions, and therefore channels of the second convolutions, in all residual blocks.


Conversion of EDSR model to INN EDSR

For conversions from the pre-trained DNNs, we utilize our special filter-channels permutation algorithm with further smooth interpolation. The permutation algorithm preserves model quality while making the weights of the DNNs look like sampled from continuous functions.


DNN to INN conversion. We are using Travelling Salesman Problem formulation to permute discrete weights. After permutation, we obtained more smooth weights while no quality drop of the pre-trained DNN.


import torch
import torchintegral as inn
from super_image import EdsrModel

# creating 4x EDSR model
model = EdsrModel.from_pretrained("eugenesiow/edsr", scale=4).cuda()

# Transform model layers to integral.
# continous_dims and discrete dims define which dimensions
# of parameters tensors should be parametrized continuously
# or stay fixed size as in discrete networks.
# In our case we make all filter and channel dimensions
# to be continuous excluding convolutions of the upsample block.
model = inn.IntegralWrapper(init_from_discrete=True)(
    model, example_input, continuous_dims, discrete_dims
).cuda()


Integration grid tuning: structured post-training pruning of DNNs

Integration grid tuning is the operation that selects smoothly (under SGD optimization) parameter tensors whose filters should be sampled for user-defined numbers. Unlike the filter/channel deletion methods, INNs generate filters that can combine several discrete filters because of interpolation operation.


INNs introduce soft select-by-index operation on the tensor of parameters along filter and channel dimensions.


# Set trainable gird for each integral layer
# Each group should have the same grid
# During the sum of continuous signals
# We need to sample it using the same set of points

for group in model.groups:
    new_size = 224 if 'operator' in group.operations else 128 
    group.reset_grid(inn.TrainableGrid1D(new_size))

# Prepare model for tuning of integration grid
model.grid_tuning()
# Start training
train(model, train_data, test_data)


Integration grid tuning is a fast optimization process that can be carried out on a small calibration set. The result of this optimization is a structurally compressed DNN. Tests on a single Nvidia A4000 show that integration grid tuning on the full Div2k dataset requires 4 minutes. A distributed setup on 4x A4000 demonstrates almost a 4x speedup, resulting in an optimization time of just 1 minute.


During our experiments, we found that 500 images give the same result as a full train set Div2k of 4000 images.


Performance

The resulting INN model can be easily converted to a discrete model and deployed on any NVIDIA GPU. We provide frames per second (FPS) on the RTX A4000 with an input resolution of 64x64. The compressed model achieves almost a 2x speedup. To convert the pruned INN model to a discrete model the following line of code can be used:


model = model.transform_to_discrete()
# then model can be compiled, for instance
# compilation can add an additional 1.4x speedup for inference
model = torch.compile(model, backend='cudagraphs')


Left. 4x Bicubic upscaled image. Right. 50% compressed EDSR model using INN.


Model

Size FP16

FPS RTX A4000

PSNR

EDSR orig.

75 MB

170

30.65

INN EDSR 30%

52 MB

230

30.43

INN EDSR 40%

45 MB

270

30.34

INN EDSR 50%

37 MB

320

30.25

Conclusion

In this article, we have presented an overview of the CVPR2023 award candidate paper, “Integral Neural Networks”. It was applied for the post-training pruning of the 4x EDSR model, achieving nearly a 2x speedup with a single line of code and 1-minute fine-tuning of the integration grid.


In our future articles, we will present more applications of the INNs and will cover more details on efficient model deployment. Stay tuned:


INN project site

INN project Github

Support code of this article


Thank you for your attention!