Every byte and every operation counts when you’re trying to build a faster model, especially if the model needs to run on a device. Neural Architecture Search (NAS) algorithms design complex model architectures by searching a larger model space than what is possible by hand. Various NAS algorithms such as MNasNet and TuNAS have been proposed and found several efficient model architectures including MobileNetV3, EfficientNet.

Here, we present the LayerNAS approach, which reformulates the multiobjective NAS problem in a combinatorial optimization framework to significantly reduce complexity, resulting in a magnitude reduction in the number of model candidates to be searched, requiring less computation. multiple trial searches and discovery of model architectures that perform better overall. Using a search space built on backbones taken from MobileNetV2 and MobileNetV3, we find models in ImageNet with top-1 accuracy up to 4.9% better than current state-of-the-art alternatives.

## Formulation of the problem

NAS solves a variety of problems in a variety of different search spaces. To understand what LayerNAS solves, let’s start with a simple example: You are the owner of GBurger and you design the flagship burger consisting of three layers, each of which has four variants with different costs. The burgers taste different with different mixes of options. You want to make the tastiest burger you can get on a budget.

Build your burger with different options available for each layer, each with different costs and providing different benefits. |

Just like a neural network architecture, the perfect pyramid search space follows a layer pattern, where each layer has several variants with varying cost and performance changes. This simplified model illustrates a general approach to creating search spaces. For example, for convolutional neural network (CNN)-based models such as MobileNet, the NAS algorithm may choose a different number of options—filters, steps or kernel sizes, etc.—for the convolutional layer.

## Method:

We base our approach on search spaces that satisfy two conditions.

- An optimal model can be constructed by using one of the model candidates generated from a previous layer search and applying those search options to the current layer.
- If we set a FLOP limit on the current layer, we can set a limit on the previous layer by reducing the current layer’s FLOPs.

Under these conditions, it is possible to search linearly from layer 1 to layer *n:* knowing that when looking for the best option for a layer *me*, changing any previous layer will not improve the performance of the model. We can then sort the candidates by their value, so only a limited number of candidates are stored in each layer. If two models have the same FLOPs, but one has better accuracy, we only keep the better one and assume that it will not affect the architecture of the following layers. While the search space of a full development will expand exponentially with layers, as the full range of options is available at each layer, our layer cost-based approach allows us to significantly reduce the search space while being able to rigorously justify polynomial complexity. algorithm. Our experimental evaluation shows that within these constraints we are able to identify high performance models.

## NAS as a combinatorial optimization problem

By applying a layer-cost approach, we reduce the NAS to a combinatorial optimization problem. That is, for the layer *me*we can calculate the cost and reward after training with a given component *S: _{me}* . This implies the following combinatorial problem.

*How can we get the best reward if we choose one choice per tier within the spending budget?*This problem can be solved in many different ways, one of the simplest of which is to use dynamic programming, as described in the following dummy code:

while True: # select a candidate to search in Layer i candidate = select_candidate(layeri) if searchable(candidate): # Use the layerwise structural information to generate the children. children = generate_children(candidate) reward = train(children) bucket = bucketize(children) if memorial_table[i][bucket] < reward: memorial_table[i][bucket] = children move to next layer |

LayerNAS dummy code. |

An illustration of the LayerNAS approach, for example trying to build the best stacks on a $7–$9 budget. We have four options for the first layer, resulting in four pyramid candidates. Applying four options in the second layer, we have a total of 16 candidates. We then bucket them into the $1 to $2, $3–$4, $5–$6, and $7–$8 ranges, and keep only the tastiest burgers in each bucket—that’s four candidates. Then, for those four candidates, we construct 16 candidates using the preselected variants for the first two layers and four variants per candidate for the third layer. We scoop them back up, pick the burgers within budget and save the best. |

## Experimental results

When comparing NAS algorithms, we evaluate the following metrics:

*Quality*What is the most accurate model that the algorithm can find?*Stability*How stable is good model selection? Can high-accuracy models be consistently identified in successive trials of the algorithm?*Efficiency*How long does it take to find a high-accuracy model for an algorithm?

We evaluate our algorithm on the standard benchmark NATS-Bench using 100 NAS runs, and we compare with other NAS algorithms previously described in the NATS-Bench paper; random search, ordered evolution and proximity policy optimization. Below we visualize the differences between these search algorithms in the dimensions described above. For each comparison, we record the mean accuracy and the variation in accuracy (the difference is indicated by the shaded circle corresponding to the 25% to 75% interquartile range).

The NATS-Bench size search defines a 5-layer CNN model where each layer can choose from eight different options, each with a different wavelet on the convolutional layers. Our goal is to find the best model with 50% of the FLOPs required by the largest model. The performance of LayerNAS stands out because it formulates the problem in a different way, separating cost and reward to avoid searching through a significant number of irrelevant model architectures. We found that models with fewer channels in earlier layers tend to perform better, which explains how LayerNAS finds better models faster than other algorithms because it avoids models outside the desired cost range. from spending time on. Note that the accuracy curve decreases slightly after longer searches due to the lack of correlation between validation accuracy and test accuracy, i.e. some model architectures with higher validation accuracy have lower test accuracy in NATS-Bench size searches.

We construct search spaces based on MobileNetV2, MobileNetV2 1.4x, MobileNetV3 Small, and MobileNetV3 Large and search for the optimal model architecture under various #MADDs (number of multiplicative additions in the image) constraints. Of all the parameters, LayerNAS finds a model with better accuracy in ImageNet. See the paper for details.

Comparison of models with different #MAdds. |

## Conclusion

In this post, we showed how to reformulate NAS as a combinatorial optimization problem, and proposed LayerNAS as a solution that only requires polynomial search complexity. We compared LayerNAS with existing popular NAS algorithms and showed that it can find improved models on NATS-Bench. We also use the method to find better architectures based on MobileNetV2 and MobileNetV3.

## Gratitude

*We would like to thank Jingyue Shen, Keshav Kumar, Dai Peng, Mingxing Tan, Esteban Real, Peter Yang, Weijun Wang, Qifei Wang, Suani Dong, Xin Wang, Yinjie Miao, Yun Long, Zhuo Wang, Da-Cheng Huan, Deqiang. Chen, Fotis Iliopoulos, Han-Byul Kim, Rino Lee, Andrew Howard, Eric Wee, Reena Panigrahi, Ravi Kumar, and Andrew Tomkins for their input, collaboration, and advice.*