Modularity in Deep Learning: Tackling Challenges and Exploring Opportunities
Scaling laws have resulted in machine learning models becoming larger than ever before. However, fine-tuning has become expensive due to their size, and alternatives aren’t always reliable. To address some of the persistent challenges, modularity can be implemented in deep learning models. Here, we provide an overview of modularity in deep learning and its potential applications in transfer learning.
Introducing Modularity in Deep Learning
Modularity allows us to separate fundamental knowledge and reasoning abilities about language, vision, etc., from domain and task-specific capabilities, which may help address some of the outstanding challenges in deep learning. This approach also offers a versatile way to extend models to new settings and augment them with new abilities.
Taxonomy of Modular Approaches
Modular approaches can be classified based on four dimensions: computation function, routing function, aggregation function, and training setting. We delve into some of the methods under each category.
Computation Functions in Modules
A neural network can be defined as a composition of functions, each with its own set of parameters. Modules are used to modify the model on the level of individual weights, input, or function output. We detail the three types of computation functions along with examples.
Routing Functions for Module Activation
Routing functions determine which module is active based on an assigned score. Different routing methods include fixed routing (pre-defined logic), hard learned routing (binary selection), and soft learned routing (probability distribution). Routing levels can vary for global or layer-wise decisions.
Aggregation Functions for Module Outputs
Aggregation functions determine how the output of activated modules is combined. They can be categorized as parameter, representation, input, or function aggregation.
Applications of Modularity in Transfer Learning
Modularity can improve performance in transfer learning, where the target task and source task differ. In modularity-based transfer learning, training can occur at different levels, such as module-level transfer learning or parameter-level transfer learning.
Future Directions and Concluding Remarks
The field of modular deep learning is still in its infancy but has promising potential. With further advancements in modularity, we can expect better performance, faster training times, and greater adaptability for deep learning models.
GIPHY App Key not set. Please check settings