KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning
9
Citation
27
Reference
10
Related Paper
Citation Trend
Abstract:
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, which is known as catastrophic forgetting. To learn new task without forgetting, recently, the mask-based learning method (e.g. piggyback [10]) is proposed to address this issue by learning only a binary element-wise mask, while keeping the backbone model fixed. However, the binary mask has limited modeling capacity for new tasks. A more recent work [5] proposes a compress-grow-based method (CPG) to achieve better accuracy for new tasks by partially training backbone model, but with order-higher training cost, which makes it infeasible to be deployed into popular state-of-the-art edge-/mobile-learning. The primary goal of this work is to simultaneously achieve fast and high-accuracy multi task adaption in continual learning setting. Thus motivated, we propose a new training method called Kernelwise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task. Such a hybrid mask can be viewed as a superposition of a binary mask and a properly scaled real-value tensor, which offers a richer representation capability without low-level kernel support to meet the objective of low hardware overhead. We validate KSM on multiple benchmark datasets against recent state-of-the-art methods (e.g. Piggyback, Packnet, CPG, etc.), which shows good improvement in both accuracy and training cost.Keywords:
Benchmark (surveying)
Kernel (algebra)
Explaining the behaviors of deep neural networks, usually considered as black boxes, is critical especially when they are now being adopted over diverse aspects of human life. Taking the advantages of interpretable machine learning (interpretable ML), this work proposes a novel tool called Catastrophic Forgetting Dissector (or CFD) to explain catastrophic forgetting in continual learning settings. We also introduce a new method called Critical Freezing based on the observations of our tool. Experiments on ResNet articulate how catastrophic forgetting happens, particularly showing which components of this famous network are forgetting. Our new continual learning algorithm defeats various recent techniques by a significant margin, proving the capability of the investigation. Critical freezing not only attacks catastrophic forgetting but also exposes explainability.
Margin (machine learning)
Catastrophic failure
Cite
Citations (0)
Selective forgetting or removing information from deep neural networks (DNNs) is essential for continual learning and is challenging in controlling the DNNs. Such forgetting is crucial also in a practical sense since the deployed DNNs may be trained on the data with outliers, poisoned by attackers, or with leaked/sensitive information. In this paper, we formulate selective forgetting for classification tasks at a finer level than the samples' level. We specify the finer level based on four datasets distinguished by two conditions: whether they contain information to be forgotten and whether they are available for the forgetting procedure. Additionally, we reveal the need for such formulation with the datasets by showing concrete and practical situations. Moreover, we introduce the forgetting procedure as an optimization problem on three criteria; the forgetting, the correction, and the remembering term. Experimental results show that the proposed methods can make the model forget to use specific information for classification. Notably, in specific cases, our methods improved the model's accuracy on the datasets, which contains information to be forgotten but is unavailable in the forgetting procedure. Such data are unexpectedly found and misclassified in actual situations.
Deep Neural Networks
Cite
Citations (0)
Abstract Chapter 5, on the work of forgetting, explores how memory has a certain need for forgetting. Especially cognizant of forgetfulness of self, the chapter engages forgetting forward (extension) and backward (distention) through images of Paul the runner, Lot’s wife, and the eagle of forgetting. As in chapter 4 on the work of remembering, the “self” reemerges as constituted in the whole, uncovering its Christological identity in the practice of forgetting together. Forgetting complements remembering at the heart of the work of memory, or participation in the life of Christ through the pushes and pulls of distended temporal life. Chapter 5 is the second of a two-chapter binary: the work of remembering (chapter 4) and the work of forgetting.
Retrieval-induced forgetting
Cite
Citations (0)
Abstract This study analyses national ways of forgetting. Following the eminent British Anthropologists Mary Douglas, I relate here to “forgetting” as “selective remembering, misremembering and disremembering” ( Douglas 2007 : 13). The case study offered here is that of the Israeli‐Jewish forgetting of the uprooting of the Palestinians in the war of 1948. This paper discusses three facets of the collective forgetting: In the first subchapter I analyze the foundations of the Israeli regime of forgetting and discern three mechanisms of removing from memory of selected events: narrative forgetting: the formation and dissemination of an historical narrative; physical forgetting: the destruction of physical remains; and symbolic forgetting: the creation of a new symbolic geography of new places and street names. In the second subchapter , I look at the tenacious ambiguity that lies in the regime of forgetting, as it does not completely erase all the traces of the past. And finally, in the third subchapter , I discuss the growth of subversive memory and counter‐memory that at least indicates the option of a future revision of the Israeli regime of forgetting.
Collective Memory
Retrieval-induced forgetting
Cite
Citations (47)
Various approaches to the problem of computer modelling of assimilation and forgetting of the educational information are considered. With the help of the multi-component model the Ebbinghaus’ curve of forgetting of poorly assimilating information to be remembered through recurrences is confirmed. It is taken into account, that while training there is a transition of weak (poor) knowledge into strong (firm) knowledge, and while forgetting – the return transition of strong knowledge into weak knowledge. Also the model of assimilation and forgetting of the educational material with a high links degree, consisting of information blocks which contain the connected concepts is created. It allows to explain that: 1) while training there is the sharp increase of the understanding level of the studied problem; 2) after termination (ending) of training during some time the level of the pupil's knowledge remains high, and then slowly lowers because of gradual forgetting of the separate learning material elements. The paper shows that the processes of assimilation and forgetting occur according to the logistic law. Along with that the imitating model of training at school which takes into account the knowledge division into three categories and distribution of the educational information on classes is offered. For all cases there are graphs of the knowledge level dependence on time.
Assimilation (phonology)
Cite
Citations (0)
Selective forgetting or removing information from deep neural networks (DNNs) is essential for continual learning and is challenging in controlling the DNNs. Such forgetting is crucial also in a practical sense since the deployed DNNs may be trained on the data with outliers, poisoned by attackers, or with leaked/sensitive information. In this paper, we formulate selective forgetting for classification tasks at a finer level than the samples' level. We specify the finer level based on four datasets distinguished by two conditions: whether they contain information to be forgotten and whether they are available for the forgetting procedure. Additionally, we reveal the need for such formulation with the datasets by showing concrete and practical situations. Moreover, we introduce the forgetting procedure as an optimization problem on three criteria; the forgetting, the correction, and the remembering term. Experimental results show that the proposed methods can make the model forget to use specific information for classification. Notably, in specific cases, our methods improved the model's accuracy on the datasets, which contains information to be forgotten but is unavailable in the forgetting procedure. Such data are unexpectedly found and misclassified in actual situations.
Deep Neural Networks
Cite
Citations (1)
Retrieval-induced forgetting
Motivated forgetting
Cite
Citations (134)
50 subjects were asked to keep a diary of instances in which they realized they had forgotten something. The 750 forgettings recorded were grouped on the basis of nominal similarity, with 64% of them falling into one of 24 categories. The major categories included forgetting to comply with requests, failures of habitual actions, absentmindedness, and forgetting to bring something. Most failures involved the forgetting to perform a future action (i.e., forgetting to do something) as opposed to forgetting facts, names, or other information once known. These results suggest that the failure to retrieve the intention to do something at the appropriate time is the source of many forgettings, and this finding may have implications for the construction of memory inventories.
Similarity (geometry)
Motivated forgetting
Retrieval-induced forgetting
Cite
Citations (138)
Intentional forgetting refers to the attempt to marshal top-down control to purposefully forget, and has been demonstrated in the laboratory using directed forgetting paradigms. Here we asked whether the mechanisms of top-down control can run in the opposite direction to prevent the forgetting of information. That is, can we actively resist unintentional forgetting. Recognition-induced forgetting is an unintentional forgetting effect in which accessing one memory leads to the forgetting of related memories. We showed subjects a ten-minute video to teach them about the recognition-induced forgetting paradigm and how recognition of certain objects unintentionally leads to forgetting of semantically related objects. After testing their comprehension of the video, we conducted a typical recognition-induced forgetting experiment and challenged the subjects to resist this form of unintentional forgetting. Despite their knowledge of the forgetting effect, and the challenge to subjects to resist the forgetting induced by the paradigm, recognition-induced forgetting persisted. We found that a minority of subjects were able to resist the forgetting effect but this resistance was not enough to eliminate the effect when averaging across subjects. These results show that knowledge of this unintentional forgetting phenomenon and the challenge to resist forgetting do not eliminate it, suggesting that it is cognitively impenetrable.
Retrieval-induced forgetting
Motivated forgetting
Cite
Citations (0)