In the era of big data, data-driven based classification has become an essential method in smart manufacturing to guide production and optimize inspection. The industrial data obtained in practice is usually time-series data collected by soft sensors, which are highly nonlinear, nonstationary, imbalanced, and noisy. Most existing soft-sensing machine learning models focus on capturing either intra-series temporal dependencies or pre-defined inter-series correlations, while ignoring the correlation between labels as each instance is associated with multiple labels simultaneously. In this paper, we propose a novel graph based soft-sensing neural network (GraSSNet) for multivariate time-series classification of noisy and highly-imbalanced soft-sensing data. The proposed GraSSNet is able to 1) capture the inter-series and intra-series dependencies jointly in the spectral domain; 2) exploit the label correlations by superimposing label graph that built from statistical co-occurrence information; 3) learn features with attention mechanism from both textual and numerical domain; and 4) leverage unlabeled data and mitigate data imbalance by semi-supervised learning. Comparative studies with other commonly used classifiers are carried out on Seagate soft sensing data, and the experimental results validate the competitive performance of our proposed method.
ABSTRACT BACKGROUND Although the advent of two FDA-approved therapies for idiopathic pulmonary fibrosis (IPF) has energized the field, their effects are largely suppressive than pulmonary fibrosis remission- or reversion-inducing. Hence, the pursuit for newer IPF therapeutics continues. Recent studies show that joint analysis of systems biology level information with drug-disease connectivity are effective in discovery of biologically relevant candidate therapeutics. METHODS Publicly available gene expression signatures from IPF patients are used to query large scale perturbagen signature library to discover compounds that can potentially reverse dysregulated gene expression in IPF. Two methods are used to calculate IPF-compound connectivity: gene expression-based connectivity and feature-based connectivity. Identified compounds are further prioritized based on their shared mechanism(s) of action. RESULTS We identified 77 compounds as potential candidate therapeutics for IPF. Of these 39 compounds are either FDA-approved for other diseases or are currently in phase 2/3 clinical trials suggesting their repurposing potential for IPF. Among these compounds are multiple receptor kinase inhibitors (e.g., nintedanib, currently approved for IPF, and sunitinib), aurora kinase inhibitor (barasertib), EGFR inhibitors (erlotinib, gefitinib), calcium channel blocker (verapamil), phosphodiesterase inhibitors (roflumilast, sildenafil), PPAR agonists (pioglitazone), HDAC inhibitors (entinostat), and opioid receptor antagonists (nalbuphine). As a proof-of-concept, we performed in vitro validations with verapamil using lung fibroblasts from IPF and show its potential benefits in pulmonary fibrosis. CONCLUSIONS Since about half of the candidates discovered in this study are either FDA-approved or are currently in clinical trials for other diseases, rapid translation of these compounds as potential IPF therapeutics is feasible. Further, the generalizable, integrative connectivity analysis framework in this study can be readily adapted in early phase drug discovery for other common and rare diseases with transcriptomic profiles.
The growing availability of the data collected from smart manufacturing is changing the paradigms of production monitoring and control. The increasing complexity and content of the wafer manufacturing process in addition to the time-varying unexpected disturbances and uncertainties, make it infeasible to do the control process with model-based approaches. As a result, data-driven soft-sensing modeling has become more prevalent in wafer process diagnostics. Recently, deep learning has been utilized in soft sensing system with promising performance on highly nonlinear and dynamic time-series data. Despite its successes in soft-sensing systems, however, the underlying logic of the deep learning framework is hard to understand. In this paper, we propose a deep learning-based model for defective wafer detection using a highly imbalanced dataset. To understand how the proposed model works, the deep visualization approach is applied. Additionally, the model is then fine-tuned guided by the deep visualization. Extensive experiments are performed to validate the effectiveness of the proposed system. The results provide an interpretation of how the model works and an instructive fine-tuning method based on the interpretation.
Over the last few decades, modern industrial processes have investigated several cost-effective methodologies to improve the productivity and yield of semiconductor manufacturing. While playing an essential role in facilitating real-time monitoring and control, the data-driven soft-sensors in industries have provided a competitive edge when augmented with deep learning approaches for wafer fault-diagnostics. Despite the success of deep learning methods across various domains, they tend to suffer from bad performance on multi-variate soft-sensing data domains. To mitigate this, we propose a soft-sensing ConFormer (CONvolutional transFORMER) for wafer fault-diagnostic classification task which primarily consists of multi-head convolution modules that reap the benefits of fast and light-weight operations of convolutions, and also the ability to learn the robust representations through multi-head design alike transformers. Another key issue is that traditional learning paradigms tend to suffer from low performance on noisy and highly-imbalanced soft-sensing data. To address this, we augment our soft-sensing ConFormer model with a curriculum learning-based loss function, which effectively learns easy samples in the early phase of training and difficult ones later. To further demonstrate the utility of our proposed architecture, we performed extensive experiments on various toolsets of Seagate Technology's wafer manufacturing process which are shared openly along with this work. To the best of our knowledge, this is the first time that curriculum learning-based soft-sensing ConFormer architecture has been proposed for soft-sensing data and our results show strong promise for future use in soft-sensing research domain.
Over the last few decades, modern industrial processes have investigated several cost-effective methodologies to improve the productivity and yield of semiconductor manufacturing. While playing an essential role in facilitating real-time monitoring and control, the data-driven soft-sensors in industries have provided a competitive edge when augmented with deep learning approaches for wafer fault-diagnostics. Despite the success of deep learning methods across various domains, they tend to suffer from bad performance on multi-variate soft-sensing data domains. To mitigate this, we propose a soft-sensing ConFormer (CONvolutional transFORMER) for wafer fault-diagnostic classification task which primarily consists of multi-head convolution modules that reap the benefits of fast and light-weight operations of convolutions, and also the ability to learn the robust representations through multi-head design alike transformers. Another key issue is that traditional learning paradigms tend to suffer from low performance on noisy and highly-imbalanced soft-sensing data. To address this, we augment our soft-sensing ConFormer model with a curriculum learning-based loss function, which effectively learns easy samples in the early phase of training and difficult ones later. To further demonstrate the utility of our proposed architecture, we performed extensive experiments on various toolsets of Seagate Technology's wafer manufacturing process which are shared openly along with this work. To the best of our knowledge, this is the first time that curriculum learning-based soft-sensing ConFormer architecture has been proposed for soft-sensing data and our results show strong promise for future use in soft-sensing research domain.
Background: There are two US Food and Drug Administration (FDA)-approved drugs, pirfenidone and nintedanib, for treatment of patients with idiopathic pulmonary fibrosis (IPF). However, neither of these drugs provide a cure. In addition, both are associated with several drug-related adverse events. Hence, the pursuit for newer IPF therapeutics continues. Recent studies show that joint analysis of systems-biology-level information with drug–disease connectivity are effective in discovery of biologically relevant candidate therapeutics. Methods: Publicly available gene expression signatures from patients with IPF were used to query a large-scale perturbagen signature library to discover compounds that can potentially reverse dysregulated gene expression in IPF. Two methods were used to calculate IPF–compound connectivity: gene expression-based connectivity and feature-based connectivity. Identified compounds were further prioritized if their shared mechanism(s) of action were IPF-related. Results: We found 77 compounds as potential candidate therapeutics for IPF. Of these, 39 compounds are either FDA-approved for other diseases or are currently in phase II/III clinical trials suggesting their repurposing potential for IPF. Among these compounds are multiple receptor kinase inhibitors (e.g. nintedanib, currently approved for IPF, and sunitinib), aurora kinase inhibitor (barasertib), epidermal growth factor receptor inhibitors (erlotinib, gefitinib), calcium channel blocker (verapamil), phosphodiesterase inhibitors (roflumilast, sildenafil), PPAR agonists (pioglitazone), histone deacetylase inhibitors (entinostat), and opioid receptor antagonists (nalbuphine). As a proof of concept, we performed in vitro validations with verapamil using lung fibroblasts from IPF and show its potential benefits in pulmonary fibrosis. Conclusions: As about half of the candidates discovered in this study are either FDA-approved or are currently in clinical trials for other diseases, rapid translation of these compounds as potential IPF therapeutics is possible. Further, the integrative connectivity analysis framework in this study can be adapted in early phase drug discovery for other common and rare diseases with transcriptomic profiles. The reviews of this paper are available via the supplemental material section.
Efforts to maximize the indications potential and revenue from drugs that are already marketed are largely motivated by what Sir James Black, a Nobel Prize-winning pharmacologist advocated—“The most fruitful basis for the discovery of a new drug is to start with an old drug”. However, rational design of drug mixtures poses formidable challenges because of the lack of or limited information about in vivo cell regulation, mechanisms of genetic pathway activation, and in vivo pathway interactions. Hence, most of the successfully repositioned drugs are the result of “serendipity”, discovered during late phase clinical studies of unexpected but beneficial findings. The connections between drug candidates and their potential adverse drug reactions or new applications are often difficult to foresee because the underlying mechanism associating them is largely unknown, complex, or dispersed and buried in silos of information. Discovery of such multi-domain pharmacomodules—pharmacologically relevant sub-networks of biomolecules and/or pathways—from collection of databases by independent/simultaneous mining of multiple datasets is an active area of research. Here, while presenting some of the promising bioinformatics approaches and pipelines, we summarize and discuss the current and evolving landscape of computational drug repositioning.
IEEE BigData 2021 Cup: Soft Sensing at Scale is a data mining competition organized by Seagate Technology, in association with the IEEE BigData 2021 conference. The scope of this challenge is to tackle the task of classifying soft sensing data with machine learning techniques. In this paper we go into the details of the challenge and describe the data set provided to participants. We define the metrics of interest, baseline models, and describe approaches we found meaningful which may be a good starting point for further analysis. We discuss the results obtained with our approaches and give insights on what potential challenges participants may run into. Students, researchers, and anyone interested in working on a major industrial problem are welcome to participate in the challenge!