As an important branch of Nature Language Processing (NLP), how to extract useful text information and effective long-range associations has always been a bottleneck for text classification. With the great effort of deep learning researchers, deep Convolutional Neural Networks (CNNs) have made remarkable achievements in Computer Vision but still controversial in NLP tasks. In this paper, we propose a novel deep CNN named Deep Pyramid Temporal Convolutional Network (DPTCN) for short text classification, which is mainly consisting of concatenated embedding layer, causal convolution, 1/2 max pooling down-sampling and residual blocks. It is worth mentioning that our work was highly inspired by two well-designed models: one is temporal convolutional network for sequential modeling; another is deep pyramid CNN for text categorization; as their applicability and pertinence remind us how to build a model in a special domain. In the experiments, we evaluate the proposed model on 7 datasets with 6 models and analyze the impact of three different embedding methods. The results prove that our work is a good attempt to apply word-level deep convolutional network in short text classification.
The dissociation between data management and data ownership makes it difficult to protect data security and privacy in cloud storage systems. Traditional encryption technologies are not suitable for data protection in cloud storage systems. A novel multi-authority proxy re-encryption mechanism based on ciphertext-policy attribute-based encryption (MPRE-CPABE) is proposed for cloud storage systems. MPRE-CPABE requires data owner to split each file into two blocks, one big block and one small block. The small block is used to encrypt the big one as the private key, and then the encrypted big block will be uploaded to the cloud storage system. Even if the uploaded big block of file is stolen, illegal users cannot get the complete information of the file easily. Ciphertext-policy attribute-based encryption (CPABE) is always criticized for its heavy overload and insecure issues when distributing keys or revoking user's access right. MPRE-CPABE applies CPABE to the multi-authority cloud storage system, and solves the above issues. The weighted access structure (WAS) is proposed to support a variety of fine-grained threshold access control policy in multi-authority environments, and reduce the computational cost of key distribution. Meanwhile, MPRE-CPABE uses proxy re-encryption to reduce the computational cost of access revocation. Experiments are implemented on platforms of Ubuntu and CloudSim. Experimental results show that MPRE-CPABE can greatly reduce the computational cost of the generation of key components and the revocation of user's access right. MPRE-CPABE is also proved secure under the security model of decisional bilinear Diffie-Hellman (DBDH).
The membership inference attack refers to the attacker's purpose to infer whether the data sample is in the target classifier training dataset. The ability of an adversary to ascertain the presence of an individual constitutes an obvious privacy threat if relate to a group of users that share a sensitive characteristic. Many defense methods have been proposed for membership inference attack, but they have not achieved the expected privacy effect. In this paper, we quantify the impact of these choices on privacy in experiments using logistic regression and neural network models. Using both formal and empirical analyses, we illustrate that differential privacy and L2 regularization can effectively prevent member inference attacks.
Federated deep learning has been widely used in various fields. To protect data privacy, many privacy-preservingapproaches have been designed and implemented in various scenarios. However, existing works rarely consider a fundamental issue that the data shared by certain users (called irregular users ) may be of low quality. Obviously, in a federated training process, data shared by many irregular users may impair the training accuracy, or worse, lead to the uselessness of the final model. In this article, we propose PPFDL, a Privacy-Preserving Federated Deep Learning framework with irregular users . In specific, we design a novel solution to reduce the negative impact of irregular users on the training accuracy, which guarantees that the training results are mainly calculated from the contribution of high-quality data. Meanwhile, we exploit Yao's garbled circuits and additively homomorphic cryptosystems to ensure the confidentiality of all user-related information. Moreover, PPFDL is also robust to users dropping out during the whole implementation. This means that each user can be offline at any subprocess of training, as long as the remaining online users can still complete the training task. Extensive experiments demonstrate the superior performance of PPFDL in terms of training accuracy, computation, and communication overheads.
Abstract Medication combination recommendation is critical in clinic, since accurately predicting therapeutic drug can provide essential decision support to physicians. However, current approaches do not consider the multilevel structure of electronic health record (EHR) data or the hierarchical dependencies between multiple visits, leading to suboptimal recommendations. To address these limitations, we propose a novel hierarchical feedback interaction network (HIFINet) to utilize an examination-diagnosis-treatment hierarchical network for modeling the inherent multilevel structure of EHR data. The feedback long short-term memory network called FeLSTM, which is the basic unit of our hierarchical network, performs hierarchical interactions and leverages change information as feedback to propagate forward among different levels. Additionally, HIFINet contains four modules. First, an embedding module is designed to learn the health information representation of patients. Second, a three-layer time-series learning module is employed to capture temporal dependencies within each sequence. Next, a differential feedback interaction module is developed to capture the difference features between visits. Finally, an attention fusion module is used to learn a comprehensive representation of the patient’s health information and to recommend next multiple treatment medications. HIFINet is compared with state-of-the-art approaches on a real-world dataset. The results indicate that HIFINet outperforms other approaches, offering more accurate recommendations.
The widespread use of edge devices in E-Health such as smartphones and wearables means richer electronic health records (EHR) are becoming available. Training deep learning models on these data can effectively improve the quality of healthcare services. Recently, federated learning (FL) has received extensive attention in E-Health because it can train a model by only sharing gradients without disclosing the original EHR of owners. In this case, however, the adversary can still violate EHR owners' privacy based on shared gradients. To mitigate privacy threat, several privacy-preserving FL protocols have been proposed by utilizing different cryptography techniques. Unfortunately, existing privacy-preserving FL schemes do not take into account irrelevant updates, which are useless for the convergence of the global model. This may reduce the predictive accuracy and worse may lead to the uselessness of the final model. In this paper, we propose PFL-IU, an efficient and privacy-preserving FL framework that is compatible with irrelevant updates. Specifically, we first design a communication-efficient secure aggregation protocol by using a non-interactive key generation algorithm. Then we present a sign method to mitigate the negative impact incurred by irrelevant updates, which will accelerate model convergence and improve predictive accuracy. Moreover, PFL-IU is robust to EHR owners' dropout during the whole training phase. Extensive experiments using the real-world dataset demonstrate that PFL-IU can achieve better performance in terms of accuracy, convergence and efficiency.
The goal of unsupervised domain adaptation aims to utilize labeled data from source domain to annotate the target-domain data, which has none of the labels. Existing work uses Siamese network-based models to minimize the domain discrepancy to learn a domain-invariant feature. Alignment of the second-order statistics (covariances) of source and target distributions has been proven an effective method. Previous papers use Euclidean methods or geodesic methods (log-Euclidean) to measure the distance. However, covariances lay on a Riemannian manifold, and both methods cannot accurately calculate the Riemannian distance, so they cannot align the distribution well. To tackle the distribution alignment problem, this paper proposes mapped correlation alignment (MCA), a novel technique for end-to-end domain adaptation with deep neural networks. This method maps covariances from Riemannian manifold to reproducing kernel Hilbert space and uses Gaussian radial basis function-based positive definite kernels on manifolds to calculate the inner product on reproducing kernel Hilbert space, and then uses Euclidean metric accurate measuring the distance to align the distribution better. This paper builds an end-to-end model to minimize both the classification loss and the MCA loss. The model can be trained efficiently using back-propagation. Experiments show that the MCA method yields the state-of-the-art results on standard domain adaptation data sets.
Pidgin is one of the special kinds of language variation. Historically, Chinese pidgin has undergone such four stages as Chinese-Portugal pidgin, Cantonese-English pidgin, Shanghai-English pidgin and modern Chinese-English pidgin. As one of the most important ingredients of modern Chinese-English pidgin, Chinese cyber-pidgin is getting more and more popular in recent years. Sociolinguistically, Chinese-Portugal pidgin, Cantonese-English pidgin and Shanghai-English pidgin resulted from the political, military, economical invasion of the imperialists so that it is viewed as a colonial language variable, but Chinese cyber-pidgin is quite different from the pidgin at those different historical stages in that it results from foreign cultural penetration and the importance of English language proficiency that Chinese educational departments or personnel departments emphasize. Therefore it, in fact, acts as the acculturation model of pidgin. Now, the majority of the speech community members of Chinese cyber-pidgin are Chinese so that most of the words or phrases in a sentence are Chinese mixed with some loan words, most of which are derived from English. Morphologically, the loan words used in Chinese cyber-pidgin can be classified into two kinds: the word of content morpheme and the word of allomorph, to which some word-forming approaches have been employed such as blending, clipping, abbreviation, prefix-word blending, number or number-letter blending, partial tones or even drawing, etc. The invention and popularization of Chinese cyber-pidgin are found to be based on such integrated motivations as psychological motivation, expressive motivation, logical motivation, rhetorical motivation, aesthetic motivation and regional motivation. In addition, Chinese cyber-pidgin has some underlying and powerful impacts on the invention and development of Chinese net cultural neology, and those different approaches to allomorph in the system of the Chinese cyber-pidgin have been so widely employed that an increasing number of lexical, phonological or syntactic variations have been found in Chinese net cultural neology.