This paper presents a para-virtualized network model to optimize inter-VMs communication performance, which is designed as a shared-memory-based channel crossing barriers to reduce data copies in virtualization systems. The implementation of kernel-based virtual machine (KVM) with para-virtualized I/O device emulation can simplify I/O operation and reduce traps which incur switches between root and non-root modes in KVM. It provides an efficient I/O path and this communication channel is transparent to guest applications while the front-end driver has a standard interface for kernel. This method can achieve performance nearly at inter-processes communication level. The test results on prototype show that the throughput and latency of inter-VMs communication are improved.
With the fast development of AI technologies, deep learning is widely applied for biomedical data analytics and digital healthcare. However, there remain gaps between AI-aided diagnosis and real-world healthcare demands. For example, hemodynamic parameters of the middle cerebral artery (MCA) have significant clinical value for diagnosing adverse perinatal results. Nevertheless, the current measurement procedure is tedious for sonographers. To reduce the workload of sonographers, we propose MCAS-GP , a deep learning-empowered framework that tackles the M iddle C erebral A rtery S egmentation and G ate P roposition. MCAS-GP can automatically segment the region of the MCA and detect the corresponding position of the gate in the procedure of fetal MCA Doppler assessment. In MCAS-GP, a novel learnable atrous spatial pyramid pooling (LASPP) module is designed to adaptively learn multi-scale features. We also propose a novel evaluation metric, Affiliation Index, for measuring the effectiveness of the position of the output gate. To evaluate our proposed MCAS-GP, we build a large-scale MCA dataset, collaborating with the International Peace Maternity and Child Health Hospital of China welfare institute (IPMCH). Extensive experiments on the MCA dataset and two other public surgical datasets demonstrate that MCAS-GP can achieve considerable performance improvement in both accuracy and inference time.
Hypervisor intervention in the virtual I/O event path is a main performance bottleneck for I/O virtualization because of the incurred costly VM exits. The shortcomings of prior software solutions against virtual interrupt delivery, a major source of VM exits, promoted the emergence of the hardware-based Posted-Interrupt (PI) technology. PI can provide non-exit interrupt delivery without compromising any virtualization benefit. However, it only acts on the half of the event path, i.e., the interrupt path, while guests I/O requests may also trigger a large amount of VM exits. Additionally, PI may still suffer a severe latency from the vCPU scheduling while delivering interrupts. Aiming at an optimal event path, we propose ES2 to simultaneously improve bidirectional I/O event delivery between guests and their devices. On the basis of PI, ES2 introduces hybrid I/O handling scheme for efficient I/O request delivery and intelligent interrupt redirection for enhanced I/O responsiveness. It does not require any modification to guest OS. We demonstrate that ES2 greatly reduces I/O-related VM exits with the exit handling time (EHT) below 2.5 percent for TCP streams and 0.1 percent for UDP streams, increases guest throughput by 1.9x for Memcached and 1.6x for Nginx, and keeps guest latency at a low level.
For the pursuit of ubiquitous computing, distributed computing systems containing the cloud, edge devices, and Internet-of-Things devices are highly demanded. However, existing distributed frameworks do not tailor for the fast development of Deep Neural Network (DNN), which is the key technique behind many intelligent applications nowadays. Based on prior exploration on distributed deep neural networks (DDNN), we propose Heterogeneous Distributed Deep Neural Network (HDDNN) over the distributed hierarchy, targeting at ubiquitous intelligent computing. While being able to support basic functionalities of DNNs, our framework is optimized for various types of heterogeneity, including heterogeneous computing nodes, heterogeneous neural networks, and heterogeneous system tasks. Besides, our framework features parallel computing, privacy protection and robustness, with other consideration for the combination of heterogeneous distributed system and DNN. Extensive experiments demonstrate that our framework is capable of utilizing hierarchical distributed system better for DNN and tailoring DNN for real-world distributed system properly, which is with low response time, high performance, and better user experience.
In this paper the authors propose an extensible metadata architecture that meets the information sharing requirements. The key difference between Dublin Core and library cataloging is also discussed.
Video object detection is more challenging than image object detection because of the deteriorated frame quality. To enhance the feature representation, state-of-the-art methods propagate temporal information into the deteriorated frame by aligning and aggregating entire feature maps from multiple nearby frames. However, restricted by feature map's low storage-efficiency and vulnerable content-address allocation, long-term temporal information is not fully stressed by these methods. In this work, we propose the first object guided external memory network for online video object detection. Storage-efficiency is handled by object guided hard-attention to selectively store valuable features, and long-term information is protected when stored in an addressable external data matrix. A set of read/write operations are designed to accurately propagate/allocate and delete multi-level memory feature under object guidance. We evaluate our method on the ImageNet VID dataset and achieve state-of-the-art performance as well as good speed-accuracy tradeoff. Furthermore, by visualizing the external memory, we show the detailed object-level reasoning process across frames.
Symmetric Multi-Processing (SMP) virtual machine (VM), or virtual SMP for short, enables a single virtual machine to span multiple processors, thereby supporting the virtual machine to run resource-intensive applications. In addition to offering higher computing capacity, virtual SMP also offers the opportunity to alleviate the problem of unpredictable I/O responsiveness. To this end, we propose vINT (scheduling status based virtual INterrupt remapping adapTer), a scheme that leverages hardware-assisted interrupt mapping. vINT obtains high efficiency and flexibility by adding a lightweight module in virtual machine monitor (VMM) with no need of changing VMM scheduler and is transparent to guest OS. We implement the prototype in XEN 4.3.0 and conduct evaluations with both micro-benchmarks and macro-benchmarks. The experimental results show that vINT can increase the networking throughput by 5x and can reduce the required execute time of disk I/O by 17.5%, while introducing only a light overhead.
In this paper, we propose an effective method for fast and accurate scene parsing called Bidirectional Alignment Network (BiAlignNet). Previously, one representative work BiSeNet~\cite{bisenet} uses two different paths (Context Path and Spatial Path) to achieve balanced learning of semantics and details, respectively. However, the relationship between the two paths is not well explored. We argue that both paths can benefit each other in a complementary way. Motivated by this, we propose a novel network by aligning two-path information into each other through a learned flow field. To avoid the noise and semantic gaps, we introduce a Gated Flow Alignment Module to align both features in a bidirectional way. Moreover, to make the Spatial Path learn more detailed information, we present an edge-guided hard pixel mining loss to supervise the aligned learning process. Our method achieves 80.1\% and 78.5\% mIoU in validation and test set of Cityscapes while running at 30 FPS with full resolution inputs. Code and models will be available at \url{https://github.com/jojacola/BiAlignNet}.
The development of the Internet of Things (IoT) has allowed devices to collect massive amounts of data, and Artificial Intelligence (AI) provides the ability to analyze those data. Moreover, researchers adopt Distributed Machine Learning (DML) methods to train neural networks collaboratively using different users' data. However, DML suffers from privacy issues, and Federated Learning (FL) has been an effective solution. FL transfers the model instead of the data to protect privacy, but the trained models have low accuracies over local datasets due to statistical heterogeneity. Thus, personalized FL (pFL) algorithms have been proposed to handle such heterogeneous data distribution. However, the communication overhead in the pFL algorithms is significant as it requires transmitting additional information. Thus, we propose Federated Learning with Com-bined Particle Swarm Optimization (FedCPSO) in this paper. FedCPSO replaces the aggregation process of FL algorithms with PSO, and we design a velocity in PSO specifically for FL algorithms, using the best global model, the best client models, and the best neighbor models. In addition, we also implement magnitude pruning to reduce the communication volume. The experimental results illustrate that FedCPSO can reduce up to 50% communication volume while having less than a 2% accuracy drop compared with the State-of-the-art (SOTA) pFL algorithm.
Cloud storage provides a convenient, massive, and scalable storage at low cost, but data privacy is a major concern that prevents users from storing files on the cloud trustingly. One way of enhancing privacy from data owner point of view is to encrypt the files before outsourcing them onto the cloud and decrypt the files after downloading them. However, data encryption is a heavy overhead for the mobile devices, and data retrieval process incurs a complicated communication between the data user and cloud. Normally with limited bandwidth capacity and limited battery life, these issues introduce heavy overhead to computing and communication as well as a higher power consumption for mobile device users, which makes the encrypted search over mobile cloud very challenging. In this paper, we propose traffic and energy saving encrypted search (TEES), a bandwidth and energy efficient encrypted search architecture over mobile cloud. The proposed architecture offloads the computation from mobile devices to the cloud, and we further optimize the communication between the mobile clients and the cloud. It is demonstrated that the data privacy does not degrade when the performance enhancement methods are applied. Our experiments show that TEES reduces the computation time by 23 to 46 percent and save the energy consumption by 35 to 55 percent per file retrieval, meanwhile the network traffics during the file retrievals are also significantly reduced.