Network contention between concurrently running jobs on HPC systems is a primary cause of performance variability. Optimizing job allocation and avoiding network sharing are hence crucial to alleviate the potential performance degradation. In order to do so effectively, an understanding of the interference among concurrently running jobs, their communication patterns, and contention in the network is required. In this work, we choose three representative HPC applications from the DOE Design Forward Project and conduct detailed simulations on a torus network model to analyze both intra-and interjob interference. By scrutinizing the communication behaviors of these applications, we identify relationships between these behaviors and the possible interference introduced by different job placement policies. Our analyses illuminate a path toward communication pattern awareness in job placement on HPC systems.
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O subsystems of their runtime environments, not only making I/O the primary bottleneck, but also consuming space that pushes the storage capacities of many computing facilities. These problems are exacerbated by the need to perform data-intensive analytics applications, such as querying the dataset by variable and spatio-temporal constraints, for what current database technologies commonly build query indices of size greater than that of the raw data. To help solve these problems, we present a parallel query-processing engine that can handle both range queries and queries with spatio-temporal constraints, on B-spline compressed data with user-controlled accuracy. Our method adapts to widening gaps between computation and I/O performance by querying on compressed metadata separated into bins by variable values, utilizing Hilbert space-filling curves to optimize for spatial constraints and aggregating data access to improve locality of per-bin stored data, reducing the false positive rate and latency bound I/O operations (such as seek) substantially. We show our method to be efficient with respect to storage, computation, and I/O compared to existing database technologies optimized for query processing on scientific data.
A case study is presented that provides computation caching (memoization) through a microservice architecture to high-performance computing (HPC) applications, particularly the ExMatEx proxy application CoEVP (Co-designed Embedded ViscoPlasticity Scale-bridging). CoEVP represents a class of multiscale physics methods in which inexpensive coarse-scale models are combined with expensive fine-scale models to simulate physical phenomena scalably across multiple time and length scales. Recently, CoEVP has employed interpolation based on previously executed fine-scale models in order to reduce the number of fine-scale evaluations needed to advance the simulation. Building on this work, we envision that distributed microservices composed to provide new capabilities to large-scale parallel applications can be an important component in simulating ever-larger systems at ever-greater fidelities. We explore three aspects of a microservice composition for interpolation-based memoization in our study. First, we present a cost assessment of CoEVP's current fine-scale modeling and interpolation approach. Second, we present an alternative interpolation strategy in which interpolation models are directly constructed on demand from previous fine-scale evaluations: a "database of points" rather than a "database of models." Third, we evaluate the characteristics of the two approaches with and without cross-process sharing of database entries. Lessons learned from the study are used to inform designs for future work in developing distributed, large-scale memoization services for HPC.
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory dataâ"intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns. To address this challenge, we present MLOC, a parallel Multilevel Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) queryâ"driven multivariate, spatio-temporal constraints, (b) precisionâ"driven data analytics, (c) compressionâ"driven data reduction, (d) multi-resolution data sampling, and (e) multiâ"file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multiâ"level architecture, on which all the levels can be flexibly re-ordered by userâ"defined priorities. When tested on queryâ"driven exploration of compressed data, MLOC demonstrates a superior performance compared to any state-of-the-art scientific database management technologies.
Biometric authentication systems are becoming more prevalent for commercial use with computers and smart devices. Biometric systems also have several vulnerable points that can be exploited by a hacker to gain unauthorized access to a system. Replay attacks focus on capturing feature extractors (FEs) during transmission, decrypting, and replaying for illegal access. The Genetic and Evolutionary Feature Extraction (GEFE) technique, developed at North Carolina A&T State University, recently showed promising results in mitigating replay attacks in combination with a feature selection algorithm. Biometric-based presentation attacks, the focus of this work, is another biometric system vulnerability primarily focused on presenting a biometric sample of quality to illegally gain access to secured data. Recently, deep learning techniques to mitigate presentation attacks have shown promising results. However, the accuracy of deep learning-based biometric presentation attack detection (PAD) methods are limited by the quality of the samples provided. In absence of large sets of original biometric sample data, data augmentation has been shown to be successful in generating synthetic biometric image data and improving the performance of deep learning techniques applied. The novelty of this paper lies in the following two aspects: First, a data augmentation technique with Generative Adversarial Networks (GANs) is used to generate comparative synthetic (spoofing) dataset. With the proliferation of deep fakes in media, this technique should provide insight on the GAN technique often used. Once properly trained, the synthetic images are used to create spoofing datasets. Second, the GEFE technique is used in combination with the GANs to generate improved anti-spoofing feature extractors optimized to mitigate presentation attacks. The combination of GEFE and GANs is used to identify those discriminative biometric features used to mitigate synthetic presentation attacks. The GEFE + GAN technique outperforms the LBP and GEFE techniques alone in overall identification and verification results on spoofing datasets.