The use of digital forensics tools has become common in typical crime investigations involving computing and communication devices. As with any evidence in criminal investigations, the preservation of digital evidence is of critical importance for the success of the investigation. Cryptographic Hash Functions (CHFs) are used by digital forensic tools to ensure the preservation of digital evidence during the acquisition and analysis of evidence. These tools make the use of the CHFs during the acquisition process to ensure that the created image of the evidence is accurate. The CHFs that are currently in use are serial in nature and can be time consuming when working with the large data sets. We propose a new parallel CHF transformation to speed up the image creation process by a factor of 6.5 over existing methods. We discuss the use of the parallel algorithm in the image creation process and compare the results with the existing sequential methods.
Digital evidence
Computer forensics
MD5
Cite
Citations (4)
Couseware is one of the most important learning resources in distance education,however it is difficult to record and evaluate the courseware learning process. A system to record and trace CD-ROM or online multimedia courseware learning process is now designed and implemented,thus providing some important data for courseware learning process to formulate and adjust formative assessment and teaching strategy.
Tracing
TRACE (psycholinguistics)
Cite
Citations (0)
Recent innovations in Big Data have enabled major strides forward in our ability to glean important insights from massive amounts of data, and to use these insights to make better decisions. Underlying many of these innovations is a computational paradigm known as MapReduce, which enables computational processes to be scaled up to very large sizes and to take advantage of cloud computing. While very powerful, MapReduce also requires a nontrivial shift in algorithm design strategies. In this paper we provide an overview of MapReduce and types of problems it is suited for. We discuss general strategies for designing MapReduce-based algorithms and provide an illustration using social media analytics.
Cite
Citations (4)
Recent advancements in location-aware analytics have created novel opportunities in different domains. In the area of process mining, enriching process models with geolocation helps to gain a better understanding of how the process activities are executed in practice. In this paper, we introduce our idea of geo-enabled process modeling and report on our industrial experience. To this end, we present a real-world case study to describe the importance of considering the location in process mining. Then we discuss the shortcomings of currently available process mining tools and propose our novel approach for modeling geo-enabled processes focusing on 1) increasing process interpretability through geo-visualization, 2) incorporating location-related metadata into process analysis, and 3) using location-based measures for the assessment of process performance. Finally, we conclude the paper by future research directions.
Process mining
Interpretability
Process modeling
Geolocation
Business process discovery
Cite
Citations (1)
Neural Machine Translation (NMT) model has become the mainstream technology in machine translation. The supervised neural machine translation model trains with abundant of sentence-level parallel corpora. But for low-resources language or dialect with no such corpus available, it is difficult to achieve good performance. Researchers began to focus on unsupervised neural machine translation (UNMT) that monolingual corpus as training data. UNMT need to construct the language model (LM) which learns semantic information from the monolingual corpus. This paper focuses on the pre-training of LM in unsupervised machine translation and proposes a pre-training method, NER-MLM (named entity recognition masked language model). Through performing NER, the proposed method can obtain better semantic information and language model parameters with better training results. In the unsupervised machine translation task, the BLEU scores on the WMT’16 English–French, English–German, data sets are 35.30, 27.30 respectively. To the best of our knowledge, this is the highest results in the field of UNMT reported so far.
Named Entity Recognition
Cite
Citations (12)
It is often a problem to combine domain knowledge and data science knowledge in applications of industrial data analytics. Data scientists usually spend a lot of time to understand the domain to develop an application while domain experts lack the skills to interpret results of underlying mathematical models. This leads to difficulties when adapting to changes, handling issues and transfer to similar scenarios, and thus to a lack of acceptance of data analytics applications in industrial companies. Based on the Cross Industry Standard Process for Data Mining (CRISP-DM), we propose a novel process model which integrates training of domain experts to enable them to become citizen data scientists to independently develop and implement data analytics applications. We qualitatively evaluated our process model on a storage location assignment problem in the warehouse of a manufacturer of high-end domestic appliances.
Data Analysis
Subject-matter expert
Cite
Citations (7)
Automatic Process Discovery aims at developing algorithmic methodologies for the extraction and elicitation of process models as described in data. While Process Discovery from event-log data is a well established area, that has already moved from research to concrete adoption in a mature manner, Process Discovery from text is still a research area at an early stage of development, which rarely scales to real world documents. In this paper we analyze, in a comparative manner, reference state-of-the-art literature, especially for what concerns the techniques used, the process elements extracted and the evaluations performed. As a result of the analysis we discuss important limitations that hamper the exploitation of recent Natural Language Processing techniques in this field and we discuss fundamental limitations and challenges for the future concerning the datasets, the techniques, the experimental evaluations, and the pipelines currently adopted and to be developed in the future.
Business process discovery
Process mining
Cite
Citations (1)
This paper analyses and proposes a solution to the dynamic MPI communicator reconstruction, which permits a newly created MPI process to join an existing process group freely, and resolves the problem of the communications between the newly created processes and the existing processes in MPI fault tolerance and process migration.
Message Passing Interface
Process migration
Cite
Citations (0)
Process mining is a paradigm shift from traditional process understanding methodologies like interviews and surveys to a data-driven understanding of the actual digital processes. It analyzes business processes by applying algorithms to the event data generated by digital systems. The chapter provides insight into various uses of process mining in different social and economic processes, with examples from past works demonstrating how practical process mining is in detecting and mitigating bottlenecks in these sectors. Then the chapter further delves into the details of process mining algorithms, key features, and metrics that can help practitioners and researchers evaluate process mining for their work. It also highlights some data quality issues in the event log that can inhibit obtaining fair results from process models. Additionally, some current limitations and concerns are described for creating awareness and building over the body of knowledge in the process and sequential mining techniques.
Process mining
Business process discovery
Process modeling
Cite
Citations (0)