XML Schema standards often undergo several revisions to fit application requirements and business demands. In order to be successful, the development process of such standards must be collaborative allowing multiple users to work on the same schema. In this editing environment, the ability to merge branched versions of the schema is significant in certain situations. Using conventional three-way XML merging tools is not suitable for the purpose of merging XML Schema because the tree model of XML Schema is different from that of XML document.
Understanding uncertainties and assessing the risks surrounding business opportunities is essential to support the success of sustainable entrepreneurial initiatives launched on a daily basis. The contribution of this study is the identification of uncertainties surrounding opportunities in the opportunity evaluation stage of the entrepreneurial process and the examination of how the analysis and evaluation of uncertainty factors, with the help of data, can predict the future success of an organization. In the first phase, the uncertainty factors are classified based on their sources and we discuss the likely implications towards new venture success with the help of existing literatures. In the second phase, a success prediction model is implemented using machine learning techniques and strategic analysis. The model is trained in such a way that, when new data emerges, the qualitative data is transformed into quantitative data and the probability of success or failure is calculated as the result output in the pre-start-up phase. The method and findings would be relevant for nascent entrepreneurs and researchers focusing on sustainable technology entrepreneurship.
In today’s big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified.
The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction ($DA^3$) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only $25\%$ label proportions. It shows highly competitive performance even if compared with fully supervised learners with $100\%$ label proportions.
The popularity and rapid growth of social networking sites is undeniable. However, it is hard to guarantee the success and sustainability of these sites. This study will focus on identifying the key success factors for each phase in agile iteration development for social networks. A qualitative and quantitative analysis was adopted using web analytical tools to gather and measure these success factors. A comparative study between popular and unpopular social networking was undertaken to gather realistic data. Results reveal that determinants of success for agile development phases include: goal setting, developing brand image, quality content, trust building, user-centered design, technology and client server platform, service quality, user satisfaction and stability. The successful implementation of these factors will benefit developers and users in order to achieve the success and survival of the social networking website development.
Security of data in cloud computing environment is considered to be critical issue because of the importance and sensitivity of the outsourced data in the cloud, as is the trustworthiness of the cloud service provider. The failure of cloud services and the danger of malicious insiders in the cloud have received intense awareness by cloud users. The aim of this work is to analyse and evaluate an existing MCDB model (Multi-clouds Databases) that utilizes multi-clouds providers. The model adopted a triple modular redundancy (TMR) technique as well as sequential method to develop data trustworthiness of cloud system and after that to improve the data security feature. Also, it. incorporated Shamir's sharing algorithm. The evaluation is done through simulation using cloud computing simulator toolkit.
This study aims to investigate the potential of educational data mining (EDM) to address the issue of delayed completion in undergraduate student thesis courses.The problem of delayed completion of these courses is a common issue that impacts both students and higher education institutions.The study employed clustering analysis to create clusters of thesis topics.The research model was constructed by using expert labeling to assign each thesis title to a computer science ontology standard.Cross-referencing was employed to associate supporting courses with each thesis title, resulting in a labeled dataset with three supporting courses for each thesis title.This study analyzed five different clustering algorithms, including K-Means, DBScan, BIRCH, Gaussian Mixture, and Mean Shift, to identify the best approach for analyzing undergraduate thesis data.The results demonstrated that K-Means clustering was the most efficient method, generating five distinct clusters with unique characteristics.Furthermore, this research investigated the correlation between educational data, specifically GPA and the average grades of courses that support a thesis title and the duration of thesis completion.Our investigation revealed a moderate correlation between GPA, thesis-supporting course average grades, and the time to complete the thesis, with higher academic performance associated with shorter completion times.These moderate results indicate the need for further studies to explore additional factors beyond GPA and the average grades of thesis-supporting courses that contribute to thesis completion delays.This study contributes to understanding and evaluating the educational outcomes within study programs as defined in the curriculum, particularly concerning the design and implementation of thesis topics.Additionally, the clustering results serve as a foundation for future research and offer valuable insights into the potential of using EDM techniques to assist in selecting appropriate thesis topics, thereby reducing the risk of delayed completion.