Shin et al [19] and McKay et al [15] previously applied tree compression and semantics-based simplification to study the distribution of building blocks in evolving Genetic Programming populations. However their method could only give static estimates of the degree of repetition of building blocks in one generation at a time, supplying no information about the flow of building blocks between generations. Here, we use a state-of-the-art tree compression algorithm, xmlppm, to estimate the extent to which frequent building blocks from one generation are still in use in a later generation. While they compared the behaviour of different GP algorithms on one specific problem -- a simple symbolic regression problem -- we extend the analysis to a more complex problem, a symbolic regression problem to find a Fourier approximation to a sawtooth wave, and to a Boolean domain, odd parity.
The role of semi-supervised network intrusion detection systems is becoming increasingly important in the ever-changing digital landscape. Despite the boom in commercial and research interest, there are still many concerns over accuracy yet to be addressed. Two of the major limitations contributing to this concern are reliably learning the underlying probability distribution of normal network data and the identification of the boundary between the normal and anomalous data regions in the latent space. Recent research has proposed many different ways to learn the latent representation of normal data in a semi-supervised manner, such as using Clustering-based Autoencoder (CAE) and hybridized approaches of Principal Component Analysis (PCA) and CAE. However, such approaches are still affected by these limitations, predominantly due to an overreliance on feature engineering, or the inability to handle the large data dimensionality. In this paper, we propose a novel Cluster Variational Autoencoder (CVAE) deep learning model to overcome the aforementioned limitations and increase the efficiency of network intrusion detection. This enables a more concise and dominant representation of the latent space to be learnt. The probability distribution learning capabilities of the VAE are fully exploited to learn the underlying probability distribution of the normal network data. This combination enables us to address the limitations discussed. The performance of the proposed model is evaluated using eight benchmark network intrusion datasets: NSL-KDD, UNSW-NB15, CICIDS2017 and five scenarios from CTU13 (CTU13-08, CTU-13-09, CTU13-10, CTU13-12 and CTU13-13). The experimental results achieved clearly demonstrate that the proposed method outperforms semi-supervised approaches from existing works.
In some Evolutionary Computations such as Genetic Algorithms or Evolution Strategies, it is well known that the choice of genetic operator rates is important to the success of these algorithms. Researchers mainly focused on choosing genetic operator rates appropriate to specific problems. Several papers work on adapting crossover and mutation rate in evolutionary algorithms showing potential results that adaptive algorithms may out-perform non-adaptive ones. In this paper, we examine the application of adaptive operator selection rates to genetic programming and propose a new algorithm for self-adapting crossover and mutation rates in the specific genetic programming Tree Adjoining Grammar Guided Genetic Programming (TAG3P). Experimental results showed that our proposed algorithm improved the performance of TAG3P than previous works.
Standard genetic programming genotypes are generally highly disorganized and poorly structured, with little code replication. This is also true of existing developmental genetic programming systems, which exploit regularity by using procedures, functional modules, or macros and parameters passing. By contrast, in biological developmental evolution, nature works through code duplication to generate modularity, regularity and hierarchy. Previous developmental approaches have only one level of evaluation for each individual - an approach which limits the advantages of modularity to the species rather than the individual, and hence inhibits selection of modularity. We argued that evaluation during development is necessary for structural regularity to emerge. To confirm the benefits of developmental evaluation and the contribution of code duplication to nature, our new developmental process uses a new representation. Developmental tree adjoining grammar guided GP (DTAG3P) uses L-systems to encode tree adjoining grammar guided (TAG) derivation trees, and has been investigated. We have demonstrated scalable solutions to difficult families of problems, and have evidence that this performance is linked to the generation and exploitation of structural regularities in the solutions.
We investigate interactions between evolution, development and lifelong layered learning in a combination we call evolutionary developmental evaluation (EDE), using a specific implementation, developmental tree-adjoining grammar guided genetic programming (GP). The approach is consistent with the process of biological evolution and development in higher animals and plants, and is justifiable from the perspective of learning theory. In experiments, the combination is synergistic, outperforming algorithms using only some of these mechanisms. It is able to solve GP problems that lie well beyond the scaling capabilities of standard GP. The solutions it finds are simple, succinct, and highly structured. We conclude this paper with a number of proposals for further extension of EDE systems.
In this paper, we describe a new test problem for genetic programming (GP), ORDERTREE. We argue that it is a natural analogue of ONEMAX, a popular GA test problem, and that it also avoids some of the known weaknesses of other benchmark problems for Genetic Programming. Through experiments, we show that the difficulty of the problem can be tuned not only by increasing the size of the problem, but also by increasing the non-linearity in the fitness structure.