Kris Nikov

University of Bristol

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Jose Nunez‐Yanez

Linköping University

Kerstin Eder

University of Bristol

Simon Wegener

AbsInt (Germany)

Mohammad Hosseinabady

University of Bristol

Shashank Jadhav

Universität Hamburg

Yoann Marquer

University of Luxembourg

Zbigniew Chamski

University of Bristol

Emad Ebeid

University of Southern Denmark

Clemens Grelck

University of Amsterdam

Kyriakos Georgiou

University of Bristol

Cooperative Institutions

University of Bristol

University of Southern Denmark

Universidad de Málaga

Thales (Spain)

Institut de Recherche en Informatique et Systèmes Aléatoires

Maersk (Denmark)

University of Manchester

Yale University

Université de Rennes

Centre National de la Recherche Scientifique

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Intra- and inter-core power modelling for single-ISA heterogeneous processors

International Journal of Embedded Systems (2019)

Kris Nikov Jose Nunez‐Yanez

This research presents a systematic methodology for producing accurate power models for single instruction set architecture (ISA) heterogeneous processors. We use the hardware event counters from the processor performance monitoring unit (PMU) to accurately capture the CPU states and ordinary least squares (OLS), assisted by automated event selection algorithms, to compute the power models. Several estimators for single-thread and multi-thread benchmarks are proposed capable of performing power predictions across different frequency levels for one processor as well as between the heterogeneous processors with less than 3% error. The models are compared to related work showing significant improvement in accuracy and good computational efficiency which makes them suitable for run-time deployment.

Many core

10.1504/ijes.2020.10021023

Cite

Citations (3)

Evaluation of Early-exit Strategies in Low-cost FPGA-based Binarized Neural Networks

2022 25th Euromicro Conference on Digital System Design (DSD) (2022)

Minxuan Kong Kris Nikov Jose Nunez‐Yanez

In this paper, we investigate the application of early-exit strategies to quantized neural networks with binarized weights, mapped to low-cost FPGA SoC devices. The increasing complexity of network models means that hardware reuse and heterogeneous execution are needed and this opens the opportunity to evaluate the prediction confidence level early on. We apply the early-exit strategy to a network model suitable for ImageNet classification that combines weights with floating-point and binary arithmetic precision. The experiments show an improvement in inferred speed of around 20% using an early-exit network, compared with using a single primary neural network, with a negligible accuracy drop of 1.56%.

10.1109/dsd57027.2022.00035

Cite

Citations (0)

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 (2023)

Benjamin Rouxel Christopher Brown Emad Ebeid Kerstin Eder Heiko Falk

Non-functional properties, such as energy, time, and security (ETS) are becoming increasingly important for the programming of Cyber-Physical Systems (CPS).This paper describes TeamPlay, a research project funded under the EU Horizon 2020 programme between January 2018 and June 2021.TeamPlay aimed to provide the system designer with a toolchain for developing embedded applications where ETS properties are first-class citizens, allowing the developer to reflect directly on energy, time and security properties at the source code level.In this paper we give an overview of the TeamPlay methodology, introduce the challenges and solutions of our approach and summarise the results achieved.Overall, applying our TeamPlay methodology led to an improvement of up to 18% performance and 52% energy usage over traditional approaches.

Electronic design automation

10.23919/date56975.2023

Cite

Citations (1)

High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip.

arXiv (Cornell University) (2020)

Kris Nikov Mohammad Hosseinabady Rafael Asenjo Andrés Rodríguez Ángeles Navarro

This paper presents a methodology for simultaneous heterogeneous computing, named ENEAC, where a quad core ARM Cortex-A53 CPU works in tandem with a preprogrammed on-board FPGA accelerator. A heterogeneous scheduler distributes the tasks optimally among all the resources and all compute units run asynchronously, which allows for improved performance for irregular workloads. ENEAC achieves up to 17\% performance improvement \ignore{and 14\% energy usage reduction,} when using all platform resources compared to just using the FPGA accelerators and up to 865\% performance increase \ignore{and up to 89\% energy usage decrease} when using just the CPU. The workflow uses existing commercial tools and C/C++ as a single programming language for both accelerator design and CPU programming for improved productivity and ease of verification.

Symmetric multiprocessor system

Multi-core processor

Source

Cite

Citations (0)

Analysis of Graph Processing in Reconfigurable Devices for Edge Computing Applications

2022 25th Euromicro Conference on Digital System Design (DSD) (2022)

Kaan Olgu Kris Nikov Jose Nunez‐Yanez

Graph processing is an area that has received significant attention in recent years due to the substantial expansion in industries relying on data analytics. Alongside the vital role of finding relations in social networks, graph processing is also widely used in transportation to find optimal routes and biological networks to analyse sequences. The main bottleneck in graph processing is irregular memory accesses rather than computation intensity. Since computational intensity is not a driving factor, we propose a method to perform graph processing at the edge more efficiently. We believe current cloud computing solutions are still very costly and have latency issues. The results demonstrate the benefits of a dedicated sparse graph processing algorithm compared with dense graph processing when analysing data with low density. As graph datasets grow exponentially, traversal algorithms such as breadth-first search (BFS), fundamental to many graph processing applications and metrics, become more costly to compute. Our work focuses on reviewing other implementations of breadth-first search algorithms designed for low power systems and proposing our solution that utilises advanced enhancements to achieve a significant performance boost up to 9.2x better performance in terms of MTEPS compared to other state-of-the-art solutions with a power usage of 2.32W.

Graph traversal

Graph Algorithms

Power graph analysis

Implementation

Wait-for graph

Breadth-first search

10.1109/dsd57027.2022.00012

Cite

Citations (1)

Run-Time Power Modelling in Embedded GPUs with Dynamic Voltage and Frequency Scaling

arXiv (Cornell University) (2020)

Jose Nunez‐Yanez Kris Nikov Kerstin Eder Mohammad Hosseinabady

This paper investigates the application of a robust CPU-based power modelling methodology that performs an automatic search of explanatory events derived from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is configured with DVFS enabled and multiple CUDA benchmarks are used to train and test models optimized for each frequency and voltage point. These optimized models are then compared with a simpler unified model that uses a single set of model coefficients for all frequency and voltage points of interest. To obtain this unified model, a number of experiments are conducted to extract information on idle, clock and static power to derive power usage from a single reference equation. The results show that the unified model offers competitive accuracy with an average 5\% error with four explanatory variables on the test data set and it is capable to correctly predict the impact of voltage, frequency and temperature on power consumption. This model could be used to replace direct power measurements when these are not available due to hardware limitations or worst-case analysis in emulation platforms.

Frequency scaling

Clock rate

10.48550/arxiv.2006.12176

Cite

Citations (0)

Accurate Energy Modelling on the Cortex-M0 Processor for Profiling and Static Analysis

2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS) (2022)

Kris Nikov Kyriakos Georgiou Zbigniew Chamski Kerstin Eder Jose Nunez‐Yanez

Energy modelling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific config-urations, neither are they suitable for static energy consumption estimation. This paper introduces a set of comprehensive energy models for Arm's Cortex-M0 processor, ready to support energy-aware development of edge computing applications using either profiling- or static-analysis-based energy consumption estimation. We use a commercially representative physical platform together with a custom modified Instruction Set Simulator to obtain the physical data and system state markers used to generate the models. The models account for different processor configurations which all have a significant impact on the execution time and energy consumption of edge computing applications. Unlike existing works, which target a very limited set of applications, all developed models are generated and validated using a very wide range of benchmarks from a variety of emerging IoT application areas, including machine learning and have a prediction error of less than 5%.

Profiling (computer programming)

Energy Modeling

ARM architecture

10.1109/icecs202256217.2022.9971086

Cite

Citations (2)

Run-Time Power Modelling in Embedded GPUs with Dynamic Voltage and Frequency Scaling

Jose Nunez‐Yanez Kris Nikov Kerstin Eder Mohammad Hosseinabady

This paper investigates the application of a robust CPU-based power modelling methodology that performs an automatic search of explanatory events derived from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is configured with DVFS enabled and multiple CUDA benchmarks are used to train and test models optimized for each frequency and voltage point. These optimized models are then compared with a simpler unified model that uses a single set of model coefficients for all frequency and voltage points of interest. To obtain this unified model, a number of experiments are conducted to extract information on idle, clock and static power to derive power usage from a single reference equation. The results show that the unified model offers competitive accuracy with an average 5% error with four explanatory variables on the test data set and it is capable to correctly predict the impact of voltage, frequency and temperature on power consumption. This model could be used to replace direct power measurements when these are not available due to hardware limitations or worst-case analysis in emulation platforms.

Frequency scaling

Clock rate

10.1145/3381427.3381429

Cite

Citations (1)

Evaluation of Hybrid Run-Time Power Models for the ARM Big.LITTLE Architecture

Kris Nikov Jose Nunez‐Yanez Matthew Horsnell

Heterogeneous processors, formed by binary compatible CPU cores with different microarchitectures, enable energy reductions by better matching processing capabilities and software application requirements. This new hardware platform requires novel techniques to manage power and energy to fully utilize its capabilities, particularly regarding the mapping of workloads to appropriate cores. In this paper we validate relevant published work related to power modelling for heterogeneous systems and propose a new approach for developing run-time power models that uses a hybrid set of physical predictors, performance events and CPU state information. We demonstrate the accuracy of this approach compared with the state-of-the-art and its applicability to energy aware scheduling. Our results are obtained on a commercially available platform built around the Samsung Exynos 5 Octa SoC, which features the ARM big.LITTLE heterogeneous architecture.

ARM architecture

Symmetric multiprocessor system

10.1109/euc.2015.32

Cite

Citations (10)

A Comprehensive and Accurate Energy Model for Arm's Cortex-M0 Processor

arXiv (Cornell University) (2021)

Kyriakos Georgiou Zbigniew Chamski Kris Nikov Kerstin Eder

Energy modeling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific configurations, neither are they suitable for static energy consumption estimation. This paper introduces a comprehensive energy model for Arm's Cortex-M0 processor, ready to support energy-aware development of edge computing applications using either profiling- or static-analysis-based energy consumption estimation. The model accounts for the Frequency, PreFetch, and WaitState processor configurations which all have a significant impact on the execution time and energy consumption of edge computing applications. All models have a prediction error of less than 5%.

ARM architecture

10.48550/arxiv.2104.01055

Cite

Citations (5)