Spike-based neuromorphic sensors such as retinas and cochleas, change the way in which the world is sampled. Instead of producing data sampled at a constant rate, these sensors output spikes that are asynchronous and event driven. The event-based nature of neuromorphic sensors implies a complete paradigm shift in current perception algorithms toward those that emphasize the importance of precise timing. The spikes produced by these sensors usually have a time resolution in the order of microseconds. This high temporal resolution is a crucial factor in learning tasks. It is also widely used in the field of biological neural networks. Sound localization for instance relies on detecting time lags between the two ears which, in the barn owl, reaches a temporal resolution of 5 μs. Current available neuromorphic computation platforms such as SpiNNaker often limit their users to a time resolution in the order of milliseconds that is not compatible with the asynchronous outputs of neuromorphic sensors. To overcome these limitations and allow for the exploration of new types of neuromorphic computing architectures, we introduce a novel software framework on the SpiNNaker platform. This framework allows for simulations of spiking networks and plasticity mechanisms using a completely asynchronous and event-based scheme running with a microsecond time resolution. Results on two example networks using this new implementation are presented.
Power consumption has become one of the most important concerns in microprocessor design. However, the potential for further power-saving in microprocessors with a conventional architecture is limited because of their unified architectures and mature low-power techniques. An alternative way is proposed in this paper to save power - embedding a dataflow coprocessor in a conventional RISC processor. The dataflow coprocessor is designed to execute short code segments very efficiently. The primary experimental results show that the dataflow coprocessor can increase the power efficiency of a RISC processor by an order of magnitude.
In this paper, we present biologically inspired means to enhance perceptually important information retrieval from rank-order encoded images. Validating a retinal model proposed by VanRullen and Thorpe, we observe that on average only up to 70% of the available information can be retrieved from rank-order encoded images. We propose a biologically inspired treatment to reduce losses due to a high correlation of adjacent basis vectors and introduce a filter-overlap correction algorithm (FoCal) based on the lateral inhibition technique used by sensory neurons to deal with data redundancy. We observe a more than 10% increase in perceptually important information recovery. Subsequently, we present a model of the primate retinal ganglion cell layout corresponding to the foveal-pit. We observe that information recovery using the foveal-pit model is possible only if FoCal is used in tandem. Furthermore, information recovery is similar for both the foveal-pit model and VanRullen and Thorpe's retinal model when used with FoCal. This is in spite of the fact that the foveal-pit model has four ganglion cell layers as in biology while VanRullen and Thorpe's retinal model has a 16-layer structure.
The SpiNNaker (Spiking Neural Network Architecture) project will soon deliver a machine incorporating a million ARM processor cores for real-time modelling of large-scale spiking neural networks. Although the scale of the machine is in the realms of high-performance computing, the technology used to build the machine comes very much from the mobile embedded world, using small integer cores and Network-on-Chip communications both on and between chips. The full machine will use a total of 10 square meters of active silicon area with 57,600 routers using predominantly multicast algorithms to convey real- time spike information through a lightweight asynchronous packet-switched fabric. In this talk I will focus on the NoC aspects, including novel approaches to fault-tolerance and deadlock avoidance.
SpiNNaker is a massively parallel architecture designed to model large-scale spiking neural networks in (biological) real-time. Its design is based around ad-hoc multi-core System-on-Chips which are interconnected using a two-dimensional toroidal triangular mesh. Neurons are modeled in software and their spikes generate packets that propagate through the on- and inter-chip communication fabric relying on custom-made on-chip multicast routers. This paper models and evaluates large-scale instances of its novel interconnect (more than 65 thousand nodes, or over one million computing cores), focusing on real-time features and fault-tolerance. The key contribution can be summarized as understanding the properties of the feasible topologies and establishing the stable operation of the SpiNNaker under different levels of degradation. First we derive analytically the topological characteristics of the network, which are later confirmed by experimental work. With the computational model developed, we investigate the topology of SpiNNaker, and compare it with a standard 3-dimensional torus. The novel emergency routing mechanism, implemented within the routers, allows the topology of SpiNNaker to be more robust than the 3-dimensional torus, regardless of the latter having better topological characteristics. Furthermore, we obtain optimal values of two router parameters related with livelock and deadlock avoidance mechanisms.
SpiNNaker is a biologically-inspired massively-parallel computer architecture optimized specifically for modeling large-scale systems of spiking neurons in biological real time. The biological inspiration is manifest in the lightweight inter-processor communications architecture, which enables a “spike” generated by a neuron modeled on one processor to be transmitted to all of the processors modeling neurons that the source neuron connects to in zero biological time (which we take to be a time much less than a millisecond). The design of such a machine is based upon achieving a balance between the processing power required to execute the neuron and synapse models, the memory hierarchy required to maintain neuron and synapse state sufficiently close to the respective processor, and a communications infrastructure that can meet the latency and bandwidth constraints of the target application.
The memory requirement of deep learning algorithms is considered incompatible with the memory restriction of energy-efficient hardware. A low memory footprint can be achieved by pruning obsolete connections or reducing the precision of connection strengths after the network has been trained. Yet, these techniques are not applicable to the case when neural networks have to be trained directly on hardware due to the hard memory constraints. Deep Rewiring (DEEP R) is a training algorithm which continuously rewires the network while preserving very sparse connectivity all along the training procedure. We apply DEEP R to a deep neural network implementation on a prototype chip of the 2nd generation SpiNNaker system. The local memory of a single core on this chip is limited to 64 KB and a deep network architecture is trained entirely within this constraint without the use of external memory. Throughout training, the proportion of active connections is limited to 1.3%. On the handwritten digits dataset MNIST, this extremely sparse network achieves 96.6% classification accuracy at convergence. Utilizing the multi-processor feature of the SpiNNaker system, we found very good scaling in terms of computation time, per-core memory consumption, and energy constraints. When compared to a X86 CPU implementation, neural network training on the SpiNNaker 2 prototype improves power and energy consumption by two orders of magnitude.
While the adult human brain has approximately 8.8 × 10(10) neurons, this number is dwarfed by its 1 × 10(15) synapses. From the point of view of neuromorphic engineering and neural simulation in general this makes the simulation of these synapses a particularly complex problem. SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Current solutions for simulating spiking neural networks on SpiNNaker are heavily inspired by work on distributed high-performance computing. However, while SpiNNaker shares many characteristics with such distributed systems, its component nodes have much more limited resources and, as the system lacks global synchronization, the computation performed on each node must complete within a fixed time step. We first analyze the performance of the current SpiNNaker neural simulation software and identify several problems that occur when it is used to simulate networks of the type often used to model the cortex which contain large numbers of sparsely connected synapses. We then present a new, more flexible approach for mapping the simulation of such networks to SpiNNaker which solves many of these problems. Finally we analyze the performance of our new approach using both benchmarks, designed to represent cortical connectivity, and larger, functional cortical models. In a benchmark network where neurons receive input from 8000 STDP synapses, our new approach allows 4× more neurons to be simulated on each SpiNNaker core than has been previously possible. We also demonstrate that the largest plastic neural network previously simulated on neuromorphic hardware can be run in real time using our new approach: double the speed that was previously achieved. Additionally this network contains two types of plastic synapse which previously had to be trained separately but, using our new approach, can be trained simultaneously.
Asynchronous circuits require components that display hazard-free operation under normal input conditions. In addition, quasi-delay-insensitive circuits are based on the assumption of isochronic forks, an assumption that can in practice be compromised by threshold variations due to the use of, for example, dynamic or pseudo-dynamic C-gate circuits. In the paper, the authors investigate the severity of these problems in practical circuits. It is shown that threshold variations are much less significant than has previously been assumed, but hazard-free operation is, by contrast, a much more significant problem. Gates with a stack of transistors in series can exhibit charge-sharing problems under specific input sequences that expose hazards that are not evident in the logic description. A design methodology is proposed which overcomes the charge-sharing problem, resulting in more robust circuits.