Category: Neuromorphic Computing

Address-Event Variable Length Compression for Time-Encoded Data

Illustration of the problem of variable length address-event compression with 3 traces:  At each events’ occurrence time T_n, the encoder outputs a variable-length packet describing the set of ‘addresses’ of the traces.

Problem Description

The information age has relied on digital information processing: audio, video, and text are represented as strings of bits. Biological brains, however, process information in the timing of events, also known as spikes. Time-encoded information underlies many data types of increasing practical importance, such as social network update times, communication network logs, retweet traces, wireless activity sensors, neuromorphic sensors, and synaptic traces from in-brain measurements for brain-computer interfaces. For example, neuromorphic cameras encode information by producing a spike in response to changes in the sensed environment; and neurons in a Spiking Neural Networks (SNNs) compute and communicate via spiking traces in a way that mimics the operation of biological brains.

When time encoded data is processed at a remote site with respect to the location in which the data is produced, the occurrence of events needs to be encoded and transmitted in a timely fashion. This is particularly relevant in SNN chips for which neurons are partitioned into several cores and spikes produced by neurons in a given core need to be conveyed to the recipient neurons in a separate core in order to enable correct processing.  Spikes in SNN’s are encoded into packets through Address Event Representation (AER) protocol. With AER, a packet encoding the occurrence of one or more events is produced at the same time in which the events take place. Thus, the spike timing information is directly carried by the reception of the packet. Therefore, assuming that the packet is detected by the receiver with negligible delay, the packet payload only needs to contain the information about the identity, also referred to as “addresses”, of the “spiking” traces.

A close-up shot of Intel Nahuku board, each of which contains 8 to 32 Intel Loihi neuromorphic chips. (Credit: Tim Herman/Intel Corporation)

In our recent paper accepted for presentation at the IEEE International Symposium on Information Theory and Applications (ISITA 2020), we study the problem of compressing packets generated by an AER-like protocol for generic time-encoded data. This could help alleviate communication bottlenecks in systems that rely on time-encoded data processing.

Suggested Solution

The key idea is that time-encoded traces are characterized by strong correlations both over time and across different traces. These intra-and inter-trace correlations can be harnessed to compress, using variable length codes, the addresses of the event producing traces at a given time.

Towards this, we first model time-encoded data with multiple traces as a discrete-time multi-variate Hawkes process that captures the inter- and intra-trace correlations. This allows formulating the address-event compression problem in terms of the parameters of the discrete-time Hawkes process. Finally, the variable-length compression of packets is achieved through entropy coding via conditional codebooks. The details of the problem modeling can be found here.

Experiment on Real-World Dataset

We implemented the proposed variable-length scheme on a real-world retweet dataset. The dataset consists of retweet sequences, each corresponding to the retweets of an original tweet. Each retweet event in a sequence is marked with the type of user group (‘small’, ‘medium’ or, ‘large’) and with the time (quantized to an integer) elapsed since the original tweet. Accordingly, each sequence can be formatted into 3 discrete-time traces. For our experiments, we sampled 2100 sequences from the data set with 2000 sequences used for training and 100 for testing. The training set is used to fit the parameters of the discrete-time Hawkes process. After training, the test sequences are used to evaluate the average number of bits per event, using the trained variable-length code. We study three scenarios: (i) compression with both inter-and intra-trace correlations; (ii) “compression with an i.i.d. model” which assumes the traces to be independent; and (iii) “compression with intra-trace correlation”, which assumes independent traces that are allowed to correlate across time.  As seen in the figure below, compared to a no-compression scheme that requires approximately 2.8 bits per event, we find that compression with an i.i.d. model requires only 1.22 bits per event, a gain of 57% over no-compression. Further reductions in rates result from compression schemes that assume intra-trace correlation across time, particularly if accounting also for inter-trace correlations.

 

Federated Neuromorphic Computing

Problem
Training state-of-the-art Artificial Neural Network (ANN) models requires distributed computing on large mixed CPU-GPU clusters, typically over many days or weeks, at the expense of massive memory, time, and energy resources, and potentially of privacy violations. Alternative solutions for low-power machine learning on resource-constrained devices have been recently the focus of intense research. In our recently accepted paper at ICASSP 2020, we study the convergence of two such recent lines of inquiries.

On the one hand, Spiking Neural Networks (SNNs) are biologically inspired neural networks in which neurons are dynamic elements processing and communicating via sparse spiking signals over time, rather than via real numbers, enabling the native processing of time-encoded data, e.g., from DVS cameras. They can be implemented on dedicated hardware, offering energy consumptions as low as a few picojoules per spike. A more thorough introduction to probabilistic SNNs can be found in this previous blog post.

On the other hand, Federated Learning (FL) allows devices to carry out collaborative learning without exchanging local data. This makes it possible to train more effective machine learning models by benefiting from data at multiple devices with limited privacy concerns. FL requires devices to periodically exchange information about their local model parameters through a parameter server. It has become de-facto standard for training ANNs over large numbers of distributed devices.

System model

Figure 1 Federated Learning (FL) model under study: Mobile devices collaboratively train on-device SNNs based on different, heterogeneous, and generally unbalanced local data sets, by communicating through a base station (BS).

In our work, as seen in Figure 1, we consider a distributed edge computing architecture in which N mobile devices communicate through a Base Station (BS) in order to perform the collaborative training of local SNN models via FL. Each device holds a different local data set. The goal of FL is to train a common SNN-based model without direct exchange of the data from the local data sets.

FL proceeds in an iterative fashion across T global time-steps. To elaborate, at each global time-step, the devices refine their local model, based on their local datasets. Every τ iterations, they will also transmit their updated local model parameters to the BS, which will in turn compute a centralized averaged parameter and send it back to the devices. This global averaged parameter will be used at the beginning of the next iteration.

An SNN is a network of spiking neurons connected via an arbitrary directed graph, possibly with cycles (see Figure 2). SNNs process information through time, based on a local clock. At each local algorithmic time-step, each neuron receives the signals emitted by the subset of neurons connected to it through directed links, known as synapses. Neurons in the network will then output a binary signal, either ‘0’ or ‘1’. The instantaneous spiking probability of a neuron is determined by its past spiking behaviour and the previous spikes of its pre-synaptic neurons. SNNs are trained over sequences of S local algorithmic time-steps, made of D examples of length S’. In an image classification task, an example could be an image encoded as a binary signal.

Figure 2 Example of an internal architecture for an on-device SNN.

In FL-SNN, we cooperatively train distributed on-device SNNs thanks to Federated Learning. To that end, we derived a novel algorithm, for which the time scales involved are summarized in Figure 3. Each global algorithmic iteration t corresponds to Δs local SNN time-steps, and the total number S of SNN local algorithmic time steps and the number T of global algorithmic time steps during the training procedure are hence related as S = DS’ = T∆s.

Figure 3 Illustration of the time scales involved in the cooperative training of SNNs via FL for τ = 3 and ∆s = 4.

Experiments

We consider a classification task based on the MNIST-DVS dataset. The training dataset is composed of 900 examples per class and the test dataset is composed of 100 samples per class. We consider 2 devices which have access to disjoint subsets of the training dataset. In order to validate the advantages of FL, we assume that the first device has only samples from class ‘1’ and the second only from class ‘7’. We train over D = 400 randomly selected examples from the local data sets, which results in S = DS’ = 32,000 local time-steps.

As a baseline, we consider the test loss at convergence for the separate training of the two SNNs. In Figure 4, we plot the local test loss normalized by the mentioned baseline as a function of the global algorithmic time. A larger communication period τ is seen to impair the learning capabilities of the SNNs, yielding a larger final value of the loss. In fact, for τ = 400, after a number of local iterations without communication, the individual devices are not able to make use of their data to improve performance.

Figure 4 Evolution of the mean test loss during training for different values of the communication period τ. Shaded areas represent standard deviations over 3 trials

One of the major flaws of FL is the communication load incurred by the need to regularly transmit large model parameters. To partially explore this aspect, in the paper, we consider exchanging only a subset of synaptic weights during global iterations. We refer to the text at this link for details.

Compute With Time, Not Over It: An Introduction to Spiking Neural Networks

Problem

Artificial Neural Networks (ANNs) have become the de-facto standard tool to carry out supervised, unsupervised, and reinforcement learning tasks. Their recent successes have built upon various algorithmic advances, but have also heavily relied on the unprecedented availability of computing power and memory in data centers and cloud computing platforms. The resulting considerable energy requirements run counter to the constraints imposed by implementations on low-power mobile or embedded devices for applications such as personal health monitoring or neural prosthetics.

How can the human brain perform general and complex tasks at a minute fraction of the power required by state-of-the-art supercomputers and ANN-based models? Neurons in the human brain are different from those in an ANN: they process and communicate using sparse spiking signals over time, rather than real numbers; and they are dynamic devices, rather than static non-linearites (see, Figure 1). Taking inspiration from this observation, Spiking Neural Networks (SNNs) have been introduced in the theoretical neuroscience literature as networks of dynamic spiking neurons that enables efficient on-line inference learning. SNNs have the unique capability to process information encoded in the timing of spikes, with the energy per spike being as a few picojoules. Proof-of-concept and commercial hardware implementations of SNNs (e.g., Intel, IBM) have demonstrated orders-of-magnitude improvements in terms of energy efficiency over ANNs.

Figure 1. Illustration of neural networks: (left) an ANN, where each neuron processes real numbers; and (right) an SNN, where dynamic spiking neurons process and communicate binary sparse spiking signals over time.

The most common SNN model consists of a network of neurons with deterministic dynamics, e.g., leaky-integrate-and-fire model, whereby a spike is emitted as soon as an internal state variable, known as membrane potential, crosses a given threshold value. Learning problems should be formulated as the minimization of a loss function that directly accounts for the timing of the spikes emitted by the neurons. While this minimization can be done using Stochastic Gradient Descent (SGD) as for ANNs, it is made challenging by the non-differentiability of the behavior of spiking neurons with respect to the synaptic weights. In contrast to deterministic models, a probabilistic model for SNNs defines the outputs of all spiking neurons as differentiable joint distributed binary random processes. A probabilistic viewpoint has hence significant analytic advantages in that we can apply flexible learning rules from the principled learning criteria such as likelihood and mutual information.

Some Results

Our recent work published on IEEE Signal Processing Magazine (SPM) Special Issue on Learning Algorithms and Signal Processing for Brain-Inspired Computing provides a review on the topic of probabilistic SNNs with a specific focus on the most commonly used Generalized Linear Models (GLMs) by covering probabilistic models, learning rules, and applications.

Figure 2. Illustration of the neurons with probabilistic dynamics with exponential feedforward and feedback kernels.

As illustrated in Figure 2, in a GLM, any post-synaptic neuron i receives the signals emitted by pre-synaptic neurons through synapses. Its internal state, or the probability to spike, is defined by membrane potential, which is the sum of contributions from the incoming spikes of the pre-synaptic neurons and from the past spiking behavior of the neuron itself, where both contributions are filtered by feedforward and feedback kernels, respectively. Under the GLM, the gradient of the log-likelihood of the spiking signals depends on the difference between the desired spiking behavior and its average behavior under the model.

SNNs can be trained using supervised, unsupervised, and reinforcement learning, by following a learning rule. This defines how the model parameters are updated on the basis of the available observations – in a batch mode or in an on-line fashion. Our work derives Maximum Likelihood learning rules using SGD in a batch and on-line mode, for both fully observed and partially observed SNNs. The learning rules can be interpreted in light of the general form of the three-factor rule; the synaptic weight wj,i from pre-synaptic neuron j to a post-synaptic neuron i is updated as wj,i ← wj,i + η × ℓ × pre(j) × post(i), where η is a learning rate; is a scalar global learning signal which is absent in case of fully observed SNNs; pre(j) is given by the filtered feedforward trace of the pre-synaptic neuron j; and post(i) is given by the error term of the post-synaptic neuron i, appeared in the gradient above. In case of partially observed SNNs, variational inference is needed to approximate the true posterior distribution by means of variational posterior. With a feedforward distribution for the variational posterior, we derive the learning rule using doubly SGD, whereby the global learning signal is obtained by sampling spike signals of unobserved neurons.

Figure 3. On-line prediction task based on an SNN with 9 visible and 2 hidden neurons; (left, top) real, analog time signal (dashed) and predicted, decoded signal (solid); (left, bottom) total number of spikes emitted by the SNN; and (right) spike raster plot of the SNN.

Experiments on an on-line prediction task allowed us to observe the potential of SNNs for ‘always-on’ event-driven applications. The SNN observes a time sequence and is trained to predict the next value of sequence given the observation of the previous values, where the time sequence is encoded in the spike domain with ΔT spike samples per each value of the sequence. In Figure 3, the SNN is seen to be able to provide an accurate prediction (left, top) with the corresponding number of spikes (left, bottom) and spikes emitted by the SNN (right). To demonstrate the efficiency benefits of SNNs that may arise from their unique time encoding capabilities, we also compare the prediction error and the number of spikes, with rate and time coding schemes.

Please refer to the full paper at IEEE Xplore (open access: arXiv) for details. The tutorial for learning algorithms and signal processing for brain-inspired computing can be found at IEEE Xplore.