Safe and Data-Efficient Control and Monitoring of Wireless Networks using a Digital Twin

A digital twin (DT) consists of a high-fidelity virtual replica of a physical entity, the physical twin (PT), such that, in the words of DT pioneer Michael Grieves, “at its optimum, any information that could be obtained from inspecting a physical manufactured product can be obtained from its digital twin” [1]. Based on a fully automatized bi-directional flow of information [2], the DT uses the data collected from the physical world to maintain an up-to-date model of the PT, which in turn provides command and analysis functionalities to the PT (see Fig.1). With the ever-growing demand in communication resources, next-generation wireless networks will be required to adapt to a large number of scenarios, and DT platforms are increasingly seen as a promising data-driven solution to build intelligent wireless systems that can offer the necessary flexibility and responsiveness.

As depicted in Fig. 1, in a recent work to be presented at the IEEE International Conference on Communications, we consider a DT that autonomously learns a model of a wireless network, providing a safe sandbox environment for network optimization and analysis, while also enabling monitoring and prediction features. The main motivation for our work  stems from the realization that, in real-world scenarios, it is challenging to transfer sufficient data to and from the PT in a way that “any information that could be obtained from inspecting the PT can be obtained from its DT”. In light of this, we propose to leverage Bayesian methods to learn a DT model that is aware of “what it knows” as much as it is aware of “what it does not know”; taking into account the epistemic uncertainty arising from limited PT-to-DT communication [3].

 

Figure 1 – The digital twin (DT) platform for the control and analysis of the communication system studied in this work. The physical twin (PT) consists of a group of K devices receiving correlated data and communicating over a shared multi-packet reception (MPR) channel. The DT platform operates along the phases of model learning (step 1) and policy optimization (step 2) ; while also enabling functionalities such as prediction, counterfactual analysis and monitoring (step 3) .

 

The Physical Twin

We consider a PT system made of a group of devices, referred to as agents, that attempt to communicate with a single base station (BS) over a shared multi-packet reception (MPR) channel [4]. Each agent is equipped with a limited-capacity buffer, and packet generation is taken to be correlated in-time and among agents. At any given time slot, each agent can take an action to decide whether or not to transmit a packet from their buffer, which can be received at the BS depending on the MPR channel dynamics. Upon packet reception, the BS transmits acknowledgement signals to the corresponding agents before the next time slot.

Agents cannot communicate with each other and each agent can only sense its local state, which contains information about its packet generation, buffer occupancy and BS feedback at a given time. Given the collective states and actions of all agents, the PT system evolves to a new state according to a transition distribution that is unknown to the DT and describes the packet-generation, buffer and channel dynamics.

The Digital Twin

Model Learning

During model learning (step 1 in Fig. 1), the DT leverages sequences of states and actions collected from the PT to learn a parametric Bayesian model of the transition distribution. As opposed to frequentist learning, which only keeps the most probable model parameter, Bayesian learning keeps a (possibly infinite) ensemble of models, where the probability of each model is given by a posterior distribution. Given that all state variables are discrete, we represent the transition distribution using a categorical model and learn the corresponding posterior using the conjugate Dirichlet distribution [3]. In order to lower the spatial complexity of the model, we leverage prior information available at the DT about state transitions like data-generation clusters, known buffer dynamics, and symmetry of the MPR channel.

Policy Optimization: Safely Learn by Trial and Error

A medium access control (MAC) protocol at the PT can be established by providing each agent with a policy distribution that maps the sequence of locally observed states and actions into a new action. Using the learned model, we can safely asses new policies in virtual space by defining a reward function that yields positive values for desired behavior (e.g. successfully delivered packets) and negative penalties for undesired behavior (e.g. buffer overflow). Policy optimization (step 2 in Fig. 1) aims at providing an optimal policy to each agent that maximizes the expected sum of future rewards. This amounts to a Decentralized Markov Decision Process [5] problem that we tackle using the COunterfactual Multi-Agent (COMA) algorithm proposed in [6], in which we periodically sample a new transition distribution from the model posterior during training.

Monitoring: Let’s Agree to Disagree

After an initial model learning phase, the DT can provide monitoring features by checking whether newly received data fits previously observed transitions, or if it rather provides evidence of changed dynamics or anomalous behavior (step 3 in Fig. 1). To this end, we use a disagreement-based test metric that measures to which extent the Bayesian ensemble of models agree on the likelihood of the newly observed data. A large disagreement is taken as evidence of a large epistemic uncertainty compared to model-learning conditions, which in turn can indicate that the observation is anomalous.

Results

We evaluate the proposed DT platform on a simulated scenario consisting of 4 devices distributed across 2 data-generation clusters. The MPR channel allows for the successful delivery of one or two simultaneous packets; while more than two simultaneous transmissions cause the loss of all packets. Each device is equipped with a buffer with single-packet capacity.

During policy optimization, we reward successfully delivered packets, while we penalize buffer overflows, caused by generating a new packet on an already full buffer. We analyze the performance of the policy trained inside the Bayesian model across different sizes of model-learning datasets, and compare it to a policy trained inside the corresponding maximum a posteriori (MAP) frequentist model, and to an oracle-aided policy that is trained using the ground-truth transition distribution.

 

 

Figure 2 – Throughput and buffer overflow probability as a function of the size of the dataset available in the model learning phase for the proposed Bayesian model-based approach (orange), as well as the oracle-aided model-free (blue) and frequentist model-based (green) benchmarks.

From Fig. 2, we observe that, in regimes with high data availability during the model learning phase, both Bayesian and frequentist model-based methods yield policies with similar performance to the oracle-aided benchmark. In the low-data regime, however, Bayesian learning achieves superior performance compared to its frequentist counterpart.

To asses the performance of anomaly detection, we assume that an anomalous event occurs where a device is disconnected, resulting in an anomalous packet-generation distribution in the corresponding cluster. We compare the performance of the disagreement metric using the Bayesian model to a log-likelihood criterion using the frequentist MAP model for model-learning datasets comprising 20 and 50 transitions and report the results in the receiver operating curves (ROC) in Fig. 3.

Figure 3 – Receiver operating curves (ROC) of the Bayesian (orange) and frequentist (green) anomaly detection tests for model-learning dataset sizes comprising 20 (solid lines) and 50 (dashed lines) transitions.

Bayesian anomaly detection is observed to uniformly outperform its frequentist counterpart, achieving a higher area under the ROC in Fig. 3.

 

For a more formal presentation of our proposed Bayesian framework for wireless networks DTs and more details on the experimental procedure, please refer to our paper at this link and to the extended version at this link.

References

[1] M. Grieves and J. Vickers, “Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems,” in Transdisciplinary perspectives on complex systems. Springer, 2017, pp. 85–113.

[2] W. Kritzinger, M. Karner, G. Traar, J. Henjes, and W. Sihn, “Digital twin in manufacturing: A categorical literature review and classification,” IFAC-PapersOnLine, vol. 51, no. 11, pp. 1016–1022, 2018.

[3] O. Simeone, Machine Learning for Engineers. Cambridge University Press, 2022.

[4] L. Tong, Q. Zhao, and G. Mergen, “Multipacket reception in random access wireless networks: From signal processing to optimal medium access control,” IEEE Communications Magazine, vol. 39, no. 11, pp. 108–112, 2001.

[5] F. A. Oliehoek and C. Amato, A concise introduction to decentralized POMDPs. Springer, 2016.

[6] J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.

Neuromorphic Integrated Sensing and Communications

Integrated sensing and communications (ISAC), a key enabling technology for 6G systems, leverages shared radio resources and hardware to realize the functions of sensing and communication. As an example of an application that can benefit from ISAC, consider the inter-vehicle communication scenario in Fig. 1. In it, a car wishes to send a message to a second car, while also enabling the latter to detect the presence of a possible target, e.g., of a pedestrian. While conventional systems would use two separate radio resources for data transmission and radar detection, ISAC solutions reuse the same transmitted waveform for the dual role of carrier of digital information and radar signal [1]. A natural radio interface to serve this dual function is impulse radio (IR), also known as ultrawideband (UWB). In fact, IR encodes information in the timing of pulses, which can in turn be repurposed for radar detection [2].

Fig. 1. Illustration of a neuromorphic ISAC system, in which the same IR (or UWB) signal is used for transmission and radar detection of the presence of a target. The key novel element is the use of neuromorphic computing at the ISAC receiver to simultaneously demodulate digital data and provide an online estimate of the presence or absence of the radar target.

Neuromorphic sensing and computing are emerging as alternative, brain-inspired, paradigms for efficient data collection and semantic signal processing [3]. The main features of this technology are energy efficiency, native event-driven processing of time-varying semantic sources, spike-based computing, and always-on on-hardware adaptation [4]. Neuromorphic processors, also known as spiking neural networks (SNNs), are networks of dynamic spiking neurons that mimic the operation of biological neurons. When implemented on specialized — digital or mixed analog-digital — hardware or on tailored FPGA configurations, SNNs have minimal idle and operating energy cost, and consume as little as a few picojoules per spike [5].

The integration of IR and neuromorphic computing was investigated in our recent works [6, 7], which proposed an end-to-end neuromorphic architecture for remote inference that replaces traditional digital blocks with SNNs as encoder and decoder.

Our work

With the aim of reducing energy consumption and facilitating online and always-on operation on specialized hardware, as illustrated in Fig. 1, we propose to leverage the synergy between IR transmission and neuromorphic computing to realize efficient ISAC systems. The neuromorphic ISAC (N-ISAC) receiver is able to leverage spiking neural network (SNN)-based processing to demodulate digital information and detect the radar signal.

As illustrated in Fig. 2, we consider an ISAC system in which digital communication and radar sensing leverage the same IR transmitted signal. In order to efficiently and simultaneously decode the digital data and detect the possible presence of a target at a known delay cell, the receiver processes the received signal via an SNN. Technical details can be found in our paper at this link.

Fig. 2. N-ISAC: Digital data is transmitted by an IR transmitter via pulse-position modulation (PPM); while the receiver simultaneously decodes digital data, and performs radar detection by means of an SNN, which can be efficiently implemented on neuromorphic hardware.

Result

We compare the proposed N-ISAC system with a conventional separate sensing and communications (SSAC) scheme, which divides the transmission slots into slots used for transmission and slots used for sensing. For SSAC, two SNNs are implemented at the receiver, one performing data decoding for the transmission slots, and the other responsible for radar sensing in the sensing slots.

To evaluate the performance of our system, we adopt the following performance metrics for data transmission and radar sensing: 1) Normalized test throughput, i.e., the ratio of the average number of correctly decoded bits over the total number of time slots; 2) Radar test detection error, i.e., the probability that the sensing decision is not correctly taken upon processing all time slots.

In Fig. 3, we demonstrate the normalized test throughput versus the radar test detection error for ISAC and SSAC. For the ISAC scheme, we vary a hyperparameter β dictating the relative weight in the design criterion in favor of communications; for SSAC we vary the fraction α of slots allocated to communications. As β increases, more priority is given by ISAC to communication over radar detection; and, similarly, as α increases, SSAC assigns more slots to communications. The performance of ISAC with an SNN having 10 hidden neurons is essentially independent of β for any 0.25< β <0.75. A first observation is that, for SSAC, there is a trade-off between communication and sensing performance levels caused by the slot allocation. A similar trade-off is also observed for ISAC when using an SNN with 6 hidden neurons. This is due to the limited capacity of the shared common hidden layer of the SNN. In contrast, when 10 hidden neurons are available at the SNN, ISAC is seen to optimize both data decoding and target sensing performance, obtaining significant gains over SSAC.

Fig. 3. Normalized test throughput versus radar test detection error for ISAC and SSAC.

Fig. 4 illustrates how the SNN receiver can leverage the temporal sparsity of the IR signals to enhance energy efficiency. In this regard, we recall that energy consumption in an SNN is essentially proportional to the number of spikes produced by the SNN, given extremely low idle energy of neuromorphic chips [8]. The top panel shows the transmitted IR signal consisting of two frames of transmitted signals, separated by an idle frame of duration of 20 slots. We observe that in the idle frame, the spike count is significantly reduced, showing that the neuromorphic receiver can adjust its energy consumption to the activity level of the transmitter.

Fig. 4. Top: Transmitted signal consisting of two frames in which the transmitter is active separated by an idle frame. Bottom: Corresponding spike count for the SNN.

References

[1] S. Jeong, O. Simeone, A. Haimovich, and J. Kang, “Beamforming design for joint localization and data transmission in distributed antenna system,” IEEE Transactions on Vehicular Technology, vol. 64, no. 1, pp. 62–76, 2014.

[2] A. Nezirovic, A. G. Yarovoy, and L. P. Ligthart, “Signal processing for improved detection of trapped victims using UWB radar,” IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 4, pp. 2005–2014, 2009.

[3] A. Mehonic and A. J. Kenyon, “Brain-inspired computing needs a master plan,” Nature, vol. 604, no. 7905, pp. 255–260, 2022.

[4] M . Davies, A. Wild, G. Orchard, Y. Sandamirskaya, G. A. F. Guerra, P. Joshi, P. Plank, and S. R. Risbud, “Advancing neuromorphic computing with Loihi: a survey of results and outlook,” Proceedings of the IEEE, vol. 109, no. 5, pp. 911–934, 2021.

[5] B. Rajendran, A. Sebastian, M. Schmuker, N. Srinivasa, and E. Eleftheriou, “Low-power neuromorphic hardware for signal processing applications: a review of architectural and system-level design approaches,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 97–110, 2019.

[6] N. Skatchkovsky, H. Jang, and O. Simeone, “End-to-end learning of neuromorphic wireless systems for low-power edge artificial intelligence,” in Proc. Asilomar Conference on Signals, Systems, and Computers, pp. 166–173, 2020.

[7] J. Chen, N. Skatchkovsky, and O. Simeone, “Neuromorphic wireless cognition: event-driven semantic communications for remote inference,” arXiv preprint arXiv:2206.06047, 2022.

[8] M . Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic manycore processor with on-chip learning,” IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018.

 

 

Learning to Learn How to Calibrate

Fig. 2. Meta-learning transfers knowledge from multiple tasks. In our recent paper, we have proposed an application of meta-learning to conformal prediction with the aim of reducing the average prediction set size while preserving formal calibration guarantees.As discussed in our previous post ‘Is Accuracy Sufficient for AI in 6G? (No, Calibration is Equally Important)’, reliable AI should be able to quantify its uncertainty, i.e., to “know when it knows” and “know when it does not know”. To obtain reliable, or well-calibrated, AI models, two types of approaches can be adopted: (i) training-based calibration, and (ii) post-hoc calibration. Training-based calibration modifies the training procedure by accounting for calibration performance, and includes methods such as Bayesian learning [1, 2], robust Bayesian learning [3, 4], and calibration-aware regularization [5]; while post-hoc calibration utilizes validation data to “recalibrate” a probabilistic model, as in temperature scaling [6], Platt scaling [7], and isotonic regression [8]. All these methods have no formal guarantees on calibration, either due to inevitable model misspecification [9], or due to overfitting to the validation set [10, 11]. In contrast, conformal prediction (CP) offers formal calibration guarantees, although calibration is defined in terms of set, rather than probabilistic, prediction [12]. 

Fig. 1. Improvements in calibration can be obtained by either (i) training-based calibration or (ii) post-hoc calibration. Only conformal prediction, a post-hoc calibration approach, provides formal guarantees on calibration via set prediction.

A well-calibrated set predictor is the one that contains the true label with probability no smaller than a predetermined coverage level, say 90%. A set predictor obtained by conformal prediction is provably well calibrated, irrespective of the unknown underlying ground-truth distribution as long as the data examples are exchangeable, or i.i.d. (independent and identically distributed). 

One could trivially build a well-calibrated set predictor by producing the entire label set as the predicted set. However, such set predictor would be completely uninformative, since the size of the set predictor determines how informative the set predictor is. While conformal prediction is always guaranteed to yield reliable set predictors, it may produce large predicted set size in the presence of limited data examples [13]. In our recent work, presented at the NeurIPS 2022 Workshop on Meta-Learning, we have introduced a novel method that enhances the informativeness of CP-based set predictors via meta-learning.

Fig. 2. Meta-learning transfers knowledge from multiple tasks. In our recent paper, we have proposed an application of meta-learning to conformal prediction with the aim of reducing the average prediction set size while preserving formal calibration guarantees.

Meta-learning, or learning to learn, transfers knowledge from multiple tasks to optimize the inductive bias (e.g., the model class) for new, related, tasks [14]. In our recent work, meta-learning was applied to cross-validation-based conformal prediction (XB-CP) [13] to achieve well-calibrated and informative set predictors. As demonstrated in the following figure, the proposed meta-learning approach for XB-CP, termed meta-XB, can reduce the average prediction set size as compared to conventional CP approaches (XB-CP and validation-based conformal prediction (VB-CP) [12]) and to previous work on meta-learning for VB-CP [14], while preserving the formal guarantees on reliability (the predetermined coverage level, 90%, is always satisfied for meta-XB). 

Fig. 3. Average prediction set size (left) and coverage (right) for new tasks as a function of number of meta-training tasks. As compared to conventional CP schemes (VB-CP and XB-CP), meta-learning based approaches (meta-VB and meta-XB) have smaller prediction set size; while the proposed meta-XB guarantees reliability for every task unlike meta-VB that satisfies coverage condition on average over multiple tasks.

For more details including improvements in terms of input-conditional coverage via meta-learning with adaptive nonconformity scores [15], and further experimental results on image classification and communication engineering aspects, please refer to the arXiv posting.

References

[1] O. Simeone, Machine learning for engineers. Cambridge University Press, 2022

[2] J. Knoblauch, et al, “Generalized variational inference: Three arguments for deriving new posteriors,” arXiv:1904.02063, 2019

[3] W. Morningstar, et al “PACm-Bayes: Narrowing the empirical risk gap in the Misspecified Bayesian Regime,” NeurIPS 2021

[4] M. Zecchin, et al, “Robust PACm: Training ensemble models under model misspecification and outliers,” arXiv:2203.01859, 2022

[5] A. Kumar, et al, “Trainable calibration measures for neural networks from kernel mean embeddings,” ICML 2018

[6] C. Guo, et al, “On calibration of modern neural networks,” ICML 2017

[7] J. Platt, et al, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood method,” Advances in Large Margin Classifiers 1999

[8]  B. Zadrozny and C. Elkan “Transforming classifier scores into accurate multiclass probability estimates,” KDD 2022

[9] A. Masegosa, “Learning under model misspecification: Applications to variational and ensemble methods.” NeurIPS 2020

[10] A. Kumar, et al, “Verified Uncertainty Calibration,” NeurIPS 2019

[11] X. Ma and M. B. Blaschko, “Meta-Cal: Well-controlled Post-hoc Calibration by Ranking,” ICML 2021 

[12]  V. Vovk, et al, “Algorithmic Learning in a Random World,” Springer 2005

[13] R. F. Barber, et al, “Predictive inference with the jackknife+,” The Annals of Statistics, 2021

[14] Chen, Lisha, et al. “Learning with limited samples—Meta-learning and applications to communication systems.” arXiv preprint arXiv:2210.02515, 2022.

[14] A. Fisch, et al, “Few-shot conformal prediction with auxiliary tasks,” ICML 2021

[15] Y. Romano, et al, “Classification with valid and adaptive coverage,” NeurIPS 2020

 

Life-long brain-inspired learning that knows what it does not know

An emerging research topic in artificial intelligence (AI) consists in designing systems that take inspiration from biological brains. This is notably driven by the fact that, although most AI algorithms become more and more efficient (for instance, for image generation), this comes at a cost. Indeed, with architectures constantly growing in size, training a single large neural network today consumes a prohibitive amount of energy. Despite consuming only about 12W, human brains exhibit impressive capabilities, such as life-long learning.

By taking a Bayesian perspective, we demonstrate in our latest work how biologically inspired spiking neural networks (SNNs) can exhibit learning mechanisms similar to those applied in brains, which allow them to perform continual learning. As we will see, the technique also solves a key challenge in deep learning, that is, to obtain well calibrated solutions in the face of previously unseen data.

 

Our work

Bayesian Learning

As seen in Fig. 1, we propose to equip each synaptic weight in the SNN with a probability distribution. The distribution captures the epistemic uncertainty induced by the lack of knowledge of the true distribution of the data. This is done by assigning probabilities to model parameters that fit equally well the data, while also being consistent with prior knowledge. As a consequence, Bayesian learning is known to produce better calibrated decisions, i.e., decisions whose associated confidence better reflects the actual accuracy of the decision.

This contrasts with frequentist learning, in which the vector of synaptic weights is optimized by minimizing a training loss. The training loss is adopted as a proxy for the population loss, i.e., for the loss averaged over the true, unknown, distribution of the data. Therefore, frequentist learning disregards the inherent uncertainty caused by the availability of limited training data, which causes the training loss to be a potentially inaccurate estimate of the population loss. As a result, frequentist learning is known to potentially yield poorly calibrated, and overconfident decisions for ANNs.

Figure 1: Illustration of Bayesian learning in an SNN: In a Bayesian SNN, the synaptic weights are assigned a joint distribution, often simplified as a product distribution across weights.

We consider both real-valued (with possibly limited resolution, as dictated by deployment on neuromorphic hardware) and binary-valued synapses, parametrised by Gaussian and Bernoulli distributions, respectively. The advantages of models with binary-valued synapses, i.e., binary SNNs, include a reduced complexity for the computation of the membrane potential. Furthermore, binary SNNs are particularly well suited for implementations on chips with nanoscale components that provide discrete conductance levels for the synapses.

 

Continual Learning

In addition to uncertainty quantification, we apply the proposed solution to continual learning, as illustrated in Fig. 2. In continual learning, the system is sequentially presented several datasets corresponding to distinct, but related, learning tasks, where each task is selected, possibly with replacement, from a pool of tasks, and its identity is unknown to the system. Its goal is to learn to make predictions that generalize well each new task, while causing minimal loss of accuracy on previous tasks.

Figure 2: Illustration of Bayesian continual learning: the system is successively presented with similar, but different, tasks. Bayesian learning allows the model to retain information about previously learned information.

Many existing works on continual learning draw their inspiration from the mechanisms underlying the capability of biological brains to carry out life-long learning. Learning is believed to be achieved in biological systems by modulating the strength of synaptic links. In this process, a variety of mechanisms are at work to establish short-to intermediate-term and long-term memory for the acquisition of new information over time. These mechanisms operate at different time and spatial scales.

 

Biological Principles of Learning

One of the best understood mechanisms, long-term potentiation, contributes to the management of long-term memory through the consolidation of synaptic connections. Once established, these are rendered resistant to disruption by changing their capacity to change via metaplasticity. As a related mechanism, return to a base state is ensured after exposition to small, noisy changes by heterosynaptic plasticity, which plays a key role in ensuring the stability of neural systems. Neuromodulation operates at the scale of neural populations to respond to particular events registered by the brain. Finally, episodic replay plays a key role in the maintenance of long-term memory, by allowing biological brains to re-activate signals seen during previous active periods when inactive (i.e., sleeping).

In this work, we demonstrate how the continual learning rule we obtain exhibits some of these mechanisms. In particular, synaptic consolidation and metaplasticity for each synapse can be modeled by a precision parameter. A larger precision reduces the step size of the synaptic weight updates. During learning, the precision is increased to the degree that depends on the relevance of each synapse as measured by the estimated Fisher information matrix for the current mini-batch of examples.

Heterosynaptic plasticity, which drives the updates towards previously learned and resting states to prevent catastrophic forgetting, is obtained from first principles via an information risk minimization formulation with a Kullback-Leibler regularization term. This mechanism drives the updates of the precision and mean parameter towards the corresponding parameters of the variational posterior obtained at the previous task.

Figure 3: Predictive probabilities evaluated on the two-moons dataset after training for Bayesian learning. Top row: Real-valued synapses; Bottom row: Binary synapses.

Results

We start by considering the two-moons dataset shown in Fig. 3. Triangles indicate training points for a class “0’’, while circles indicate training points for a class “1”. The color intensity represents the predictive probabilities for frequentist learning and for Bayesian learning: the more intense the color, the higher the prediction confidence determined by the model. Bayesian learning is observed to provide better calibrated predictions, that are more uncertain outside the input regions covered by training data points. As can be seen, confidence for the Bayesian models can be mitigated by a parameter, as precised in the full text.

Figure 4: Top three classes predicted by both Bayesian and frequentist models on selected examples from the DVS-Gestures dataset. Top: real-valued synapses. Bottom: binary synapses. The correct class is indicated in bold font.

This point is further illustrated in Fig. 4 by showing the three largest probabilities assigned by the different models on selected examples from DVS-Gestures dataset, considering real-valued synapses in the top row and binary synapses in the bottom row. In the left column, we observe that, when both models predict the wrong class, Bayesian SNNs tend to do so with a lower level of certainty, and typically rank the correct class higher than their frequentist counterparts. Specifically, in the examples shown, Bayesian models with both real-valued and binary synapses rank the correct class second, while the frequentist models rank it third. Furthermore, as seen in the middle column, in a number of cases, the Bayesian models manage to predict the correct class, while the frequentist models predict a wrong class with high certainty. In the right column, we show that even when frequentist models predict the correct class and Bayesian models fail to do so, they still assign lower probabilities (i.e., <50%) to the predicted class.

Figure 5: Evolution of the average test accuracies and ECE on all tasks of the split-MNIST-DVS across training epochs, with Gaussian and Bernoulli variational posteriors, and frequentist schemes for both real-valued and binary synapses. Continuous lines: test accuracy, dotted lines: ECE, bold: current task. Blue: {0, 1}; Red: {2, 3}; Green: {4, 5}; Purple:{6, 7}; Yellow: {8, 9}.

Finally, we show results for continual learning on the MNIST-DVS dataset in Fig. 5. We show the evolution of the test accuracy and expected calibration error (ECE) on all tasks, represented with lines of different colors, during training. The performance on the current task is shown as a thicker line. We consider frequentist and Bayesian learning, with both real-valued and binary synapses. With Bayesian learning, the test accuracy on previous tasks does not decrease excessively when learning a new task, which shows the capacity of the technique to tackle catastrophic forgetting. Also, the ECE across all tasks is seen to remain more stable for Bayesian learning as compared to the frequentist benchmarks. For both real-valued and binary synapses, the final average accuracy and ECE across all tasks show the superiority of Bayesian over frequentist learning.

More details can be found in the full text at this link.

Is Accuracy Sufficient for AI in 6G? (No, Calibration is Equally Important)

AI modules are being considered as native components of future wireless communication systems that can be fine-tuned to meet the requirements of specific deployments [1]. While conventional training solutions target the accuracy as the only design criterion, the pursuit of “perfect accuracy” is generally neither a feasible nor a desirable goal. In Alan Turing’s words, “if a machine is expected to be infallible, it cannot also be intelligent”. Rather than seeking an optimized accuracy level, a well-designed AI should be able to quantify its uncertainty: It should “know when it knows”, offering high confidence for decisions that are likely to be correct, and it should “know when it does not know”, providing a low confidence level for decisions are that are unlikely to be correct. An AI module that can provide reliable measures of uncertainty is said to be well-calibrated.

Importantly, accuracy and calibration are two distinct criteria. As an example, Fig. 1 illustrates  a QPSK demodulator trained using limited number of pilots. Depending on the input, the trained probabilistic model may result in either accurate or inaccurate demodulation decisions, whose uncertainty is either correctly or incorrectly characterized.

Fig. 1. The hard decision regions of an optimal demodulator (dashed lines) and of a data-driven demodulator trained on few pilots (solid lines) are displayed in panel (a), while the corresponding probabilistic predictions for some outputs are shown in panel (b).

 

The property of “knowing what the AI knows/ does not know” is very useful when the AI module is used as part of a larger engineering system. In fact, well-calibrated decisions should be treated differently depending on their confidence level. Furthermore, well-calibrated models enable monitoring – by tracking the confidence of the decisions made by an AI – and other functionalities, such as anomaly detection [2].

In a recent paper from our group published on the IEEE Transaction on Signal Processing [3], we proposed a methodology to develop well-calibrated and efficient AI modules that are capable of fast adaptation. The methodology builds on Bayesian meta-learning.

To start, we summarize the main techniques under consideration.

  1. Conventional, frequentist, learning ignores epistemic uncertainty – uncertainty caused by limited data – and tends to be overconfident in the presence of limited training samples.
  2. Bayesian learning captures epistemic uncertainty by optimizing a distribution in the model parameter space, rather than finding a single deterministic value as in frequentist learning. By obtaining decisions via ensembling, Bayesian predictors can account for the “opinions” of multiple models, hence providing more reliable decisions. Note that this approach is routinely used to quantify uncertainty in established fields like weather prediction [4].
  3. Frequentist meta-learning [5], also known as learning to learn, optimizes a shared training strategy across multiple tasks, so that it can easily adapt to new tasks. This is done by transferring knowledge from different learning tasks. As a communication system example, see Fig. 2 in which the demodulator adapts quickly with only few pilots for a new frame. While frequentist meta-learning is well-suited for adaptation purpose, its decisions tend to be overconfident, hence not improving monitoring in general.
  4. Bayesian meta-learning [6,7] integrates meta-learning with Bayesian learning in order to facilitate adaptation to new tasks for Bayesian learning.
  5. Bayesian active meta-learning [8] Active meta-learning can reduce the number of meta-training tasks. By considering streaming-fashion of availability of meta-training tasks, e.g., sequential supply of new frames from which we can online meta-learn the AI modules, we were able to effectively reduce the time required for satisfiable meta-learning via active meta-learning.

 

Fig. 2. Through meta-learning, a learner (e.g., demodulator) can be adapted quickly using few pilots to new environment, using hyperparameter vector optimized over related learning tasks (e.g., frames with different channel conditions).

 

Some Results

We first show the benefits of Bayesian meta-learning for monitoring purpose by examining the reliability of its decisions in terms of calibration. In Fig. 3, reliability diagrams for frequentist and Bayesian meta-learning are compared. For an ideal calibrated predictor, the accuracy level should match the self-reported confidence (dashed line in the plots). In can be easily checked that AI modules designed by Bayesian meta-learning (right part) are more reliable than the ones with Frequentist meta-learning (left part), validating the suitability of Bayesian meta-learning for monitoring purpose. Experimental results are obtained by considering a demodulation problem.

 

 

Fig. 3. Bayesian meta-learning (right) yields reliable decisions as compared to frequentist meta-learning (left) which can be captured via reliability diagrams [9].

Fig. 4 demonstrates the impact of Bayesian active meta-learning that successfully reduces the number of required meta-training tasks. The results are obtained by considering an equalization problem.

Fig. 4. Bayesian active meta-learning actively searches for meta-training tasks that are most surprising (left), hence increasing the task efficiency as compared to Bayesian meta-learning which randomly chooses tasks to be meta-trained.

 

References

[1] O-RAN Alliance, “O-RAN Working Group 2 AI/ML Workflow Description and Requirements,” ORAN-WG2. AIML. v01.02.02, vol. 1, 2.

[2] C. Ruah, O. Simeone, and B. Al-Hashimi, “Digital Twin-Based Multiple Access Optimization and Monitoring via Model-Driven Bayesian Learning,” arXiv preprint arXiv:2210.05582.

[3] K.M. Cohen, S. Park, O. Simeone and S. Shamai, “Learning to Learn to Demodulate with Uncertainty Quantification via Bayesian Meta-Learning,” arXiv https://arxiv.org/abs/2108.00785

[4] T. Palmer, “The Primacy of Doubt: From Climate Change to Quantum Physics, How the Science of Uncertainty Can Help Predict and Understand Our Chaotic World,” Oxford University Press, 2022.

[5] C. Finn, P. Abbeel, and S. Levine, “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,” in Proceedings of the 34th International Conference on Machine Learning, vol. 70. PMLR, 06–11 Aug 2017, pp. 1126–1135.

[6] J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn, “Bayesian Model-Agnostic Meta-Learning,” Proc. Advances in neural information processing systems (NIPS), in Montreal, Canada, vol. 31, 2018.

[7] C. Nguyen, T.-T. Do, and G. Carneiro, “Uncertainty in Model-Agnostic Meta-Learning using Variational Inference,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 3090–3100.

[8] J. Kaddour, S. Sæmundsson et al., “Probabilistic Active Meta-Learning,” Proc. Advances in Neural Information Processing Systems (NIPS) as Virtual-only Conference, vol. 33, pp. 20 813–20 822, 2020.

[9] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural Networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 1321–1330.

The Born Supremacy in Learning How to Learn

Whilst the true impact of quantum computers is anybody’s guess, there seems to be some consensus on the advantages offered by near-term devices in modeling more complex probability distributions. These distributions can be used to model complex particle interactions, e.g., in quantum chemistry, or, as we will see next, to train principled machine learning models – in this case, binary Bayesian neural networks – and enable fast adaptation to new learning tasks from few training examples.

Fig. 1. (left) A binary Bayesian neural network, i.e., a neural network with stochastic binary weights, is trained to carry out a learning task. (right) The probability distribution of the binary weights of the neural network is modelled by a Born machine, i.e., by a parametric quantum circuit (PQC), leveraging the PQC’s capacity to model complex distributions [1].

Setting

In our latest work, accepted for presentation at the IEEE MLSP, we are interested in training Bayesian binary neural networks, i.e., classical neural networks with stochastic binary weights, in a sample-efficient manner by means of meta-learning, as illustrated in Fig. 1. The key idea of this work is to model the distribution of the binary weights via a Born machine, i.e., via a probabilistic parametric quantum circuit (PQC), due to the capacity of PQCs to efficiently implement complex probability distributions [1]-[4]. We propose a novel method that integrates meta-learning with the gradient-based optimization of quantum Born machines [3], with the aim of speeding up adaptation to new learning tasks from few examples.

Born Machines

A Born machine produces random binary strings  , where    denotes the total number of model parameters, by measuring the output of a PQC  defined by parameters  .

Fig. 2. Hardware-efficient ansatz for a Born machine. All qubits are initialized in the ground state. The rotations are parametrized by the entries of the variational vector.

As illustrated in Fig. 2, the PQC takes the initial state    of n qubits as an input, and operates on it via a sequence of unitary gates described by a unitary matrix   . This operation outputs the final quantum state

which is measured in the computational basis to produce a random binary string . Note that each basis vector of the computational basis corresponds to one of all the possible 2^n patterns of model parameters  .

The PQC can be implemented using a hardware-efficient ansatz [2], in which a layer of one-qubit unitary gates, parametrized by vector , is followed by a layer of fixed, entangling, two-qubit gates. This pattern can be repeated any number of times, building a progressively deeper circuit. Another option is using the mean-field ansatz that does not use entangling gates, and only relies on one-qubit gates.

By Born’s rule (hence the name of the circuit), the probability distribution of the output model parameter vector is given by

Importantly, Born machines only provide samples, while the actual distribution above can only be estimated by averaging multiple measurements of the PQC’s outputs. Therefore, Born machines model implicit distributions, and only define a stochastic procedure that directly generates samples.

Some Results

Fig. 3 illustrate the results in terms of the prediction root mean squared error (RMSE) as a function of the number of meta-training iterations. By comparison with conventional per-task learning, the figure illustrates the capacity of both joint learning and meta-learning to transfer knowledge from the meta-training to the meta-test task, with hardware-efficient (HE) and mean-field (MF) quantum meta-learning clearly outperforming joint learning. For example, HE meta-learning requires around 150 meta-training iterations to achieve the same RMSE ideal per-task training, whilst joint-learning requires more than 200 to achieve comparable performance. The HE ansatz performs best, due to the use of entangling unitaries; however, the MF ansatz approaches the minimal RMSE after 230 iterations. The classical solution based on MF Bernoulli does not achieve lower RMSE than the quantum-aided meta-learning schemes, even with joint learning.

Fig. 3. Average RMSE for a new, meta-test, task as a function of the number of meta-training iterations. The results are averaged over 5 independent trials.

Please see the paper for a more detailed exposition, available here.

References


[1] Arute, F., Arya, K., Babbush, R., Bacon, D., Bardin, J.C., Barends, R., Biswas, R., Boixo, S., Brandao, F.G., Buell, D.A., et al.: Quantum supremacy using a programmable superconducting processor. Nature 574(7779), 505–510 (2019)
[2] Kandala, A., Mezzacapo, A., Temme, K., Takita, M., Brink, M., Chow, J.M., Gambetta, J.M.: Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549(7671), 242–246 (2017)
[3] Liu, J.G., Wang, L.: Differentiable learning of quantum circuit Born machines. Physical Review A 98(6), 062324 (2018)
[4] Sweke, R., Seifert, J.P., Hangleiter, D., Eisert, J.: On the quantum versus classical learnability of discrete distributions. Quantum 5, 417 (2021)

There is Plenty of Room at the Bottom (but How do We Learn There?)

In 1959 Richard Feynman gave an after-dinner talk at an American Physics Society meeting in Pasadena entitled “There’s Plenty of Room at the Bottom”,  crediting Edward Fredkin for inspiration.  In his talk, the transcription of which would later become a landmark paper in quantum computation and simulation [1], he takes some existing ideas — computation is a physical process, perhaps even a quantum mechanical one — and makes a particularly famous statement:

”I’m not happy with all the analyses that go with just classical theory, because Nature isn’t classical, dammit, and if you want to make a simulation of Nature, you’d better make it quantum mechanical, and by golly it’s a wonderful problem!”

But how can we simulate the quantum mechanical nature of Nature? This new kind of machine would become the quantum computer, and from then on, quantum computing has been on a journey with many ups and downs. Nowadays, excitement seems to be in the air again as quantum machine learning, a hybrid research discipline that combines machine learning and quantum computing, has emerged as a potential practical use of quantum hardware. Generally, quantum machine learning methods apply classical optimization routines to select parameters that define the operation of a quantum circuit. Alternative approaches, which may be more promising in the short term, involve hybrid quantum-classical models, where classical computation, e.g., for feature extraction, is combined with quantum parametric circuits [2].

Our Work

In our latest work, published in the IEEE Signal Processing Letters, we focus on the hybrid classical-quantum two-layer architecture illustrated in Fig. 1.

Fig. 1. In the studied hybrid classical-quantum classifier, a quantum hidden layer, fed via amplitude encoding and consisting of quantum generalized linear models (QGLMs), is followed by a classical combining output layer with a single classical GLM (CGLM) neuron. All weights and activations are binary.

In it, a first layer of quantum generalized linear models (QGLMs) is followed by a second classical combining layer. The input to the first, hidden, layer is obtained via amplitude encoding (see, e.g., [3]). Several implementations of QGLM neurons have been proposed in the literature using different quantum circuits. Given a binary input sample  and an N-dimensional vector of binary weights, the main goal of these circuits is to produce a stochastic binary output with probabilities which are a function of the inner product

between the input state and the amplitude-encoded binary weight vector

Different solutions, along with the resulting QGLM neuron’s response functions are given in the paper. For this hybrid model, we introduced a stochastic variational optimization (SVO) approach [4] that enables the joint training of quantum and classical layers via stochastic gradient descent. The proposed SVO-based training strategy operates in a relaxed continuous space of variational classical parameters.

Some Results

We show the classification accuracy, which is defined as the ratio of the number of accurate predictions over the total number of predictions made by the model, in Fig. 2 as a function of the training iterations.

 

Fig. 2. Classification accuracy as a function of the training iteration for the benchmark sign-flips scheme [5] and the proposed SVO-based procedure for the BAS data set. The results are averaged over 5 independent trials.

The proposed SVO scheme is seen to achieve high classification accuracy for all of the considered response functions. In particular, the QGLM using the Quadratic (Q) response function yields fastest convergence and achieves the best performance. Due to the additional bias terms resulting from the swap test routine, the QGLMs relying on the Biased quadratic (BQ) and Biased centered quadratic (BCQ) response functions are slower to learn, but ultimately converge after around 3000 training iterations.

Please see the paper for a more extensive presentation, available here

Code, alongside a tutorial, are available here

References

[1] R. P. Feynman et al., “Simulating physics with computers,” Int. j. Theor. phys, vol. 21, no. 6/7, 1982.
[2] A. Mari, T. R. Bromley, J. Izaac, M. Schuld, and N. Killoran, “Transfer learning in hybrid classical-quantum neural networks,” Quantum, vol. 4, p. 340, 2020.
[3] M. Schuld and F. Petruccione, Machine Learning with Quantum Computers. Springer, 2021.
[4] T. Bird, J. Kunze, and D. Barber, “Stochastic variational optimization,” arXiv preprint arXiv:1809.04855, 2018.
[5] F. Tacchino, C. Macchiavello, D. Gerace, and D. Bajoni, “An artificial neuron implemented on an actual quantum processor,” npj Quantum Information, vol. 5, no. 1, pp. 1–8, 2019

Understanding the Uncertainty of Learning to Learn

The overall predictive uncertainty of a trained predictor comprises of two main contributions: the aleatoric uncertainty arising due to inherent randomness in the data generation process and the epistemic uncertainty resulting due to limitations of available training data. While the epistemic uncertainty, also called minimum excess risk, can be made to vanish with increasing training data, the aleatoric uncertainty is independent of the data. In our recent work accepted to AISTATS 2022, we provide an information-theoretic quantification of the epistemic uncertainty arising in the broad framework of Bayesian meta-learning.

Problem Formulation

In conventional Bayesian learning, the model parameter   that describes the data generating distribution is assumed to be random and is endowed with a prior distribution. This distribution is conventionally chosen based on prior knowledge about the problem. In contrast,  Bayesian meta-learning (see Fig. 1 below) aims to automatically infer this prior distribution by observing data from several related tasks. The statistical relationship among the tasks is accounted for via a global latent hyperparameter .  Specifically, the model parameter for each observed task  is drawn according to a shared prior distribution   with shared global hyperparameter . Following the Bayesian formalism, the hyperparameter is assumed to be random and distributed according to a hyper-prior distribution .

Figure 1: Bayesian meta-learning decision problem

The data from the observed related tasks, collectively called meta-training data, is used to reduce the expected loss incurred on a test task. The test task is modelled as generated by an independent model parameter  with the same shared hyperparameter. This model parameter underlies the generation of a test task training data, used to infer the task-specific model parameter, as well as a test data sample from the test task. The Bayesian meta-learning decision problem is to predict the label corresponding to test input feature of the test task, after observing the meta-training data and the training data of the test task.

A meta-learning decision rule    thus maps the meta-training data, the test task training data and test input feature to an action space.  The Bayesian meta-risk can be defined as the minimum expected loss incurred over all meta-learning decision rules, i.e., .  In the genie-aided case when the model parameter and hyper-parameter are known, the genie-aided Bayesian risk is defined as  . The epistemic uncertainty, or minimum excess risk, corresponds to the difference between the Bayesian meta-risk and Genie-aided meta-risk as  .

Main Result

Our main result shows that under the log-loss, the minimum excess meta-risk can be exactly characterized using the conditional mutual information

where H(A|B)  denotes the conditional entropy of A given B and  I(A;B|C) denotes the conditional mutual information between A and B given C.  This in turn implies that

More importantly, we show that the epistemic uncertainty is contributed by two levels of uncertainties – model parameter level and hyperparameter level as

which scales in the order of 1/Nm+1/m, and vanishes as both the number of observed tasks and per-task data samples go to infinity. The behavior of the bounds is illustrated for the problem of meta-learning the Bayesian neural network prior for regression tasks in the figure below.

Figure 2: Performance of MEMR and derived upper bounds as a function of number of tasks and per-task data samples

 

 

 

 

 

 

 

Learning How to Adapt Power Control in Dynamic Communication Networks

Problem

An essential property of any wireless channel is the fact that it is a shared medium, much like the air through which sound propagates is shared among the participants of a conversation. As a result, communication engineers must deal with the resulting interference,  which may substantially limit the reliability and the achievable rates in a wireless communication system. A proven remedy is to adapt the transmission power to current channel conditions, which was successfully addressed by the data-driven methodology introduced in [1] in which the power control policy is parametrized by a random edge graph neural network (REGNN).

In our recent work to be presented at SPAWC 2021, we focus on the higher-level problem of facilitating adaptation of the power control policy. We consider the case where the topology of the network varies across periods of operation of the system, with each period being in turn characterized by time-varying channel conditions. In order to facilitate fast adaptation of the power control policy — in terms of data and iteration requirements — we integrate meta-learning with REGNN training.

Meta-learning Solution

Our meta-learning solution leverages channel state information (CSI) data from a number of previous periods to optimize an adaptation procedure that facilitates fast adaptation on a new topology to be encountered in a future period. We specifically adopt first-order meta-learning methods, namely first-order model agnostic meta-learning (FOMAML) [2] and REPTILE [3] that parametrize the adaptation procedure via its initialization within each period. While GNNs are known to be robust to changes in the topology, the proposed integration of meta-learning and REGNNs is shown to offer significant improvements in terms of sample and iteration efficiency.

Fig 1. Sum rate as a function of the number of samples used for adaptation, for a network with dynamic size.

Some Results

The achievable sum rate with respect to the number of CSI samples used for adaptation is illustrated in Fig. 1 for a network in which the number of transmitters and receivers changes in each period. Meta-learning, via both FOMAML and REPTILE, is seen to adapt quickly to the new topology, outperforming conventional REGNN, even when allowing for fine-tuning of the later. This significant improvement can be attributed to the variability of the topologies observed across periods in the considered scenario, which makes the joint training approach in [1] ineffective. That said, when the number of samples for adaptation is sufficiently large, conventional REGNN training as in [1] outperforms meta-learning, as the initialization obtained by meta-learning induces a more substantial bias than joint training due to the mismatch in the conditions assumed for the updates on meta-training and meta-testing tasks (i.e., the different number of samples used for meta-training and adaptation).

 

Please see the paper for more results and a more extensive analysis, which is available here

 

[1] M. Eisen and A. Ribeiro, “Optimal wireless resource allocation with random edge graph neural networks,”IEEE Transactionson Signal Processing, vol. 68, pp. 2977–2991, April, 2020.

[2] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” inProc. InternationalConference on Machine Learning (PMLR). Sydney, 6–11 August, 2017, pp. 1126–1135.

[3] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,”arXiv preprint arXiv:1803.02999, 2018.

 

“Hyper-Learning” How to Transfer from Uplink to Downlink for Massive MIMO

Problem

Most cellular deployments rely on frequency division duplex (FDD) due to its lower latency and greater coverage potential. In FDD, uplink and downlink channels use different carrier frequencies. Therefore, as illustrated in Fig. 1, with FDD, downlink channel state information (CSI) cannot be directly obtained from uplink pilots due to a lack of full reciprocity between uplink and downlink channels.

Fig.1 FDD massive MIMO system over multipath channels with partial reciprocity

The conventional solution to this problem is to leverage downlink training and feedback from the devices. This, however, generally causes a prohibitively large downlink and uplink overhead in massive multiple-input multiple-output (MIMO) systems owing to the need to transmit a pilot sequences of length proportional to the number of antennas.

State-of-the-art recent proposals to address the inefficiencies of this conventional solutions adopt machine learning (ML) tools. The use of ML is justified by the technical challenges arising from the lack of efficient optimal model-based methods.

In our recent work to be presented at SPAWC 2021, we contribute to this line of work by introducing a new ML-based solution that improves over the state of the art by leveraging partial channel reciprocity and the tool of hypernetworks.

Our approach

 

Fig. 2 The proposed HyperRNN architecture for end-to-end channel estimation based on temporal correlations and partial reciprocity

In this work, we propose a novel end-to-end architecture — HyperRNN — illustrated in Fig. 2. The main innovation of the approach is that simultaneously transmitted pilot symbols in the uplink, across multiple time slots, are leveraged to automatically extract long-term reciprocal channel features (see Fig. 1) via a hypernetwork that determines the weight of the downlink CSI estimation or beamforming network. Importantly, the long-term features implicitly underlie the discriminative mapping implemented by the hypernetwork between uplink pilots and downlink CSI estimation network, rather than estimated explicitly. The second main innovation is to incorporate recurrent neural networks (RNNs), in lieu of (feedforward) deep neural networks (DNNs) for both uplink and downlink processing in order to leverage the temporal correlation of the fading amplitudes.

Results

We compare the the normalized mean square error (NMSE) performance of our proposed HyperRNN and an earlier work based on end-to-end training procedure, downlink-based DNN (DL-DNN), which encompasses downlink pilot training, distributed quantization for the uplink and downlink channel estimation. Simulations are performed over the spatial channel model (SCM) standardized in 3GPP Release 16. Fig. 3 demonstrates the NMSE of the proposed HyperRNN and of the benchmark DL-DNN for channel estimation as a function of the number of paths. Larger performance gains can be achieved when the channel has a lower number of paths. In fact, in this regime, the invariant of the long-term features of the channel defines a low-rank structure of the channel that can be leveraged by the hypernetwork.

Fig. 3 NMSE of the HyperRNN and DL-DNN over frequency-flat fading channels having different number of paths for an FDD system

Full paper can be found here.

« Older posts