Bayesian Optimization with Formal Safety Guarantees via Online Conformal Prediction

Motivation

In the general formulation of black-box optimization problems, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. As illustrated in Fig. 1, we consider scenarios in which feedback is also provided on the safety of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process [1] [2]. Focusing on methods based on Bayesian optimization (BO), prior works provide safety guarantee that any unsafe solution is excluded with a controllable probability with respect to feedback noise. This theoretical guarantee is, however, only valid if the optimizer has access to information about the constraint function, e.g., reproducible kernel Hilbert space (RKHS) norm bound of the constraint function. In practice, specifying such information may be difficult, since the constraint function is a priori unknown.

Fig. 1. Illustration of black-box optimization with safety constraints. We provide formal safety guarantee on keeping the fraction of unsafe solutions attempted during the optimization process below some tolerated threshold.

 

Safe-BO via Online Conformal Prediction

In our recent work, to appear in IEEE Journal of Selected Topics in Signal Processing, we study for the first time leveraging online conformal prediction (CP) for providing assumptions-free guarantees on the safety level of the attempted candidate solutions, while enabling any non-zero target safety violation level. As shown in Fig. 2, we introduce Safe-BOCP that models objective function and constraint function by using independent Gaussian processes (GPs) as surrogate models, calibrating the credible intervals constructed for safe sets adaptively based on the observation history via online CP [3] [4]. The key mechanism is to use safety feedback, in the form of a well-designed safety error signal, on the reliability of past decisions to adjust the post-processing of probabilistic surrogate model’s outputs. In contrast to previous safe BO methods assuming RKHS properties of the constraint function to ensure a strict safety guarantee, Safe-BOCP adopts a “caution-increasing” back-off strategy that compensates for the uncertainty on the boundaries of the safe regions without any assumptions.

Fig. 2. Block diagram of the main steps including safe set creation, producing the safe set, and of acquisition, selecting the next iterate.

 

Experiments

We compare Safe-BOCP with the state-of-the-art SAFEOPT in a safe movie recommendation problem and plug flow reactor (PFR) optimization problem. Fig. 3 plots the histograms of the ratings across all selected movies during the optimization procedure with varying target violation rates, showing that SAFEOPT does not meet the safety requirement (red dashed line) while D-SAFE-BOCP can correctly control the fraction of unsafe movies. As shown in Fig. 4, P-SAFE-BOCP is seen to meet the target reliability level irrespective of observation noise power, while SAFEOPT can only achieve it when the observation noise power is sufficiently large.

Fig. 3. Histograms of the ratings of recommended movies by SAFEOPT, as well by D-SAFE-BOCP under different target violation rates.

Fig. 4. Probability of excessive violation rate (top) and optimality ratio (bottom) as a function of constraint observation noise power.

 

References

[1] Y. Sui, A. Gotovos, J. Burdick, and A. Krause, “Safe exploration for optimization with Gaussian processes,” in Proceedings of International Conference on Machine Learning, Lille, France, 2015.
[2] F. Berkenkamp, A. Krause, and A. P. Schoellig, “Bayesian optimization with safety constraints: Safe and automatic parameter tuning in robotics,” Machine Learning, pp. 1–35, 2021.
[3] I. Gibbs and E. Candes, “Adaptive conformal inference under distribution shift,” in Proceedings of Advances in Neural Information Processing Systems, Virtual, 2021.
[4] S. Feldman, L. Ringel, S. Bates, and Y. Romano, “Achieving risk control in online learning settings,” Transactions on Machine Learning Research, 2023.

Cross-Validation Conformal Risk Control

Motivation

Conformal risk control (CRC) [1] [2] is a recently proposed technique that applies post-hoc to a conventional point predictor to provide calibration guarantees. Generalizing conformal prediction (CP) [3], with CRC, calibration is ensured for a set predictor that is extracted from the point predictor to control a risk function such as the probability of miscoverage or the false negative rate. The original CRC requires the available data set to be split between training and validation data sets. This can be problematic when data availability is limited, resulting in inefficient set predictors. In [4], a novel CRC method is introduced that is based on cross-validation, rather than on validation as the original CRC. The proposed cross-validation CRC (CV-CRC) allows for the control of a broader range of risk functions, while proved to offer theoretical guarantees on the average risk of the set predictor, and reduced average set size with respect to CRC when the available data are limited.

Cross-Validation Conformal Risk Control

The objective of CRC is to design a set predictor with a mean risk no larger than a predefined level α, i.e.,

with test data input-label pair (x,y), and a set of N data pairs D.

The risk is defined between the  true label y and a predictive set Γ of labels.

VB-CRC generalizes VB-CP [2] in the sense it allows the risk taking arbitrary form under technical conditions such as boundness and monotonicity in the set. VB-CP is resorted when VB-CRC considers the special case of the miscoverage risk

In this work, we introduce CV-CRC, which is a cross-validation-based version of VB-CRC. In a similar manner how CV-CP [5] generalizes VB-CP, CV-CRC generalizes VB-CRC. See Fig. 1 for illustration.

Fig. 1. (top) validation-based CRC (bottom) the proposed method, CV-CRC.

In the top panel of Fig. 2, VB-CRC is shown as the outcome of available data split into training data and validation data. The former is used to train a model, while the latter is used to post process and control a threshold λ. Upon test input x, a predictive set Γ of labels y’s is formed. In the bottom panel, CV-CRC is illustrated as a generalization. Available data is split K≤N folds, and K leave-fold-out models are trained. Then, K predictive sets are formed and merged via a threshold that is set via the trained models and the left-fold-out data.

Fig. 2. (top) validation-based CRC (bottom) the proposed method, CV-CRC.

Experiments

To illustrate the main theorem that the risk guarantee (1) is met, while the average set sizes are expected to reduce, two experiments were conducted. The first is vector regression using maximum-likelihood learning, and is shown in Fig. 3.

Fig. 3. VB-CRC and CV-CRC for the vector regression problem.

The second problem is a temporal point process prediction, where a point process set predictor aims to predict sets that contain future events of a temporal process with false negative rate of no more than a predefined α. As can be seen, in both problems, CV-CRC is shown to be more data-efficient in the small data regime, while holding the risk condition (1).

 

Fig. 4. VB-CRC and CV-CRC for the temporal point process prediction problem.

Full details can be found at ISIT preprint [4].

References

[1] A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster, “Conformal Risk Control,” in The Twelfth International Conference on Learning Representations, 2024.

[2] S. Feldman, L. Ringel, S. Bates, and Y. Romano, “Achieving Risk Control in Online Learning Settings,” Transactions on Machine Learning Research, 2023.

[3] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic Learning in a Random World. Springer, 2005, springer, New York.

[4] K. M. Cohen, S. Park, O. Simeone, and S. Shamai Shitz, “Cross-Validation Conformal Risk Control,” accepted to IEEE International Symposium on Information Theory Proceedings (ISIT2024), July 2024.

[5] R. F. Barber, E. J. Candes, A. Ramdas, and R. J. Tibshirani, “Predictive Inference with the Jackknife+,” The Annals of Statistics, vol. 49, no. 1, pp. 486–507, 2021.

Generalization and Informativeness of Conformal Prediction

Motivation

When using a machine learning model to make important decisions, like in healthcare, finance, or engineering, we not only need accurate predictions but also want to know how sure the model is about its answers [1-3]. CP offers a practical solution for generating certified “error bars”—certified ranges of uncertainty—by post-processing the outputs of a fixed, pre-trained base predictor. This is crucial for safety and reliability. At the upcoming ISIT 2024 conference, we will present our research work, which aims to bridge the generalization properties of the base predictor with the expected size of the set predictions, also known as informativeness, produced by CP. Understanding the informativeness of CP is particularly relevant as it can usually only be assessed at test time.

Conformal prediction

Figure 1: Conformal prediction (CP) set predictors (gray areas) obtained by calibrating a base predictor with a higher generalization error on the left and a lower generalization error on the right. Thanks to CP, both set predictors satisfy a user-defined coverage guarantee, but the inefficiency, i.e., the average prediction set size, is larger when the generalization error of the base predictor is larger.

The most practical form of CP, known as inductive CP, divides the available data into a training set and a calibration set [4]. We use the training data to train a base model, and the calibration data to determine the prediction sets around the decisions made by the base model. As shown in Figure 1, a more accurate base predictor, which generalizes better outside the training set, tends to produce more informative sets when CP is applied.

Results

Figure 2: Bound on the average set size for different values of training and calibration data set sizes as a function of the target reliability level. Increasing the number of calibration data points causes the bound to converge exponentially fast to a function (black line) that is increasing in and decreasing in the amount of training data.

Our work’s main contribution is a high probability bound on the expected size of the predicted sets. The bound relates the informativeness of CP to the generalization properties of the base model and the amount of available training and calibration data. As illustrated in Figure 2, our bound predicts that by increasing the amount of calibration data CP’s efficiency converges rapidly to a quantity influenced by the coverage level, the size of the training set, and the predictor’s generalization performance. However, for finite amount of calibration data, the bound is also influenced by the discrepancy between the target and empirical reliability measured over the training data set. Overall, the bound justifies a common practice: allocating more data to train the base model compared to the data used to calibrate it.

Figure 3: Normalized empirical CP set size for a multi-class classification problem on the MNIST data set as a function of the reliability level and for different sizes of the calibration and training data sets.

Since what really proves the worth of a theory is how well it holds up in real-world testing, we also compare our theoretical findings with numerical evaluations. In our study, we looked at two classification and regression tasks. We ran CP with various splits of calibration and training data, then measured the average efficiency. As shown in the Figure 3, the empirical results from our experiments matched up nicely with what our theory predicted in Figure 2.

References

[1] A. L. Beam and I. S. Kohane, “Big data and machine learning in health care,” JAMA, vol. 319, no. 13, pp. 1317–1318, 2018.

[2] J.. W. Goodell, S. Kumar, W. M. Lim, and D. Pattnaik, “Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis,” Journal of Behavioral and Experimental Finance, vol. 32, p. 100577, 2021.

[3] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model predictive control: Toward safe learning in control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 3, pp. 269–296, 2020.

[4] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic learning in a random world, vol. 29. Springer, 2005.

Empowering Wireless Digital Twins with Ray Tracing Simulations

At the crossroad between simulation and machine learning, digital twin systems are envisioned to bridge the theoretical guarantees of model-based approaches with the flexibility of data-driven methods. However, one major concern is whether insights drawn from the simulation can still apply to the real world. Embodying both the opportunities and challenges of simulation intelligence, we believe that ray tracing will drive the understanding of signal propagation in the next generation of wireless digital twins, while relying on machine learning to cope with the diversity of real-world materials and inaccuracies in the available geometry.

Wireless Reliable Federated Inference

Written by Meiyi Zhu during her visit to KCLIP.

Motivation

Consider a wireless federated inference scenario in which the devices and a server share a pre-trained machine learning model, e.g., trained via federated learning. The server wishes to make an inference on its own new input based on such a pre-trained machine learning model. Note that the server has no access to the data; the data is only presented at the devices. This scenario is common in practice. For example, a personal healthcare system would first train the respective model via federated learning, without acquiring personal data from the end users; while upon achieving a trained healthcare model, wishes to provide useful solution to new users. We will assume that new users ask queries to the central server, while the general conclusion made in this article retains even for the case in which the new user has its own access to the pre-trained model.

However, depending on the quality of the pre-trained model, e.g., lack of data, the solution provided by the pre-trained model may yield wrong decisions. More importantly, such model is likely to yield unreliable decisions; see, e.g., our previous post ‘Is Accuracy Sufficient for AI in 6G? (No, Calibration is Equally Important)’. As reliability plays an important role in various fields including healthcare monitoring and autonomous vehicle navigation, it is important to find ways to make the federated inference reliable. But how can we make the pre-trained model reliable as the central server has no access to the data at all?

Recent work has introduced federated conformal prediction (CP), which improves the reliability of the server’s decision by utilizing available held-out local data at each device, of course, without central server’s access to such data. The goal of federated CP is to provide a guaranteed interval or set of potential outputs that contains the correct answer at a predefined reliability level [1, 2]. As a state-of-the-art solution, reference [1] proposed a quantile-of-quantile (QQ) scheme, referred to as FedCP-QQ, whereby each device computes and communicates a pre-determined quantile of the local losses. However, existing work assumed noise-free communication between the server and the devices, whereby devices can communicate a single real number to the server.

Wireless Federated Conformal Prediction

In our recent work, to appear in Transactions on Signal Processing, we study for the first time federated CP in a wireless setting, as illustrated in Fig. 1. Specifically, we introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction scheme.

Fig. 1. Illustration of the wireless reliable federated inference problem under study.

TBMA is a multiple access scheme that aims at recovering aggregated statistics, rather than individual messages [3]. By noting that federated CP also requires aggregated statistics across the devices, i.e., quantile, we have proposed to apply TBMA for WFCP. More precisely, as illustrated in Fig. 2, TBMA enables the estimate of the global histogram of data available across all devices without having to separately estimate the histograms of all devices. Specifically, each histogram bin is assigned an orthogonal codeword and the server can estimate the global histogram thanks to the superposition property of wireless communications. In this way, WFCP enables a direct estimate of the global quantile at the server without imposing bandwidth requirements that scale linearly with the number of active devices like FedCP-QQ. Rather, the communication requirements of WFCP are only dictated by the precision with which the signals are represented for transmission to the server, i.e., the length of each codeword.

Fig. 2. Illustration of the TBMA enabled communication model.

The other key technical challenge tackled in our work is the derivation of a novel quantile correction approach that ensures the reliability of the set predictor despite the presence of channel noise.

Experiments

We evaluate our proposed WFCP on CIFAR-10 data set over Rayleigh fading channels. We show here one of the results that plots the performance gains of WFCP in the presence of limited communication resources. In Fig. 3, we evaluate the performance of WFCP and our implementation of existing FedCP-QQ (DQQ) over wireless channels using finite blocklength information theory as a function of SNR. As SNR increases, both WFCP and DQQ maintain the target reliability level, while offering a decreasing prediction set size. Across all the SNRs, WFCP generates a more informative predicted set than DQQ, and it approaches the performance of the centralized CP. Please refer to our paper for more details.

 

Fig. 3. Empirical coverage and normalized empirical inefficiency of centralized CP, WFCP, and digital implementation of existing FedCP-QQ [1].

References

[1] P. Humbert, B. Le Bars, A. Bellet, and S. Arlot, “One-shot federated conformal prediction,” ICML 2023

[2] C. Lu and J. Kalpathy-Cramer, “Distribution-free federated learning with conformal predictions,” arXiv:2110.07661, 2021

[3 G. Mergen and L. Tong, “Type based estimation over multiaccess channels,” IEEE TSP 2006

Safe Model Predictive Control via Reliable Time-Series Forecasting

Motivation

The control of dynamical systems is the backbone of modern technologies, ranging from industrial processes to autonomous vehicles. In many of these scenarios, systems must be controlled while satisfying a set of safety and reliability constraints with respect to the unknown evolution of a target process. For example, as illustrated in Figure 1, autonomous vehicles or unmanned aerial vehicles (UAVs) must plan their trajectory while maintaining a safe distance from other vehicles or obstacles. To this end, predictions about the future evolution of the system must be used. In this context, a primary challenge is to ensure safety and reliability in the face of predictions that are often uncertain.

Figure 1: UAV tracking problem, an example of model predictive control in which the UAV must plan its path based on the unknown evolution of the object to be tracked.

Probabilistic Time Series-Conformal Risk Prediction

To support the deployment of reliable control mechanisms for dynamical system, in our work we have recently proposed probabilistic time series-conformal risk prediction (PTS-CRC). PTS-CRC is a novel post-hoc calibration procedure that operates on the predictions produced by any pre-designed probabilistic forecaster to yield reliable time series prediction sets. As illustrated in Figure 2, PTS-CRC generates predictive sets based on an ensemble of multiple prototype trajectories sampled from the probabilistic model, supporting the efficient representation of forking uncertainties. This contrasts with previous solutions that apply Conformal Prediction[1] to deterministic predictors (TS-CP)[2], which are bounded to produce compact prediction sets. Furthermore, sets produced by PTS-CRC can be calibrated to satisfy a wide array of reliability definitions, beyond the standard one of coverage.

Figure 2: Construction of a prototype-based set predictor based on 3 prototypical sequences.

PTS-CRC Based Model Predictive Control

Based on the reliability properties of PTS-CRC predictions, we devise a novel Model Predictive Control (MPC) framework that addresses open-loop and closed-loop control problems under general average constraints on the quality or safety of the control policy. The key idea is to derive the control by replacing constraints that depend on the unknown dynamics of the target process with those depending on the predictive sets output by PTS-CRC. The reliability requirements of PTS-CRC predictions translate into reliability requirements for the original control problem.

Experiments

We apply PTS-CRC-based MPC to wireless networking problems, specifically focusing on a scenario where a base station must modulate its future power allocation based on the unknown evolution of channel conditions. For instance, we address the challenge of controlling transmit power to maximize the communication rate at an unlicensed user while adhering to a safety requirement, expressed as the maximum interference experienced by a licensed user. By employing PTS-CRC, we can replace the unknown system evolution with efficient multimodal predictive sets that more effectively capture multimodal channel evolution compared to TS-CP (Figure 3). As exemplified in Figure 4, PTS-CRC-based power control leads to power allocations that achieve a higher communication rate compared to TS-CP.

Figure 3: Comparison between the prediction sets of TS-CP and PTS-CRC for the problem of channel gain evolution forecasting.

Figure 4: Comparison between the power control solution obtained using PTS-CRC and TS-CP based MPC.

References

[1] Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. “Algorithmic learning in a random world,” Vol. 29. New York: Springer, 2005.

[2] Stankeviciute, Kamile, Ahmed M Alaa, and Mihaela van der Schaar. “Conformal time-series forecasting.” Advances in neural information processing systems 34, 2021.

[3] Zecchin, Matteo, Sangwoo Park, and Osvaldo Simeone. “Forking Uncertainties: Reliable Prediction and Model Predictive Control with Sequence Models via Conformal Risk Control.” arXiv preprint arXiv:2310.10299, 2023.

Time-Varying Quantum Channel Simulation via Programmable Quantum Computers using Online Learning

A quantum computer can be programmed to carry out a given functionality in different ways, including the direct engineering of pulse sequences, the design of parametric quantum circuits via quantum machine learning (QML) [1], the use of adaptive measurements on cluster states, and the optimization of a program state operating on a fixed quantum processor. A fundamental result derived in [2] states there is no universal programmable quantum processor that operates with finite-dimensional program states. Since a quantum processor is universal if it can implement any quantum operation, this conclusion implies that the exact simulation of an arbitrary quantum channel on a single programmable quantum processor is impossible. This, in turn, highlights the importance of developing tools for the optimization of quantum programs. Reference [3] addressed the problem of approximately simulating a given quantum channel using a finite-dimensional program state.

 

Fig. 1: Fig. 1. Time-varying quantum channel ε^t (top) and its simulation ε_(π^t ) via a programmable quantum processor Q controlled by the time-varying program state π^t (bottom).

 

In our recent work, presented at IEEE ITW 2023, we study the more challenging setting illustrated in Fig. 1, in which the channel to be simulated varies over time. We adopt a worst-case formulation in which the channel variation is arbitrary and chosen by “nature” in a possibly adversarial way. To study this setting, we propose to adopt the framework of online convex optimization [4], which provides tools to track the optimal solution of time-varying convex problems. We specifically develop and analyze an online mirror descent algorithm over the space of positive definite matrices, yielding a matrix exponentiated gradient descent (MEGD) [5]. We prove that the regret of MEGD with respect to an optimized fixed program state is sublinear in time.

 

Experiments

Fig. 2: Generalized teleportation processor as a programmable processor Q
operating on one input qubit (n=1) and on a two-qubit program state π
(n_π=2).

We conduct experiments by adopting the generalized teleportation processor (GTP), shown in Fig. 2, as the programmable quantum processor. GTP can simulate exactly the class of teleportation-covariant channels, modeling Pauli and erasure channels, and is operated here in an adversarial setting with time varying dephasing channels. Fig.3 plots the normalized regret as a function of time T. We observe that, MEGD is able to obtain a normalized regret that decreases sublinearly with T, hence approaching the performance of the reference program that would have been optimal in hindsight.

 

Fig. 3: Normalized regret as a function of time T for MEGD when simulating a time-varying dephasing channel with dephasing probabilities drawn independently and uniformly at each time in the interval [0.2,p_max) (setting p_max=0.2 models a constant channel).

[1] O. Simeone, “An introduction to quantum machine learning for engineers”,
Foundations and Trends in Signal Processing, vol. 16, no. 1-2, pp. 1–223, 2022.

[2] M. A. Nielsen and I. L. Chuang, “Programmable quantum gate arrays”, Phys. Rev. Lett., vol. 79, pp. 321–324, Jul 1997.

[3] L. Banchi, J. Pereira, S. Lloyd, and S. Pirandola, “Convex optimization of programmable quantum computers”, npj Quantum Information, vol. 6, no. 1, pp. 1–10, 2020.

[4] F. Orabona, “A modern introduction to online learning”, CoRR, vol. abs/1912.13213, 2019.

[5] K. Tsuda, G. Ratsch, and M. K. Warmuth, “Matrix exponentiated gradient updates for on-line learning and Bregman projection”, Journal of Machine Learning Research, vol. 6, no. 34, pp. 995–1018, 2005.

Distributed Quantum Entanglement Distillation via Quantum Machine Learning

Quantum networking, and with it the quantum Internet, rely on the management and exploitation of entanglement. In fact, entangled qubits enable fundamental quantum communication primitives such as teleportation and superdense coding. Practical sources of entangled qubits, such as single-photon detection, are imperfect, producing mixed states with reduced fidelity as compared to ideal, fully entangled, Bell pairs. In order to enhance the fidelity of entangled qubits available at distributed parties, entanglement distillation protocols leverage local operations and classical communication (LOCC). While existing solutions, such as DEJMPS protocol [1] and LOCCNet [2], assume ideal classical communications, we study the case in which communications between the parties holding imperfectly entangled qubits are noisy. As illustrated in Fig. 1, to address this more challenging scenario, we propose the use of quantum machine learning (QML) [3] via parameterized quantum circuits (PQCs).

 

Fig. 1: Entanglement distillation at two quantum-enabled devices (Alice and Bob) aided by a noisy classical communication channel to a third party (Charlie).

 

Noise Aware-LOCCNet (NA-LOCCNet)

In our recent work, accepted for presentation at IEEE ICASSP 2023, we introduce NA-LOCCNet, as shown in Fig. 2, which improves average output fidelity while accounting for the channel errors.

 

Fig. 2: Proposed Noise Aware-LOCCNet (NA-LOCCNet) circuit for distilling two S states.

 

Experiments

Fig.3 plots average output fidelity as a function of bit-flip probability of noisy channel for a given input fidelity, whereas Fig. 4 plots the same quantity as a function of input fidelity for a given bit flip probability of noisy channel. In both the figures NA-LOCCNet performs far better than the state of the art protocols.

 

Fig. 3: Average output fidelity as a function of the bit flip probability p of the noisy classical channels from Alice and Bob to Charlie for input fidelity F = 0.6.

 

Fig. 4: Average output fidelity, conditioned on a successful distillation, as a function of the input fidelity F for bit flip probability p = 0.25 on the noisy classical channels from Alice and Bob to Charlie. The black dashed line corresponds to the reference performance of a scheme that simply outputs the input state.

 

In our another recent work, published in Entropy, we have extended the NA-LOCCNet framework to the problem of quantum state discrimination.

 

[1] D. Deutsch, A. Ekert, R. Jozsa, C. Macchiavello, S. Popescu, and A. Sanpera, “Quantum privacy amplification and the security of quantum cryptography over noisy channels”, Phys. Rev. Lett., vol. 77, pp. 2818– 2821, Sep 1996.

[2] X. Zhao, B. Zhao, Z. Wang, Z. Song, and X. Wang, “Practical distributed quantum information processing with LOCCNet,” Quantum Information, vol. 7, no. 1, pp. 1–7, 2021.

[3] O. Simeone, “An introduction to quantum machine learning for engineers”,
Foundations and Trends in Signal Processing, vol. 16, no. 1-2, pp. 1–223, 2022.

How to Turn an Unreliable Predictor into a Reliable Scheduler

Motivation

Servicing ultra-reliable and low-latency communication (URLLC) traffic typically calls for a pre-emptive allocation of resources in order to meet stringent delay constraints. A conservative static allocation of resources for URLLC may guarantee desired levels of reliability and latency, but this comes at the expense of other services, most notably enhanced mobile broadband (eMBB), which cannot use the resources reserved for URLLC. A dynamic allocation of resources, while potentially more efficient, is made challenging by the stochastic nature of URLLC data packet generation. A promising solution is the adoption of predictors of URLLC data packet generation. Concretely, with reference to Fig. 1, a base station can deploy a predictor of URLLC data packet generation for the following frame, so as to guide the adaptive allocation of slots for URLLC packets, leaving the other slots available for eMBB users.

 

Background

URLLC traffic

A URLLC traffic must hold two restrictions:

  1. Ultra-Reliability – a portion of at least 1-α of all generated packets must be scheduled for transmission.
  2. Low-Latency – Each packet should have a unique schedule resource, no later than a predefined acceptable latency.

Fig 1 (a) URLLC traffic ground true generation patterns; (b) using a predictor that underestimates the traffic dynamics leads to unreliable URLLC allocation. CP-based is able to compensate successfully; (c) using a predictor that overestimates the traffic dynamics leads to overreliable URLLC allocation, i.e., low eMBB efficiency. CP-based is able to compensate successfully this as well. 

Online Conformal Prediction

CP is a class of post-hoc calibration methods that transform standard probabilistic model into a set predictor that is guaranteed to contain the true target with probability no smaller than a predetermined coverage level [1]. Online CP alleviates the limitation of conventional CP of requiring a separate calibration data at the cost of providing time-averaged, rather than ensemble, reliability guarantees [2,3]. The adoption of CP in communication engineering was proposed in [Cohen2023ICASSP], which focused on wireless applications such as symbol demodulation, modulation classification, and received signal strength prediction.

Guaranteed Dynamic Scheduling

In our new work [4], accepted at IEEE Signal Processing Letters, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor.

Fig. 2(a) illustrates the frame-based segmentation. Fig. 2(b) shows 4 URLLC generated packets and 6 pre-emptively allocated URLLC resources, yet the latest packet is not allocated a resource within the allowed latency. In contrast, Fig. 2(c) show an allocation that meets the constraints, even though the number of URLLC resources are smaller. This leaves a better portion for eMBB traffic.

Fig. 2 (a) Frame-based timelinel; (b) miscovered allocation; (c) well-covered allocation.

 

 

The proposed method leverages recent advances in online CP, and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer. To this end, we adjust a threshold that changes between frames on the basis of a reliability condition, that controls how conservative the predictor of the next frame is.

 

Experiments

We consider two mismatched predictors: the first underestimates the dynamic of changes the URLLC traffic, while the second overestimates.

 

Fig. 3 investigates of the impact of such mismatches between URLLC model parameter and ground-truth model parameter. For some parameters values of mismatch, the conventional scheduler does not hold reliability to the desired level, while for the other it may result in over reliability. The conventional scheduler is significantly affected by a mismatch between predictor and ground-truth packet generation mechanism, yielding either ill empirical coverage (below 1-α) or over coverage. In contrast, the CP-based predictor is able to flatten the coverage to asymptotically reach the long-term target 1-α.

.

 

Fig. 3 Empirical URLLC reliability rate and eMBB efficiency. CP-based scheduler flattens out the coverage

 

Full details can be found at this SPL preprint [4].

 

[1] Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. “Algorithmic learning in a random world,” Vol. 29. New York: Springer, 2005.

[2] Gibbs, Isaac, and Emmanuel Candes. “Adaptive conformal inference under distribution shift.” Advances in Neural Information Processing Systems 34 (2021): 1660-1672.

[3] Feldman, Shai, Stephen Bates, and Yaniv Romano. “Conformalized Online Learning: Online Calibration Without a Holdout Set.” arXiv preprint arXiv:2205.09095 (2022).

[4] Cohen, Kfir M., Sangwoo Park, Osvaldo Simeone, Petar Popovski, and Shlomo Shamai. “Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction.” To appear in Signal Processing Letters, [online] arXiv preprint arXiv:2302.07675 (2023).

Making a Demodulator Trustworthy via Conformal Prediction

Motivation

Artificial intelligence (AI) models typically report a confidence measure associated with each prediction, which reflects the model’s self evaluation of the accuracy of a decision. Notably, neural networks implement probabilistic predictors that produce a probability distribution across all possible values of the output variable. As an example, Fig. 1 illustrates the operation of a neural network-based demodulator, which outputs a probability distribution on the constellation points given the corresponding received baseband sample. The self-reported model confidence, however, may not be a reliable measure of the true, unknown, accuracy of the prediction, in which case we say that the AI model is poorly calibrated. Poor calibration may be a substantial problem when AI-based decisions are processed within a larger system such as a communication network.

 

Fig. 1 Accuracy and calibration are different properties of probabilistic predictiors.

Set Predictors

A set predictor is defined as a set-valued function that maps an input to a subset of the output domain based on a data set. As illustrated in the example of Fig. 1, it depends in general on an input, and can be taken as a measure of the uncertainty of the predictor. The performance of a set predictor is evaluated in terms of coverage and inefficiency. Coverage refers to the probability that the true label is included in the predicted set; while inefficiency refers to the average size of the predicted set. There is a clear a trade-off between two metrics.

Given a probabilistic predictor, one can construct a set predictor by relying on the confidence levels reported by the model. To this end, one can construct the smallest subset of the output domain that covers a fraction 1 − α of the probability designed by the trained model given an input. For poorly calibrated predictors, this approach cannot satisfy the coverage condition for the given desired miscoverage level α.

 

Conformal Prediction

In our new work [3], presented at ICASSP2023, we applied three different conformal prediction schemes for a demodulation problem:

  1. Validation-based (VB) [1] – which partitions the available data set into training and validation sets. Uses the first set to train a model, and the second for calibration purpose.
  2. Cross-Validation-based (CV) [2] – which trains multiple models, each using all the available data set excluding one data point, that acts as a validation example. While increasing computational complexity, in general it reduces the inefficiency of the predictive sets.
  3. K-fold CV-based (K-CV) [2] – which cross-validates using a fold rather than a single point. K different models are trained using a leave-fold-out approach. This is a generalization of CV-CP set predictors that strike a balance between complexity and inefficiency by reducing the total number of model training phases to K.

 

Experiments

Fig. 2 shows the empirical coverage level and Fig. 3 shows the empirical inefficiency as a function of the size N of the available data set D. From Fig. 2, we first observe that the naïve set predictor, with both frequentist and Bayesian learning, does not meet the desired coverage level in the regime of a small number N of available samples. In contrast, all CP methods provide coverage guarantees, achieving coverage rates at least 1 − α. From Fig. 3, we observe that the size of the predicted sets, and hence the inefficiency, decreases as the data set size increases. Furthermore, due to their efficient use of the available data, CV and K-CV predictors have a lower inefficiency as compared to VB predictors. Finally, Bayesian NC scores are generally seen to yield set predictors with lower inefficiency, confirming the merits of Bayesian learning in terms of calibration.

Overall, the experiments confirm that all the CP-based predictors are all well-calibrated with small average set prediction size, unlike naïve set predictors that built directly on the self-reported confidence levels of conventional probabilistic predictors.

Fig. 2 Empirical coverage as function of data set size

Fig. 3 Empirical inefficiency as function of data set size

 

 

Please see preprint of the ICASSP23 paper for full details.

 

[1] Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. “Algorithmic learning in a random world,” Vol. 29. New York: Springer, 2005.

[2] Barber, Rina Foygel, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. “Predictive inference with the jackknife+.” (2021): 486-507.

[3] Cohen, Kfir M., Park, Sangwoo,  Simeone, Osvlado, and Shamai, Shlomo (Shitz). “Calibrating AI Models for Wireless Communications via Conformal Prediction,” to appear in ICASSP 2023 [Online]. Available: https://arxiv.org/abs/2212.07775

 

« Older posts