Category: Uncategorized (Page 1 of 3)

LiRA Membership Inference: An Uncertainty and Calibration Perspective

07/22/2025 / Meiyi Zhu / 0 Comments

Motivation

Membership Inference Attacks (MIAs) pose a critical threat to machine learning models, as they enable adversaries to determine whether a particular data point was used during training. This threat becomes especially concerning in real-world settings like smart healthcare, where even a single query to a diagnostic model could reveal whether a patient’s medical record was part of the training set.

Fig. 1 Illustration of MIAs under different disclosure settings.

Depending on what the model reveals to the adversary, MIAs can take different forms. As shown in Fig. 1, in the most common setting, the attacker observes the full confidence vector (CV) output by the model for a given input. Other settings consider more restricted disclosures, such as only the true label confidence (TLC) or a decision set (DS), which contains all labels whose predicted confidence exceeds a certain threshold. Each of these disclosure modes enables different levels of privacy leakage and has motivated a range of attack strategies.

To defend against such attacks, it is essential to understand their underlying mechanisms. In other words, “know the enemy to protect oneself.” While prior work has provided valuable insights through empirical studies and theoretical analysis, several limitations remain. First, most analyses focus narrowly on TLC, overlooking other common disclosures like CV and DS. Second, existing frameworks are often tied to specific assumptions, such as differential privacy or Bayesian attackers, limiting generality. Third, prior work emphasizes tight performance bounds, rather than uncovering how key factors shape attack success. Fourth, many studies focus on high-level variables like architecture or data size, without linking them to fundamental sources of privacy leakage such as uncertainty and calibration.

We seek to understand the principled, theoretical underpinnings of why MIAs succeed in the first place through the lens of uncertainty and calibration. By formally characterizing the attacker’s advantage and linking it to intrinsic properties of the model, we provide a roadmap for privacy-aware model design that is explainable and generalizable.

Theoretical Analysis on MIAs

Fig.2 Illustration of LiRA-style attacks.

As shown in Fig. 2, we consider a class of state-of-the-art likelihood-ratio attacks (LiRA), which estimate the likelihood that a given data point was used for training by comparing the outputs of two shadow models: one trained with the target point and one without [1]-[4]. This process can be formulated as a binary hypothesis testing problem, where the attacker aims to distinguish whether the input originates from the training set (in-sample) or not (out-of-sample), based on the model’s output. The membership inference advantage, which serves as our central privacy metric, is defined as the difference between the true negative rate (TNR) and the false negative rate (FNR).

To analyze this advantage, we adopt a distributional view of the model outputs and apply information-theoretic tools. Specifically, we derive a general upper bound on the MIA advantage in Lemma 2, showing that it is controlled by the Kullback-Leibler (KL) divergence between the output distributions of the in-sample and out-of-sample shadow models. This result provides a unified way to quantify the attacker’s ability to distinguish between the two hypotheses based on distributional separation.

To interpret this bound in terms of model properties, we model the output confidence vector as a Dirichlet distribution. Within this framework, we explicitly incorporate three key factors: calibration error, aleatoric uncertainty, and epistemic uncertainty. The calibration error measures the mismatch between predicted probabilities and the true label distribution (Eq. 24). Aleatoric uncertainty captures the intrinsic randomness in the data (Eq. 23), while epistemic uncertainty reflects model uncertainty caused by limited or imperfect information available during model training (Eq. 28). Conversely, we then express the Dirichlet parameters in terms of these three factors in Eqs. (29)-(30). This formulation enables us to substitute the resulting distributions into the KL-based upper bound in Lemma 2, forming the basis for the analysis in the following.

Building on the above formulation, we further instantiate the general bound of MIA advantage under three commonly studied information disclosure settings: CV, TLC, and DS. For each case, we derive an explicit upper bound on the MIA advantage, presented in Propositions 1, 2, and 3, respectively. Each bound is expressed in terms of the three factors, enabling a detailed analysis of how these quantities influence the success of LiRA-style attacks.

Experiments

Fig.3 True advantage and bounds for the MIA advantage with CV, TLC, and DS observations as a function of relative calibration error (left), aleatoric uncertainty (middle), and epistemic uncertainty (right).

We validate our framework on CIFAR-10 with a standard convolutional neural network (CNN) and on CIFAR-100 with a ResNet classifier. Our theoretical results reveal consistent trends across all disclosure settings. As shown in Fig. 3, as the calibration error decreases, the aleatoric uncertainty increases, and the epistemic uncertainty increases, the MIA advantage becomes smaller, indicating that the model is more resistant to LiRA-style attacks. Moreover, the level of privacy risk decreases with less informative outputs, following the order CV, TLC, DS. These insights suggest practical strategies for improving privacy from the ground up, as further discussed in Appendix F.

Please refer to our paper at this link for more details.

References

[1] N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramer, “Membership inference attacks from first principles,” in Proc. IEEE Symp. Secur. Privacy, May. 2022, pp. 1897–1914.

[2] H. Ali, A. Qayyum, A. Al-Fuqaha, and J. Qadir, “Membership inference attacks on DNNs using adversarial perturbations,” arXiv preprint arXiv:2307.05193, Jul. 2023.

[3] J. Ye, A. Maddi, S. K. Murakonda, V. Bindschaedler, and R. Shokri, “Enhanced membership inference attacks against machine learning models,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Nov. 2022, pp. 3093–3106.

[4] S. Zarifzadeh, P. Liu, and R. Shokri, “Low-cost high-power membership inference attacks,” arXiv preprint arXiv:2312.03262, Jun. 2024

Adaptive Learn-Then-Test

05/02/2025 / Matteo Zecchin / 0 Comments

Motivation

Hyperparameter selection is a fundamental step in deploying machine learning models, aimed at assessing whether a model meets specified requirements in terms of performance, robustness, or safety. Recent approaches based on the Learn-Then-Test (LTT) [1] framework formulate this task as a multiple hypothesis testing procedure. For each candidate hyperparameter, LTT tests whether the corresponding model meets a target reliability level by evaluating it on multiple instances of the task (e.g., deploying the model in real-world scenarios). Despite its theoretical guarantees, LTT supports only non-adaptive testing, where all evaluation decisions and the length of the testing phase must be fixed in advance. This rigidity limits its practical utility in safety-critical environments, where minimizing the cost of testing is essential.

E-process-based testing

To overcome this limitation, our recent work—accepted at ICML 2025—introduces adaptive Learn-Then-Test (aLTT), a statistically rigorous, sequential testing framework that enables efficient, data-driven hyperparameter selection with provable reliability guarantees. The core innovation behind aLTT is its use of e-process-based multiple hypothesis testing [2], which replace the traditional p-value-based testing employed in LTT. E-processes support sequential, data-adaptive hypothesis testing while maintaining formal statistical guarantees.

Practically speaking, as illustrated in Figure 1, this means that at each testing round, the experimenter can decide—based on the accumulated evidence—whether to continue testing specific hyperparameters or to stop if a sufficiently large set of reliable candidates has been identified. All of this is achieved without sacrificing the statistical guarantees of the procedure in terms of family-wise error rate (FWER) or false discovery rate (FDR) control. This stands in sharp contrast to p-value-based approaches, where such flexibility would invalidate the statistical guarantees of the procedure. An insidious problem known as p-hacking.

Figure 1: aLTT enables data-adaptive testing and flexible termination rules. At each testing round, based on the accumulated evidence, it is possible to decide which hyperparameters to test next and whether to continue testing.

Automated Prompt Engineering

The aLTT framework is broadly applicable to any setting where reliable configuration must be achieved under limited testing budgets. In our paper, we demonstrate its effectiveness in three concrete domains: configuring wireless network policies, selecting offline reinforcement learning strategies, and optimizing prompts for large language models. In the prompt engineering setting [3], the goal is to identify instructions (prompts) that consistently lead an LLM to generate accurate, relevant, or high-quality responses across tasks. Since each prompt must be tested by running the LLM—often a costly operation—efficiency is critical. aLTT enables the sequential testing of prompts, adaptively prioritizing those that show early promise and terminating the process once enough reliable prompts are found. As shown in Figure 2, this not only reduces the computational burden (yielding a higher true discovery rate under the same testing budget), but also leads to the discovery of shorter, more effective prompts—a valuable property in latency-sensitive or resource-constrained environments. The result: fewer evaluations, higher-quality prompts, and rigorous statistical reliability.

(Left) True positive rate as a function of the testing horizon attained by aLTT with $\epsilon$-greedy exploration and LTT. (Right) Length of the shortest prompt in the predicted set of reliable hyperparameters retuned by aLTT and LTT. aLTT needs fewer testing round to return high quality and short prompts

References

[1] Angelopoulos AN, Bates S, Candès EJ, Jordan MI, Lei L. Learn then test: Calibrating predictive algorithms to achieve risk control. arXiv preprint arXiv:2110.01052. 2021 Oct 3.

[2] Xu Z, Wang R, Ramdas A. A unified framework for bandit multiple testing. Advances in Neural Information Processing Systems. 2021 Dec 6;34:16833-45.

[3] Zhou Y, Muresanu AI, Han Z, Paster K, Pitis S, Chan H, Ba J. Large language models are human-level prompt engineers. InThe Eleventh International Conference on Learning Representations 2022 Nov 3.

Neuromorphic Wireless Split Computing with Multi-Level Spikes

03/29/2025 / Jiechen Chen / 0 Comments

Motivation

Current AI hardware platforms, such as GPUs, face significant limitations in scalability and energy efficiency, especially for edge deployments. This challenge motivates the exploration of alternative computing paradigms, among which neuromorphic computing has emerged as a promising candidate. Neuromorphic computing, leveraging spiking neural networks (SNNs), mimics biological neural systems by encoding and transmitting information through spikes, thus offering substantial efficiency gains for sequential and event-driven data processing. However, the increasing complexity and depth of SNNs required for large-scale tasks pose significant energy and memory demands for mobile and edge devices.

Split computing, where the computational workload is distributed across multiple devices, offers a promising solution to mitigate these demands. Recent advances indicate that embedding multi-level, or graded, spikes, can significantly enhance the inference accuracy of SNNs without substantially increasing energy consumption [1-2]. However, this introduces a trade-off between the benefits provided by enriched spike information and the limited communication resources required for transmitting extra bits between devices.

This work

As shown in the figure below, we consider a neuromorphic wireless split computing architecture in which an SNN, using multi-level leaky integrate-and-fire (M-LIF) neurons, is partitioned into encoding and decoding layers across two separate devices connected via a wireless channel. Specifically, the initial part of the SNN, deployed at the transmitter side, processes sequential data from neuromorphic sensors (such as dynamic vision sensors) and transmits the resulting multi-level spike outputs over the air to the second part deployed at the receiver side for inference.

Our approach encodes event-driven spiking signals, which are inherently sparse and irregular. The OFDM transmission scheme leverages this sparsity differently in analog and digital modulation methods. For analog transmission, each data subcarrier within the OFDM system is dedicated to a specific output neuron, directly modulating the neuron’s multi-level spikes. Subcarriers remain idle whenever their assigned neurons do not emit spikes, effectively harnessing the signal sparsity. In digital modulation, sparsity is utilized by encoding all generated spikes when sufficient bandwidth is available. However, if spike density increases beyond the bandwidth constraints, some spikes must be dropped to ensure that transmission remains within the available OFDM symbols.

Experiments

Our experiments evaluate the benefits of using multi-level spikes over conventional binary spikes and showcase the practical viability of our proposed architecture. The first figure below illustrates the accuracy improvements achieved by employing multi-level spikes compared to conventional SNNs, particularly within limited sensing periods. Additionally, as shown in the second figure below, we provide details of our experimental setup using software-defined radios (SDRs), demonstrating the real-world applicability of our methods. Further results, including detailed energy consumption analysis and additional simulations, are available in the full paper [3].

References

[1] S. B. Shrestha, J. Timcheck, P. Frady, L. Campos-Macias, and M. Davies, “Efficient video and audio processing with Loihi 2,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 13 481–13 485, 2024.

[2] B. H. Theilman, Q. Zhang, A. Kahana, E. C. Cyr, N. Trask, J. B. Aimone, and G. E. Karniadakis, “Spiking physics-informed neural networks on Loihi 2,” in Proc. IEEE Neuro Inspired Computational Elements Conference (NICE), pp. 1–6, 2024.

[3] D. Wu, J. Chen, B. Rajendran, H. V. Poor, and O. Simeone, “Neuromorphic wireless split computing with multi-level spikes,” IEEE Transactions on Machine Learning in Communications and Networking, 2025.

Quantile Learn-Then-Test: Quantile-Based Risk Control for Hyperparameter Optimization

11/12/2024 / Amirmohammad Farzaneh / 0 Comments

Motivation

Hyperparameter optimization (HPO) is essential in tuning artificial intelligence (AI) models for practical engineering applications, as it governs model performance across varied deployment scenarios. Conventional HPO techniques such as random search and Bayesian optimization often focus on optimizing average performance without providing statistical guarantees, which can be limiting in high-stakes engineering tasks where system reliability is crucial. The learn-then-test (LTT) method [1], introduced in recent research, offers statistical guarantees on the average risk associated with selected hyperparameters. However, in fields like wireless networks and real-time systems, designers frequently need assurance that a specified quantile of performance will meet reliability thresholds.

To address this need, our proposed method, Quantile Learn-Then-Test (QLTT), extends LTT to offer statistical guarantees on quantiles of risk rather than just the average. This quantile-based approach provides greater robustness in real-world applications where it’s critical to control risk-aware objectives, ensuring that the system meets performance goals in a specified fraction of scenarios.

Quantile Learn-Then-Test (QLTT)

LTT, as introduced in [1], guarantees that the average risk remains within a defined threshold with high probability. However, many real-world applications require tighter control over performance measures. For instance, in cellular network scheduling, system designers may need to ensure that key performance indicators (KPIs) like latency and throughput stay within acceptable limits for a majority of users, not just on average.

Our approach, QLTT, extends LTT to provide guarantees on any specified quantile of risk. Specifically, QLTT selects hyperparameters that ensure a predefined quantile of the risk distribution meets a target threshold. This probabilistic guarantee, based on quantile risk control, better aligns with the needs of applications where performance variability is critical.

Methodology

QLTT builds on LTT’s multiple hypothesis testing framework, incorporating a quantile-specific confidence interval, obtained using [2], to achieve guarantees on the desired quantile of risk. The method takes a set of hyperparameter candidates and identifies those that meet the desired quantile threshold with high probability, enhancing reliability beyond what is possible through average risk control alone. This quantile-based approach enables QLTT to adapt to varying risk tolerance levels, making it versatile for different engineering contexts.

Experiments

To demonstrate QLTT’s effectiveness, we applied it to a radio access scheduling problem in wireless communication [3]. Here, the task was to allocate limited resources among users with different quality of service (QoS) requirements, ensuring that latency requirements were met for the vast majority of users in real-time.

Our experimental results highlight QLTT’s advantage over LTT with respect to quantile control. While both methods controlled the average risk effectively, only QLTT managed to limit the higher quantiles of the risk distribution, reducing instances where latency exceeded critical thresholds.

The following figure compares the distributions of packet delays for conventional LTT and QLTT for a test run of the simulation. While LTT shows considerable variance, with some instances exceeding the desired threshold, QLTT consistently meets the reliability requirements by providing tighter control over risk quantiles.

Conclusion

QLTT extends the applicability of LTT by providing hyperparameter sets with guarantees on quantiles of a risk measure, thus offering a more rigorous approach to HPO for risk-sensitive engineering applications. Our experiments confirm that QLTT effectively addresses scenarios where quantile risk control is required, providing a robust solution to ensure high-confidence performance across diverse conditions.

Future work may explore expanding QLTT to more complex settings, such as other types of risk functionals and broader engineering challenges. By advancing risk-aware HPO, QLTT represents a significant step toward reliable, application-oriented AI optimization in critical industries.

References

[1] Angelopoulos, A.N., Bates, S., Candès, E.J., Jordan, M.I., & Lei, L. (2021). Learn then test: Calibrating predictive algorithms to achieve risk control. arXiv preprint arXiv:2110.01052.

[2] Howard, S.R., & Ramdas, A. (2022). Sequential estimation of quantiles with applications to A/B testing and best-arm identification. Bernoulli, 28(3), 1704–1728.

[3] De Sant Ana, P.M., & Marchenko, N. (2020). Radio Access Scheduling using CMA-ES for Optimized QoS in Wireless Networks. IEEE Globecom Workshops (GC Wkshps), pp. 1-6.

Statistically Valid Information Bottleneck via Multiple Hypothesis Testing

10/31/2024 / Amirmohammad Farzaneh / 0 Comments

Motivation

In machine learning, the information bottleneck (IB) problem [1] is a critical framework used to extract compressed features that retain sufficient information for downstream tasks. However, a major challenge lies in selecting hyperparameters that ensure the learned features comply with information-theoretic constraints. Current methods rely on heuristic tuning without providing guarantees that the chosen features satisfy these constraints. This lack of rigor can lead to suboptimal models. For example, in the context of language model distillation, failing to enforce these constraints may result in the distilled model losing important information from the teacher model.

Our proposed method, “IB via Multiple Hypothesis Testing” (IB-MHT), addresses this issue by introducing a statistically valid solution to the IB problem. We ensure that the features learned by any IB solver meet the IB constraints with high probability, regardless of the dataset size. IB-MHT builds on Pareto testing [2] and learn-then-test (LTT) [3] methods to wrap around existing IB solvers, providing statistical guarantees on the information bottleneck constraints. This approach offers robustness and reliability compared to conventional methods that may not meet these constraints in practice.

IB-MHT

In the traditional IB framework, we aim to minimize the mutual information between the input data X and a compressed representation T, while ensuring that T retains sufficient information about a target variable Y. This is expressed mathematically as minimizing I(X;T) under the constraint that I(T;Y) exceeds a certain threshold. In practice, though, solving this problem often relies on tuning a Lagrange multiplier or hyperparameters to balance the compression of T and the information retained about Y. These approaches do not guarantee that the solution will meet the required information-theoretic constraints.

To overcome this, IB-MHT introduces a probabilistic approach where we wrap around any existing IB solver to ensure that the learned features satisfy the IB constraint with high probability. By leveraging Pareto testing, IB-MHT identifies the optimal hyperparameters through a family-wise error rate (FWER) testing mechanism, ensuring that the final solution is statistically sound.

Experiments

To validate the effectiveness of IB-MHT, we conducted experiments on both classical and deterministic IB [4] formulations. One experiment was performed on the MNIST dataset, where we applied IB-MHT to ensure that the learned representations met the IB constraints with high probability. In another experiment, we applied IB-MHT to the task of distilling language models, transferring knowledge from a large teacher model to smaller student model. We demonstrated that IB-MHT successfully guarantees that the compressed features retain sufficient information about the target variable. Compared to conventional IB methods, IB-MHT showed significant improvements in both the reliability and consistency of the learned representations, with reduced variability in the mutual information estimates.

The following figure illustrates the difference between the performance of conventional IB solvers and IB-MHT in a classical IB setup. While the conventional solver shows a wide variance in the mutual information values, IB-MHT provides tighter control, ensuring that the learned representation T meets the desired information-theoretic constraints.

Conclusion

IB-MHT introduces a reliable, statistically valid solution to the IB problem, addressing the limitations of heuristic hyperparameter tuning in existing methods. By guaranteeing that the learned features meet the required information-theoretic constraints with high probability, IB-MHT enhances the robustness and performance of IB solvers across a range of applications. Future work can explore extending IB-MHT to continuous variables and applying similar techniques to other information-theoretic objectives such as convex divergences.

References

[1] Naftali Tishby, Fernando Pereira, and William Bialek. The information bottleneck method. Proceedings of the 37th Allerton Conference on Communication, Control, and Computing, 2001.

[2] Laufer-Goldshtein, Ben, Ariel Fisch, Regina Barzilay, and Tommi Jaakkola. Efficiently controlling multiple risks with Pareto testing. International Conference on Learning Representations, 2023.

[3] Angelopoulos, Anastasios N., Stephen Bates, Emmanuel J. Candès, Michael I. Jordan, and Lucas Lei. Learn then test: Calibrating predictive algorithms to achieve risk control. arXiv preprint arXiv:2110.01052, 2021.

[4] Strouse, Daniel, and David Schwab. The deterministic information bottleneck. Neural Computation, 2017.

Neuromorphic Wireless Split Computing with Wake-Up Radios

10/14/2024 / Jiechen Chen / 0 Comments

Context and Motivations

Neuromorphic processing units (NPUs), such as Intel’s Loihi or BrainChip’s Akida, leverage the sparsity of temporal data to reduce processing energy by activating a small subset of neurons and synapses at each time step. When deployed for split computing in edge-based systems, remote NPUs, each carrying out part of the computation, can reduce the communication power budget by communicating asynchronously using sparse impulse radio (IR) waveforms [1-2], a form of ultra-wide bandwidth (UWB) spread-spectrum signaling.

However, the power savings afforded by sparse transmitted signals are limited to the transmitter’s side, which can transmit impulsive waveforms only at the times of synaptic activations. The main contributor to the overall energy consumption remains the power required to maintain the main radio on.

Architecture

To address this architectural problem, as seen in the figure above, our recent work [3-4] proposes a novel architecture that integrates a wake-up radio mechanism within a split computing system consisting of remote, wirelessly connected, NPUs. In the proposed architecture, the NPU at the transmitter side remains idle until a signal of interest is detected by the signal detection module. Subsequently, a wake-up signal (WUS) is transmitted by the wake-up transmitter over the channel to the wake-up receiver, which activates the main receiver. The IR transmitter modulates the encoded signals from the NPU, and sends them to the main receiver. The NPU at the receiver side then decodes the received signals and make an inference decision.

Digital twin-aided design methodology with reliability guarantee

A key challenge in the design of a wake-up radios is the selection of thresholds for sensing and WUS detection, and decision making (three λ’s in the figure above). A conventional solution would be to calibrate the thresholds via on-air testing, trying out different thresholds via testing on the actual physical system. On-air calibration would be expensive in terms of spectral resources, and there is generally no guarantee that the selected thresholds would provide desirable performance levels for the end application.

To address this design problem, as illustrated in the figure below, this work proposes a novel methodology, dubbed DT-LTT, that leverages the use of a digital twin, i.e., a simulator, of the physical system, coupled with a sequential statistical testing approach that provides theoretical reliability guarantees. Specifically, the digital twin is leveraged to pre-select a sequence of hyperparameters to be tested using on-air calibration via Learn then Test (LTT) [5]. The proposed DT-LTT calibration procedure is proved to guarantee reliability of the receiver’s decisions irrespective of the fidelity of digital twin and of the data distribution.

Experiment

We compare the proposed DT-LTT calibration method with conventional neuromorphic wireless communications without wake-up radio, conventional LTT without a digital twin, and DT-LTT with an always-on main radio system. As shown in the figure below, the conventional calibration scheme fails to meet the reliability requirement, while the basic LTT scheme selects conservative hyperparameters, often including all classes in the predicted set, which results in zero expected loss. In contrast, the proposed DT-LTT schemes are guaranteed to meet the probabilistic reliability requirement.

References

[1] J. Chen, N. Skatchkovsky and O. Simeone, “Neuromorphic Wireless Cognition: Event-Driven Semantic Communications for Remote Inference,” in IEEE Transactions on Cognitive Communications and Networking, vol. 9, no. 2, pp. 252-265, April 2023,.

[2] J. Chen, N. Skatchkovsky and O. Simeone, “Neuromorphic Integrated Sensing and Communications,” in IEEE Wireless Communications Letters, vol. 12, no. 3, pp. 476-480, March 2023.

[3] J. Chen, S. Park, P. Popovski, H. V. Poor and O. Simeone, “Neuromorphic Split Computing with Wake-Up Radios: Architecture and Design via Digital Twinning,” in IEEE Transactions on Signal Processing, Early Access, 2024.

[4] J. Chen, S. Park, P. Popovski, H. V. Poor and O. Simeone, “Neuromorphic Semantic Communications with Wake-Up Radios,” Proc. IEEE 25th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lucca, Italy, pp. 91-95, 2024.

[5] Angelopoulos, Anastasios N., et al. “Learn then test: Calibrating predictive algorithms to achieve risk control,” arXiv preprint arXiv:2110.01052, 2021.

Localized Adaptive Risk Control

10/10/2024 / Matteo Zecchin / 0 Comments

Motivation

In many online decision-making settings, ensuring that predictions are well-calibrated is crucial for the safe operation of systems. One way to achieve calibration is through adaptive risk control, which adjusts the uncertainty estimates of a machine learning model based on past feedback [1]. This method guarantees that the calibration error over an arbitrary sequence is controlled and that, in the long run, the model becomes statistically well-calibrated if the data points are independently and identically distributed [2]. However, these schemes only ensure calibration when averaged across the entire input space, raising concerns about fairness and robustness. For instance, consider the figure below, which depicts a tumor segmentation model calibrated to identify potentially cancerous areas. If the model is calibrated using images from different datasets, marginal calibration may be achieved by prioritizing certain subpopulations at the expense of others.

A tumor segmentation model is calibrated using data from two sources to ensure that the marginal false negative rate (FNR) is controlled. However, as shown on the right, the error rate for one source is significantly lower than for the other, leading to unfair performance across subpopulations.

Localized Adaptive Risk Control

To address this issue, our recent work at NeurIPS 2024 proposes a method to localize uncertainty estimates by leveraging the connection between online learning in reproducing kernel Hilbert spaces [3] and online calibration methods. The key idea behind our approach is to use feedback to adjust a model’s confidence levels only in regions of the input space that are near observed data points. This allows for localized calibration, tailoring uncertainty estimates to specific areas of the input space. We demonstrate that, for adversarial sequences, the number of mistakes can be controlled. More importantly, the scheme provides asymptotic guarantees that are localized, meaning they remain valid under a wide range of covariate shifts, for instance those induced by considering certain subpopulation of the data.

Experiments

Comparison between the coverage map obtained using adaptive risk control (on the left) and localized adaptive risk control (on the right). Adaptive risk control is unable to deliver uniform coverage across the deployment areas, leading to large regions where the SNR level is unsatisfactory. In contrast, localized adaptive risk control is capable of guaranteeing a more uniform SNR level, improving the overall system coverage.

To demonstrate the fairness improvements of our algorithm, we conducted a series of experiments using standard machine learning benchmarks as well as wireless communication problems. Specifically, in the wireless domain, we considered the problem of beam selection based on contextual information. Here, a base station must select a subset of communication beam vectors to guarantee a level of signal-to-noise ratio (SNR) across a deployment area. Standard calibration methods like adaptive risk control (on the left) result in substantial SNR variation across the area, creating regions where communication is impossible. In contrast, our localized adaptive risk control scheme (on the right) enables the base station to calibrate the beam selection algorithm to match the local uncertainty, providing more uniform coverage throughout the deployment area.

References

[1] Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34 (2021).

[2] Anastasios Nikolas Angelopoulos, Rina Barber, Stephen Bates. Online conformal prediction with decaying step sizes. Proceedings of the 41st International Conference on Machine Learning. (2024).

[3] Jyrki Kivinen, Alex Smola and Robert C. Williamson. Online Learning with Kernels. Advances in Neural Information Processing Systems, 14 (2001)

Cross-Validation Conformal Risk Control

06/11/2024 / Kfir Cohen / 0 Comments

Motivation

Conformal risk control (CRC) [1] [2] is a recently proposed technique that applies post-hoc to a conventional point predictor to provide calibration guarantees. Generalizing conformal prediction (CP) [3], with CRC, calibration is ensured for a set predictor that is extracted from the point predictor to control a risk function such as the probability of miscoverage or the false negative rate. The original CRC requires the available data set to be split between training and validation data sets. This can be problematic when data availability is limited, resulting in inefficient set predictors. In [4], a novel CRC method is introduced that is based on cross-validation, rather than on validation as the original CRC. The proposed cross-validation CRC (CV-CRC) allows for the control of a broader range of risk functions, while proved to offer theoretical guarantees on the average risk of the set predictor, and reduced average set size with respect to CRC when the available data are limited.

Cross-Validation Conformal Risk Control

The objective of CRC is to design a set predictor with a mean risk no larger than a predefined level α, i.e.,

with test data input-label pair (x,y), and a set of N data pairs D.

The risk is defined between the true label y and a predictive set Γ of labels.

VB-CRC generalizes VB-CP [2] in the sense it allows the risk taking arbitrary form under technical conditions such as boundness and monotonicity in the set. VB-CP is resorted when VB-CRC considers the special case of the miscoverage risk

In this work, we introduce CV-CRC, which is a cross-validation-based version of VB-CRC. In a similar manner how CV-CP [5] generalizes VB-CP, CV-CRC generalizes VB-CRC. See Fig. 1 for illustration.

Fig. 1. (top) validation-based CRC (bottom) the proposed method, CV-CRC.

In the top panel of Fig. 2, VB-CRC is shown as the outcome of available data split into training data and validation data. The former is used to train a model, while the latter is used to post process and control a threshold λ. Upon test input x, a predictive set Γ of labels y’s is formed. In the bottom panel, CV-CRC is illustrated as a generalization. Available data is split K≤N folds, and K leave-fold-out models are trained. Then, K predictive sets are formed and merged via a threshold that is set via the trained models and the left-fold-out data.

Fig. 2. (top) validation-based CRC (bottom) the proposed method, CV-CRC.

Experiments

To illustrate the main theorem that the risk guarantee (1) is met, while the average set sizes are expected to reduce, two experiments were conducted. The first is vector regression using maximum-likelihood learning, and is shown in Fig. 3.

Fig. 3. VB-CRC and CV-CRC for the vector regression problem.

The second problem is a temporal point process prediction, where a point process set predictor aims to predict sets that contain future events of a temporal process with false negative rate of no more than a predefined α. As can be seen, in both problems, CV-CRC is shown to be more data-efficient in the small data regime, while holding the risk condition (1).

Fig. 4. VB-CRC and CV-CRC for the temporal point process prediction problem.

Full details can be found at ISIT preprint [4].

References

[1] A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster, “Conformal Risk Control,” in The Twelfth International Conference on Learning Representations, 2024.

[2] S. Feldman, L. Ringel, S. Bates, and Y. Romano, “Achieving Risk Control in Online Learning Settings,” Transactions on Machine Learning Research, 2023.

[3] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic Learning in a Random World. Springer, 2005, springer, New York.

[4] K. M. Cohen, S. Park, O. Simeone, and S. Shamai Shitz, “Cross-Validation Conformal Risk Control,” accepted to IEEE International Symposium on Information Theory Proceedings (ISIT2024), July 2024.

[5] R. F. Barber, E. J. Candes, A. Ramdas, and R. J. Tibshirani, “Predictive Inference with the Jackknife+,” The Annals of Statistics, vol. 49, no. 1, pp. 486–507, 2021.

04/19/2024 / Clement Ruah / 0 Comments

Empowering Wireless Digital Twins with Ray Tracing Simulations

At the crossroad between simulation and machine learning, digital twin systems are envisioned to bridge the theoretical guarantees of model-based approaches with the flexibility of data-driven methods. However, one major concern is whether insights drawn from the simulation can still apply to the real world. Embodying both the opportunities and challenges of simulation intelligence, we believe that ray tracing will drive the understanding of signal propagation in the next generation of wireless digital twins, while relying on machine learning to cope with the diversity of real-world materials and inaccuracies in the available geometry.

PhD student in Machine Learning, Clement Ruah, has built a simulation of the Strand Campus at King’s to test how Wi-Fi signals are reflected between buildings – giving us insights on how to improve connectivity.📶 pic.twitter.com/BCIVrQervd

— KCL Engineering (@kcl_engineering) April 18, 2024

How to Turn an Unreliable Predictor into a Reliable Scheduler

04/05/2023 / Kfir Cohen / 0 Comments

Motivation

Servicing ultra-reliable and low-latency communication (URLLC) traffic typically calls for a pre-emptive allocation of resources in order to meet stringent delay constraints. A conservative static allocation of resources for URLLC may guarantee desired levels of reliability and latency, but this comes at the expense of other services, most notably enhanced mobile broadband (eMBB), which cannot use the resources reserved for URLLC. A dynamic allocation of resources, while potentially more efficient, is made challenging by the stochastic nature of URLLC data packet generation. A promising solution is the adoption of predictors of URLLC data packet generation. Concretely, with reference to Fig. 1, a base station can deploy a predictor of URLLC data packet generation for the following frame, so as to guide the adaptive allocation of slots for URLLC packets, leaving the other slots available for eMBB users.

Background

URLLC traffic

A URLLC traffic must hold two restrictions:

Ultra-Reliability – a portion of at least 1-α of all generated packets must be scheduled for transmission.
Low-Latency – Each packet should have a unique schedule resource, no later than a predefined acceptable latency.

Fig 1 (a) URLLC traffic ground true generation patterns; (b) using a predictor that underestimates the traffic dynamics leads to unreliable URLLC allocation. CP-based is able to compensate successfully; (c) using a predictor that overestimates the traffic dynamics leads to overreliable URLLC allocation, i.e., low eMBB efficiency. CP-based is able to compensate successfully this as well.

Online Conformal Prediction

CP is a class of post-hoc calibration methods that transform standard probabilistic model into a set predictor that is guaranteed to contain the true target with probability no smaller than a predetermined coverage level [1]. Online CP alleviates the limitation of conventional CP of requiring a separate calibration data at the cost of providing time-averaged, rather than ensemble, reliability guarantees [2,3]. The adoption of CP in communication engineering was proposed in [Cohen2023ICASSP], which focused on wireless applications such as symbol demodulation, modulation classification, and received signal strength prediction.

Guaranteed Dynamic Scheduling

In our new work [4], accepted at IEEE Signal Processing Letters, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor.

Fig. 2(a) illustrates the frame-based segmentation. Fig. 2(b) shows 4 URLLC generated packets and 6 pre-emptively allocated URLLC resources, yet the latest packet is not allocated a resource within the allowed latency. In contrast, Fig. 2(c) show an allocation that meets the constraints, even though the number of URLLC resources are smaller. This leaves a better portion for eMBB traffic.

Fig. 2 (a) Frame-based timelinel; (b) miscovered allocation; (c) well-covered allocation.

The proposed method leverages recent advances in online CP, and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer. To this end, we adjust a threshold that changes between frames on the basis of a reliability condition, that controls how conservative the predictor of the next frame is.

Experiments

We consider two mismatched predictors: the first underestimates the dynamic of changes the URLLC traffic, while the second overestimates.

Fig. 3 investigates of the impact of such mismatches between URLLC model parameter and ground-truth model parameter. For some parameters values of mismatch, the conventional scheduler does not hold reliability to the desired level, while for the other it may result in over reliability. The conventional scheduler is significantly affected by a mismatch between predictor and ground-truth packet generation mechanism, yielding either ill empirical coverage (below 1-α) or over coverage. In contrast, the CP-based predictor is able to flatten the coverage to asymptotically reach the long-term target 1-α.

Fig. 3 Empirical URLLC reliability rate and eMBB efficiency. CP-based scheduler flattens out the coverage

Full details can be found at this SPL preprint [4].

[1] Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. “Algorithmic learning in a random world,” Vol. 29. New York: Springer, 2005.

[2] Gibbs, Isaac, and Emmanuel Candes. “Adaptive conformal inference under distribution shift.” Advances in Neural Information Processing Systems 34 (2021): 1660-1672.

[3] Feldman, Shai, Stephen Bates, and Yaniv Romano. “Conformalized Online Learning: Online Calibration Without a Holdout Set.” arXiv preprint arXiv:2205.09095 (2022).

[4] Cohen, Kfir M., Sangwoo Park, Osvaldo Simeone, Petar Popovski, and Shlomo Shamai. “Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction.” To appear in Signal Processing Letters, [online] arXiv preprint arXiv:2302.07675 (2023).

Motivation

Theoretical Analysis on MIAs

Experiments

References

Motivation

E-process-based testing

Automated Prompt Engineering

References

Context and Motivations

Architecture

Digital twin-aided design methodology with reliability guarantee

Experiment

References

Motivation

Localized Adaptive Risk Control

Experiments

References

Motivation

Cross-Validation Conformal Risk Control

Experiments

References

Empowering Wireless Digital Twins with Ray Tracing Simulations

Motivation

Background

URLLC traffic

Online Conformal Prediction

Guaranteed Dynamic Scheduling

Experiments

Recent Posts

Recent Posts

Recent Comments

Archives

Categories

Meta