{"id":660,"date":"2022-11-08T15:07:38","date_gmt":"2022-11-08T15:07:38","guid":{"rendered":"https:\/\/blogs.kcl.ac.uk\/kclip\/?p=660"},"modified":"2022-11-08T15:07:38","modified_gmt":"2022-11-08T15:07:38","slug":"is-accuracy-sufficient-for-ai-in-6g-no-calibration-is-equally-important","status":"publish","type":"post","link":"https:\/\/blogs.kcl.ac.uk\/kclip\/2022\/11\/08\/is-accuracy-sufficient-for-ai-in-6g-no-calibration-is-equally-important\/","title":{"rendered":"Is Accuracy Sufficient for AI in 6G? (No, Calibration is Equally Important)"},"content":{"rendered":"<p>AI modules are being considered as native components of future wireless communication systems that can be fine-tuned to meet the requirements of specific deployments [1]. While conventional training solutions target the accuracy as the only design criterion, the pursuit of \u201cperfect accuracy\u201d is generally neither a feasible nor a desirable goal. In Alan Turing\u2019s words, \u201cif a machine is expected to be infallible, it cannot also be intelligent\u201d. Rather than seeking an optimized accuracy level, a well-designed AI should be able to quantify its uncertainty: It should \u201cknow when it knows\u201d, offering high confidence for decisions that are likely to be correct, and it should \u201cknow when it does not know\u201d, providing a low confidence level for decisions are that are unlikely to be correct. An AI module that can provide reliable measures of uncertainty is said to be well-calibrated.<\/p>\n<p>Importantly, accuracy and calibration are two distinct criteria. As an example, Fig. 1 illustrates \u00a0a QPSK demodulator trained using limited number of pilots. Depending on the input, the trained probabilistic model may result in either accurate or inaccurate demodulation decisions, whose uncertainty is either correctly or incorrectly characterized.<\/p>\n<div id=\"attachment_653\" style=\"width: 1259px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-653\" class=\"wp-image-653 size-full\" src=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig1.jpg\" alt=\"\" width=\"1249\" height=\"607\" srcset=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig1.jpg 1249w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig1-300x146.jpg 300w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig1-1024x498.jpg 1024w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig1-768x373.jpg 768w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig1-676x329.jpg 676w\" sizes=\"auto, (max-width: 1249px) 100vw, 1249px\" \/><p id=\"caption-attachment-653\" class=\"wp-caption-text\">Fig. 1. The hard decision regions of an optimal demodulator (dashed lines) and of a data-driven demodulator trained on few pilots (solid lines) are displayed in panel (a), while the corresponding probabilistic predictions for some outputs are shown in panel (b).<\/p><\/div>\n<p>&nbsp;<\/p>\n<p>The property of \u201cknowing what the AI knows\/ does not know\u201d is very useful when the AI module is used as part of a larger engineering system. In fact, well-calibrated decisions should be treated differently depending on their confidence level. Furthermore, well-calibrated models enable monitoring \u2013 by tracking the confidence of the decisions made by an AI \u2013 and other functionalities, such as anomaly detection [2].<\/p>\n<p>In a recent paper from our group published on the IEEE Transaction on Signal Processing [3], we proposed a methodology to develop well-calibrated and efficient AI modules that are capable of fast adaptation. The methodology builds on Bayesian meta-learning.<\/p>\n<p>To start, we summarize the main techniques under consideration.<\/p>\n<ol>\n<li><strong>Conventional, frequentist, learning<\/strong> ignores epistemic uncertainty \u2013 uncertainty caused by limited data \u2013 and tends to be overconfident in the presence of limited training samples.<\/li>\n<li><strong>Bayesian learning <\/strong>captures epistemic uncertainty by optimizing a distribution in the model parameter space, rather than finding a single deterministic value as in frequentist learning. By obtaining decisions via ensembling, Bayesian predictors can account for the \u201copinions\u201d of multiple models, hence providing more reliable decisions. Note that this approach is routinely used to quantify uncertainty in established fields like weather prediction [4].<\/li>\n<li><strong>Frequentist meta-learning<\/strong> [5], also known as learning to learn, optimizes a shared training strategy across multiple tasks, so that it can easily adapt to new tasks. This is done by transferring knowledge from different learning tasks. As a communication system example, see Fig. 2 in which the demodulator adapts quickly with only few pilots for a new frame. While frequentist meta-learning is well-suited for adaptation purpose, its decisions tend to be overconfident, hence not improving monitoring in general.<\/li>\n<li><strong>Bayesian meta-learning<\/strong> [6,7] integrates meta-learning with Bayesian learning in order to facilitate adaptation to new tasks for Bayesian learning.<\/li>\n<li><strong>Bayesian active meta-learning<\/strong> [8] Active meta-learning can reduce the number of meta-training tasks. By considering streaming-fashion of availability of meta-training tasks, e.g., sequential supply of new frames from which we can online meta-learn the AI modules, we were able to effectively reduce the time required for satisfiable meta-learning via active meta-learning.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<div id=\"attachment_654\" style=\"width: 982px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-654\" class=\"size-full wp-image-654\" src=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig2.jpg\" alt=\"\" width=\"972\" height=\"1097\" srcset=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig2.jpg 972w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig2-266x300.jpg 266w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig2-907x1024.jpg 907w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig2-768x867.jpg 768w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig2-676x763.jpg 676w\" sizes=\"auto, (max-width: 972px) 100vw, 972px\" \/><p id=\"caption-attachment-654\" class=\"wp-caption-text\">Fig. 2. Through meta-learning, a learner (e.g., demodulator) can be adapted quickly using few pilots to new environment, using hyperparameter vector optimized over related learning tasks (e.g., frames with different channel conditions).<\/p><\/div>\n<p>&nbsp;<\/p>\n<h1>Some Results<\/h1>\n<p>We first show the benefits of Bayesian meta-learning for monitoring purpose by examining the reliability of its decisions in terms of calibration. In Fig. 3, reliability diagrams for frequentist and Bayesian meta-learning are compared. For an ideal calibrated predictor, the accuracy level should match the self-reported confidence (dashed line in the plots). In can be easily checked that AI modules designed by Bayesian meta-learning (right part) are more reliable than the ones with Frequentist meta-learning (left part), validating the suitability of Bayesian meta-learning for monitoring purpose. Experimental results are obtained by considering a demodulation problem.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><div id=\"attachment_655\" style=\"width: 1295px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-655\" class=\"size-full wp-image-655\" src=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig3.jpg\" alt=\"\" width=\"1285\" height=\"841\" srcset=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig3.jpg 1285w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig3-300x196.jpg 300w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig3-1024x670.jpg 1024w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig3-768x503.jpg 768w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig3-676x442.jpg 676w\" sizes=\"auto, (max-width: 1285px) 100vw, 1285px\" \/><p id=\"caption-attachment-655\" class=\"wp-caption-text\">Fig. 3. Bayesian meta-learning (right) yields reliable decisions as compared to frequentist meta-learning (left) which can be captured via reliability diagrams [9].<\/p><\/div>Fig. 4 demonstrates the impact of Bayesian active meta-learning that successfully reduces the number of required meta-training tasks. The results are obtained by considering an equalization problem.<\/p>\n<div id=\"attachment_656\" style=\"width: 2174px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-656\" class=\"size-full wp-image-656\" src=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4.jpg\" alt=\"\" width=\"2164\" height=\"788\" srcset=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4.jpg 2164w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4-300x109.jpg 300w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4-1024x373.jpg 1024w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4-768x280.jpg 768w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4-1536x559.jpg 1536w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4-2048x746.jpg 2048w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/11\/Fig4-676x246.jpg 676w\" sizes=\"auto, (max-width: 2164px) 100vw, 2164px\" \/><p id=\"caption-attachment-656\" class=\"wp-caption-text\">Fig. 4. Bayesian active meta-learning actively searches for meta-training tasks that are most surprising (left), hence increasing the task efficiency as compared to Bayesian meta-learning which randomly chooses tasks to be meta-trained.<\/p><\/div>\n<p>&nbsp;<\/p>\n<h1>References<\/h1>\n<p>[1] O-RAN Alliance, \u201cO-RAN Working Group 2 AI\/ML Workflow Description and Requirements,\u201d ORAN-WG2. AIML. v01.02.02, vol. 1, 2.<\/p>\n<p>[2] C. Ruah, O. Simeone, and B. Al-Hashimi, \u201cDigital Twin-Based Multiple Access Optimization and Monitoring via Model-Driven Bayesian Learning,\u201d\u00a0<em>arXiv preprint arXiv:2210.05582<\/em>.<\/p>\n<p>[3] K.M. Cohen, S. Park, O. Simeone and S. Shamai, &#8220;Learning to Learn to Demodulate with Uncertainty Quantification via Bayesian Meta-Learning,&#8221; <i>arXiv <\/i><a href=\"https:\/\/arxiv.org\/abs\/2108.00785\">https:\/\/arxiv.org\/abs\/2108.00785<\/a><\/p>\n<p>[4] T. Palmer, \u201cThe Primacy of Doubt: From Climate Change to Quantum Physics, How the Science of Uncertainty Can Help Predict and Understand Our Chaotic World,\u201d Oxford University Press, 2022.<\/p>\n<p>[5] C. Finn, P. Abbeel, and S. Levine, \u201cModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,\u201d in Proceedings of the 34th International Conference on Machine Learning, vol. 70. PMLR, 06\u201311 Aug 2017, pp. 1126\u20131135.<\/p>\n<p>[6] J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn, \u201cBayesian Model-Agnostic Meta-Learning,\u201d Proc. Advances in neural information processing systems (NIPS), in Montreal, Canada, vol. 31, 2018.<\/p>\n<p>[7] C. Nguyen, T.-T. Do, and G. Carneiro, \u201cUncertainty in Model-Agnostic Meta-Learning using Variational Inference,\u201d in Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 3090\u20133100.<\/p>\n<p>[8] J. Kaddour, S. S\u00e6mundsson et al., \u201cProbabilistic Active Meta-Learning,\u201d Proc. Advances in Neural Information Processing Systems (NIPS) as Virtual-only Conference, vol. 33, pp. 20 813\u201320 822, 2020.<\/p>\n<p>[9] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, \u201cOn Calibration of Modern Neural Networks,\u201d in International Conference on Machine Learning. PMLR, 2017, pp. 1321\u20131330.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI modules are being considered as native components of future wireless communication systems that can be fine-tuned to meet the requirements of specific deployments [1]. While conventional training solutions target the accuracy as the only design criterion, the pursuit of \u201cperfect accuracy\u201d is generally neither a feasible nor a desirable goal. In Alan Turing\u2019s words, [&hellip;]<\/p>\n","protected":false},"author":1227,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27,10],"tags":[],"class_list":["post-660","post","type-post","status-publish","format-standard","hentry","category-bayesian-learning","category-meta-learning","post-preview"],"_links":{"self":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts\/660","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/users\/1227"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/comments?post=660"}],"version-history":[{"count":3,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts\/660\/revisions"}],"predecessor-version":[{"id":663,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts\/660\/revisions\/663"}],"wp:attachment":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/media?parent=660"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/categories?post=660"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/tags?post=660"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}