{"id":677,"date":"2022-12-08T02:22:10","date_gmt":"2022-12-08T02:22:10","guid":{"rendered":"https:\/\/blogs.kcl.ac.uk\/kclip\/?p=677"},"modified":"2023-05-11T22:00:52","modified_gmt":"2023-05-11T22:00:52","slug":"learning-to-learn-how-to-calibrate","status":"publish","type":"post","link":"https:\/\/blogs.kcl.ac.uk\/kclip\/2022\/12\/08\/learning-to-learn-how-to-calibrate\/","title":{"rendered":"Learning to Learn How to Calibrate"},"content":{"rendered":"<p>As discussed in our previous post \u2018<a href=\"https:\/\/blogs.kcl.ac.uk\/kclip\/2022\/11\/08\/is-accuracy-sufficient-for-ai-in-6g-no-calibration-is-equally-important\/\">Is Accuracy Sufficient for AI in 6G? (No, Calibration is Equally Important)<\/a>\u2019, <b>reliable AI<\/b> should be able to quantify its uncertainty, i.e., to \u201cknow when it knows\u201d and \u201cknow when it does not know\u201d. To obtain reliable, or well-calibrated, AI models, two types of approaches can be adopted: <i>(i)<\/i> training-based calibration, and <i>(ii) <\/i>post-hoc calibration. Training-based calibration modifies the training procedure by accounting for calibration performance, and includes methods such as Bayesian learning [1, 2], robust Bayesian learning [3, 4], and calibration-aware regularization [5]; while post-hoc calibration utilizes validation data to \u201crecalibrate\u201d a probabilistic model, as in temperature scaling [6], Platt scaling [7], and isotonic regression [8]. All these methods have no formal guarantees on calibration, either due to inevitable model misspecification [9], or due to overfitting to the validation set [10, 11]. In contrast, <b>conformal prediction (CP) <\/b>offers formal calibration guarantees, although calibration is defined in terms of set, rather than probabilistic, prediction<b> <\/b>[12].<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<div id=\"attachment_679\" style=\"width: 442px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-679\" class=\" wp-image-679\" src=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1.png\" alt=\"\" width=\"432\" height=\"308\" srcset=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1.png 2452w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1-300x214.png 300w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1-1024x730.png 1024w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1-768x547.png 768w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1-1536x1095.png 1536w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1-2048x1460.png 2048w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_1-676x482.png 676w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><p id=\"caption-attachment-679\" class=\"wp-caption-text\">Fig. 1. Improvements in calibration can be obtained by either (i) training-based calibration or (ii) post-hoc calibration. Only conformal prediction, a post-hoc calibration approach, provides formal guarantees on calibration via set prediction.<\/p><\/div>\n<p>A <b>well-calibrated set predictor<\/b> is the one that contains the true label with probability no smaller than a predetermined coverage level, say 90%. A set predictor obtained by conformal prediction is <i>provably<\/i> well calibrated, irrespective of the unknown underlying ground-truth distribution as long as the data examples are <i>exchangeable, <\/i>or <i>i.i.d.<\/i> (independent and identically distributed).<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>One could trivially build a well-calibrated set predictor by producing the entire label set as the predicted set. However, such set predictor would be completely uninformative, since the <b>size of the set predictor<\/b> determines how informative the set predictor is. While conformal prediction is always guaranteed to yield reliable set predictors, it may produce large predicted set size in the presence of limited data examples [13]. In our <a href=\"https:\/\/openreview.net\/pdf?id=S0ItikPStJy\">recent work<\/a>, presented at the <a href=\"https:\/\/meta-learn.github.io\/2022\/\">NeurIPS 2022 Workshop on Meta-Learning<\/a>, we have introduced a novel method that enhances the informativeness of CP-based set predictors via meta-learning.<\/p>\n<div id=\"attachment_689\" style=\"width: 2954px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-689\" class=\"size-full wp-image-689\" src=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev.png\" alt=\"\" width=\"2944\" height=\"1656\" srcset=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev.png 2944w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev-300x169.png 300w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev-1024x576.png 1024w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev-768x432.png 768w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev-1536x864.png 1536w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev-2048x1152.png 2048w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_2_rev-676x380.png 676w\" sizes=\"auto, (max-width: 2944px) 100vw, 2944px\" \/><p id=\"caption-attachment-689\" class=\"wp-caption-text\">Fig. 2. Meta-learning transfers knowledge from multiple tasks. In our recent <a href=\"https:\/\/openreview.net\/pdf?id=S0ItikPStJy\">paper<\/a>, we have proposed an application of meta-learning to conformal prediction with the aim of reducing the average prediction set size while preserving formal calibration guarantees.<\/p><\/div>\n<p><b>Meta-learning<\/b>, or learning to learn, transfers knowledge from multiple tasks to optimize the inductive bias (e.g., the model class) for new, related, tasks [14]. In our recent work, meta-learning was applied to cross-validation-based conformal prediction (XB-CP) [13] to achieve well-calibrated and informative set predictors. As demonstrated in the following figure, the proposed meta-learning approach for XB-CP, termed <b>meta-XB<\/b>, can reduce the average prediction set size as compared to conventional CP approaches (XB-CP and validation-based conformal prediction (VB-CP) [12]) and to previous work on meta-learning for VB-CP [14], while preserving the formal guarantees on reliability (the predetermined coverage level, 90%, is always satisfied for meta-XB).<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<div id=\"attachment_681\" style=\"width: 665px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-681\" class=\"wp-image-681\" src=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3.png\" alt=\"\" width=\"655\" height=\"234\" srcset=\"https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3.png 3096w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3-300x107.png 300w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3-1024x365.png 1024w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3-768x274.png 768w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3-1536x548.png 1536w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3-2048x730.png 2048w, https:\/\/blogs.kcl.ac.uk\/kclip\/files\/2022\/12\/meta_xb_3-676x241.png 676w\" sizes=\"auto, (max-width: 655px) 100vw, 655px\" \/><p id=\"caption-attachment-681\" class=\"wp-caption-text\">Fig. 3. Average prediction set size (left) and coverage (right) for new tasks as a function of number of meta-training tasks. As compared to conventional CP schemes (VB-CP and XB-CP), meta-learning based approaches (meta-VB and meta-XB) have smaller prediction set size; while the proposed meta-XB guarantees reliability for every task unlike meta-VB that satisfies coverage condition on average over multiple tasks.<\/p><\/div>\n<p>For more details including improvements in terms of input-conditional coverage via meta-learning with adaptive nonconformity scores [15], and further experimental results on image classification and communication engineering aspects, please refer to the <a href=\"https:\/\/arxiv.org\/abs\/2210.03067\">arXiv posting<\/a>.<\/p>\n<h4>References<\/h4>\n<p>[1] O. Simeone, <i>Machine learning for engineers<\/i>. Cambridge University Press, 2022<\/p>\n<p>[2] J. Knoblauch, et al, \u201cGeneralized variational inference: Three arguments for deriving new posteriors,\u201d <i>arXiv:1904.02063<\/i>, 2019<\/p>\n<p>[3] W. Morningstar, et al \u201cPACm-Bayes: Narrowing the empirical risk gap in the Misspecified Bayesian Regime,\u201d <i>NeurIPS <\/i>2021<\/p>\n<p>[4] M. Zecchin, et al, \u201cRobust PACm: Training ensemble models under model misspecification and outliers,\u201d <i>arXiv:2203.01859,<\/i> 2022<\/p>\n<p>[5] A. Kumar, et al, \u201cTrainable calibration measures for neural networks from kernel mean embeddings,\u201d <i>ICML <\/i>2018<\/p>\n<p>[6] C. Guo, et al, \u201cOn calibration of modern neural networks,\u201d <i>ICML<\/i> 2017<\/p>\n<p>[7] J. Platt, et al, \u201cProbabilistic outputs for support vector machines and comparisons to regularized likelihood method,\u201d<i> Advances in Large Margin Classifiers<\/i> 1999<\/p>\n<p>[8]<span class=\"Apple-converted-space\">\u00a0 <\/span>B. Zadrozny and C. Elkan \u201cTransforming classifier scores into accurate multiclass probability estimates,\u201d <i>KDD<\/i> 2022<\/p>\n<p>[9] A. Masegosa, \u201cLearning under model misspecification: Applications to variational and ensemble methods.\u201d<i> NeurIPS 2020<\/i><\/p>\n<p>[10] A. Kumar, et al, \u201cVerified Uncertainty Calibration,\u201d <i>NeurIPS<\/i> 2019<\/p>\n<p>[11] X. Ma and M. B. Blaschko, \u201cMeta-Cal: Well-controlled Post-hoc Calibration by Ranking,\u201d ICML 2021<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>[12]<span class=\"Apple-converted-space\">\u00a0 <\/span>V. Vovk, et al, \u201cAlgorithmic Learning in a Random World,\u201d <i>Springer<\/i> 2005<\/p>\n<p>[13] R. F. Barber, et al, \u201cPredictive inference with the jackknife+,\u201d <i>The Annals of Statistics,<\/i> 2021<\/p>\n<p>[14] Chen, Lisha, et al. &#8220;Learning with limited samples\u2014Meta-learning and applications to communication systems.&#8221; <i>arXiv preprint arXiv:2210.02515, <\/i>2022.<\/p>\n<p>[14] A. Fisch, et al, \u201cFew-shot conformal prediction with auxiliary tasks,\u201d <i>ICML <\/i>2021<\/p>\n<p>[15] Y. Romano, et al, \u201cClassification with valid and adaptive coverage,\u201d <i>NeurIPS<\/i> 2020<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As discussed in our previous post \u2018Is Accuracy Sufficient for AI in 6G? (No, Calibration is Equally Important)\u2019, reliable AI should be able to quantify its uncertainty, i.e., to \u201cknow when it knows\u201d and \u201cknow when it does not know\u201d. To obtain reliable, or well-calibrated, AI models, two types of approaches can be adopted: (i) [&hellip;]<\/p>\n","protected":false},"author":667,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28,10],"tags":[],"class_list":["post-677","post","type-post","status-publish","format-standard","hentry","category-conformal-prediction","category-meta-learning","post-preview"],"_links":{"self":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts\/677","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/users\/667"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/comments?post=677"}],"version-history":[{"count":5,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts\/677\/revisions"}],"predecessor-version":[{"id":786,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/posts\/677\/revisions\/786"}],"wp:attachment":[{"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/media?parent=677"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/categories?post=677"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/kclip\/wp-json\/wp\/v2\/tags?post=677"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}