Language is at the core of all forms of human and technological communication. A language model has a similar function in the field of AI, providing a basis for communication and idea generation.
Large language models (LLMs) are a type of AI algorithm that are used for a vast array of tasks, from article translation to financial fraud detection. LLMs use deep learning and massive data sets to understand, summarize, generate, and predict new content.
However, despite the incredible capabilities and versatility of these models, they occasionally generate inaccurate responses; can be overconfident about wrong answers or underconfident about correct ones. This makes it tough for users to know when a model can be trusted.
MIT and the MIT-IBM Watson AI Lab researchers have developed a new calibration method called ‘Thermometer’ tailored for LLMs. This technique utilizes a ‘temperature’ parameter to align the model’s confidence with its prediction accuracy, ensuring it’s neither overconfident nor underconfident. This approach helps users determine when to trust the model as it pinpoints situations where a model is overconfident about false predictions.
A well-calibrated model should have lower confidence about an incorrect prediction, and vice versa. This solution is expected to take the LLMs such as GPT models to the next level.