Quiet-STaR algorithm allows chatbot to think over its possible answer before responding
Qui­et-STaR. We visu­al­ize the algo­rithm as applied dur­ing train­ing to a sin­gle thought. We gen­er­ate thoughts, in par­al­lel, fol­low­ing all tokens in the text (think). The mod­el pro­duces a mix­ture of its next-token pre­dic­tions with and with­out a thought (talk). We apply REINFORCE, as in STaR, to increase the like­li­hood of thoughts that help the mod­el pre­dict future text while dis­card­ing thoughts that make the future text less like­ly (learn). Cred­it: arX­iv (2024). DOI: 10.48550/arxiv.2403.09629

A col­lab­o­ra­tion between AI researchers at Stan­ford Uni­ver­si­ty and Not­bad AI Inc. has result­ed in the devel­op­ment of an algo­rithm that allows cur­rent chat­bots to mull over pos­si­ble respons­es to a query before giv­ing its final answer. The team has pub­lished a paper on the arX­iv preprint serv­er describ­ing their new approach and how well their algo­rithm worked when paired with an exist­ing chat­bot.

As the researchers note, the gen­er­al approach tak­en by cur­rent chat­bots is to devel­op an answer to a query posed by a human using train­ing data. None of the chat­bots cur­rent­ly being used by the pub­lic stop to pon­der mul­ti­ple pos­si­ble answers to a query before giv­ing the one it thinks is most like­ly to be what the human want­ed. If a human respond­ed in such a fash­ion, it would be described as sim­ply blurt­ing out an answer.

In this new study, the research team has giv­en chat­bots a means for mulling a bit before answer­ing, and in so doing, claim to have cre­at­ed a way for chat­bots to be much more accurate—and to answer ques­tions in more human-like ways.

The algo­rithm, Qui­et-STaR, works by first ask­ing the chat­bot to pro­duce mul­ti­ple answers to a giv­en query. It com­pares the answers with the orig­i­nal query to decide which appears to be the best. It then directs the chat­bot to return that answer to the user. The team also gave the algo­rithm the abil­i­ty to learn from its own work, there­by improv­ing its mulling capa­bil­i­ties over time.

To test their algo­rithm, the researchers added it to the open-source Mis­tral 7B chat­bot and test­ed it using a stan­dard rea­son­ing test—it scored 47.2%. With­out the algo­rithm, Mis­tral 7B scored just 36.3%. It also did much bet­ter on a math test.

The research team notes that their algo­rithm could be plugged into any of the chat­bots cur­rent­ly in use, though it would have to be done by their mak­ers, a move they sug­gest could improve the accu­ra­cy of chat­bots in gen­er­al.

Source link