Large language models (LLMs) often use a very simple linear function to recover and decode stored facts
Cred­it: AI-gen­er­at­ed image

Large lan­guage mod­els, such as those that pow­er pop­u­lar arti­fi­cial intel­li­gence chat­bots like Chat­G­PT, are incred­i­bly com­plex. Even though these mod­els are being used as tools in many areas, such as cus­tomer sup­port, code gen­er­a­tion, and lan­guage trans­la­tion, sci­en­tists still don’t ful­ly grasp how they work.

In an effort to bet­ter under­stand what is going on under the hood, researchers at MIT and else­where stud­ied the mech­a­nisms at work when these enor­mous machine-learn­ing mod­els retrieve stored knowl­edge.

They found a sur­pris­ing result: Large lan­guage mod­els (LLMs) often use a very sim­ple lin­ear func­tion to recov­er and decode stored facts. More­over, the mod­el uses the same decod­ing func­tion for sim­i­lar types of facts. Lin­ear func­tions, equa­tions with only two vari­ables and no expo­nents, cap­ture the straight­for­ward, straight-line rela­tion­ship between two vari­ables.

The researchers showed that, by iden­ti­fy­ing lin­ear func­tions for dif­fer­ent facts, they can probe the mod­el to see what it knows about new sub­jects, and where with­in the mod­el that knowl­edge is stored.

Using a tech­nique they devel­oped to esti­mate these sim­ple func­tions, the researchers found that even when a mod­el answers a prompt incor­rect­ly, it has often stored the cor­rect infor­ma­tion. In the future, sci­en­tists could use such an approach to find and cor­rect false­hoods inside the mod­el, which could reduce a mod­el’s ten­den­cy to some­times give incor­rect or non­sen­si­cal answers.

“Even though these mod­els are real­ly com­pli­cat­ed, non­lin­ear func­tions that are trained on lots of data and are very hard to under­stand, there are some­times real­ly sim­ple mech­a­nisms work­ing inside them. This is one instance of that,” says Evan Her­nan­dez, an elec­tri­cal engi­neer­ing and com­put­er sci­ence (EECS) grad­u­ate stu­dent and co-lead author of a paper detail­ing these find­ings post­ed to the arX­iv preprint serv­er.

Her­nan­dez wrote the paper with co-lead author Arnab Shar­ma, a com­put­er sci­ence grad­u­ate stu­dent at North­east­ern Uni­ver­si­ty; his advi­sor, Jacob Andreas, an asso­ciate pro­fes­sor in EECS and a mem­ber of the Com­put­er Sci­ence and Arti­fi­cial Intel­li­gence Lab­o­ra­to­ry (CSAIL); senior author David Bau, an assis­tant pro­fes­sor of com­put­er sci­ence at North­east­ern; and oth­ers at MIT, Har­vard Uni­ver­si­ty, and the Israeli Insti­tute of Tech­nol­o­gy. The research will be pre­sent­ed at the Inter­na­tion­al Con­fer­ence on Learn­ing Rep­re­sen­ta­tions (ICLR 2024) held May 7–11 in Vien­na.

Finding facts

Most large lan­guage mod­els, also called trans­former mod­els, are neur­al net­works. Loose­ly based on the human brain, neur­al net­works con­tain bil­lions of inter­con­nect­ed nodes, or neu­rons, that are grouped into many lay­ers, and which encode and process data.

Much of the knowl­edge stored in a trans­former can be rep­re­sent­ed as rela­tions that con­nect sub­jects and objects. For instance, “Miles Davis plays the trum­pet” is a rela­tion that con­nects the sub­ject, Miles Davis, to the object, trum­pet.

As a trans­former gains more knowl­edge, it stores addi­tion­al facts about a cer­tain sub­ject across mul­ti­ple lay­ers. If a user asks about that sub­ject, the mod­el must decode the most rel­e­vant fact to respond to the query.

If some­one prompts a trans­former by say­ing “Miles Davis plays the…” the mod­el should respond with “trum­pet” and not “Illi­nois” (the state where Miles Davis was born).

“Some­where in the net­work’s com­pu­ta­tion, there has to be a mech­a­nism that goes and looks for the fact that Miles Davis plays the trum­pet, and then pulls that infor­ma­tion out and helps gen­er­ate the next word. We want­ed to under­stand what that mech­a­nism was,” Her­nan­dez says.

The researchers set up a series of exper­i­ments to probe LLMs, and found that, even though they are extreme­ly com­plex, the mod­els decode rela­tion­al infor­ma­tion using a sim­ple lin­ear func­tion. Each func­tion is spe­cif­ic to the type of fact being retrieved.

Large language models use a surprisingly simple mechanism to retrieve some stored knowledge
LRE per­for­mance for select­ed rela­tions in dif­fer­ent lay­ers of GPT‑J. The last row fea­tures some of the rela­tions where LRE could not achieve sat­is­fac­to­ry per­for­mance indi­cat­ing a non-lin­ear decod­ing process for them. Cred­it: arX­iv (2023). DOI: 10.48550/arxiv.2308.09124

For exam­ple, the trans­former would use one decod­ing func­tion any time it wants to out­put the instru­ment a per­son plays and a dif­fer­ent func­tion each time it wants to out­put the state where a per­son was born.

The researchers devel­oped a method to esti­mate these sim­ple func­tions, and then com­put­ed func­tions for 47 dif­fer­ent rela­tions, such as “cap­i­tal city of a coun­try” and “lead singer of a band.”

While there could be an infi­nite num­ber of pos­si­ble rela­tions, the researchers chose to study this spe­cif­ic sub­set because they are rep­re­sen­ta­tive of the kinds of facts that can be writ­ten in this way.

They test­ed each func­tion by chang­ing the sub­ject to see if it could recov­er the cor­rect object infor­ma­tion. For instance, the func­tion for “cap­i­tal city of a coun­try” should retrieve Oslo if the sub­ject is Nor­way and Lon­don if the sub­ject is Eng­land.

Func­tions retrieved the cor­rect infor­ma­tion more than 60% of the time, show­ing that some infor­ma­tion in a trans­former is encod­ed and retrieved in this way.

“But not every­thing is lin­ear­ly encod­ed. For some facts, even though the mod­el knows them and will pre­dict text that is con­sis­tent with these facts, we can’t find lin­ear func­tions for them. This sug­gests that the mod­el is doing some­thing more intri­cate to store that infor­ma­tion,” he says.

Visualizing a model’s knowledge

They also used the func­tions to deter­mine what a mod­el believes is true about dif­fer­ent sub­jects.

In one exper­i­ment, they start­ed with the prompt “Bill Bradley was a” and used the decod­ing func­tions for “plays sports” and “attend­ed uni­ver­si­ty” to see if the mod­el knows that Sen. Bradley was a bas­ket­ball play­er who attend­ed Prince­ton.

“We can show that, even though the mod­el may choose to focus on dif­fer­ent infor­ma­tion when it pro­duces text, it does encode all that infor­ma­tion,” Her­nan­dez says.

They used this prob­ing tech­nique to pro­duce what they call an “attribute lens,” a grid that visu­al­izes where spe­cif­ic infor­ma­tion about a par­tic­u­lar rela­tion is stored with­in the trans­former’s many lay­ers.

Attribute lens­es can be gen­er­at­ed auto­mat­i­cal­ly, pro­vid­ing a stream­lined method to help researchers under­stand more about a mod­el. This visu­al­iza­tion tool could enable sci­en­tists and engi­neers to cor­rect stored knowl­edge and help pre­vent an AI chat­bot from giv­ing false infor­ma­tion.

In the future, Her­nan­dez and his col­lab­o­ra­tors want to bet­ter under­stand what hap­pens in cas­es where facts are not stored lin­ear­ly. They would also like to run exper­i­ments with larg­er mod­els, as well as study the pre­ci­sion of lin­ear decod­ing func­tions.

“This is an excit­ing work that reveals a miss­ing piece in our under­stand­ing of how large lan­guage mod­els recall fac­tu­al knowl­edge dur­ing infer­ence. Pre­vi­ous work showed that LLMs build infor­ma­tion-rich rep­re­sen­ta­tions of giv­en sub­jects, from which spe­cif­ic attrib­ut­es are being extract­ed dur­ing infer­ence.

“This work shows that the com­plex non­lin­ear com­pu­ta­tion of LLMs for attribute extrac­tion can be well-approx­i­mat­ed with a sim­ple lin­ear func­tion,” says Mor Geva Pipek, an assis­tant pro­fes­sor in the School of Com­put­er Sci­ence at Tel Aviv Uni­ver­si­ty, who was not involved with this work.

 Mass­a­chu­setts Insti­tute of Tech­nol­o­gy