In Context Learning (ICL) is already a rapidly advancing area. You do not need to modify their weights for LLMs to persist state.
The human brain is not that different. Our long-term memories are stored separately from our executive function (prefrontal cortex), and specialist brain functions such as the hippocampus serve to route, store, and retrieve those long term memories to support executive function. Much of the PFC can only retain working memory briefly without intermediate memory systems to support it.
If you squint a bit, the structure starts looking like it has some similarities to what's being engineered now in LLM systems.
Focusing on whether the model's weights change is myopic. The question is: does the system learn and adapt? And ICL is showing us that it can; these are not the stateless systems of two years ago, nor is it the simplistic approach of "feeding old context back to it."
It seems like there is a bunch of research/working implementations that allow efficient fine tuning of models. Additionally there are ways to tune the model to outcomes vs training examples.
Right now the state of the world with LLMs is that they try to predict a script in which they are a happy assistant as guided by their alignment phase.
I'm not sure what happens when they start getting trained in simulations to be goal oriented, ie their token generation is based off not what they think should come next but what should come next in order to accomplish a goal. Not sure how far away that is but it is worrying.
That's already happening. It started happening when they incorporated reinforcement learning into the training process.
It's been some time since LLMs were purely stochastic average-token predictors; their later RL fine tuning stages make them quite goal-directed, and this is what has given some big leaps in verifiable domains like math and programming. It doesn't work that well with nonverifiable domains, though, since verifiability is what gives us the reward function.
That makes sense for why they are so much better at writing code than actually following the steps the same code specifies.
Curious, is anyone training in adversarial simulations? In open world simulations?
I think what humans do is align their own survival instinct with a surrogate activities and then rewrite their internal schema to be successful in said activities.
The human brain is not that different. Our long-term memories are stored separately from our executive function (prefrontal cortex), and specialist brain functions such as the hippocampus serve to route, store, and retrieve those long term memories to support executive function. Much of the PFC can only retain working memory briefly without intermediate memory systems to support it.
If you squint a bit, the structure starts looking like it has some similarities to what's being engineered now in LLM systems.
Focusing on whether the model's weights change is myopic. The question is: does the system learn and adapt? And ICL is showing us that it can; these are not the stateless systems of two years ago, nor is it the simplistic approach of "feeding old context back to it."