Machine Learning Street Talk
December 31, 2025

The Algorithm That IS The Scientific Method [Dr. Jeff Beck]

The Bayesian Brain: Why Scaling World Models Beats Function Approximation by Machine Learning Street Talk

Author: Unknown | Date: Unknown

This summary is for builders who realize LLMs are hitting a wall. It explains why the next leap in AI requires moving from pixel-space prediction to object-centered Bayesian world models.

  • 💡 Why is the brain's efficiency: the ultimate proof of its Bayesian nature?
  • 💡 How does the "Lots of Little Models": strategy solve the data-hungry scaling problem?
  • 💡 Why is "surprisal": the secret weapon for AI safety in unpredictable environments?

Top 3 Ideas

🏗️ THE BAYESIAN MANDATE

"I just believe it's the right way to think about the empirical world."
  • Bayesian Efficiency: The brain combines sensory cues with near-mathematical optimality. This suggests our biology isn't just guessing but running rigorous internal simulations.
  • Information Filtering: Intelligence is defined by the ability to compress reality into actionable macroscopic variables. 90% of neural activity involves deciding what to ignore.
  • Engineering Pivot: Autograd turned AI into an engineering discipline, but we lost the cognitive thread. We must return to brain-like structures to move past simple pattern matching.

🏗️ OBJECT-CENTERED LOGIC

"Intelligence must be embodied."
  • Object-Centered Sparsity: Instead of one giant model, use thousands of tiny, specialized models for specific objects. This allows for combinatorial creativity and systems engineering in AI.
  • Asset Libraries: View world models as game engine assets rather than raw video frames. Agents can then generalize across environments by simply swapping asset libraries.

🏗️ THE ALIGNMENT CONUNDRUM

"Belief and value are fundamentally conflated when all you observe is action."
  • Value Identification: We cannot infer an agent's goals just by watching it move. We must force agents to communicate their internal beliefs to separate what they know from what they want.
  • Surprisal Signals: When an agent hits a novel outlier, it should stop and phone a friend. This creates a natural safety buffer that current black-box models lack.

Actionable Takeaways

  • 🌐 The Macro Transition: Move from Big Data mimicry to Small Data causal reasoning.
  • The Tactical Edge: Prioritize Active Inference frameworks that track uncertainty.
  • 🎯 The Bottom Line: AGI won't come from bigger LLMs; it will come from agents that possess a physics-grounded world model.

Podcast Link: Click here to listen

My PhD is in mathematics from Northwestern University. I studied pattern formation in complex systems, in particular combustion synthesis, which is all about burning things that don't ever enter the gaseous phase. Basian inference provides us with a normative approach to empirical inquiry and encapsulates the scientific method at large. I just believe it's the right way to think about the empirical world.

I remember I was at a talk many years ago by Zuben Garammani, and he was explaining deer the durisho process prior. This is when the Chinese restaurant process and all that stuff was relatively new. His explanation of it so resonated with me in terms of, oh my gosh, this is the algorithm that summarizes how the scientific method actually works. You get some data, then you get some new data, and you sort of say, oh, how is it like the old data? And if it's similar enough, then you sort of lump them together, and then you build theories and you properly test hypothesis in the fashion.

That's the essence of the basian approach. It's about explicit hypothesis testing and explicit models, in particular generative models of the world conditioned on those hypotheses. I believe it is the only right way to think about how the world works, and it encapsulates the structure of the scientific method. I mean, if I'm being perfectly honest, what actually convinced me the brain was basian had a lot more to do with behavioral experiments done by other people.

My principal focus was on, well, how does the brain actually do this? So, I'm referring to experiments showing that humans and animals do optimal Q combination. We're surprisingly efficient in terms of using the information that comes into our brains with regards to these low-level sensory motor tasks.

Interesting. So it's almost like we're so efficient that the only explanation that makes sense is that we must be doing basian analysis.

Yeah. More or less. I mean it's a bit more precise than that. It's, you know, the Q combination experiments I think are really compelling. And so the idea behind a Q combination experiment is that I give you two pieces of information about the same thing, and one piece of information is more reliable than the other. And the degree of reliability changes on a trial by trial basis. So you never know a priori that the visual cue as opposed to the auditory queue is going to be the more reliable thing.

And yet nonetheless, when people combine those two pieces of information, they take into account the relative reliability on a trial by trial basis, and that means that they're optimal in a sense. Now we have to be super careful with our words. They're relatively optimal because they're not actually using 100% of the information that the computer, like your the visual information that you use. You don't use 100% of the information that the computer provided you, but there is some loss between the computer screen and your brain mediated in principle by, but it the system behaves as if it has optimally combined those two cues. It has taken into account uncertainty.

This also is because how we really do think about the world. We take into account uncertainty all the time in our decisions. If you're ever driven in the fog, you're aware of this. 90% of what the brain does is decide what to ignore, because if we didn't, we'd be screwed. We receive an insane amount of information, most of which we don't even bother to process.

Is that definitely the case though? Do you think that we could actually be processing more information than we know?

We are definitely processing more information than comes out in behavior. A lot of that is because we are continually learning, and like learning, you know, you have to, you close your eyes for five years and your visual system decays, you lose fidelity, it forgets, it requires constant input simply to maintain this understanding of the low-level statistics of the visual world. Without input, you're, you know, you're, so the question is, is that using all the information or is it just using the low-level information and it's information that we don't directly perceive but is still definitely being used in a sense when it comes to, but what is it being used for? It's being used to track these sort of low-level statistics that we sometimes need but don't always need.

This is why I say that when we say context matters, you can think of that in terms of we're able to flexibly switch between tasks, which means having a lot of resources and having a lot maintained and having them still be in good working order just in case we need them. This is why these self-supervised or unsupervised learning approaches that are ubiquitous for getting your LLM to give you your reasonable prior over language is the sort of stuff that your brain is definitely doing. So in a sense it is using everything, but it's not really using all of the information that's present. That's the argument that I want to make.

The idea of having to traffic in squishy people in order to make our systems go is not immediately appealing. Let's put it that way. This episode is sponsored by Prolific. Let's get few quality examples in. Let's get the right humans in to get the right quality of human feedback in. So, we're trying to make human data or human feedback, we treat it as an infrastructure problem. We try to make it accessible. We're making it cheaper. We effectively democratize access to this data.

What do you think about these broad sort of metaphorical idealizations? The big one is that the brain is a computer. The probably the more popular one is that the brain is a prediction machine.

It will always be the case that our explanation for how the brain works will be by analogy to the most sophisticated technology that we have. That how's that for a non-answer? So, a couple thousand years ago, how'd the brain work? It was like levers and pulleys, man. I mean, duh. Don't be ridiculous. That was the, at some point in the middle ages, it became humors, because fluid dynamics was the technology that was the most advanced or technology that took advantage of water power was the most advanced technology that we had. Now the most advanced technology is computers. So duh that's exactly how the brain works.

Philosophers used to think that the universe was a machine. We interviewed Chsky about this as well, because he talks about the ghost in the machine, and the ghost is all of the bits in the machine that we don't understand. But do you think now that we can think of the universe as a machine?

I think that that is a very convenient way to think of the universe. So when we model the universe as having causal structure, do we do so because it actually has causal structure or because that's a really convenient class of models with which to work?

I think that it has causal structure, but also it's a convenient class of models. So like a good example is large language models. They're all, most but not all are auto regressive in terms of their predictions. Well, why? Like why is it auto? Oh, it's because it's mathematically convenient. It's a compact way to take the past and make a prediction about the future. Does it mean that that's actually the way language works? No, I don't think it's actually the way language works, but it's a computationally convenient model.

In physics we have, there are in fact, momentum is a good example. Why do we need momentum in order to describe? We don't, you know, we don't observe momentum directly. We only, you're just looking at videos, you just, the position of the ball, you want to infer the velocity, you just take the difference between two adjacent positions and then that gives you, but you don't ever directly observe the momentum in a mechanical setting. So why did we choose momentum? Well, we chose momentum because that's the variable that, if we knew what it, if we knew what momentum was now everything is marovian. Everything is, now there's a simple causal model that describes how the world works. We picked that model because we picked that particular hidden variable because it's what rendered the model causal. Does that mean that's how the universe works or was that just a computationally convenient choice?

I'm gonna stay agnostic on that one. But I do like that it's a computational that ended up working out. And just quickly riff on the benefits of having models that preference causal relationships.

So, when you have a causal relationship, it reduces the number of variables you have to worry about and track. That's the beauty of having a cause. It's like a marov. It's the same argument with momentum and marov models. We chose to have that hidden variable because it's the thing that made the model simpler. It made the calculations easy. Now we can just go forward in time, just make predictions in a totally iterative fashion. That's what makes causal models great.

The other thing that makes causal models great is if you do ever intend to sort of act or behave, then you still need to be able to predict the consequences of your action. The more tightly linked your actions or your affordances are to the things that causally impact the world, the more effective those actions are with respect to your model, but hopefully also with respect to reality. And so we prefer causal models in part because they are relatively speaking, simpler to execute in a simulation form, but also because they point directly to, well, where should I intervene, where should I go in and how should I choose my series of actions that will give me the desired lead me to the desired conclusion or goal?

What's the difference between micro causation and macro causation?

I think the difference between micro and macro is a single letter. No. So we could just model the light cone at the particle level.

Yeah. That's how physicists. Yeah. I mean that's the way physicists see the world and we see the world in terms of populations and people and all these macroscopic things and we still reasonably do experiments and we do interventions and we do randomization. To truly identify a causal relationship, you have to do an intervention. You know, the classic example, this is also in the in lung cancer, it's like, I forget how long ago this was, but at one point there was this belief that alcoholism caused lung cancer, but it was actually because they were in poor health because they were alcoholics and they smoked a lot more than the rest of the population. So, you do need to do that kind of intervention to discover a causal relationship.

However, the causal relationships that we care about are the ones that mesh with our affordances. If identifying a microscopic causal relationship is super, that's great, but unless you have really tiny tweezers it's not very helpful. What you need to do is you need to identify the causal relationships that are present in the domain in which you are capable of acting. We care about the causal relationships at the macroscopic level because that is where we live. We live in the macros at the macroscopic level most of our actions are at the. Now one of the best things about humans is our ability to extend the domain of our affordances with technology. We have nuclear power because what we did was we acquired the ability to take tweezers at that scale and make these things happen. We figured out how to take advantage of causal relationships at that level, not because we have those abilities, but we were able to create the tools that gave us access to that space.

It all depends on what the problem it is that you're trying to solve and the causal relationships that you always care about will be the ones that are related to the actions that you are capable of performing. Now that said, there's clearly a great advantage in understanding the microscopic causal relationships, if for no other reason than that might lead to us discovering a way to expand our affordances into another aspect of the microscopic domain.

Is this just instrumental? Is this just something that it's a little bit like we say that agents have intentions and representations and it's just a great way of understanding things, but for all intents and purposes, it's not actually how it works.

Well, I think that that sentence ended on a rather definitive statement with which I don't think we could I would agree, but the rest of it is it all in you're asking like the bas the the scientific anti-realist if it's all instrumental. So, yeah. Yeah, it's all instrumental. I mean we, the things that we care about are the things that, again back to affordances, we need to understand causal relationships at the scale that we can manipulate. That's what matters most, because that allows us to have effective actions in the world in which we actually live to the extent that we care about other scales, it is because we simply wish to expand our domain of influence.

The mind is quite an interesting example. So let's say I want to move my hand and my mind willed it. So it's top down causation. Now I can't act in the world of my mind. But it seems macroscopically intelligible. We think about our minds. So maybe the mind is a special case. I don't know.

Well the mind is a special case. I'll agree with that. I think of downward causation from, well I guess from an instrumentalist perspective, it's like I'm not saying downward causation is the thing. I'm saying that downward causation is one of is like how it all works. I would take it from more from the perspective that downward causation if discovered downward causation is what justified your macroscopic assumption.

So what do I mean by that? I mean that like suppose I'm in the following situation. I got a bunch of microscopic elements and they're all doing stuff and I'd like to draw a circle around them and call that a macroscopic object. Now I am justified in doing so if that particular description of the macroscopic at the macroscopic level has the downward causation property. It is a way of saying, oh that was a good you that circle you drew that was a good circle because it summarized the behavior of the system as a whole in a way that rendered the microscopic behavior irrelevant to further for further consideration.

Yes, I can think of some situations where we do this. I mean, we might identify an aspect of culture or a meme and we might say that is responsible for violence or something like that.

You still have to show that it has that property, and I think in intentionality is a tough one, because it's a variable that has a lot of explanatory power but it's not, but it's not one that evolves. So when I think of a good macroscopic variable it's one that I understand how it evolves over time. That's what makes it a good macroscopic. I can just write down a simple equation and it says pressure volume temperature they are going to do this over time and like taking any little microscopic measurement becomes like totally irrelevant.

But what made it useful wasn't just that the microscopic measurements are irrelevant, it's that I had an equation that describes how it would have behaved, that's also fairly accurate. So I have a nice deter, relatively deterministic model that is at the macroscopic level. And so when we talk about intentionality, I think it's, yes, it can be used as an explanatory variable, but it's only good to the extent that we understand how that intentionality changes over time. It's a long-term prediction.

This is why the jurist prudence example made me really uncomfortable because it's sort of like saying, well, you're kind of doing is you're saying this is a bad person, and I don't know how we would necessarily identify that intentionality except in a very indirect way, that is that then they're stuck with but then because it's only good as a macros cover variable if it we can make predictions about how that variable changes over time and we're not doing that. We're saying you're stuck with it, and I just that's why it sort of makes me a little uncomfortable.

I did actually notice that the active inference community has quite a rag tag. It's got very diverse.

Well, I think this was Carl's influence. So what did Carl actually discover? He's got this link between information theory and statistical physics that in some way gives you this sort of uniform mathematical framework that's widely applicable to a huge number of situations. It has a lot of sort of things that are baked into the how we think about the world is kind of baked into it and so it can be applied in a whole bunch of different areas. And Carl spent a lot of time basically evangelizing various different aspects of the scientific community. It's like oh look you can apply this to epidemiology you can apply this to the social sciences. You can apply this to physics. you can apply and just in and wrote a ser this is one of the reasons I think he's so prolific is because he's basically written variations on the same paper but just applied in different domains and he did this and this was intentional because he wanted to show that this is a uniformly applicable mathematical framework and I think he's largely right about that.

As a result, there's all these people from all these different communities that have been pulled into his sphere that think about the world very differently and it makes for some very entertaining conversations at the pub.

Yes, even in our Discord server, we've got people thinking about it in terms of crypto, even in terms of Christianity, phenomenology, psychology. It's really interesting. But that's the beauty of constructing a nearly uniformly applicable mathematical framework. Exactly. You get to you get to suddenly this is what one of the things I love I mean this is what I love about the community in fact is that we now have a relatively common language to discuss a huge variety of different things. Now of course that means we often end up talking cross purposes but that's half the fun.

I often ask people in the business like what what changed like what's you know what's what you know why did we have this like massive explosion in um you know in AI development over the last several years. I get three there there are three common responses and I agree with every single one of them:

  • Autograd.
  • The transformer, but why the transformer is something that I often disagree with with people about. Transformer architecture.
  • Just the amazing the the ability to scale things up in a manner that we haven't really seen before.

I actually the reason why I say transformer comes with an asterric is because a lot of the things that transformers have that people believe that the transformer enabled I think really resulted more from scaling and my the point of evidence that I like to site is like mamba mamba which is a state which is a traditional state space model it's basically a common filter but like on steroids they scaled it way up and yet and now it's you know got they've you know mistral has their very nice like coding agent and it works pretty darn well. They got a lot of the same functionality with a completely with a you know a completely different architecture simply by virtue of scaling. So transformers get a get an aster.

I think that the biggest thing was autograd and autograd turned the development of artificial intelligence from being something that was done by carefully constructing your neural networks and then writing down your learning roles and going through all that painful process that took forever and they turned it into an engineering problem. It made it possible to experiment with different architectures, different networks, different nonlinearities, different structures, different ways of getting your memory in there in different way and all this fun stuff that allowed people to just start trying things out in a way that we couldn't do it before. And then we what did we did? We we suddenly discovered, oh, it turns out back prop does work.

I mean, when I was a young man, like back prop was considered a non-starter for two reasons. One is it's not brain-like, which is true, brain does not use back prop. And the other one was a vanishing gradient. Oh, you'll never solve the vanishing gradients problem. And it's like, oh, it'll always be unstable. And yet, nonetheless, once we turned into an engineering problem, started playing around with tricks and hacks and certain kinds of knowledge and rel and that we discovered that oh no, in fact, there are ways around this. You just, we just, weren't going to discover them by playing with equations. We had to actually start. We turned it into an engineering problem. as soon as it got turned into an engineering problem, that's what enabled the hypers scaling, which is what led to all of this all all of this, you know, these great developments over the last several years.

What got lost in the mix, though, was the notion that there's more to artificial intelligence than just function approximation. We got really good function approximators, but that's not the only thing you need to develop proper AI. You need models that are structured like the brain is structured. You need models that you need models that are structured like how we conceive the world is structured. Certainly if you want to have models that think the way we think. And that that got lost in the shuffle and we're starting to see as we're starting to see the limitations and the faults and flaws of these approaches and starting to see them not living up to the hype which I think is now it's at standard that like AGI is no longer I don't know if you read the other day at least according to the experts in the field at the top at the of the best companies in the business like AGI is no longer like a huge priority and that they're dialing back the rhetoric surrounding that in part because I think that they've begun to realize that like just function approximation isn't going to deliver or that was just hype.

We do need to do something different. We do need to start get, you know, bringing in what we know about how the brain works, if we're ever going to get to something that is a humanlike intelligence. And that was the starting point for us, about a year or so ago is that we were sort of like, yes, let's do the same thing for cognitive models. Like, let's talk about let's take what we know about how the brain the brain actually works. Let's take what we know about how people actually think about the world in which they live and start building an artificial intelligence that thinks like we do by incorporating these principles. And this means this means basically creating a modeling and coding framework for building brain-like models at scale. And that's like the critical element because obviously scaling was a was a big part of the solution.

Right now most of the work in the active inference space as I'm sure you're aware is not at scale. There's very little like active inference work that is active inference at scale. Most of the models are like relatively small toy grid worldy type models. And part of the reason for that is that it is in fact difficult to scale basian methods. Now that also has now begun to change. We now have a lot of great mathematical tools and a lot of great frameworks for approximating basian inference. You'll never do it exactly. We're approximating basian inference which I believe is how the brain works, basian brain and all that that allows us to build these kind of structured models that that are structured both after the brain how the brain is structured and how the the world that we live in is actually structured.

Hence the the the this notion that what we need to build get the ne to the next layer of of AGI and I also don't like that term and don't intend to use it very often. What we need to get to the next level is this framework that allows us to build the kinds of models that we know people actually use and just make them bigger and more sophisticated and so on and then take advantage like hyperscaling basian inference is part of it, but also like it's constructing models of the world as it actually works. The way the world actually works provides us with the structure of our own thinking. The atomic elements of thought is how I like to phrase it, are models of the physical world in which we live.

The physical world which we live is a world of macroscopic objects that have specific relations and interact in certain ways that we understand. I'm looking around the room for a good example, you sit on a chair. That's an example of a relationship. It holds you up and all that fun stuff. And those are the kinds of, that understanding of the physical world was necessary for us in order to survive. Dogs have it too. Language isn't what make, isn't all that special. Well, it's actually quite special. But those are the models that form that that that understanding of the world in which we live is where we get our the models that form the the the models that form the atomic elements of our thoughts out of which we have composed more sophisticated models that have allowed us to do all this great systems engineering, build this great technology that we've got.

That's what we want to do. We're focused on building cognitively inspired models that are based on our understanding on the way the world in which we live actually works because we believe intelligence must be embodied. Building a framework for putting those models together and experimenting with them at scale all in approximately basing way because we believe that's how the brain works. It's not just about putting your AI into a robot. It's about giving that giving the robot a model of the world that is like our model of the world, a model that is object- centered. It's dynamic. It's largely causal. It's, that's the big difference. And I think that the the the sort of sparse structured models is another sort of key differentiating component.

Like when you think about how like a transformer and LLM work a transformer takes every word in the document and says now how does this word relate to every other word and it does it many many many many times it's a it you know it's it's very much word same thing with like your your your generative vision language action models they operate in pixel space they are microscopic models now yes do they have an implicit notion of sort of macroscopic yes they must because they work but it's implicit it and it's not implemented with the kind of sparse structure that actually exists in the real world and in our conceptualization of it. And that's the thing that we are going to that we are we are saying no no no look like if we want an AI that thinks like us then we are going to build models that are structured like both like the real world is structured they have this sparse causal macroscopic structure to it and so should our models and so should and the only way to do that is not just to like put a robot in the real world but to put a robot with a model that is structured in that fashion into the real world no one's using the XLSTM not many people are using Mambber because why not all you need to do is just scale the transformer as much as possible.

So, many people just really think you just magically get these things for free. So, I think you could argue that that with enough data that's the right kind of data, one of these like really big supercaled models will obtain an implicit representation of the world that is more or less correct. Now having an implicit representation is great if if your only goal is to just represent the world. If your only goal is to just predict what's going to happen. But it turns out people do something which is very different. People are creative. People can solve novel problems. They can't. It's not just about mining old problems and figuring out where I can move some words around and get a and get an answer that looks more or less right. We actually are capable of creating. We're capable of inventing new things.

The way that we invent, I think, is exemplified by by like systems engineering. How does systems engineering work? Well, I I know how an air foil works to create lift. I know how a jet engine works to create thrust, and I can take those two bits of information to invent something brand new, which is an airplane. That kind of systems engineering was predicated upon having this sort of model of the world that was relational. Here's the wing. I can put a jet on it. I can like I don't know you don't staple it on. I'm sure you use rivets or something. I know how to put things together. I know how to construct new relationships and new objects. An AI that that is designed for systems that is designed to do systems engineering will have a object- centered or system centered understanding of the world and will know how those all of the objects relate so that it can sort of start experimenting with different ways to combine them.

It's absolutely it you know without that the only thing you will ever be able to do right is just retool solutions for new purposes and it won't and even that is I think is a generous interpretation of what a purely predictive model is going to do right so this is how I like to think about like you know the the principal advantage of taking this object- centered approach right is that it enables systems engineering what is a grounded world model that is

So, I feel like that's a trick question. I was I actually had this conversation with one of my friends Maxi and co-conspirators the other day. In some sense every model is grounded. It's grounded in the data that it was given. Now, okay, so that's like a true statement. It's like okay, yeah, but that's not what we want. And when we often use the word like a grounded world model, it's we say that it's grounded in something. And that something is not just the data that it saw. So example vision language models. A vision language model is like is a way of grounding the visual model in the linguistic space. And this is the approach that we're taking. This is what lang chain does, right? It's all about taking models and everything becomes a blank language model right you know vision you know a you know whatever everything becomes and what what you're when you do that what you're doing is you're saying that that you're grounding all of your models in like a common linguistic space so that they can communicate with one another via language.

Now why did we choose language? Well, we chose language because honestly I think it's because we wanted models that we could talk to. We wanted we wanted a model that like you know it was really all about the in making the interface convenient for us and which is great. That's totally something you want. But it begs the question, what's the right domain in which to ground your models? Now I like grounding models like so we also use the phrase like you know one of those like ground truth and of course ground truth is the thing you made up in Oprior said was ground truth right. So what's ground truth? What is the what is the right do you know domain in which to ground models in order to get them to think like we do? That's the relevant question.

So my view is is that if you know again if you want AI that thinks like we do you need to have it grounded in the same domain in which we are ground and we are grounded in this domain. This is why the embodied bit is such an important thing. We want models that are grounded in the physical world in which we evolved. And the reason for this is because that is the world that provides us with these atomic elements of thought. A single cell like lives in a in a soup, and it has and it it's you know whatever model it has of the world to the extent that it has one or it behaves as if it has one that model is is the model of its environment. If it didn't understand the environment in which it lived to some extent, then it wouldn't be able to continue to exist and function in that environment. So you can sort of say that a cell has a model that's grounded in chemistry, of the chemistry of the soup in which it lives. You know, when we talk about that that is a prerequisite for its survival.

Now we talk about like mammals and bigger animals and things that live in the macroscopic world that includes other animals, and all that. So what's that model? What's what's the world the the you know well at the very least we can say that whatever models we have a significant subset of them are grounded in that world and that world we know has properties that that we can understand it is object- centered it's relational it's all this you know all this stuff and so the ground the the the the grounded bit is more about like properly grounded grounded in the domain in which in in which we are grounded as a route to to to creating, you know, AI, you know, an a a AI models that in fact think like we think. That's the grounding that we that that that we're particularly focused on.

If you had to choose the domain in which to ground your models, what would you choose?

I don't think language is the right one. Language is an incredibly poor description of both our thought processes and reality. I tell the story all the time. You ever you ask any cognitive scientist or psychologist who's done some experimental work with humans, the you put them in a in a chair, you make them do some tasks, you carefully monitor their behavior, you look at what they did, and then you have a nice way of and then you you know that informs your theory of that behavior or however that works. Then you and if you do the experiment well, you have a very good model of how they made whatever decisions they made throughout the course experiment. And then you go back and you ask them what why did you do what you did? and they give you an explanation. It sounds totally reasonable. It also is completely inconsistent with an accurate model of their behavior. Self-report is the least reliable form of data that one gets out of a cognitive or psychological experiment. And so, we don't want to rely on that. We don't want to ground our models in what we know is an unreliable representation both of the world and of our thought processes. We want to ground it in something that's a good model of our world. And that's why we we've chosen to focus on like mac, you know, models that are grounded in the domain of macroscopic physics as opposed to language.

Can you speak a little bit more to the the limitations with current active inference?

A nearly uniformly applicable information theoretic for describing objects and agents. It really is inspired by statistical physics and its links to information theory. And when you take those two mathematical structures, throw in a little like marov blankety thing, so you can talk about macroscopic objects, you kind of have a very generic widely applicable mathematical framework that you can throw up many problems. And a lot of what has gone on in the active immers community over much of the last 20 years has been demonstrating that it's uniformly applicable. So there's been a lot of breth and not a lot of depth. And part of you know and I think that you know of course like that's you know that's appropriate right given you know if you really want to make the the argument that everyone should be using this you show see in this in this domain it works on your like toy examples but the people doing that right kind of you know the active community has had has this habit of showing like like see like oh this basically like I I can handle this like psychological phenomenon I can model this cognitive phenomenon on. Oh, and look, like it's a good post talk description of this neural network's behavior and things like that. They're they've been showing that, but they've never really sat down and and and like tried to tackle any really big really hard problem because the emphasis has been on evangelism.

You couple that with the fact that there is this strong bias within the active community towards being as basian as possible. And so, of course, they also like shun the really hard problems because basian inference, you know, is has been historically challenging to scale. There have been a lot of developments over the last few years that have come out of the machine learning community as well you know but mostly out of the basian machine learning community that have really made it possible to start scaling basian inference in ways that we we really weren't able to do it before and you couple that with a desire to sort of stop the evangelizing and start solving really hard problems with these methods and you've got a way to prove that like active inference really can live up to its promises.

Others You May Like