Machine Learning Street Talk
January 21, 2026

"We Made a Dream Machine That Runs on Your Gaming PC"

We Made a Dream Machine That Runs on Your Gaming PC

By Machine Learning Street Talk

Date: October 2023

Quick Insight: This summary is for builders and researchers tracking the move from static video generation to real-time, interactive world models. Overworld Labs is proving that high-fidelity simulation belongs on consumer GPUs, not just centralized TPU clusters.

  • 💡 How can a 2-billion parameter model hit 60 FPS on a standard gaming PC?
  • 💡 Why is "lucid dreaming" the ultimate benchmark for the next generation of interactive media?
  • 💡 How does diffusion distillation trade off diversity for the "playability" required in world models?

Andrew Lap and Shabalan Matiana of Overworld Labs are decentralizing the world model by moving it from Google’s closed clusters to your home office. They are building Waypoint One, an interactive diffusion model that turns text and image prompts into playable, 60 FPS realities.

The Sovereign Simulation "[Making this open, making this public, making this accessible and runnable on consumer hardware... that's a big distinction.]"
  • Consumer Hardware Focus: The model runs on 3090s and 4090s rather than enterprise-grade silicon. This democratizes high-fidelity simulation and prevents platform gatekeeping.
  • Privacy by Design: Local execution ensures your internal monologues and simulated scenarios remain private. This prevents the outcome of a centralized entity owning your subconscious data.
  • Open Weight Strategy: Releasing the 2B parameter model on Hugging Face invites community experimentation. This accelerates the discovery of use cases beyond traditional gaming.
The Lucid Dream Benchmark "[Dreams can give you these really amazing, fully immersive experiences, but there's no way to record them.]"
  • Zero Latency Input: The architecture generates one frame at a time to ensure immediate responsiveness to controller inputs. This avoids the laggy feel of temporal autoencoders that batch multiple frames.
  • Grounded Shared Reality: These simulations provide a shared physical context for multiple users. This allows for the creation of social experiences that were previously impossible in subjective dream states.
The Efficiency Breakthrough "[Every other week it feels like someone comes out with a paper that finds a way to make diffusion 100 times faster.]"
  • Compute Bound Performance: Large batch sizes of 256 tokens per forward pass maximize GPU utilization. This shifts the bottleneck from memory bandwidth to raw processing power.
  • Distillation Sweet Spot: Using a four-step rectified flow model maintains quality while hitting high frame rates. This proves that extreme step reduction does not have to sacrifice the structural integrity of the generated world.

Actionable Takeaways

  • 🌐 The Macro Shift: The transition from static LLMs to interactive world models marks the move from AI as a tool to AI as a persistent environment.
  • ⚡ The Tactical Edge: Monitor the Hugging Face release of the 2B model to build custom image-to-experience wrappers for niche training or spatial entertainment.
  • 🎯 The Bottom Line: Local world models will become the primary interface for spatial computing within the next year, making high-end local compute more valuable than cloud-based streaming.

Podcast Link: Click here to listen

I'm Andrew Lap. I'm member of technical staff at Overworld Labs. I focus on pre-training, post- training, and inference.

I'm Shabalan Matiana. I am a co-founder and head of research here at Overworld. I focus also on pre-training, but mostly on the autoenccoder side as well as helping with things all around for the pre-training team.

So, many in the audience, they would have seen that we did the launch video for Genie, and that is this new technology that we're just trying to understand what the hell it is and how we could use it. It's really exciting and I guess you could call it a kind of continuous generative vision model that allows us to do a form of interactive entertainment which is a little bit like dreaming or doing simulations of the world and actually interacting with it in real time.

I think what they've done is really impressive. I think we're a little bit different in that we're a research project. We're letting the community run this on their own hardware. We're letting people explore the possibilities of Overworld and Waypoint One and we really want to see what the community can do and figure out what all the use cases for this are.

We have a few in mind ourselves, but you know making this open, making this public, making this accessible and runnable on consumer hardware, I think that's a big distinction that lets people, you know, figure out what is possible with this technology.

And that is a big constraint, right? So Google, they have it running on their TPU network. God knows how big the model is. It all just runs in the magical cloud and you want to run it on consumer hardware. What do you mean by consumer hardware?

Consumer hardware that you can purchase your 3090s, 3070s, 4090s, 5090s, AMD and we're going to be targeting Apple silicon soon. So basically making it so you don't have to buy a $30,000 B200 to run it, making it so that anyone can run it on their own gaming hardware.

And I think before we go into discussions, can you just show a demo of it just so that folks in the audience can see what it is we're talking about?

So, this is our overworld streaming demo. There's a few ways you can run Waypoint One. You can run it locally on your own hardware. You can run it on overworld.stream, which is our service that you can go in your browser and try out.

There's third party clients that allow you to run Waypoint One. And, you know, we've really opened up the tooling and allowed people to stream it and run it however they want. But yeah, this is our streaming demo. It allows you to enter a prompt. you can generate a world that you can explore, that you can play in. It's text to experience basically real-time interaction.

The typical workflow for video diffusion models is you enter in a prompt, you wait, and then you have a video. So this kind of disrupts that paradigm because you're able to real time enter control inputs drive the video drive the experience and dynamically have anything you want to experience.

So this thing is running all the time and initially I think you said you can put a prompt in and you can just like generate a scene. Do I understand that you said that you also support this feature where you could kind of construct the scene interactively? So you build the scene and then you can say I want to have this thing and I want to have this thing. You said the model's trained for that but you're not supporting it at launch.

So the stack is available to run this. The model is trained with the capability to enter a prompt at the beginning and then adjust the scene either through controls or through a prompt that adds a new event, adds a new change to the environment. So every frame that's generated is conditioned on the prompt and the controller inputs.

So you can describe a change in the scene. That is not supported in this client. No, it's supported in our world engine inference library. Anyone can play with it. Anyone can test it. And you know you can make modifications to our open client as well.

And we're planning on adding these new features such as inflight captions, we're calling them that drive the scene. That is just to reiterate that is not a capability at launch but it's something that we're excited to be bringing soon.

And you said that you're supporting 60 frames a second or you're aiming to do that. How much resolution is there? And you know how big is the model and what are the kind of like performance constraints that you're heading up against?

The model is this is our small model actually. This is a two billion parameter model and it runs at 60 fps on a 5090. The constraints of that really are you're just doing so much compute. you're generating 15,000 forward pass tokens per second, you're batching 256 tokens in a forward pass and you're basically for some context when you're you convert an image into a series of 256 tokens, a 16x6 grid, and then a transformer model predicts the next frame based on all the history of the frames and all the conditioning, the text conditioning and the prompt conditioning.

And that's a lot of processing. It's a lot of work to do. So it is that's basically the main constraint just how heavy the operations being performed on the GPU are.

And at the moment I think you said the context window is quite small. So it's about 2 seconds. But as we increase that it will have a memory so you could kind of you know look at something and you could look away from it and you could go back and it would still remember what that was. But is how are you going to kind of increase the length of the context window over time?

Well, the big constraint is just a technological constraint that we've run into. And to alleviate that, we are applying we're basically distributing the sequence across GPUs when we're training so that we can actually have the full sequence. 30 seconds, 60 frames per second. That's 1,800 frames. You have to have a GPU process.

And so we're sharting that sequence across GPU so that the model can actually learn how to process such a long sequence.

And at the moment, it's constructed with text prompts. But in the future, I think you're going to introduce image prompts and and other forms of conditioning as well. Tell me about that.

Yeah, so actually on day one, we're supporting image to experience, image to real-time streaming. Prompting is just one way you can generate the experience, or you can just leave it unconditioned as well. Uh, you know, it's really the model is quite flexible. It's just a matter of implementing a client that can handle all these different functionalities.

But on day one, we are supporting image to video and text to video within the client.

And how much does it depend on the imagination and competence of the prompter? And what I mean by that is, you know, we've all had this experience that claude code is even it's like an amplifier of intelligence. So the more competent and the more you understand, the more you can specify things, the more worlds you can create, the more things you can explore. Do you see a similar thing here? Do do you think that some people are really imaginative and they can come up with beautiful scenes and worlds? or do you think that it's actually quite democratized and and almost anyone can make this thing sing?

Unlike cloud code, there isn't a need for the one user to have to do everything themselves. We do think that as people explore the models, they're going to probably want to customize the experience a bit to see what it can do and what kinds of things they can make for them. However, unlike cloud code and LMS, it is way more intuitive to share these kinds of experiences.

You can imagine someone taking their prompts, taking their seed images, taking their instructions for how seed images are put in, things like that. Being able to basically share these directly. Imagine, you know, being able to go into a room, having a wall of all these experiences that people have made and just being able to step into one and experience that.

How composable are these things? So I guess like I'm imagining that level one of this tech stack is I've got the simulation machine and it's really interesting from a cognition point of view because I think that our brain is like a simulator and the reason why we have this amazing form of intelligence is because we can simulate things without you know direct physical experience and we can share those simulations with others through language. Language is pointers to the simulations.

So that's level zero that's very exciting. The next level is we have a social component. So we can like share simulations that we've created to other people and other people can compose them and extend them and so on. And then maybe the third layer of the stack is we have virtual reality and you know headset instantiation. So it' be the killer application for virtual reality. But I guess the question is though if you think about it these simulations are constructed through a trajectory of conditioning. So it doesn't seem obvious how they could be composable and modifiable. How how do you see that working?

I mean the way that we're currently going about it, right, is that what as you as you play through an experience, right, and there's things that happen, there's events in the world, say where you have these prompts existing that show certain things happening at certain points throughout the experience. We are currently primarily experimenting with text and being able to add text to the to the world.

But the plan is in the future to also have this work for images, audio, and other kind of media so that you know people that are creative, they might have doodles and art of let's say a boss or something, right? And then be able to just inject that boss into the experience and have it spawn in and then when they share this experience with others, you know, other people can fight this boss that they've created or something like that.

When when we spoke earlier, you you were at pains to say that this is not necessarily a games engine. this is a form of interactive experience generation and you were saying actually that you're quite interested in lucid dreaming and some of those experiences in informed your your viewpoint here. Tell me about that.

Yeah, so honestly let me get specific. Let me talk about like an actual lucid dream I had for example and it'll make sense when I say that you know these are things that modern games can't really do. So one of the most cool experiences I had my whole life right was a lucid dream. I was in this like house floating in space and there was a giant like dragon circling the the house and I I I hear it detect me. It's like coming for me.

I draw a katana from my like waist and I parry the dragon's teeth as it goes try to bite me. I feel a clang reverberate through my whole body. The floorboards crack beneath my feet. The window shatter around me. Things like that. Right. And I like I I woke up and I was like, "Oh my god, that was awesome." Right?

And this is the kind of thing where, you know, dreams can give you these really amazing, fully immersive experiences, but there's no way to record them. There's no way to share them. And it's something that modern games just cannot hit cuz you cannot get to that level of immersion, right? Where everything is is the world is bending around what you're doing, right?

I think that this technology and fully developing it is the only way to get there. And this like my experience growing up with lucid dreams is kind of what's motivated me to pursue this this direction.

There's also this thing that you know like our conscious experience and dreaming it's it's subjective with a capital S which means it's ungrounded. When I tell you about my dream and I say there was a blue swirly thing you don't know what the blue swirly thing is because obviously there's no grounding physical experience between us. But here this is a way of almost like um curating a shared grounded experience which means we can effectively share some of these amazing things together.

But it also raises the question of I think that this might be different from claude code because in claude code the the cognitive wall is because some people can think in very high levels of abstraction and some cannot. But I have a theory that this is actually very grounded because we we we cate these experiences using possibly photos or videos or you know everyday descriptions of of our reality and that could actually democratize how we share these experiences.

I think it's there's an interface problem to it as well where it depends on how you present these to people and how you allow them to share it. I think that one of the most important things going forward is going to be developing the social experience in a way that people actually are encouraged to you know share their experiences and try out the experiences that let's say their friends have created or that others have created.

Do I understand correctly that you're sharing the weights for this? So is is it like a kind of semi open weights type situation?

Yeah, the small model which is what we've been demoing. a two billion parameter model and that's going to be open source. It's going to be on a hugging face tomorrow of an AM at lunchtime.

How did this whole thing come about?

Sora came out. I saw people using it for video generation and I was like, "Oh, this is really cool. It's like building these worlds, but you can't interact or step into them, but they look really good." So, I figured that diffusion was was a much better direction to get to where I wanted to go. So, I did a full pivot from LLMs into diffusion.

And, you know, we were at stability back then. I spent a few years mastering diffusion. Pretty great place to do that. And uh you know after the first world models came out I was like it it ignited a passion within me and I was like okay we should be trying to get to fusion as fast as possible and getting it into everyone's hands so they can experience it themselves.

It is I think it it will be possibly the killer application for AI. So I'm very excited about that.

But one question I did have though is it it seems like you're you're going to you know you're trying to use local GPUs and you've got this kind of like local processing thing but it seems to me like the direction of travel is platform you know platformication right eventually I think you said the KV cache on these if I wanted to take a snapshot because it's not just the prompt it's the conditioning trajectory it's actually like the accumulated weights that I need to share so it's several gigabytes it's not something that I could easily share to people over the internet. So the direction of travel is this will be running on the cloud somewhere. So why not start with the cloud?

I'm not sure I fully agree with that. I think that you can definitely transmit maybe not the KV cache but things like an image sequence or a seed sequence like that to someone's local computer to run it there. I don't really think that you are going to be that heavily bottlenecked by you know large data formats maybe for downloading the model itself but beyond that not really.

I also think it gives you a lot of privacy and like a lot more control if you can you know actually make things locally. Gives you freedom to like experiment with it too and uh and run different things.

I mean the privacy thing this might be a bit of a vex topic to talk about. I hadn't really thought about that. But in in a sense when we imagine you know like when I imagine future situations quite often we kind of have a a monologue in our heads and we say well in this situation I could have done this and I could have done that and this is actually deeply private so I can really see the the rationale for having this you know separated so it can't be shared with other people because this is like an extension of my mind. It should be very it should be very private. How do you think about that?

No I I totally agree. I mean, it's one of those things where, you know, I think if if if dreams could be recorded, right, and then everyone's dreams are recorded and shared with the entire world to see, it would be a bit dystopic. I think there is a level to which, you know, when you're when you're playing these kinds of experiences and you're exploring these worlds, there there should be a level to which that experience is your own. You know, there should be some sense of ownership over it. I think with streaming, you lose that just from a technical architecture point of view.

So can you go into a lot more technical detail about how the how the model actually worked and what sample you're using and just yeah all of the details.

The thing that this is built on is like is is having compression that can make the images much much much smaller. First of all these models are not operating in like the actual pixel space. They're not you're not just generating raw frames. We have a lot of different directions we're going with this. But the one that Waypoint 1 is launching with at least is just kind of a pretty basic image compressor. there's an autocoder that can compress 360p videos into just like a small 32x32 image.

Then from that, you know, you can compress videos into just 32x 32 pixels, right? Andrew can talk about what he actually does with those compressed videos. In a lot of senses, a combination of an LM, a causal LM and a image diffusion model.

So in an image diffusion model, you're given a set of patches and you figure out how to den noiseise them. And you condition these D noiseis patches on text usually. But in our case, you are generating a new patch, a new a new frame, I should say, a new series of patches every 60th of a second. And each of these each of these frames are ENO noise just like an image diffusion model, but they're conditioned not only on the text, but also on controller input from the last 60th of a second from the last frame and all preceding frames.

So what that what you end up with is just kind of a standard transformer LMS except instead of generating the next token you're denoising the next 256 tokens. You know it is a standard feed forward transformer. It has attention in MLP like a transformer. It conditions using cross attention between between the current activations and the text embeddings and the controller inputs are our own homebaked controller input embedding we're calling it. So that's kind of like the gist of our architecture.

Okay. So, do I understand correctly that it's a sequential architecture? So, you're processing or or you're kind of like decoding one frame on its own and then you move to the next frame or do you have some kind of cross frame processing going on?

We only generate one frame at a time. If we generated more than that, then you would kind of have a lag in input in controller inputs, right? So, if you're generating five frames at a time, eight frames at a time, then you have to wait for those eight frames we generated before it's actually responsive to the controller input. So for that reason we're kind of constrained to generating one frame at a time.

Yeah, playability is super important for us. I think that when you start doing introduce latency for the player which come from streaming if you're not doing streaming efficiently as well as from you know using a traditional autoenccoder like most of these video diffusion models they they use a temporal autoenccoder which basically means that they compress every four frames so that they become one latent frame. So their diffusion model underneath the hood doesn't actually need to generate that fast.

In fact, if you are, let's say, trying to aim for 24 fps, right? And you have a 4x temporal compression VAE like uh, you know, Juan or Hunan, those kinds of models, then your latent model only needs to generate at, let's say, 6 FPS, and that 6 fps can be upsampled to 24 fps. However, you would not be able to take user input every frame. You'd be taking user input once every fourth frame, which can add pretty bad latency.

You'll see with a lot of other world models they'll have, you know, basic WD controls and arrow keys to look around and things like that because you can't do proper high frequency controls like mouses or eventually like you know VR headsets looking around. If you are only taking controls once every fourth frame at 6 fps and if there's a delay of more than let's say 500 millonds it's a it's a no-go.

And what kind of sampler are you using? And are you doing the same amount of computation every time or do you have some kind of adaptive decoding?

So we we don't have any adaptive decoding. We're using the same four-step flow matching oiler sampler. So basically we're we're doing a a um a rectified flow model. So it's going to just generate four-step diffusion, four-step denoising, same trajectory every single sample.

And can you just explain just just for the audience who don't understand how this process works, can you know like like a you know the rectified flow model and so on. Ju can just explain how that works.

Basically in our effect or rectified flow model, you are sampling a random point of noise in space and there's a clean ground truth somewhere else in space and the rectified flow model predicts the vector that'll get you closer to that clean point, that ideal point, the point that you're trying to generate conditioned on the inputs, but it's not always right there. So you have to have multiple steps so it can kind of correct and kind of gravitate towards the ideal D noiseis frame.

So you know it's multiple transformer passes. Each of them is predicting what the vector is that'll move you from the pure noise input to the partially noised input to the clean output.

And can you tell me about some of the trade-offs here? So you know there are all of these different parameters you've set on the architecture like you know how much decoding do I do, what do I set the parameters to and so on. How did you go through that engineering process?

The one thing that has been found kind of in the literature for division distillation is that very often the thing you're actually sacrificing when you reduce step step count is more often than not diversity as opposed to actual quality. The here's the interesting thing for image diffusion. This is a pretty big deal. If people want very diverse and interesting images for every prompt they give they'll often use higher step counts or they won't use distilled models at all because it lets you retain that diversity.

However, for auto regressive diffusion models, it's kind of like this cross hybrid that we're using between an auto regressive transformer and a diffusion model, it doesn't really matter that much because your your conditioning is more than just the prompt. Your conditioning is the previous frames. Your conditioning is the control inputs. Your conditioning is the text. All these things put together. So, the loss of diversity that these dissolation methods give you doesn't really matter anymore.

What we do notice is that it seems to be the case that you can drop the step count to four during distillation without really losing any quality. It's only when you start to go to three, two, or one where you say big sun jumps in it. I'm of the opinion that even though people say in the literature that one works, it's a bit of a strange thing because if you're doing one step diffusion, it's not really diffusion anymore, right? It's at that point basically a GAN because you're going directly from noise to an image. It's it's a bit weird.

But four seems to be the sweet spot where you're not really losing any quality. There is of course the trade-off for speed, right? four diffusion steps, four forward passes, five actually because of how the way that our setup works. That's a lot of added latency, right? So even if your model was running at let's say 100 FPS, if you do five steps, it's going to be 20, right? So going to drop down to 20 FPS. So there's a bit of a balancing act here.

We think that at two diffusion steps, you can still get a pretty good level of quality in a lot of cases, especially for these bigger models, without needing to sacrifice too much speed.

So, so you you said that in traditional models more step count means more diversity. Yes. And what's the what's the in like why why is that the case when you're when you're doing these long trajectories, right?

The paths are very very chaotic. Even though rectified flow is technically trained to do a straight line from input to output. That's not really how it works in practice. In most cases, the path that you take is going to be very curvy and wiggly and the place that you end up is very chaotic depending on where you started.

I've actually done some experiments with optimized versions of these models where I can play around with the input starting noise in a 2D space and I find that as you know if you move your cursor around in the space and you give it different starting noise the image you end up with is very different. It's kind of honestly an application of of of chaos in the area sense like the starting conditions really have heavy effects on where you end up in the end right but when you do a lot of these distillation methods they are analogous to kind of forcing the line to be straight.

So, like I said, it's normally wiggly, right? But when you try to do it in in like one or two steps, you're basically forcing it to be straight. A straight line often ends up at the same place or at least very close. So, you do sometimes lose that diversity when you do low step sampling, especially for just raw text image with nothing else in the mix.

The safest place for a rectified flow model to point to is the middle of the data distribution. And once you're in the middle of the data distribution, you if you're in a two-step model, you point to whatever is safe to point to basically, if that makes sense. So you're basically often getting very close to mean the the mean of the samples, the mean of the model distribution.

And funny enough, if you if you don't do any proper diffusion distillation methods and you try to do a one-step or twostep generation with a rectified flow model, so without any dissolation, you'll actually see that it just it just mode collapses. It it mode collapses intentionally. you'll see it generate something blurry that looks like the meat of your data set.

Yeah, I wanted to talk about that. So, so many folks at home, they would have played with stable diffusion models and they probably had the experience of, you know, you have all of these things that you can play with, right? So, you know, am I going to use LCM? Am I going to use caris? Am I going to use oiler? You know, like that there's this config parameter, you know, which is about like how much should it pay attention to the prompt and so on. And I I I guess intuitively people find that there's a goldilocks zone for these parameters where you can't really move too much because otherwise you get mode collapse and the image diverges.

So, do do you envision a world where you're you're just setting these parameters and and they're the same for everyone or or would they reasonably depend on the type of hardware people are using and the type of images they're generating? Would you ever foresee the users themselves changing these parameters or do you think they should just be fixed?

When you're doing distillation, you do lose control over a lot of these things. Just as a basic example of that, right? uh two axes that I think are probably the biggest things that people use for controlling these things is CFG scale and the actualuler that you use during uh during inference right but here's the problem when you do DMD or you do most of these methods of of uh diffusion distillation you lose control over the schedule for for DMD for example in our setup at least we we pick specific noise levels that the student is then trained on so for example the student might be trained on only let's say this is what it's actually just an example Let's say the student was only trained on like 1.0, 0.75, 0.5, 0.25 for four step.

Then if you were to feed it like a sample, let's say with 0.8 noise, it just wouldn't be used to that, right? You can't just give it any schedule you want during inference because during distillation, you fix the schedule. Uh you also fix a guidance scale because it's distilled, you know, to do uh classifier free guidance built into it.

Otherwise, you'd need to do a forward pass for each conditioning vector, each conditioning signal and an unconditioned forward pass, which is really expensive. So, not only would be be performing all our denoising steps, so we'd be having to do them four times. So, you know, basically, you bake that in and say there's only one value for the conditioning signal. You don't need to do unguided and guided. You hardcode the CFG during distillation. And then you end up with a model that is, you know, might need to be post-trained for a different setting if you want a different setting, but it's four times as fast.

Do you think that it is bottlenecked on hardware or algorithms? I mean, you guys know this far better than I do because you're studying the state-of-the-art of all of these different algorithms. Do you think there could be some kind of breakthrough that could just unlock this and just really make it tractable on consumer hardware?

It's there there's a there was like a point with LLMs where it broke past this threshold where it stopped being mostly research and started being mostly engineering. I don't think we're at that point for role models. I don't even think diffusion itself is at that point. And I think that there is still a lot of ground to cover. There's new papers coming out all the time about ways to speed up diffusion training, ways to make smaller models more efficient, ways to do let's say distillation.

There are so many axes of improving these models and making them smaller that it would be nonsense to make assumptions like oh this will never run locally. Oh, this will never run on a phone. Oh, this will never run on. It doesn't it doesn't make sense because every every other week it feels like someone comes out with a paper that finds a way to make diffusion 100 times faster.

In the short term, we have the capability to reduce the step count while losing fid without reducing fidelity. That and we also have the opportunity to quantize the model with only a very marginal decrease in fidelity. So those are our two targets to immediately improve improve the speed 3x.

And there's also a variety of strategies that allow us to scale up the model without losing a substantial throughput and there's also a variety of strategies that are basically targeting improving model quality for the size the model is. So there's it is a very dynamic research space and your current architecture at the moment is it computebound or memory bound and what is the like GPU utilization like what are the areas of of improvement?

Yeah, it's absolutely computebound. It's not memory bound. And that's because you are processing 256 tokens at a time. So when you're batching, you know, LLMify this trick where you increase batch size up to the point where you're not memory bandwidth bound. LMS are memory bandwidth bound, but the fact that our effective batch size is so large makes us computebound. So you're utilizing all the flops in your GPU.

And to mitigate this, you know, the things we're looking towards are things like mixture of experts, which allow you to select an expert and have some sparity and move a little bit of the lot, a little bit of the intelligence maybe we can say towards the memory bandwidth. So basically the idea is to make it so that you can get the best throughput possible while also getting the benefits of a a more capable model.

This is absolutely brilliant. I'm I'm really excited. genuinely very very excited about this technology. So great job. I wish you the very best of luck with it and yeah, thank you very much for coming on MLSD.

Thanks Tim.

Others You May Like