Ventura Labs

January 12, 2026

Marc Graczyk: Numinous, Bittensor Subnet 6, AI Forecasting Agents, Polymarket Predictions | Ep. 78

By Ventura Labs

Date: [Insert Date Here]

Quick Insight: Marc Graczyk explains how Numinous uses Bittensor to build a decentralized oracle of truth. This summary details the move from closed-source predictions to open-source agent competitions that outpace traditional markets.

💡 Why do LLMs beat time-series models in complex geopolitical forecasting?
💡 How does open-sourcing miner code accelerate the evolutionary process of model improvement?
💡 What is the roadmap for autonomous trading agents that execute on Polymarket odds?

Marc Graczyk, a Cambridge pure maths alum, is building Numinous to solve the gaming problem in decentralized forecasting. By forcing miners into a constrained, open-source environment, he is creating a self-improving engine for superhuman predictions.

Top 3 Ideas

🏗️ Reasoning Over Regression

"An LLM will be able to look at the latest news and actually forecast these questions."

Contextual Intelligence: Pure time-series models miss qualitative jumps like troop movements. LLMs synthesize unstructured data into actionable probabilities.
Unified Models: Numinous favors one large model over specialized routing. This forces the AI to develop generalized reasoning that applies across diverse domains.
Information Reaction: Markets move when new data hits. LLMs mimic this by updating forecasts the moment a news API triggers a change.

🏗️ The Genetic Algorithm

"Open sourcing the code just drives innovation faster."

Code Transparency: Closed-source forecasting is a black box that invites gaming. Open-sourcing the logic allows the protocol to verify how a prediction was reached.
Survival of Fittest: Winner-take-all rewards create an evolutionary pressure. Miners must constantly fork and improve the best-performing architectures to stay relevant.

🏗️ The Oracle of Truth

"If you truly build the super forecaster, you will get explainability."

Reverse Engineering: High-quality forecasts allow for an autopsy of data inputs. This reveals exactly which news source or data point moved the needle.
Execution Layer: Numinous is moving toward autonomous trading. Accurate forecasting is the foundation for a decentralized fund that beats Polymarket liquidity.

Actionable Takeaways

🌐 The Macro Trend: Verifiable intelligence is replacing black-box predictions. As AI agents become the primary participants in prediction markets, the value moves from the prediction itself to the verifiable logic behind it.
⚡ The Tactical Edge: Integrate real-time news APIs like Darch to give agents a qualitative edge over pure quant models.
🎯 The Bottom Line: Forecasting is the ultimate utility for LLMs. If Numinous succeeds, Bittensor becomes the world's most accurate, explainable source of truth for investors and researchers.

Podcast Link: Click here to listen

Open sourcing the forecasting code is very exciting because you don't normally see open-source forecasting. Usually, it's closed source, and you only see the prediction.

It's a way for us to better understand what exactly the forecasters are doing. Ultimately, we want to be a forecasting subnet, and the tricky problem has been to actually build these general forecasters and ensure that miners actually build them without gaming it.

That's why we decided to build these environments where we give miners a tool and put constraints, but at the same time, we want to make them as rich as possible for them to create in these environments. So, as long as they're able to build really strong casting LLMs in these environments, we're going to be able to create prediction market Agents.

I'm very confident about that.

An LLM should mimic a market in its price, in the sense that it's able to react fast to new information. The key property of a market is that it will react very fast. If I send a big order, the market will react, and similarly, an LLM should do the same whenever there is impactful information.

If you have an LLM that is able to do that, then building these layers is doable.

Welcome to the Ventura Labs podcast. Today we're joined by Mark, the founder of Numinous Subnet 6. Mark shares his journey from pure maths at Cambridge to building on Bittensor, the pivot to open-source agent competitions for superhuman forecasting, favoring unified models over specialized routing, LLMs for complex predictions beyond time series, the road map to an oracle of truth with explanability, benchmarking against Poly Market with trading agents, and key tools like shoots and darch and a winner take all design.

Welcome Mark of Subnet 6 Numerous.

Hey man, how you doing today?

Great to meet you, man. Great, great, great to chat.

Yeah, absolutely. And you were one of the cognitive athletes at this hacker house here in London. How did you get involved with joining a hacker house? What made you want to join a hacker house?

I think that things could just go much faster. As we were mentioning it before, I met my current co-founder here. You're constantly in this environment where you're both optimizing work and optimizing self.

I get a lot of energy from people, so that's part of it, and it's a way also for me to be in the community in London. It's an amazing house.

You made the move to London for university. You graduated from Cambridge that is considered a pretty elite school by most standards. How was your experience there? What did you study?

Pure maths. It was a very transformative year for me. I left Paris and moved here without really knowing anyone.

I was studying pure maths, which is really what I love. I just always liked maths a lot, but also when I learned more about crypto and tech, I first thought of going into research. So I had started a pure maths PhD.

I couldn't wait and went into crypto after a year. After a couple of different experiences, I learned about Bit Tensor and got into Bit Tensor, and now I'm building Numinous.

Numinous. You guys have made a pretty big pivot here in the last few months and that is switching to this open-source agent competition style. What was the decision- making behind this?

Yeah, absolutely. We fully changed the paradigm. I think that I fully changed the mechanics behind the paradigm. The core paradigm of using LLMs to forecast the future and using them on this very wide event landscape hasn't changed, but the way we do it fully changes.

The reason why is by open sourcing the forecast. Open sourcing the forecasting code is very exciting because you don't normally see open source forecasting. Usually, it's behind closed source, and you will only see the prediction.

It's a way for us to better understand what exactly the forecasters are doing, but most of all, to intrinsically force the forecasters to optimize around a sort of central model.

We don't want forecasters to have smaller models to which they will route different types of questions. We want them to have one unified model that is able to be good across the different questions we send, and we can enforce that by open sourcing.

This is more the forecasting reason. Open sourcing the code just drives innovation faster. There's this efficient frontier of the best code, and miners can continually pick up from there.

You said you prefer one big model instead of routing to smaller models for predictions. Why this design choice?

Because it's part of our vision that we want to build models that aren't just narrow predictors, that aren't going to, for crypto questions, only need certain pieces of data or time series that correlate very closely to your crypto price.

We want them to be able to process across different domains, and we want the miners to learn how to do that. There's also the idea behind LLMs because LLMs are exactly able to reason across different domains.

The ability to do this is actually the edge because then most questions, the most important questions you want to forecast, won't have this sort of narrow data set you can optimize around. Think about macroeconomic questions or geopolitical questions.

Yeah, that sort of touches on the answer for the next question I was going to ask you here.

You brought up time series, and you guys do exclusively Poly Market prediction markets right now. Why not do time series predictions? Why the LLM approach?

If you do just time series, you could think of models that would just predict the time series of Poly Market, but the problem is, how do you find relevant features for Poly Market questions?

If Poly Market has markets that are like bitcoin markets, then you need synth, for example, to perform on these markets, and these are indeed traditional finance markets, and then it makes sense.

When you look at the Venezuela market, for example, like Maduro, sure, you can build a time series forecasting model. You will learn something; you will definitely learn some properties about the price process you're studying.

You will not know, for example, the information of where the American planes are. You have all these maps in the sky of where the American planes are, which are very useful. Sometimes these planes turn off their flight tracker, sometimes they turn it on. Knowing where they are is probably very useful information for making your forecast, but you can't really encode it into a time series.

An LLM will be able to look at that, look at the latest news that came out, and actually forecast these questions. Having a time series model is still useful. You definitely should ideally use them as well.

To come back to my earlier point of this unified node, this unified forecasting model, you want this unified forecasting model that then decides, okay, I'm going to use a time series model for the short-term trend. I'm going to also process this satellite data to understand where the base is. I'm going to look at the flight paths.

It's really the only way to forecast these really complex questions. When you see a jump in a time series on Poly Market, it usually means that the information landscape changed, and it's very hard to forecast these jumps.

That's really where the LLM is the right way to think about it.

What is the end goal of this? Is it just to build this Oracle of truth agent, or is it to build this agent that can explain why things happened?

I think you want both. The goal is probably, if you want a choice, you want the oracle of truth. We're really driven partly by R&D for sure. We want to build super forecasters. You want to build the best possible truth oracle.

I would answer this on an R&D level. You have explainability, which is the reasoning, and you have the actual probability. What is the degree of your confidence in your forecast, in this outcome happening?

Explainability can be hard. It's important to align them. The best way to align them is for sources. The clearest way to align it is through sources. The forecast will be made, and some sources will be very important sources that will underpin your forecast, and by weighting such a source more, you later explain your forecast.

In the limit, you want both because if you build this forecasting world model where you really build the map behind the question, and you don't only understand one question but all the related questions, then you do explain as well because you're able to understand what are the factors that will affect this question, this event.

If you truly build the super forecaster, you will get explanability and you will get the truth oracle.

Well, essentially you're saying the better we are at forecasting, the better we would be at explaining because you could reverse engineer the explanation doing like an autopsy essentially of what input caused the biggest change in the prediction probability.

It's also a great point for why we decided to open source the protocol because this way, now we can see inside the machine, and we can get this explanability. We can reverse engineer, we can see which type of information moved the forecast more and simulate.

I do think that explanability can be hard because there has been a mismatch sometimes. You ask them to explain their thinking process, and the explanation will be very different from what it will output.

How are you guys judging performance here like against Poly Market like just the the straight odds? Are you outperforming the odds for a day out a week out? Is that the right way to measure this or is there some other measurement you guys use?

Poly Market is the most obvious benchmark. There are many more ways to measure how good our forecasters are, but the ability to beat the odds is the most natural way to measure our forecasters.

That's what we're working on now. We haven't yet released our benchmarking and our trading agents, but we're working on the execution layer and also working on perhaps directing more the forecasters on certain questions, certain Poly Market questions.

We definitely see our forecasters beat the odd quite constantly.

It's like how are you comparing to Poly Market and are you doing like one day out, one week out like performance differences? because I'd assume it's essentially impossible to beat that closing line of Poly Market because that's the market literally settling.

Poly Market varies a lot depending on the liquidity on the type of markets, like sports markets, whether you're in match or before the match. If a market has a lot of liquidity, especially close to settlement, it will be hard to beat. As soon as there's a jump, it will be hard.

We're going to release a benchmark comparing the performance of our forecasters with a reference forecaster both on Poly Market. We're going to release a simulated P&L performance of our forecasters on Poly Market, and we're also already working on autonomous trading agents.

Building out this execution layer on top of the forecasters, and these will actually trade on Poly Market, and that's the real benchmark. If you really beat the market, that's also one of our goals because we're in the age of prediction markets.

It's true that there are many more parameters that go into trading. There's this execution layer which is this additional layer which goes on top of the forecasting. You can have a forecaster which gives you a calibrated forecast, but then you still have a rich set of parameters before you actually trade.

This needs to be built out, and we're also thinking of integrating this within the subnet as well, so that the miners perhaps also choose what parameters they think are best.

You're hinting at the trading execution complexity here of building that risk management system is the hard part.

If you have an LLM which is able to create a calibrated forecast and really mimic a market in its price, in the sense that it's able to react fast to new information, the key property of a market is that it will react very fast. If I send a big order, the market will react, and similarly, an LLM should do the same whenever there is impactful information.

If you have an LLM that is able to do that, then building the extreme layer is doable. It's not something impossible to do, but there's a lot of refinement or subtleties there. The price gives you a baseline, but then the way you move around this baseline is exactly the work of this execution layer.

It's something we're working on actively right now, and we do want to build a Poly Market fund as well.

I almost have a obligation I feel to ask this question since prediction markets have been probably the most popular subnet design attempts and what makes this a better or feasible prediction market subnet why does your design work for predict prediction markets?

Ultimately, we don't want to be a forecasting subnet, and the tricky problem has been to actually build these general forecasters and ensure that miners actually build them without them gaming it.

That's why we decided to build these environments where we give miners a tool and we put constraints, but at the same time, we want to make them as rich as possible for them to create in these environments. So, as long as they're able to build really strong casting LMS in these environments, we're going to be able to create prediction market Agents.

I'm very confident about that. Now it's even less about the event landscape. You don't even need to think, okay, I need a varied, a very diverse event landscape in order to incentivize generalized predictors because I enforce already all of that through my environment.

I could actually only send price predictions, for example. The only challenge is just making these environments rich enough. When we just launched the subnet, they were quite limited. They are still maybe not exactly where we want them to be.

When you look at a benchmark like forecast bench, for example, where you have LLMs that don't have access to news, but on a statistically proper benchmark, they near the performance of human super forecasters without access to news, then miners will do that, as long as they have the same tools.

The tools available in this environment you've created is shoots for the LLMs and then D search for access to X APIs and I think web search they also offer are those the only toolings currently available what is adding toolings look like for you guys how hard is it to add new tools?

We're just building out the pieces, and it's going to be streamlined now. The ability for us to just add these building blocks will be very streamlined, but we just have to think which building block we should choose.

Right now it's indeed shoot, which already offers a lot of models, and some of them are very strong, and it doesn't give access to Gemini or the closed source models. Darch, we're very grateful for Darch partnering with us because it really allowed us to bootstrap the subnet with a strong news provider.

With news, you really want to be comprehensive. We really want to give miners as large a spectrum of new searches as possible. Miners can only tap into the search where they can use a web search, their AI endpoints, they can use X also. X I think is really, really important.

In the future, we will be adding more, and the first thing we'll be adding in the coming days is Gemini, Gemini new search perplexity. Then it will be about thinking about the created sources, and this is going to be part of the work. The very first step of the road map, which is building out these environments, we'll be thinking about the curated, the high value data sources that correspond to certain markets.

One thing we could do would be even to run these environments live during sport matches, and then you want to make sure that miners can know when a goal is made. Then you want to give them APIs in the environment that can give them whether tell them whether a goal was just made or not. That's an example.

Have you seen any designs for this agent yet that have surprised you or what have you been most impressed by from the agent designs?

I'm yet to see something which has really surprised me. I think we're still early. I definitely saw a lot of different architectures. One key idea is this idea of genetic algorithm, and that's of course what Ridges is doing. Any subnet which will open source to code and strongly prefer certain nodes, certain miners to other, will effectively be doing sort of like genetic algorithm which explores continually the best models.

If you do open source and where they call, you're doing it. It's especially interesting in the case of forecasting because when you do forecasting and you evaluate a minor against this quote unquote evolutionary process that is the market, that is Poly Market, it's a very natural environment to evaluate and then have these agents learn from their mistakes.

The subnet should be constantly exploring new architectures and then building on top of them and pruning the architectures that have low value. I have seen miners already building on top of architectures which kind of stay as a baseline.

As we build out more and more these environments, I give more bandwidth, we will see more of these kind of biodal nodes. I think the reason why I don't have one model to tell you of on top of my head like really struck with me is because ultimately at the moment miners will be aggregating LLMs and finding different ways to best calibrate their forecast.

One very cool thing for sure is that we're starting to see miners basically really optimize for event types which we really want to do.

In terms of tooling, what would you say is the most important tool or most utilized tool that you offer miners?

They're all using the search. These are really, they each have limited choice. It's crucial for a minor to be able to tap into an LLM, and it's crucial for a minor to get outside information up to date information on a new question.

They will all use both research and shoot. When we will give them more tools to use, there will be maybe some more interesting properties.

In terms of this winner take all design why do you opt to do the winner take all? Is it possible you guys switch that in the future to try to reward more of a a breadth of diversity in terms of these architectures for agents?

I think it's a possibility. I definitely see us changing the way we actually choose the top minor. We're going to be doing that. We very much want to measure the top minor and by comparing it to other miners at the same time more directly.

You look at how the top minor is marginally improving on top of the other minor. In forecasting, you also could argue that naturally you will use many forecasters. If I am a user of a forecasting product, I would rather want to use an ensemble of forecasting agents than just one forecasting agent.

From this perspective, it makes sense to really reward many Mars at the same time, but it is very efficient. So for now we'll probably stay with it and we'll see later.

All right, Mark. Well, I got one last question for you here today and that is being a subnet builder in probably the most competitive sector of Bit Tensor being prediction markets. We've seen easily doubledigit subnet attempts at prediction market subnetss. What advice do you have for people in differentiating themselves from their competitors within Bit Tensor when they see those subnetss that are in that same sector? How do they differentiate themselves and take a different approach?

I'm tempted to just answer this on a general level, which probably applies to tensor anyway. It's always the same general answer. Ultimately, you have your vision. You have your idea, and I'm not really even thinking about other subnets that are building on prediction markets. I'm just only thinking about our problem and our steps to build this forecasting world model.

Only you can build out your, if you have a true vision, a true sincere vision for your tech, only you can bull it out and it's automatically listening to almost like your heart what you think.

If what you're trying to do doesn't make sense, the market will tell you.

All right. Well, Mark, it was a pleasure having you on the podcast, man.

Thank you so much, man. Thank you. Thank you so much. Pleasure. See you soon.

Marc Graczyk: Numinous, Bittensor Subnet 6, AI Forecasting Agents, Polymarket Predictions | Ep. 78