This episode delves into Celium's rapid six-month journey building a permissionless GPU-as-a-Service platform on Bittensor's Subnet 51, detailing how they leverage crypto-economic incentives to challenge established players like Vast and RunPod while navigating significant technical and security hurdles.
Celium's Launch and the GPU Market Context
- Celium's representative marks the six-month anniversary of launching on Bittensor, framing the discussion around their "permissionless GPU as a service" model via Subnet 51.
- The speaker highlights the explosive growth of the GPU market, citing multi-billion dollar valuations and projected 10x growth within eight years, driven by massive investments from AI giants like Meta and xAI.
- The core problem addressed is the prohibitive cost and complexity for smaller players to own GPU infrastructure, noting that variable costs alone can equal the 5-year rental cost of high-end GPUs like H100s. Celium positions itself as the solution for those without billions to spend, offering easy GPU rentals.
Origins: From Bittensor Mining to Celium
- The speaker shares their background as a large-scale miner on Bittensor, particularly during its early days focused on LLM (Large Language Model) inferencing. LLMs are AI models trained on vast text data to understand and generate human-like language.
- This experience involved managing GPUs across multiple providers, dealing with varied terms, and negotiating rates, revealing inefficiencies in the traditional GPU rental market.
- Recognizing the synergy between Bittensor's incentive structure and the demand for GPUs, the speaker saw an opportunity: "I figured that it would be the perfect place to launch a new cloud computer... incentivizing only the best GPUs... into a comprehensive platform that provided a valuable service by utilizing the valuable bits miners."
Overcoming Miner Exploits and Security Challenges
- The journey wasn't smooth; the speaker candidly discusses the "brutal" nature of miners attempting to exploit the system for illegitimate rewards (Subnet 51 tokens).
- Initial GPU verification methods, like checking Nvidia libraries, were circumvented by miners faking their hardware (e.g., reporting H100s when using 4090s).
- Miners also created multiple mining containers on single GPUs or proxied tasks to more powerful machines off-platform.
Innovations in GPU Verification and Incentive Mechanisms
- Celium developed robust verification by measuring actual GPU performance (speed and capacity) via matrix multiplication tests, similar to Subnet 64, making hardware spoofing ineffective. Flops (Floating Point Operations Per Second) are a measure of computer performance.
- To combat multi-container exploits, they implemented checks on Nvidia UUIDs (Universally Unique Identifiers) and monitor GPU utilization outside the designated rental container.
- Proxying was addressed using SSH (Secure Shell) interactive shells, preventing miners from offloading computation tasks.
- The speaker emphasizes the continuous improvement cycle, moving from initial instability to a more robust platform capable of directing miner behavior effectively.
Enhancing the User Experience: Frontend and Features
- Miner exploits initially impacted front-end performance for renters. Celium addressed this by refining incentives and platform features.
- A key innovation is dynamically adjusting GPU incentives based on rental demand for specific GPU types. When 4090 supply exceeded demand while H100s/B200s were fully rented, they increased incentives for the latter and decreased them for the former.
- This dynamic system leverages Bittensor's rapid miner response: "if we make one specific GPU like plus 50% incentive within hours we can see more of that GPU type come online."
- A major breakthrough announced is Docker-in-Docker compatibility (using Sysbox for security, avoiding privileged mode), allowing users to run containerized applications like Subtensor within their rented Celium instance, unlike competitors. Docker enables packaging applications and dependencies into portable containers.
- Other quality-of-life improvements include remote pod reboots, UI-based SSH key management, and documented, verified custom template creation.
Celium's Growth and Current Performance Metrics
- The platform now supports programmatic interaction via an API, with a community-developed CLI (Command Line Interface) available and official tooling in development (like a Kubernetes-style configuration manager). A CLI allows users to interact with a system using text commands.
- Celium reports approximately $7,000 per day in rental revenue and around 500 unique users, achieved within six months.
- The speaker expresses confidence in continued rapid growth due to ongoing stability improvements and feature additions.
Competitive Advantage: Permissionless Onboarding and Cost Efficiency
- Celium's primary competitive edge is its automated, permissionless onboarding for GPU providers: no KYC (Know Your Customer), no contracts, no vendor lock-in.
- This contrasts sharply with competitors requiring lengthy agreements, sometimes locking GPUs for a year or more.
- This accessibility broadens the potential supply pool, including providers in jurisdictions potentially excluded by stricter platforms.
- The result for renters is greater GPU diversity and significantly lower costs, claimed to be "on average about half the price of other GPU providers" while aiming for comparable quality.
Comparing Celium to Vast and RunPod
- Celium highlights lower fees compared to competitors like RunPod (which takes 24%). Celium also offers baseline incentives for available (not just rented) GPUs, ensuring supply.
- Rental revenue is shared back with miners via the Subnet 51 token.
- Celium claims faster access to new hardware (e.g., obtaining Nvidia B200s before Vast) due to its dynamic incentive model attracting providers with the latest GPUs.
- The speaker points out restrictive terms in competitor agreements (Vast's ability to withhold payments/charge taxes, RunPod's stringent physical security requirements) contrasting with Celium's simple, hardware-focused requirements.
Access to Cutting-Edge Hardware and Custom Deployments
- Celium offers a wide variety of GPUs and is expanding into user-friendly custom deployments like one-click LLM, image, or video generators.
- A demo showed launching an image generator template and accessing its UI via an external IP within a minute, requiring no SSH or coding.
- Users can create and share their own public or private custom templates via DockerHub, defining specific environments, packages, and ports.
Platform Features: Metrics, Payments, and Reliability
- Detailed dashboards provide real-time GPU usage and utilization metrics accessible via the front end.
- SSH key management is integrated into the UI.
- Flexible payment options include crypto and fiat (via Stripe, supporting multiple currencies), with auto-top-up functionality.
- Upcoming features (launching "tomorrow") include uptime trackers and reliability scores for individual provider GPUs, allowing renters to assess track records before renting. A public database already tracks miner rental history and success rates.
Live Demo and Technical Deep Dive
- During the Q&A, the speaker demonstrates renting an H100 GPU using a pre-made image generator template. The process involves selecting the GPU and template, deploying, and accessing the service via the provided IP address and port.
- The speaker acknowledges the need for more templates, like a Jupyter Notebook environment (a web-based interactive computing environment popular in data science), which is planned but not yet available.
Addressing Stability and Technical Hurdles
- Addressing recent instability, the speaker reiterates the iterative process of identifying and patching miner exploits related to storage, verification, and resource allocation.
- Storage verification involves randomly assigning templates of varying sizes (up to 100GB+) during automated checks. Failure to provision adequate storage results in penalties, ensuring miners have sufficient disk space.
- The proxy issue (miners using one powerful GPU behind multiple fake instances) was solved using interactive SSH shells that prevent proxying, combined with performance verification (matrix multiplication speed/size) directly on the assigned machine.
Pricing, Incentives, and Scalability
- The speaker clarifies that rental prices are market-driven, while only 20% of the subnet's token emissions are currently used for incentivizing GPU availability. This is a deliberate choice to maintain a healthy supply/demand balance and avoid oversupply of unrented GPUs.
- Rental revenue is partially returned to miners, creating a feedback loop.
- The speaker believes Celium could easily attract "thousands of H100s" if incentives were maximized, but prioritizes building robust features and services (like video rendering, custom deployments) first to ensure product quality before scaling supply aggressively.
- The long-term vision involves a sustainable cycle: improved platform -> increased rental utilization -> higher revenue share for miners -> increased GPU supply -> further platform growth.
Tokenomics: Value Accrual for Holders and Validators
- The speaker argues that value accrues to token holders not through direct capital flowback, but through access to the underlying commodity: GPUs.
- Validators on Subnet 51 gain exclusive, stake-weighted access to the GPU network. "The more stake you have the more access to GPUs you have."
- This creates intrinsic value for the token tied directly to compute access. Plans exist to extend this access beyond validators to general token holders (e.g., holding 1% of supply grants access to 1% of GPUs).
Openness, Validator Access, and Future Integrations (TEEs)
- Celium's platform is designed to be open. Validators can bypass the Celium front-end entirely and access miner GPUs directly via SSH using their validator keys. Competitors could even build on Subnet 51.
- The goal is seamless access, potentially via a simple CLI command using wallet keys.
- The speaker discusses the importance of TEEs (Trusted Execution Environments) and Confidential Compute (like Nvidia's offering) for enhancing privacy. These technologies encrypt workloads, preventing even the GPU owner from seeing the data being processed.
- Integrating TEEs is a priority, potentially incentivizing miners who offer this capability, to attract users with high privacy requirements (like validators running sensitive key material). Talks are underway with providers like Follow Cloud.
Current Use Cases and Ecosystem Integration
- Currently, Celium's user base is primarily within the Bittensor community, which has significant GPU demand for running miners on various subnets (e.g., Subnet 19 for image generation).
- Proof-of-concept work is underway to enable one-click deployment of miners (like for Subnet 19) using Celium GPUs via custom templates, simplifying the mining setup process.
Dura: The Company Behind Celium
- The speaker explains that Dura was formed to provide structure and credibility, facilitating hiring and operations.
- Dura is broadly focused on adding value to the Bittensor ecosystem through various means (tools like Towel Market Cap, subnet development, advising) based on the speaker's extensive experience.
Debate: Miner Emissions and Validator Incentives
- The speaker defends the decision to reduce miner emissions (currently at 20%), arguing it prevents rewarding non-performing or exploitative miners and aligns incentives with actual value creation. The level can be scaled based on platform maturity and revenue.
- Extending this logic, the speaker suggests validator and even subnet owner emissions could also be significantly reduced (potentially to 20% or even 0% for validators).
- The rationale is that the primary incentive for validators should be the direct value derived from accessing the commodity (GPUs), making traditional token emissions less necessary for honest participation compared to subnets without a direct commodity link.
Weight Copying Dynamics on Subnet 51
- The speaker claims weight copying (validators mimicking others' weights instead of performing independent validation) is not a significant problem on Subnet 51.
- The reasoning is that miners have no incentive to allocate their valuable GPUs to weight-copying validators, as these validators don't contribute to the honest weight-setting that determines miner rewards.
- Honest validators gain stake-weighted access to valuable hardware ($10M+ worth potentially) for minimal operational cost (a cheap CPU server), creating a strong financial incentive to validate correctly rather than copy weights. "The biggest incentive to not weight copy is financial or value value based."
Security Considerations: Validator Keys and Docker-in-Docker
- A key barrier for validators using Celium GPUs is the security risk of placing hotkeys on non-owned hardware. Confidential Compute/TEEs are presented as the solution to encrypt validator operations.
- Regarding Docker-in-Docker security, the speaker clarifies they use Sysbox, not privileged mode. Sysbox allows nested containers without granting root access to the host machine, mitigating risks for GPU providers.
Reflective and Strategic Conclusion
- Celium's journey highlights how crypto-economic incentives on Bittensor can bootstrap a decentralized compute network capable of undercutting incumbents. Investors and researchers should track Celium's progress in maintaining stability, integrating privacy features like TEEs, and scaling its user base beyond the core Bittensor community.