AI Engineer
December 19, 2025

Leadership in AI Assisted Engineering – Justin Reock, DX (acq. Atlassian)

AI assistance in engineering promises a productivity surge, but the reality is more complex. Justin Reock, from DX (acquired by Atlassian), cuts through the hype, revealing that while engineers feel more productive, the data shows a highly variable, often negative, impact. The path to real gains requires strategic integration and precise measurement.

The Productivity Paradox: Feeling vs. Fact

  • "Every engineer that took part in this study felt more productive, but then the data actually bore out that they were less productive. Kind of interesting, right? We've got this induced flow that makes us feel really good about what we're doing."
  • The Illusion of Flow: Engineers report feeling more productive with AI tools, yet studies sometimes show a decrease in actual output. This "induced flow" can mask inefficiencies.
  • Averages Deceive: Industry-wide productivity averages for AI adoption (e.g., a modest 2.6% increase in change confidence) hide extreme company-level volatility. Some organizations see 20% gains, others 20% declines.
  • Mandates Fail: Top-down mandates for AI adoption without proper education or enablement lead to superficial compliance, not genuine productivity improvements.

Beyond Code: Integrating AI Where It Matters

  • "For most organizations, writing code has never been the bottleneck, right? We can increase productivity a bit by helping with code completion, but our biggest bottlenecks are elsewhere within the SDLC. There's a lot more to creating software than just writing code."
  • Target Bottlenecks: Code generation is rarely the primary constraint. AI's highest impact comes from addressing bottlenecks across the entire Software Development Life Cycle (SDLC), like legacy code modernization (Morgan Stanley saved 300,000 hours annually) or accelerating new engineer onboarding (Zapier reduced effectiveness time to two weeks).
  • Unblock Usage: Data exfiltration concerns should not halt experimentation. Infrastructure like Bedrock or Fireworks AI enables running powerful models in secure, private environments. Think of it as a secure intranet for AI models, allowing controlled access to powerful tools.
  • Augmentation, Not Replacement: AI augments engineers, it does not replace them. Leaders must transparently communicate this to foster psychological safety. AI proficiency is a new skill; organizations must provide time and resources for learning.

Measure Outcomes, Build Trust

  • "Our AI metrics like utilization and things are telling us what's happening with the tech, but these core metrics that we've been able to trust are telling us whether these initiatives are actually working, right? Are we actually moving the needle and having the outcomes that we want to see?"
  • Outcome-Driven Metrics: Move beyond simple AI utilization. Focus on core developer experience and productivity metrics: speed, quality, change failure rate, maintainability, and change confidence.
  • Psychological Safety First: Google's Project Aristotle identified psychological safety as the strongest predictor of high-performing teams. This foundation is essential for successful AI adoption.
  • Feedback Loops for AI: Establish gatekeepers and continuous feedback loops for AI system prompts (e.g., "cursor rules") to refine model behavior, ensure compliance, and build trust.
  • Control Creativity: Understand "temperature" settings in AI models. A low temperature yields deterministic output (like a precise recipe), while a higher setting increases creativity (like a chef experimenting with ingredients). Match the setting to the task.

Key Takeaways:

  • Strategic Shift: Successful AI integration means identifying and solving your organization's specific SDLC bottlenecks, not just boosting code completion.
  • Builder/Investor Note: Prioritize psychological safety and invest in AI skill development. For builders, this means dedicated learning time; for investors, look for companies that do this well.
  • The "So What?": The next 6-12 months will separate organizations that merely adopt AI from those that master its strategic application and measurement, driving real competitive advantage.

Podcast Link: https://www.youtube.com/watch?v=PmZDupPM3UM

This episode exposes the stark variability in generative AI's impact on engineering productivity, revealing a critical need for strategic integration, psychological safety, and precise measurement beyond mere adoption rates.

The Generative AI Productivity Paradox

  • Initial assessments of generative AI's impact on engineering productivity present a contradictory picture. Google reports a 10% productivity increase, while a "MER study" indicated a 19% decrease, despite engineers feeling more productive. This "induced flow" masks underlying inefficiencies.
  • DORA (DevOps Research and Assessment) research shows modest average gains: 7.5% in documentation quality and 3.4% in code quality with a 25% AI adoption increase.
  • DX's aggregate data mirrors these averages, showing a 2.6% increase in change confidence and a 1% reduction in change failure rate.
  • Company-level data reveals extreme volatility: some organizations achieve 20% increases in change confidence, while others experience 20% decreases.
  • This variability extends to code maintainability and change failure rates, with some companies shipping 50% more defects.

"Every engineer that took part in this study felt more productive, but then the data actually bore out that they were less productive." – Justin Reock

Drivers of AI Impact and Adoption Challenges

  • Effective AI integration requires more than top-down mandates; it demands education, enablement, and clear measurement. Organizations often fail by simply deploying technology without guidance or impact assessment.
  • Top-down mandates for 100% AI adoption prove ineffective, failing to move the needle on actual productivity.
  • Lack of education and enablement negatively impacts adoption, as organizations expect engineers to intuitively grasp best practices.
  • Difficulty measuring impact, or even identifying relevant metrics beyond simple utilization, hinders successful integration.
  • DORA research highlights clear AI policies and dedicated learning time as key factors for positive impact.

"Top down mandates are not working. Driving towards, oh, we must have 100% adoption of AI. Great, I will update my read my file every morning and I will be compliant." – Justin Reock

Strategic Integration and Fear Reduction

  • AI integration must span the entire Software Development Life Cycle (SDLC), not just code writing, addressing actual bottlenecks. Leaders must proactively reduce engineer fear by emphasizing augmentation over replacement and fostering psychological safety.
  • Integrating AI across the SDLC is crucial; code writing is rarely the primary bottleneck.
  • Unblocking usage requires creative solutions, such as leveraging secure infrastructure like Bedrock and Fireworks.ai for powerful models in safe spaces.
  • Reducing fear involves transparent communication that AI augments, rather than replaces, engineers.
  • Google's Project Aristotle demonstrated psychological safety as the biggest indicator of team productivity, a principle directly applicable to AI adoption.
  • SWE-bench (Software Engineering Benchmark) data shows AI agents complete only one-third of tasks without human intervention, reinforcing their role as augmentative tools.

"AI is not coming for your job, but somebody really good at AI might take your job." – Justin Reock

Effective AI Measurement and Compliance

  • Measuring AI impact demands a focus on foundational developer experience (DevEx) metrics like speed and quality, rather than just utilization. Establishing robust compliance and trust mechanisms is equally vital.
  • Key metrics focus on speed (Pull Request throughput, velocity) and quality (change failure rate, change confidence, maintainability).
  • Three types of metrics provide a comprehensive view: telemetry (API data), experience sampling (e.g., PR form fields), and effective, high-participation surveys.
  • W. Edwards Deming's principle states 90-95% of organizational productivity is system-determined, not worker-determined, underscoring the importance of DevEx metrics.
  • The DX AI Measurement Framework normalizes metrics across utilization, impact, and cost, forming a maturity curve for AI adoption.
  • Compliance requires feedback loops for system prompts (rules controlling model behavior) and understanding temperature (a setting controlling model determinism/creativity, between 0 and 1).

"Our AI metrics like utilization and things are telling us what's happening with the tech, but these core metrics that we've been able to trust are telling us whether these initiatives are actually working." – Justin Reock

Unblocking Usage and Identifying Bottlenecks

  • To maximize AI's value, organizations must unblock usage through self-hosted models and early compliance partnerships, then strategically apply AI to address specific bottlenecks within the SDLC.
  • Self-hosted and private models facilitate secure experimentation and usage.
  • Partnering with compliance from day one clarifies permissible AI uses, often revealing more flexibility than initially assumed.
  • Applying Eli Goldratt's Theory of Constraints, AI's value is realized only when it addresses the actual bottleneck in the workflow.
  • Morgan Stanley uses Dev Gen AI to modernize legacy COBOL, mainframe Natural, and Perl code, saving 300,000 hours annually by generating modernization specs.
  • Zapier employs AI agents for onboarding, reducing engineer effectiveness time to two weeks (from months), leading to increased hiring.
  • Spotify assists Site Reliability Engineers (SREs) by using AI to aggregate incident context and runbook steps, significantly reducing Mean Time To Resolution (MTTR).

"An hour saved on something that isn't the bottleneck is worthless." – Justin Reock

Investor & Researcher Alpha

  • Capital Reallocation: Investment should shift from broad AI adoption mandates to targeted solutions addressing specific SDLC bottlenecks (e.g., legacy code modernization, onboarding, incident response). Companies demonstrating precise AI application to known constraints will outperform.
  • Measurement Innovation: The next frontier for AI tooling is not just model performance, but robust, system-level DevEx measurement frameworks that correlate AI utilization with tangible quality and speed outcomes, moving beyond vanity metrics.
  • Human-AI Teaming Research: Research into "induced flow" and psychological safety in AI-augmented workflows is critical. Understanding how to bridge the gap between perceived and actual productivity will unlock deeper value and inform future human-AI interface design.

Strategic Conclusion

Generative AI's true value in engineering lies in its precise application to identified bottlenecks, supported by psychological safety and rigorous, outcome-focused measurement. The next step for industry leaders is to move beyond superficial adoption, integrating AI strategically to augment human capabilities and drive measurable improvements in speed and quality.

Others You May Like