top of page

Measuring AI Usage and Safety in the Software Development Lifecycle


We’ve explored a range of approaches to measuring software delivery and ensuring teams build effectively in my recent posts. From well-established engineering and testing practices to modern delivery metrics. The metrics we’ll explore today, however, are far newer, emerging in response to the rapidly evolving role of AI in the software development lifecycle.


As AI becomes embedded in the software development process, from code generation to testing and documentation, engineering teams are unlocking significant productivity gains. However, alongside these benefits comes a new responsibility: ensuring that AI is used safely, effectively, and responsibly.


Just as we measure flow, quality, and reliability in traditional delivery, we now need a new category of metrics that focuses on AI usage, impact, and risk. Without these metrics, organisations risk introducing hidden defects, security vulnerabilities, compliance issues, and unintended behaviours into their systems.


This is not just about tracking adoption, it’s about ensuring that AI enhances engineering outcomes without compromising trust.


Why AI Usage & Safety Metrics Matter


Without visibility into how AI is used in the development process:

  • AI-generated code may introduce undetected defects or vulnerabilities

  • Teams may become over-reliant on AI without proper validation

  • Sensitive data could be exposed through prompts or outputs

  • Biases or hallucinations may propagate into production systems

  • Compliance and auditability become increasingly difficult


AI introduces a new layer of non-determinism and opacity into the development lifecycle. Metrics help bring structure, transparency, and control to this complexity, ensuring that AI remains a force multiplier, not a risk amplifier.


Core AI Usage Metrics


The first set of metrics we are going to look at is for measuring AI usage in teams and companies.


AI Adoption Rate

What it measures: The percentage of development activities supported or augmented by AI tools.

Why it matters: Helps organisations understand where AI is delivering value and where adoption is lagging.

How to measure it:

  • Track usage of AI tools across IDEs, code assistants, or platforms

  • AI Adoption Rate = (AI-assisted tasks ÷ Total development tasks) × 100


AI Contribution Ratio

What it measures: The proportion of code, tests, or documentation generated or assisted by AI.

Why it matters: Provides visibility into how much of your codebase is influenced by AI.

How to measure it:

  • Analyse commit metadata or AI tool integrations

  • Tag or estimate AI-assisted contributions in pull requests


Developer Productivity Impact

What it measures: The effect of AI on delivery speed and throughput.

Why it matters: AI should improve flow—not just shift effort elsewhere.

How to measure it:

  • Compare cycle time and throughput before and after AI adoption

  • Track time-to-complete for common tasks with vs without AI


Core AI Quality & Safety Metrics


These metrics look at how effectively teams are applying quality and safety best practices to their AI usage.


AI-Generated Defect Rate

What it measures: The number of defects linked to AI-generated code.

Why it matters: AI can introduce subtle bugs that bypass traditional review processes.

How to measure it:

  • Tag defects traced back to AI-generated code

  • AI Defect Rate = (AI-related defects ÷ Total AI-generated changes) × 100


Prompt Safety Violations

What it measures: Instances where prompts or outputs violate security, privacy, or policy guidelines.

Why it matters: Uncontrolled prompts can expose sensitive data or generate unsafe outputs.

How to measure it:

  • Monitor prompt logs for restricted data patterns

  • Track flagged or blocked interactions via AI governance tools


Hallucination Rate

What it measures: The frequency of incorrect or fabricated AI outputs.

Why it matters: Hallucinations can lead to faulty logic, incorrect assumptions, or misleading documentation.

How to measure it:

  • Use human review or automated validation checks

  • Hallucination Rate = (Invalid AI outputs ÷ Total AI outputs) × 100


AI Review Coverage

What it measures: The percentage of AI-generated content that undergoes human review.

Why it matters: Human oversight is critical for ensuring correctness and safety.

How to measure it:

  • Track PRs or outputs flagged as AI-generated

  • AI Review Coverage = (Reviewed AI outputs ÷ Total AI outputs) × 100


Governance & Compliance Metrics


Importantly, these metrics look at how well teams are complying with certain laws and rules around AI usage and ensure that core data and AI remain safe in the usage of AI.


Data Exposure Risk

What it measures: The likelihood or occurrence of sensitive data being shared with AI systems.

Why it matters: AI tools can inadvertently leak confidential or regulated information.

How to measure it:

  • Scan prompts and outputs for sensitive data (PII, credentials, IP)

  • Track policy violations or blocked interactions


Model Usage Compliance

What it measures: Whether AI tools and models are used in accordance with organisational policies.

Why it matters: Ensures alignment with legal, regulatory, and ethical standards.

How to measure it:

  • Audit tool usage against approved vendors and models

  • Track unauthorised or shadow AI usage


Auditability & Traceability Score

What it measures: The ability to trace AI-generated outputs back to prompts, models, and users.

Why it matters: Critical for debugging, compliance, and accountability.

How to measure it:

  • Track logging completeness (inputs, outputs, metadata)

  • Score based on traceability coverage across systems


Connecting AI Metrics to Engineering Outcomes


AI metrics should not exist in isolation. They must be correlated with traditional engineering metrics to understand their true impact.


For example:

  • Increased AI adoption should align with improved cycle time and throughput

  • Stable or reduced AI defect rates indicate safe usage

  • Improved flow metrics without rising failure rates signal successful AI integration


This ensures that AI is not just accelerating delivery, but doing so safely and sustainably.


Turning AI Metrics into Responsible Innovation


Tracking AI usage and safety is not about limiting innovation; it’s about enabling it responsibly.


Use these metrics to:

  • Identify where AI adds the most value

  • Detect risks before they become incidents

  • Strengthen governance without slowing teams down

  • Build trust in AI-assisted development


Closing Thought


AI is changing how software is built, but it doesn’t remove the need for discipline. It increases it. By measuring AI usage, quality, and safety, organisations can move beyond experimentation and into trusted, production-grade AI adoption.


Because in the end, the goal is not just to build faster with AI, it’s to build better, safer, and more responsibly.

Comments


Thanks for subscribing!

R

© 2025 Craig Risi

bottom of page