Measuring Developer Productivity in the AI Era

Every engineering leader running GitHub Copilot in Visual Studio has asked some version of the same question: "Is this actually making us faster, or just making us feel faster?" Lines of code went up. Commit counts went up. But so did code churn, review backlog, and — in some teams — production incidents. The old productivity dashboard doesn't tell you which story you're living. Here's a practical, no-nonsense way to measure it on a Microsoft stack: Visual Studio, GitHub Copilot, Azure DevOps, and Power BI.

Why the Old Metrics Lie to You Now

Three assumptions quietly broke when AI started writing real production code:

  • Lines of code (LOC) is now infinitely gameable — Copilot can generate hundreds of lines of boilerplate in seconds with zero functional value.
  • Deployment frequency and lead time can rise just because typing got faster, not because more real value shipped.
  • Code churn — code reverted or rewritten within two weeks — is trending upward industry-wide as AI-generated code increases. It's the quiet tax nobody puts on the dashboard.

The fix isn't to throw out delivery metrics. It's to add an AI-attribution layer on top of them, so you can tell real acceleration from acceleration built on hidden debt.

The Four-Layer Model

Instead of one productivity number, track four short layers — 8 to 10 metrics total, no more:

LayerQuestion it answersExample metrics
Delivery Is the pipeline healthy? Deployment frequency, lead time, change failure rate, recovery time
AI attribution How much is AI really contributing, and does it hold up? AI code share, Copilot acceptance rate, code durability (30-day survival)
Quality guardrail Is speed being bought with debt? Code churn, rework rate
Human experience Are developers actually better off? Satisfaction pulse survey, PR review cycle time

Golden rule: every speed metric gets a quality counter-metric, and every quantitative number gets a qualitative check. Never use any of this to rank individual developers — these are system-level signals, not performance reviews.

Step 1 — Turn On Copilot Telemetry in Visual Studio

  1. Make sure everyone has a Copilot Business or Enterprise seat — personal seats don't report usage back to the organization.
  2. In Visual Studio, update the GitHub Copilot and GitHub Copilot Chat extensions and sign in with the org's GitHub identity.
  3. Go to Tools > Options > GitHub > Copilot and confirm telemetry and completions are enabled — this is what feeds acceptance-rate data.
  4. If you use Copilot's agent mode against Azure resources or Azure DevOps work items, enable the Azure MCP Server tools from the Copilot Chat tools picker so agent activity gets counted too, not just inline suggestions.

Step 2 — Pull Org-Wide Copilot Metrics

GitHub's Copilot Usage Metrics API is your source of truth for acceptance rate, active users, and code-generation breakdowns.

Fastest path — deploy Microsoft's free solution accelerator:

azd init -t microsoft/copilot-metrics-dashboard azd up

This one command provisions Azure App Service, Azure Functions, Cosmos DB, and Key Vault, then wires them together for you. You'll be prompted for your GitHub org/enterprise name, a token, and whether metrics should scope to organization or enterprise.

Already on Grafana? Use the community copilot-metrics-viewer project instead — same data, different UI.

Note: GitHub retired the old Copilot Metrics API in April 2026. Point any new integration at the current Copilot Usage Metrics API.

Step 3 — Connect Azure DevOps Analytics

This is where your delivery-side DORA data already lives:

  • Turn on Analytics Views in Azure Boards for cycle time, lead time, and velocity per team.
  • Link your GitHub repos to Azure Boards work items so PRs and commits roll up against the same items Copilot is helping with — this lets you split PR cycle time into AI-assisted vs. human-authored.
  • Use the Pipelines > Analytics tab for deployment frequency and success/failure trends.

Step 4 — Unify Everything in Power BI

Bring three sources into one model:

  1. Azure DevOps Analytics → connect via the built-in OData feed connector.
  2. Copilot metrics → connect to the Cosmos DB store behind the solution accelerator (or the raw NDJSON export).
  3. Application Insights / Azure Monitor → for production incidents and deployment markers, which feed Change Failure Rate and Recovery Time.

Build one dashboard page per layer — Delivery, AI Attribution, Quality Guardrail, Human Experience — and schedule a daily refresh. Copilot's API only exposes a rolling 28-day window, so if you want longer trend lines, store the data yourself (Cosmos DB or a Fabric/Azure SQL warehouse).

Step 5 — Don't Forget the Human Layer

Numbers alone don't tell you why. Run a short quarterly pulse survey (8–12 questions, Microsoft Forms is enough) covering satisfaction and flow. Whenever a metric spikes or dips, check the human context before reacting — a cycle-time increase during a planned refactor sprint isn't a regression.

A Starter Dashboard (Copy This)

LayerMetricSource
Delivery Deployment Frequency Azure Pipelines Analytics
Delivery Change Failure Rate Application Insights
Delivery Recovery Time Application Insights
AI Attribution AI Code Share Copilot Usage Metrics API
AI Attribution Suggestion Acceptance Rate Copilot Usage Metrics API
Quality Code Churn Azure Repos
Quality Rework Rate Azure Boards + Pipelines
Human Experience Satisfaction (pulse survey) Quarterly survey
Human Experience PR Review Cycle Time Azure Repos Analytics

That's it. Nine metrics, four layers, no vanity numbers.

Three Traps to Avoid

  • Don't reward raw AI token/suggestion volume. "Tokenmaxxing" measures spend, not value.
  • Don't use any of this to rank individuals. DORA and SPACE were built to measure systems, not people.
  • Don't let the dashboard grow past 10 metrics. If a number wouldn't change a decision, stop tracking it.

Bottom Line

AI genuinely removes toil and can speed up real delivery — but only if you're watching quality and durability alongside speed. On the Microsoft stack, the pieces to do this properly already exist: Copilot's usage API, Azure DevOps Analytics, Application Insights, and Power BI. The work isn't building new tools — it's wiring the ones you already have into one honest picture.

· · ·Have you set up something similar on your Azure DevOps + Copilot stack? I'd be curious to compare dashboard structures — reach out via ridilabs.net.

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

Topics Highlights

About @ridife

This blog will be dedicated to integrate a knowledge between academic and industry need in the Software Engineering, DevOps, Cloud Computing and Microsoft 365 platform. Enjoy this blog and let's get in touch in any social media.

Month List

Visitor