Testing & Live Mode for Evaluations

Last updated: March 6, 2026

Evaluations now have two lifecycle modes: Testing and Live. These are new concepts in Solidroad — Testing mode lets you safely calibrate a new evaluation before it affects your reports, and Live mode is when your evaluation is running for real and producing results that appear in QA reporting.

Your QA team can review AI scoring, mark results as correct or incorrect, and refine your scorecard criteria in Testing mode — all completely isolated from live reporting data.


Why use Testing mode?

When you create an evaluation, the AI scores conversations using your scorecard's sections, descriptions, and score ranges. Small differences in how criteria are worded can significantly shift results. Testing mode gives you a safe environment to iterate on your scorecard until the AI's scoring consistently matches your team's expectations — before any of it touches your live reports.


How it works

1. Create an evaluation with testing enabled

When setting up a new evaluation, complete your standard configuration (data source, scorecard, filters, agent selection), then navigate to the Testing section and toggle Testing on.

page_2.png

Configure the following:

  • Conversations per period — how many conversations to send to QAs for calibration (e.g. 5 per day, 10 per week)

  • Calibrators — which QA team members will review and calibrate the AI's scoring

  • Accuracy goal (optional) — a target percentage to track progress toward (e.g. 85%)

Testing mode can only be enabled when creating a new evaluation. It cannot be added to an existing live evaluation. If you need to recalibrate a live evaluation, duplicate it and enable testing on the new copy.

page_3.png

Click Create and Confirm to save your evaluation in Testing mode.

page_4.png

2. The system begins running and generating calibration tasks

Once created, the evaluation immediately starts:

  • Running on incoming conversation batches

  • Generating calibration tasks for assigned QAs at your configured frequency (daily, weekly, or monthly)

  • Accuracy metric is unavailable because no calibrations have been submitted

annotely_image.png

3. QAs complete calibration tasks

QAs receive calibration tasks and review the AI's scores for each assigned conversation. For each scorecard section, they mark the AI's score as either:

  • Correct — the AI scored this section accurately

  • Incorrect — the AI's score doesn't match the QA's judgement

Completed calibrations are submitted to the QA Manager for review.

page_5.pngpage_6.png

4. QA Manager reviews calibrations

As QA Manager, you review submitted calibrations and accept or reject them.

The Testing dashboard tracks your progress:

  • Accuracy — how closely the AI's scoring matches your team's judgement, calculated across scorecard sections

  • Number of conversations — how many conversations have been calibrated so far

  • Accuracy trend — whether accuracy is improving as you refine the scorecard

page_5.png

The accuracy card only appears once enough conversations have been calibrated to be statistically meaningful. Until then, the card shows "Calibrate N more to see accuracy" — telling you exactly how many more calibrations are needed.

5. Go live when you're ready

When you're confident the scorecard is performing well, click Go Live.

page_6.png

This transitions the evaluation from Testing to Live mode:

  • All future evaluation runs produce live results

  • Results appear in your QA reports and metrics

  • Testing-mode data remains accessible on the evaluation's Testing tab but is never included in reporting

Going live is a one-way action. Once an evaluation is in Live mode, it cannot be switched back to Testing. If you need to recalibrate, duplicate the evaluation and enable testing on the new copy.

page_7.png

Understanding accuracy

Accuracy is calculated section by section for each calibrated conversation, then averaged across all conversations. Think of it as a partial score — if the AI got most sections right on a conversation, that still counts positively toward your overall accuracy, rather than the whole conversation being marked wrong.


Who sees what

RoleWhat they see

QA Manager (Admin)

All calibrations from all QAs, plus aggregate accuracy metrics

QA (Calibrator)

Only their own assigned calibrations, plus the same aggregate accuracy metrics

Testing-mode evaluations never appear in QA Reporting. Once an evaluation is in Live mode, it becomes visible in QA Reporting to everyone with reporting access.


Tips for effective calibration

  • Write specific scorecard descriptions. The more precisely you define what "poor", "average", and "strong" look like, the more accurately the AI will score.

  • Calibrate across different conversation types. Include edge cases and difficult conversations — not just the easy ones.

  • Watch the accuracy trend. If accuracy stalls, check which sections are consistently wrong and revise those descriptions.

  • Set a realistic accuracy goal. 100% AI-human agreement is uncommon. An accuracy goal of 80–90% is a strong benchmark for most teams. Note that reaching your accuracy goal doesn't automatically move the evaluation to Live mode — you always choose when to go live by clicking Go Live.

  • Don't rush to go live. Testing mode continues until you click Go Live — there's no time pressure. Take the time to get your scorecard right.


FAQ

Can I enable testing on an existing live evaluation? No. Testing mode can only be enabled at creation time. To recalibrate a live evaluation, duplicate it and enable testing on the new copy.

What happens to testing data after I go live? Testing runs and their data are preserved and remain visible on the evaluation's Testing tab, but are excluded from all reporting.

Does the evaluation still run while in testing mode? Yes. The evaluation processes incoming conversations normally — those runs are stamped as "testing" so they're excluded from live reporting.

Can I edit the evaluation while it's in testing mode? Yes. You can adjust testing configuration (conversations per period, assigned QAs, accuracy goal) while in Testing mode. Testing configuration cannot be changed after going live.

What if I never go live? The evaluation continues in Testing mode indefinitely. No data appears in your reports. You can archive or delete the evaluation at any time.