# Benchmarks

Ango Hub allows project managers to mark certain labeling tasks as benchmark (also known as 'Test Question' or 'Gold Standard' in other environments). Benchmarks allow the project manager to measure the performance of annotators.

## Enabling benchmarking for your project

Benchmarking is not enabled by default in new projects.

To enable benchmarking, navigate to your project's *Settings* page, then to the *General* section. Enable the toggle next to *Benchmark.*

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2Fl9s59KJ9TIo4aH6GlmiC%2Fimage.png?alt=media&#x26;token=a7a8e218-019b-4e50-b4aa-63c0cbe609aa" alt=""><figcaption></figcaption></figure>

Click on the *Save* button at the bottom of the page.

From the same menu, you may also choose the likelihood of annotators being shown a benchmark task whenever they are shown a new task from the queue. By default, it is 10%, meaning that each time an annotator clicks on "Submit" and a new task is shown to them, there is a 10% chance that task is a benchmark (if any benchmarks are left to annotate for that user).

{% hint style="danger" %}
Disabling benchmarking in a project where benchmark tests have taken place **will delete all benchmarking information** from the project completed so far.

It is strongly recommended not to disable benchmarking in projects after it has been enabled.

If you wish to pause benchmarking on your project, you may set the likelihood annotators are shown a benchmark to 0% in the project settings. This will cause benchmark tasks not to appear to annotators.
{% endhint %}

## Setting and removing benchmark tasks

Hub allows you to mark existing tasks as benchmarks. The task must have already been created and existing in the project. You may not mark certain tasks as benchmark during asset upload – assets must first be uploaded – only then they can be marked.

Tasks may be marked as benchmark one at a time (single) or in bulk.

### Single

Navigate to and open the task you would like to set as benchmark. For example, you may click on the task from the *Assets* or the *Tasks* tab.

Once you have opened the task, from the three-dot menu at the top right of the labeling editor, click on *Set as Benchmark.*

To unset a task as benchmark, and turn it back to a normal task, follow the same steps, then click *Remove as Benchmark*. You may need to refresh the page if the benchmark status was changed recently.

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2F8wty2OWsxoOErtrXofx8%2Fimage.png?alt=media&#x26;token=645bc101-4c34-47a4-b24c-07a8e160f846" alt=""><figcaption></figcaption></figure>

The *Set as Benchmark* dialog will appear. Click on *Set as Benchmark* to finalize setting the task as benchmark. The task will be shown to all annotators in all labeling stages in the project.

#### Marking more than one classification answer as correct

By default, the consensus score is calculated against classification answers as they were when the task was set as benchmark. For example, if a [radio](https://docs.imerit.net/labeling/labeling-tools/classification-tools/radio) classification with three answers, A, B, and C had "B" marked as the benchmark answer, then if the annotators give any other answer, they will be given a 0% score for that classification.

You may make it so that more than one answer may be accepted for a 100% benchmark score for that classification.

To do so, before you mark the task as benchmark, or after you have opened a benchmark task, look at the left-hand side of the screen and find the classification for which you would like to create a new potentially correct answer.

Answer the classification with the first correct answer. Then, click on "OR":

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FvW0P2M7X7HVEFY7b38tA%2Fimage.png?alt=media&#x26;token=44591e2d-067b-458d-a35d-e073c9cab284" alt=""><figcaption></figcaption></figure>

A new classification answer will appear below. You may then answer it to add a new potentially correct answer. Keep on clicking on "OR" to add more correct answers. Save the task.

In the example above, now, annotators may answer either "Top-Down" or "Orthogonal" to the "Camera Angle" classification and they will, in both cases, get a 100% score for this classification.

### In bulk

From the *Tasks* tab of your project, select one or multiple tasks using the checkboxes to their left. Then, from the *Items* menu, click on *Set as Benchmark*.

To unset tasks as benchmark, and turn them back to normal tasks, follow the same steps, then click *Remove as Benchmark*.

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2F8T4LB8eOPJenQ4E0WDJg%2FMar-01-2024%2014-04-40.gif?alt=media&#x26;token=8975dd3a-b163-4ad3-94ca-5e6826798772" alt=""><figcaption></figcaption></figure>

The *Set as Benchmark* dialog will appear. Click on *Set as Benchmark* to finalize setting the task as benchmark. The task will be shown to all annotators in all labeling stages in the project.

## How to see a task's benchmark answers

From the *Tasks* tab, navigate to the task you'd like to examine the benchmark answers of, and open it.

In the labeling editor, a dropdown will appear at the top, allowing you to flip between answers given by different users to this task:

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FPmpZkX4T5QiaWa9ur0uv%2Floreshot-20260216-120006%402x.png?alt=media&#x26;token=943d6f73-c38d-4904-a4db-b0ecfbf7bbd1" alt=""><figcaption></figcaption></figure>

When you are done checking out a user's answer, click on *Cancel* on the top right to return to the task.

## Allowing a user to retake a benchmark

From the Tasks tab, navigate to the benchmark you'd like to reset and click on the "Reset Benchmark" button on the right.

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FVYsh0dMWHC8QXFwaCR7q%2Floreshot-20251219-100835%402x.png?alt=media&#x26;token=84caffc6-1011-40f3-b10d-77fc8f607867" alt=""><figcaption></figcaption></figure>

## What happens when you set task(s) as benchmark

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2F66LTeqX6QLyMQfRZGZrT%2Fimage.png?alt=media&#x26;token=d2e35875-ac3c-4cec-b230-3660bbc27917" alt=""><figcaption></figcaption></figure>

In this dialog, you click on *Set as Benchmark* to finalize your benchmark selection.

What this means in practice is that:

1. The tasks you have selected will be marked as benchmarks.
2. The tasks you have selected will be moved to the [*Complete* stage](https://docs.imerit.net/core-concepts/workflow/complete). This is because since you have marked the task(s) as the gold standard, they are assumed to be complete.

{% hint style="info" %}
Benchmark tasks may be re-queued to other stages from *Complete*. What this means, however, is that an annotator will annotate it again, or a reviewer may alter it, changing the benchmark for users who have not yet been tested on the benchmark. This is not recommended.

We strongly recommend tasks marked as benchmark not be re-queued from Complete to other stages where they can be edited.
{% endhint %}

3. Hub will make copies of the task(s) you have selected, one for each user in the project, and place them in every user's labeling queues in all label-type stages in the project. The tasks created this way are known as "Benchmark tasks". Benchmark tasks are not included in the final export, are not sent to *Complete*, and are only be shown to users in the stage you have selected in this dialog. They are archived afterwards.\
   \
   All users who annotate in label-type stages will be shown the benchmark tasks. To limit who can see the benchmark tasks, you must limit who can annotate or review in all label-type stages in your project.
4. Tasks selected as benchmarks will be visually distinguished from other tasks, to the project manager only, by the presence of a small yellow crown in their row, both in the *Assets* and *Tasks* tab:<br>

   <figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FffK96bMNyQA02eCd5OpX%2Fimage.png?alt=media&#x26;token=c7cfc17c-8fdf-409a-86ba-a8425123a25e" alt=""><figcaption></figcaption></figure>

## How benchmark tasks are shown to users

### Look and Feel

Users will not be able to tell that they are annotating a benchmark task. The task will look and feel exactly like any other task, with no indication whatsoever that the task they are annotating will be utilized in their performance evaluation.

Users will be able to annotate, create issues, skip, save, view instructions, and perform any other action they can normally perform on normal labeling tasks.

The only difference is that completing a benchmark task will not increase the number of "Completed" tasks, as benchmark tasks do not appear in the final export, are not sent to *Complete*, and are only used once in the stage where they have been created to measure the annotator's (or reviewer's) performance.

### Benchmark Frequency

For as long as there are benchmark tasks in the stage, for the user annotating, whenever the user clicks on "Submit", the next task that is shown to them has a, by default, 10% chance of being a benchmark task.

By default, if only benchmark tasks are remaining in the user's queue, the user will be exclusively shown benchmark tasks. However, if you'd like for Ango Hub to stop showing benchmark tasks when the user has completed all of their "regular" tasks, you may enable this toggle:

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FLFqVmWjHmvGCoQKCj9ld%2Floreshot-20260310-144932%402x.png?alt=media&#x26;token=eec8cca6-993c-46f1-8b3a-5b915105ba67" alt=""><figcaption></figcaption></figure>

The frequency at which benchmark tasks are shown can be changed from the project Settings -> General section, under the benchmark toggle:

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FsLrpU57JqnmBywRKaqWX%2Fimage.png?alt=media&#x26;token=9c91cd46-bbdc-48c5-ae79-f60f851ce454" alt=""><figcaption></figcaption></figure>

By default, there is a 10% chance that, whenever an annotator submits a task, the next task they will receive will be a benchmark (if there are any left for the user to annotate).

Setting this to 0% will cause benchmark tasks to stop showing to users, and setting this to 100% will cause only benchmark tasks to be shown to users until benchmark tasks are done. Users would then be shown normal labeling tasks.

## Analyzing annotator performance

As project manager, you can see the performance of each user, as well as the performance of each benchmark question.

To see each user's performance, enter the *Performance* tab. Each user's average benchmark score will be shown on the user's row:

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FgwWXRair8zSOhYYhXVqr%2Fimage.png?alt=media&#x26;token=c140a82d-7170-4e2d-85e7-c2f8f14293bf" alt=""><figcaption></figcaption></figure>

To see the performance of each benchmark question, from the *Tasks* tab, filter by *Benchmark*. You will then only see tasks which have been set as benchmark.

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FPylFGMAau7lPY48ojwZZ%2Fimage.png?alt=media&#x26;token=5567f813-a364-4c05-b810-518e8f03edec" alt=""><figcaption></figcaption></figure>

Click on the "+" icon next to a benchmark task to see each user's answers and score as it relates to that benchmark question:

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2F2c0zmlWK1R0mDO4oaURV%2Fimage.png?alt=media&#x26;token=ff9fa4e8-1a07-4637-9895-6f881856f26e" alt=""><figcaption></figcaption></figure>

From each benchmark's row, you may see the average benchmark score for that question, as well as the number of annotators who have submitted an answer to that benchmark:

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FNZYfCoGtvMTrORKRZeiH%2Fimage.png?alt=media&#x26;token=7f587428-7795-44e3-9baf-a21d12bca002" alt=""><figcaption></figcaption></figure>

You may also download a JSON containing all information on all tasks used to benchmark users from  *Settings* -> *General -> Export Benchmark Tasks.*

<figure><img src="https://3895963154-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTcOUG6rfWxqGM0N4db2P%2Fuploads%2FZcgyXWX4VcJXJ54iU1wY%2Fimage.png?alt=media&#x26;token=8a636c30-8b5c-4ae4-9ce2-76eeab650d5c" alt=""><figcaption></figcaption></figure>

## Properties of benchmark tasks

Benchmark tasks shown to users:

* Do not get sent to Complete
* Do not contribute to completion statistics (e.g. the "Tasks Completed" number will not go up as benchmark tasks are completed)
* Do contribute to all other statistics (TPT, etc.)
* Are immediately archived after being submitted (e.g. they are available for project managers to inspect, but they are not present in any stage.

## FAQ

### What is the algorithm used to calculate the benchmark score?

#### Classifications

Let `questionCount` be the total number of classification questions in the project, and `taskCount` the total number of tasks assigned to an asset.

We calculate `x`, the single-question score for a single task as `(sameAnswers / (taskCount - 1))`, where `sameAnswers` is the count of answers that are equal to one another, current one excluded.

We repeat the above calculation for all tasks in the asset, to calculate the final result represented as Σ(x) below.

We calculate `y`, the overall score on a single question (classification) as `(∑(x) / taskCount)`.

We repeat the above calculation for all questions in the asset, to get to the final result represented as Σ(y) below.

The final score, then, is calculated as `∑(y) / questionCount`.

{% hint style="info" %}
**Note on Rank Benchmarking**

In the [Rank](https://docs.imerit.net/labeling/labeling-tools/classification-tools/rank) classification tool, if the annotator's answers differ, in any way, with the benchmark, their score for that classification will be 0. If they are the exact same, it wil be 1 (e.g. 100%) for that classification.
{% endhint %}

#### Objects (Bounding Box, Polygon...)

We calculate benchmarks for objects using the Intersection over Union (IoU) method.

We compare objects with one another to generate their IoU scores. If some annotations are completely separate, for example, with not even a pixel in common, their IoU score would be 0. If they overlapped completely, their score would be 100.

We then average the IoU scores of all annotation to calculate the final score.

#### Points

See [Consensus/Points](https://docs.imerit.net/workflow/consensus#points).

### Can I edit a benchmark task after it has been set?

Yes. Open the task from the *Tasks* or the *Assets* tab, edit it, and save it.

Existing benchmark scores will not be changed. Users who have not yet been benchmarked on the task will see the new, updated task, and they will be tested on this new version of the task.

### What happens to a user's benchmark score after I have edited a benchmark?

The benchmark score remains unchanged. Once a user is tested on a benchmark task, their score regarding that task is unchangeable. Users who have not yet seen the benchmark task, however, will be tested on the edited version of the task.

### What tools and classification types are included in benchmark calculations?

<table><thead><tr><th width="229.06640625">Tool / Classification Type</th><th width="194.5859375">Benchmark supported</th><th>Notes (if any)</th></tr></thead><tbody><tr><td><strong>Tools</strong></td><td></td><td></td></tr><tr><td>Bounding Box</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Rotated Bounding Box</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>Polygon</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Polyline</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>Segmentation</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>Brush</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>Voxel Brush</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>Entity</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Point</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Circle</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>PDF</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Message</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>Angle</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td><strong>Classifications</strong></td><td></td><td></td></tr><tr><td>Radio</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Checkbox</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Single-select dropdown</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Multi-select dropdown</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Single-select tree</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Multi-select tree</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td></tr><tr><td>Text</td><td><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td>Benchmark is 0% if the texts differ (even by a single character), and 100% if they are exactly the same.</td></tr><tr><td><strong>Relations</strong></td><td></td><td></td></tr><tr><td>Single</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr><tr><td>Group</td><td><span data-gb-custom-inline data-tag="emoji" data-code="274c">❌</span></td><td></td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.imerit.net/core-concepts/benchmarks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
