Ango Hub Docs
Open Ango HubContact iMerit
  • Ango Hub Documentation
  • Video Guides
  • Changelog
  • FAQs & Troubleshooting
  • All Keyboard and Mouse Shortcuts
  • Core Concepts
    • Assets
    • Attachments
    • Batches
    • Benchmarks
    • Category Schema (Ontologies)
    • Frame Interpolation
    • Geofencing
    • Idle Time Detection & Time Tracking
    • Instructions
    • Issues
      • Issue Error Codes
    • Label Validation
    • Labeler Performance
    • Labeling
    • Labeling Queue
    • Multiple Classification
    • Notifications
    • Organizations
    • Projects
    • Requeuing
    • Reviewing
    • Review Queue
    • Skipping
    • Stage History
    • Tasks
    • Usage
    • User Roles
    • Workflow
      • Complete
      • Consensus
      • Hold
      • Label
      • Logic
      • Plugin
      • Review
      • Start
      • Webhook
  • Labeling
    • Managing Users in Projects
      • Profile Page
    • Managing the Project Ontology
    • Labeling Editor Interface
      • Audio Labeling Editor
      • Image Labeling Editor
      • Video Labeling Editor
      • DICOM Labeling Editor
      • Medical Labeling Editor
        • 3D Bounding Box
        • Fill Between Slices
        • Island Tools
        • Line (Tape Measure)
        • Smoothing
      • PDF Labeling Editor
      • Text (NER) Labeling Editor
      • LLM Chat Labeling Editor
      • Markdown Labeling Editor
      • 3D Multi-Sensor Fusion Labeling Editor
    • Labeling Classes
      • Tools
        • Bounding Box
        • Brush
        • Entity
        • Message
        • Nested Classifications
        • PCT
        • PDF Tool
        • Point
        • Polygon
        • Polyline
        • Rotated Bounding Box
        • Segmentation
        • Spline
        • Voxel Brush
      • Classification
        • Checkbox
        • Multiple Dropdown
        • Radio
        • Rank
        • Single Dropdown
        • Text
        • Tree Dropdown Tools (Single and Multiple Selection)
      • Relation
        • Single Relation
        • Group Relation
    • Magnetic Lasso
    • Performance & Compatibility Considerations
  • Data
    • Data in Ango Hub
      • Embedding Private Bucket Files in MD Assets
    • Importing Assets
      • Asset Builder
      • Bundled Assets
        • Importing Multiple Images as One Multi-Page Asset
        • Importing Multiple Single-Frame DICOM Files as One Multi-Page Asset
        • Importing multiple DICOM files to be annotated and displayed at once
        • Importing Multiple Single-Frame DICOM Files as a DICOM Series
        • Importing Multiple Markdown Files as One Multi-Page Asset
      • File Explorer
      • Supported Asset File Types & Codecs
      • Importing Cloud (Remote) Assets
      • Importing From Local Machine
      • Creating and Importing LLM Chat Assets
      • Importing Data in the 3D Multi-Sensor Fusion Labeling Tool
      • Bulk Importing Markdown/HTML Assets
      • Importing Attachments during Asset Import
      • Importing Annotations during Asset Import
      • contextData: Adding Extra Data to Assets
      • Importing Reference Images as Overlay
      • Importing Reference Medical Data During Asset Import
    • Importing and Exporting Annotations
      • Importing Annotations
        • Ango Import Format
        • Importing Brush Traces
        • Importing NRRD Annotations
      • Exporting Annotations
        • Ango Export Format
          • Asset
            • Task
              • Tools
              • Classifications
              • Relations
          • Stage History
    • Adding and Managing LLMs
    • Storages
      • Set up a storage integration with Azure
      • Set up a storage integration with AWS S3
      • Set up a storage integration with MinIO and S3-compatible custom storage services
      • Set up a storage integration with GCP (Google Cloud Platform)
      • Set up CORS
      • Validating Storage Integrations
    • Purging Data from Ango Hub
  • Plugins
    • Overview of Plugins in Ango Hub
      • Installing Plugins
      • Plugin Setting Presets
      • Monitoring Plugin Progress
    • First-Party Plugins
      • Ango Export Converter Plugins
      • Asset Converter Plugins
      • Ango to Mask Converter
      • Batch Assignment
      • ChatGPT
      • Column-Agnostic Markdown Generator
      • CSV Export for Classification
      • DALL-E
      • DALL-E (Model Plugin)
      • File Explorer Plugin
      • Markdown Generator
      • One-Click Segmentation
      • Open World Object Detection
      • Optical Character Recognition
      • TPT Export
      • YOLO | Instance Segmentation
      • YOLO | Pose Estimation
      • YOLO | Object Detection
      • YOLO | Image Classification
    • Plugin Developer Documentation
      • Export Plugins
      • Batch Model Plugins
      • Model Plugins
      • File Explorer Plugins
      • Markdown Generator Plugins
      • Plugin Logger
      • [WIP] Deploying your Plugin
      • Plugin 'Host' Information
  • SDK
    • SDK Documentation
      • Project Level SDK Functions
        • add_members_to_project
        • assign_batches
        • assign_task
        • create_attachment
        • create_batch
        • create_issue
        • create_label_set
        • create_project
        • delete_issue
        • export
        • exportV3
        • get_assets
        • get_batches
        • get_issues
        • get_metrics
        • get_project
        • get_project_performance
        • get_task
        • get_tasks
        • get_task_history
        • import_labels
        • list_projects
        • requeue_tasks
        • rerun_webhook
        • update_workflow_stages
        • upload_files
        • upload_files_cloud
        • upload_files_with_asset_builder
        • upload_chat_assets
      • Organization Level SDK Functions
        • create_storage
        • delete_organization_invites
        • delete_organization_members
        • delete_storage
        • get_organization_invites
        • get_organization_members
        • get_storages
        • invite_members_to_org
        • update_organization_members_role
    • SDK - Useful Snippets
    • SDK Changelog
  • API
    • API Documentation
  • How-To
    • Add Members
      • Add multiple users to a project
    • Annotate
      • Annotate 3D Point Cloud Files on Ango Hub
      • Perform targeted OCR on images
    • Export Data
      • Automatically send Ango Hub Webhook contents to Google Sheets, Email, Slack, and more with Zapier
      • Download a JSON of your project ontology
      • Download DICOM Segmentation Masks
      • Download your annotations in the COCO, KITTI, or YOLO format
      • Download your Segmentation Masks
      • Get your export as separate JSON files for each asset
    • Manage a Project
      • Get your API Key
      • Get your Organization ID
      • Mute your notifications
      • Open an asset provided the Asset ID
      • Pre-label assets
      • Share a filtered view of the Tasks table with others
      • Transfer project ontologies between projects
      • Transfer project workflows between projects
    • Perform Model Evaluation on Ango Hub
  • Troubleshooting
    • I get a "0 Tasks Labeled" alert when trying to pre-label tasks
    • I get a 'The data couldn't be loaded properly' error when opening certain assets
    • I get a "Unknown Classification" warning when opening a task
  • Other
    • Feature Availability Status for projects of the 3D Multi-Sensor Fusion type
    • Comparison between QuickServe and Ango Hub
    • Changes from Ango Hub Legacy
    • Video V2 Breaking Changes and Transition
    • Data Access, Storage, and Security
    • Two-Factor Authentication
    • Single Sign-On (SSO) Support
    • Customer Support
    • Ango Hub Status Page
    • Features Unavailable in Private Cloud and On-Premise Deployments of Ango Hub
Powered by GitBook
On this page
  • Enabling benchmarking for your project
  • Setting and removing benchmark tasks
  • Single
  • In bulk
  • What happens when you set task(s) as benchmark
  • How benchmark tasks are shown to users
  • Look and Feel
  • Benchmark Frequency
  • Analyzing annotator performance
  • Properties of benchmark tasks
  • FAQ
  • What is the algorithm used to calculate the benchmark score?
  • Can I edit a benchmark task after it has been set?
  • What happens to a user's benchmark score after I have edited a benchmark?
  1. Core Concepts

Benchmarks

Set certain labeling tasks as the gold standard and measure annotator performance.

PreviousBatchesNextCategory Schema (Ontologies)

Last updated 5 months ago

Ango Hub allows project managers to mark certain labeling tasks as benchmark (also known as 'Test Question' or 'Gold Standard' in other environments). Benchmarks allow the project manager to measure the performance of annotators.

Enabling benchmarking for your project

Benchmarking is not enabled by default in new projects.

To enable benchmarking, navigate to your project's Settings page, then to the General section. Enable the toggle next to Benchmark.

Click on the Save button at the bottom of the page.

From the same menu, you may also choose the likelihood of annotators being shown a benchmark task whenever they are shown a new task from the queue. By default, it is 10%, meaning that each time an annotator clicks on "Submit" and a new task is shown to them, there is a 10% chance that task is a benchmark (if any benchmarks are left to annotate for that user).

Disabling benchmarking in a project where benchmark tests have taken place will delete all benchmarking information from the project completed so far.

It is strongly recommended not to disable benchmarking in projects after it has been enabled.

If you wish to pause benchmarking on your project, you may set the likelihood annotators are shown a benchmark to 0% in the project settings. This will cause benchmark tasks not to appear to annotators.

Setting and removing benchmark tasks

Hub allows you to mark existing tasks as benchmarks. The task must have already been created and existing in the project. You may not mark certain tasks as benchmark during asset upload – assets must first be uploaded – only then they can be marked.

Tasks may be marked as benchmark one at a time (single) or in bulk.

Single

Navigate to and open the task you would like to set as benchmark. For example, you may click on the task from the Assets or the Tasks tab.

Once you have opened the task, from the three-dot menu at the top right of the labeling editor, click on Set as Benchmark.

To unset a task as benchmark, and turn it back to a normal task, follow the same steps, then click Remove as Benchmark. You may need to refresh the page if the benchmark status was changed recently.

The Set as Benchmark dialog will appear. Click on Set as Benchmark to finalize setting the task as benchmark. The task will be shown to all annotators in all labeling stages in the project.

Marking more than one classification answer as correct

You may make it so that more than one answer may be accepted for a 100% benchmark score for that classification.

To do so, before you mark the task as benchmark, or after you have opened a benchmark task, look at the left-hand side of the screen and find the classification for which you would like to create a new potentially correct answer.

Answer the classification with the first correct answer. Then, click on "OR":

A new classification answer will appear below. You may then answer it to add a new potentially correct answer. Keep on clicking on "OR" to add more correct answers. Save the task.

In the example above, now, annotators may answer either "Top-Down" or "Orthogonal" to the "Camera Angle" classification and they will, in both cases, get a 100% score for this classification.

In bulk

From the Tasks tab of your project, select one or multiple tasks using the checkboxes to their left. Then, from the Items menu, click on Set as Benchmark.

To unset tasks as benchmark, and turn them back to normal tasks, follow the same steps, then click Remove as Benchmark.

The Set as Benchmark dialog will appear. Click on Set as Benchmark to finalize setting the task as benchmark. The task will be shown to all annotators in all labeling stages in the project.

What happens when you set task(s) as benchmark

In this dialog, you click on Set as Benchmark to finalize your benchmark selection.

What this means in practice is that:

  1. The tasks you have selected will be marked as benchmarks.

Benchmark tasks may be re-queued to other stages from Complete. What this means, however, is that an annotator will annotate it again, or a reviewer may alter it, changing the benchmark for users who have not yet been tested on the benchmark. This is not recommended.

We strongly recommend tasks marked as benchmark not be re-queued from Complete to other stages where they can be edited.

  1. Hub will make copies of the task(s) you have selected, one for each user in the project, and place them in every user's labeling queues in all label-type stages in the project. The tasks created this way are known as "Benchmark tasks". Benchmark tasks are not included in the final export, are not sent to Complete, and are only be shown to users in the stage you have selected in this dialog. They are archived afterwards. All users who annotate in label-type stages will be shown the benchmark tasks. To limit who can see the benchmark tasks, you must limit who can annotate or review in all label-type stages in your project.

  2. Tasks selected as benchmarks will be visually distinguished from other tasks, to the project manager only, by the presence of a small yellow crown in their row, both in the Assets and Tasks tab:

How benchmark tasks are shown to users

Look and Feel

Users will not be able to tell that they are annotating a benchmark task. The task will look and feel exactly like any other task, with no indication whatsoever that the task they are annotating will be utilized in their performance evaluation.

Users will be able to annotate, create issues, skip, save, view instructions, and perform any other action they can normally perform on normal labeling tasks.

The only difference is that completing a benchmark task will not increase the number of "Completed" tasks, as benchmark tasks do not appear in the final export, are not sent to Complete, and are only used once in the stage where they have been created to measure the annotator's (or reviewer's) performance.

Benchmark Frequency

For as long as there are benchmark tasks in the stage, for the user annotating, whenever the user clicks on "Submit", the next task that is shown to them has a, by default, 10% chance of being a benchmark task.

If only benchmark tasks are remaining in the user's queue, the user will be exclusively shown benchmark tasks.

This frequency can be changed from the project Settings -> General section, under the benchmark toggle:

By default, there is a 10% chance that, whenever an annotator submits a task, the next task they will receive will be a benchmark (if there are any left for the user to annotate).

Setting this to 0% will cause benchmark tasks to stop showing to users, and setting this to 100% will cause only benchmark tasks to be shown to users until benchmark tasks are done. Users would then be shown normal labeling tasks.

Analyzing annotator performance

As project manager, you can see the performance of each user, as well as the performance of each benchmark question.

To see each user's performance, enter the Performance tab. Each user's average benchmark score will be shown on the user's row:

To see the performance of each benchmark question, from the Tasks tab, filter by Benchmark. You will then only see tasks which have been set as benchmark.

Click on the "+" icon next to a benchmark task to see each user's answers and score as it relates to that benchmark question:

From each benchmark's row, you may see the average benchmark score for that question, as well as the number of annotators who have submitted an answer to that benchmark:

You may also download a JSON containing all information on all tasks used to benchmark users from Settings -> General -> Export Benchmark Tasks.

Properties of benchmark tasks

Benchmark tasks shown to users:

  • Do not get sent to Complete

  • Do not contribute to completion statistics (e.g. the "Tasks Completed" number will not go up as benchmark tasks are completed)

  • Do contribute to all other statistics (TPT, etc.)

  • Are immediately archived after being submitted (e.g. they are available for project managers to inspect, but they are not present in any stage.

FAQ

What is the algorithm used to calculate the benchmark score?

Classifications

Let questionCount be the total number of classification questions in the project, and taskCount the total number of tasks assigned to an asset.

We calculate x, the single-question score for a single task as (sameAnswers / (taskCount - 1)), where sameAnswers is the count of answers that are equal to one another, current one excluded.

We repeat the above calculation for all tasks in the asset, to calculate the final result represented as Σ(x) below.

We calculate y, the overall score on a single question (classification) as (∑(x) / taskCount).

We repeat the above calculation for all questions in the asset, to get to the final result represented as Σ(y) below.

The final score, then, is calculated as ∑(y) / questionCount.

Note on Rank Benchmarking

Objects (Bounding Box, Polygon...)

We calculate benchmarks for objects using the Intersection over Union (IoU) method.

We compare objects with one another to generate their IoU scores. If some annotations are completely separate, for example, with not even a pixel in common, their IoU score would be 0. If they overlapped completely, their score would be 100.

We then average the IoU scores of all annotation to calculate the final score.

Points

Can I edit a benchmark task after it has been set?

Yes. Open the task from the Tasks or the Assets tab, edit it, and save it.

Existing benchmark scores will not be changed. Users who have not yet been benchmarked on the task will see the new, updated task, and they will be tested on this new version of the task.

What happens to a user's benchmark score after I have edited a benchmark?

The benchmark score remains unchanged. Once a user is tested on a benchmark task, their score regarding that task is unchangeable. Users who have not yet seen the benchmark task, however, will be tested on the edited version of the task.

By default, the consensus score is calculated against classification answers as they were when the task was set as benchmark. For example, if a classification with three answers, A, B, and C had "B" marked as the benchmark answer, then if the annotators give any other answer, they will be given a 0% score for that classification.

The tasks you have selected will be moved to the . This is because since you have marked the task(s) as the gold standard, they are assumed to be complete.

In the classification tool, if the annotator's answers differ, in any way, with the benchmark, their score for that classification will be 0. If they are the exact same, it wil be 1 (e.g. 100%) for that classification.

See .

radio
Complete stage
Rank
Consensus/Points
Page cover image