Benchmark & Consensus Export

Overview

The Benchmark & Consensus Export plugin is an export plugin designed to measure the agreement between multiple annotators and generate a comprehensive CSV report summarizing the results

It analyzes annotations across annotators, compares them using selected metrics, and produces CSV reports to support quality assessment, benchmarking, and adjudication workflows.

Cover

Benchmark & Consensus Export

Plugin Functionality

The Benchmark & Consensus Export plugin analyzes annotations produced by multiple annotators and generates structured CSV outputs for quality evaluation and benchmarking. The plugin is fully configurable using calculation modes and text comparison metrics.

  • The plugin processes tasks for consensus generation or benchmark evaluation. In consensus mode, annotations from multiple annotators are compared to identify agreement and derive a consensus result. In benchmark mode, annotator outputs are evaluated against ground truth annotations to measure performance.

  • Compares text-based annotations using a configurable evaluation metric. Supported metrics include: exact match, case-insensitive match and BLEU score.

  • Allows specific annotation classes (schemas) to be excluded from consensus and benchmark calculations. Any schema IDs listed in ignored_schema_ids are ignored during metric computation and report generation.

  • Generates detailed CSV reports summarizing annotator agreement, consensus results, and benchmark metrics.

Supported Data Types

  • Compatible with all data types available in AngoHub.

Supported Annotation Tools

  • Classifications

    • Radio

    • Checkbox

    • Single-Dropdown

    • Multi-Dropdown

    • Tree Dropdown

    • Text

    • Nested Classifications

    • Multiple Classifications

  • Tools

    • Bounding-Box

    • Rotated Bounding-Box

    • Polygon

    • Polyline

    • Segmentation

    • Entity

    • Point

    • Brush

    • Voxel Brush

triangle-exclamation

Plugin Configuration

From the Stage Filter field, pick the stages containing the tasks you'd like to get the export of. Similarly, from the Batch Filter field, you may select one or more batches the tasks of which will be exported. By default, all tasks are exported from all batches.

If you wish to receive an email when the export is complete, toggle Send Email on.

You may vary a number of settings related to your export from the Config JSON field. Each option is detailed below:

  • "mode": Determines what types of tasks the plugin will process.

    • Options:

      • "consensus"

      • "benchmark"

  • "text_metric": Defines the metric used to compare text-based annotations.

    • Options:

      • "exact_match"

      • "case_insensitive_match"

      • "bleu_score"

  • "ignored_schema_ids": Specifies which classes should be excluded from the consensus calculation.

    • Example:

      • "ignored_schema_ids": ["12345", "12346"]

  • "include_key_frames_only": Specifies whether only key frames should be included in the metrics calculation. (For video assets only)

    • Example:

      • "include_key_frames_only": true

      • "include_key_frames_only": false

  • "logging_frequency": Defines how frequently progress logs are displayed; setting this value to 0 disables logging entirely, while any positive integer enables logging at the specified interval. For more information on how to view plugin logs, see here.

    • Example:

      • "logging_frequency": 0

      • "logging_frequency": 100

Last updated