Consensus

Because of the way the Consensus mechanism works under the hood, logic stages of the type "Annotator" and "Duration" may not work as expected when processing tasks output from a Consensus stage.

Requeuing tasks with issues which have been output from a Consensus stage might lead to unexpected behavior regarding the issues. We recommend closing all issues on such tasks before requeuing them.

A Consensus stage is a way for you to present tasks to multiple annotators, and have the task be output in either Agreement or Disagreement conditional upon how much the annotators agree with one another.

Essentially, the Consensus stage is a container for other Label or Plugin sub-stages.

The Consensus stage accepts plugin sub-stages, such that, for example, you can have a task be labeled by an annotator and a plugin, and you may return the task based on how similarly the annotator labeled the task compared to the plugin.

There is a limit of ten maximum sub-stages you may add to the Consensus stage.

The Consensus stage, by default, does not prevent the same task from being labeled by the same person. To prevent that from happening, you will have to assign different annotators to different label stages, as mentioned in the section for the Label stage. This can be done automatically by clicking on Auto Assign in the settings for the consensus stage.

More details in the section for Auto Assign.

Consensus agreement cannot be calculated for the following class types:

Brush
Voxel Brush
Segmentation
Polyline
Rotated Bounding Box

In video assets, consensus calculation ignores pages. Because of this, we do not recommend using consensus in video-based tasks yet.

Diagram of how Consensus works

As mentioned in the diagram above, whenever a task enters the Consensus stage, it is 'duplicated' into sub-tasks, and each sub-task is sent to its own sub-stage.

You may examine individual sub-tasks and check their current status from the "Tasks" tab, by clicking on the "Plus" next to the Consensus task to expand it and see details pertaining to the sub-tasks:

If a sub-task is in the "Archive" stage it means it has been completed and submitted.

If a sub-task has been assigned to a specific annotator, but you wish to unassign it from that person and place it back in the queue for that specific consensus sub-stage, you can click on the "Un-assign and send back to queue" button for that sub-task:

Once all sub-stages have been annotated, they will be archived and they will no longer be accessible through the "Tasks" tab. They will, however, be accessible from the "Stage History" panel in the labeling editor when opening the main task.

Settings

Auto Assign

By default, Ango Hub does not prevent the same annotator from annotating the same asset more than once as part of a consensus stage.

For example, if you add two Label tasks which can be annotated by Anyone, like so:

Labeler A will open their labeling queue and go through the tasks in Consensus_1.

If no other annotator has opened the tasks annotated by Labeler A, and Labeler A clicks on Start Labeling and enters the labeling queue, they may enter the Consensus_2 queue and label the same tasks again. This way, consensus will not be calculated between two different annotators, as usually expected, since the same annotator will have annotated both tasks themselves.

To prevent this, you'd have to assign each labeling stage in consensus to different annotators. Auto Assign automates this process for you.

From the Consensus stage settings, click on Auto Assign. The following dialog will pop up:

Toggle on the users you'd like to assign to the stages within the selected consensus container, and they'll be distributed to every consensus stage in the container. If, after doing so, there are no consensus stages in your container assigned to Anyone, then you have guaranteed that no labeler will see the same task twice.

Stage Settings

Setup

Clicking on Add Label will add a label stage. Clicking on Add Plugin will add a plugin stage. Click on each individual stage to change their options. Enable the grey toggle to mark the stage as dynamic. (see section on Dynamic Consensus). Click on the trash can to delete the stage.

Threshold

From this view, you will be able to pick what will be determined as Agreement and Disagreement. You will see a list of labeling tools present in your project.

To have a tool be included in the Consensus calculation, enable the toggle next to it.

In the example above, we have three tools: a bounding box named Vehicle, a radio classification named Color, and a single dropdown named Model. In this example, the task will be considered in agreement when at least 30% of the annotators give the same answer to Color, and at least 30% of annotators give the same answer to Model. When both of these conditions are satisfied, the task is marked as being in Agreement.

Since the "Vehicle" bounding box had its toggle turned off, annotations from that class will not be counted in the consensus score calculation.

Adjudication

The task sent as output is not the judgment from a single annotator – it is instead a composite task, the contents of which will be determined by the adjudication method you pick here.

Best Answer

The output task contains the annotations with the highest consensus score, for each class, for classes where consensus can be calculated.

For example, if the consensus stage has three judgment sub-stages, and the task has three radio classifications A, B, and C, and one bounding box class D, the task output at the end will have, for each classification, the answer annotators coalesced on the most, and for class D, the bounding boxes created by the annotator with the highest class D consensus score.
For classes where consensus cannot be calculated (e.g. assume in our project there is a points class E and a rotated bounding box class F), the final task will have the non-calculable classes from the first user who has submitted them in the consensus stage.
So in this case, we would have the best answers from classes A, B, and C, then the bounding boxes drawn by the user with the highest class D consensus score, and for classes E and F, we would have the answers given by the first user to submit them in the consensus stage.

If, in the Consensus stage, some annotators did not create annotations using a certain class, or did not answer some classification answers, but others did, the output task will contain them, even if not all consensus annotators responded.

For example, if we have a project with a bounding box class A, a polygon class B, a radio classification C, and a text classification D, assuming:
- User 1 only created 1 bounding box with class A, and answered the radio classification C (no other answers/annotations)
- User 2 only created 1 bounding box with class A, and a polygon with class B (no other answers/annotations)
- User 3 only created 1 bounding box with class A, answered the text classification D (no other answers/annotations)
The output composite task will have:
- The class A bounding boxes drawn by the user with the highest class A consensus score
- The class B polygons created by User 2
- The class C radio answer from User 1
- The class D text answer from User 3

Here is a visual representation of the algorithm, given three annotators working on the same image:

All Answers

The output task contains all annotations from all consensus stages/judgments, merged together.

Here is a visual representation of the algorithm, given three annotators working on the same image:

How Consensus is Calculated

Classifications

Let questionCount be the total number of classification questions in the project, and taskCount the total number of tasks assigned to an asset.

We calculate the single-question consensus for a single task as sameAnswers / taskCount, where sameAnswers is the count of answers that are equal to one another, current one included.

We repeat the above calculation for all tasks in the asset, the overall consensus on a single question (classification) is the highest value achieved during the repetitions, (y).

We repeat the above calculation for all questions in the asset, to get to the final result represented as Σ(y) below.

The final consensus score, then, is calculated as ∑(y) / questionCount.

Note on Rank Benchmarking

In the Rank classification tool, if the annotator's answers differ, in any way, with the benchmark, their score for that classification will be 0. If they are the exact same, it wil be 1 (e.g. 100%) for that classification.

Points

The algorithm checks the proportion between the distance of two points and the longest distance on the image. For example, let’s say we have an image with 500 height and 1200 width. The longest distance for the image here will be 1300.

And I have a point from Consensus one [200,200] and from Consensus two [500,600]. The distance between these points is 500. The consensus result of this two points will be 1300 / 500 = 26(%) And since we also compare the point with itself to adjust the consensus, which will be 100%, the overall consensus will be 63%.

See the following visual examples:

Other objects (Bounding Box, Polygon, PDF Area)

We calculate consensus for objects using the Intersection over Union (IoU) method.

We compare objects with one another to generate their IoU scores. If some annotations are completely separate, for example, with not even a pixel in common, their IoU score would be 0. If they overlapped completely, their score would be 100.

Noting that objects are compared to themselves too, hence for the above not-intersecting objects example the score of each object would be 50. The highest score achieved for all tasks will be taken into account as the consensus score of that tool (a tool means a unique schema id here, not the tool type).

We then average the IoU scores of all tools to calculate the final consensus score.

Output

The Consensus stage has two output: Agreement and Disagreement.

If the consensus threshold has been achieved for all labeling tools and classifications specified in the stage setup, the consensus task will be output from the Agreement output. Otherwise, it will be sent from the Disagreement output.

The Output Task

The task you will get as the output will be determined by the method you pick in the stage's Adjudication tab. Please refer to the section on adjudication for more on the task being output.

The task sent as output is not the judgment from a single annotator – it is instead a composite task. Here are this task's properties:

The output task contains the annotations with the highest consensus score, for each class, for classes where consensus can be calculated.
- For example, if the consensus stage has three judgment sub-stages, and the task has three radio classifications A, B, and C, and one bounding box class D, the task output at the end will have, for each classification, the answer annotators coalesced on the most, and for class D, the bounding boxes created by the annotator with the highest class D consensus score.
- For classes where consensus cannot be calculated (e.g. assume in our project there is a points class E and a rotated bounding box class F), the final task will have the non-calculable classes from the first user who has submitted them in the consensus stage.
- So in this case, we would have the best answers from classes A, B, and C, then the bounding boxes drawn by the user with the highest class D consensus score, and for classes E and F, we would have the answers given by the first user to submit them in the consensus stage.
If, in the Consensus stage, some annotators did not create annotations using a certain class, or did not answer some classification answers, but others did, the output task will contain them, even if not all consensus annotators responded.
- For example, if we have a project with a bounding box class A, a polygon class B, a radio classification C, and a text classification D, assuming:
  - User 1 only created 1 bounding box with class A, and answered the radio classification C (no other answers/annotations)
  - User 2 only created 1 bounding box with class A, and a polygon with class B (no other answers/annotations)
  - User 3 only created 1 bounding box with class A, answered the text classification D (no other answers/annotations)
- The output composite task will have:
  - The class A bounding boxes drawn by the user with the highest class A consensus score
  - The class B polygons created by User 2
  - The class C radio answer from User 1
  - The class D text answer from User 3

PreviousComplete NextHold

Last updated 2 months ago