Subtask 1
IMPORTANT: please note that there is a closed and an open track for this subtask!
In subtask 1 the goal is to predict labels for each text in a dataset where the labels are derived from the original labels assigned by several human annotators.
The human annotators assigned (according to the annotation guidelines ) the strength of misogyny/sexism present in the given text via the following labels:
0-Kein: no sexism/misogyny present1-Gering: mild sexism/misogyny2-Vorhanden: sexism/misogyny present3-Stark: strong sexism/misogyny4-Extrem: extreme sexism/misogyny
While the annotation guidelines define what kind of sexism/misogyny should get annotated, there has been made no attempt to give rules about how to decide on the strength. For this reason, if an annotator decided that sexism/misogyny is present in a text, the strength assigned is a matter of personal judgement.
The labels to predict in subtask 1 reflect different strategies for how multiple labels from annotators can be used to derive a final target label:
bin_maj: predict1if a majority of annotators assigned a label other than0-Kein, predict0if a majority of annotators assigned a label0-Kein. If there was no majority, then both the label1and0will count as correct in the evaluation.bin_one: predict1if at least one annotator assigned a label other than0-Kein,0otherwisebin_all: predict1if all annotators assigned labels other than0-Kein,0otherwisemulti_maj: predict the majority label if there is one, if there is no majority label, any of the labels assigned is counted as a correct prediction for evaluationdisagree_bin: predict1if there is disagreement between annotators on0-Keinversus all other labels and0otherwise
Data
For the trial phase of subtask 1, we provide a small dataset, containing
- a small labeled dataset containing ‘id’, ‘text’, and ‘annotations’ (annotator ids and the label assigned by them)
- a small unlabeled dataset containing ‘id’, ‘text’ and ‘annotators’ (annotator ids)
For the development phase of subtask 1, we provide all participants with the following data:
- the labeled training set containing ‘id’, ‘text’, and ‘annotations’ (annotator ids and the label assigned by them)
- the unlabeled dev set containing ‘id’, ‘text’ and ‘annotators’ (annotator ids)
For the competition phase of subtask 1, we provide
- the unlabeled test set containing ‘id’, ‘text’ and ‘annotators’ (annotator ids)
All of the five files are in JSONL format (one JSON-serialized object per line) where each object is a dictionary with the following fields:
id: a hash that identifies the exampletext: the text to classify. The text can contain arbitrary Unicode and new linesannotations(only in the labeled dataset): an array of dictionaries which contain the following key/value pairs:user: a string in the form “A003” which is an anonymized id for the annotator who assigned the labellabel: the label assigned by the annotator- Note that the number of annotations and the specific annotators who assigned labels vary between examples
annotators(only in the unlabeled dataset): an array of annotator ids who labeled the example
You can download the data for each phase as soon as the corresponding phase starts.
Submission
Your submission must be a file in TSV (tab separated values) format which contains the following columns in any order:
id: the id of the example in the unlabeled dataset for which the predictions are submittedbin_maj: prediction of0or1bin_one: prediction of0or1bin_all: prediction of0or1multi_maj: prediction of one of0-Kein,1-Gering,2-Vorhanden,3-Stark,4-Extremdisagree_bin: predictiction of1or0
Note that the way how you derive those labels is up to you (as long as the rules for the closed or open tracks are followed):
- you can train several models or a single model to get the predictions
- you can derive the model-specific training set in any way from the labeled training data
- you can use the information of which annotator assigned the label or ignore that
To submit your predictions to the competition:
- the file MUST have the file name extension
.tsv - the TSV file must get compressed into a ZIP file with extension
.zip - the ZIP file should then get uploaded as a submission to the correct competition.
- !! Please make sure you submit to the competition that corresponds to the correct subtask (1 or 2) and correct track (Open or Closed)!
- under “My Submissions” make sure to fill out the form and:
- enter the name of your team which has been registered for the competition
- give a name to your method
- confirm that you have checked that you are indeed submitting to the correct competition for the subtask and track desired
Submission errors and warnings
- Always make sure a phase is selected before trying to upload your submission.
- A submission is successful, if it has the submission status ‘finished’. ‘Failed’ submissions can be investigated for error sources by clicking at ‘?’ next to ‘failed’ and looking at LOGS > scoring logs > stderr.
- If you experience any issue such as a submission file stuck with a “scoring” status, please cancel the submission and try again. In case the problem persists you can contact us using the Forum.
- Following a successful submission, you need to refresh the submission page in order to see your score and your result on the leaderboard.
Phases
- For the trial phase, multiple submissions are allowed for getting to know the problem and the subtask.
- For the development phase, multiple submissions are allowed and they serve the purpose of developing and improving the model(s).
- For the competition phase, participants may only submit a limited number of times. Please note that only the latest valid submission determines the final task ranking.
Evaluation
System performance on all five predicted labels (bin_maj, bin_one, bin_all, multi_maj, disagree_bin) is evaluated using F1 macro score
over all classes.
The final score which is used for ranking the submissions is calculated as the unweighted average over all 5 scores.
Following a successful submission, you need to refresh the web page in order to see your score and your result on the leaderboard.