---
title: Multi Label Precision Recall Accuracy Fscore
datasets:
-  
tags:
- evaluate
- metric
description: "TODO: add a description here"
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
---

# Metric Card for Multi Label Precision Recall Accuracy Fscore
Implementation of example based evaluation metrics for multi-label classification presented in Zhang and Zhou (2014).

## How to Use

    >>> multi_label_precision_recall_accuracy_fscore = evaluate.load("mdocekal/multi_label_precision_recall_accuracy_fscore")
    >>> results = multi_label_precision_recall_accuracy_fscore.compute(
                predictions=[
                    ["0", "1"],
                    ["1", "2"],
                    ["0", "1", "2"],
                ],
                references=[
                    ["0", "1"],
                    ["1", "2"],
                    ["0", "1", "2"],
                ]
            )
    >>> print(results)
    {
        "precision": 1.0,
        "recall": 1.0,
        "accuracy": 1.0,
        "fscore": 1.0
    }

There is also multiset configuration available, which allows to calculate the metrics for multi-label classification with repeated labels.
It uses the same definition as in previous case, but it works with multiset of labels. Thus, intersection, union, and cardinality for multisets are used instead.

    >>> results = multi_label_precision_recall_accuracy_fscore.compute(
                predictions=[
                    [0, 1, 1]
                ],
                references=[
                    [1, 0, 1, 1, 0, 0],
                ]
            )
    >>> print(results)
    {
        "precision": 1.0,
        "recall": 0.5,
        "accuracy": 0.5,
        "fscore": 0.6666666666666666
    }

### Inputs
*List all input arguments in the format below*
- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*

### Output Values

*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*

*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*

#### Values from Popular Papers
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*

### Examples
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*

## Limitations and Bias
*Note any known limitations or biases that the metric has, with links and references if possible.*

## Citation
*Cite the source where this metric was introduced.*

## Further References
*Add any useful further references.*