^^Recupero e precisione. Recall and precision.

Per mentalizzarci ragioniamo su un esempio di riferimento

Classificatore di immagini. “cat” vs “not cat” image classifier.

il classificatore di immagini quando valuta la singola immagine puo'

sbagliare
1. valutare gatto un n-gatto. Falso positivo.
  - Aggiunge dei falsi all'insieme selezione.
2. valutare n-gatto un gatto. Falso negativo.
  - fa mancare elementi all'insieme selezione.
esattare
1. gatto al gatto. Pane al pane. Vero positivo.
2. n-gatto al n-gatto. Vino al vino. Vero negativo.

Gli errori di selezione sugli elementi, producono un errore complessivo sull'insieme selezionato

dovrebbe	invece
contenere solo gatti	contiene anche non gatti
contenere tutti i gatti	mancano immagini di gatti

Esempi

Es information retrieval

Es media monitoring

many customers expect an almost perfect recall. They never want to miss an article about the subjects they are interested in. However precision is not as important (albeit highly desirable), if you get a bit of noise in the article feed, you are usually fine.

Classificare giudicare recuperare selezionare riconoscere ricercare ...

Nei vari analoghi casi, l'operazione e' come sopra detta.

selezione perfetta, esatta: sono selezionati tutti e soli i rilevanti
individui rilevanti, positivi: sono tali per definizione; e' il giudizio di riferimento; sono quelli voluti-desiderati, che ci si aspetta il classificatore selezioni, da recuperare.
insieme degli individui rilevanti; ground truth set (T-set); insieme dei veri VdiRiferimento: il nome e' la definizione
irrilevanti; negativi; insieme dei falsi FdiRiferimento: da scartare
recuperati-selezionati-riconosciuti;: selezioinati dal classificatore
insieme dei recuperati, classifier set: il nome e' la definizione
high precision: getting almost all correct
high recall: getting almost them all

Some times its more important

high precision: getting all correct, than getting them all.
high recall: getting them all, than getting all correct.

	low recall, high precision	high recall, high precision
↑ ↑ ↑ h i g h e r p r e c i s i o n
	low recall, low precision	high recall, low precision
	>>> higher recall >>>

e' un diagramma insiemistico

rilevanti, a sx, i "pieni"
irrilevanti, a dx, i "vuoti"
verde: classificazione esatta
bianco: classificaz errata

rettangolo interno: insieme selezionato

Il messaggio e' ridondante: basterebbe o il verde o il rettangolo interno.

Es riconoscimento, Eg. “cat” vs “not cat” image classifier

e
verything within the dotted line represents the system output for “cats”
everything within the full line represents the ground truth “cats” (actual images of “cats”)
everything inside the outer square represents the full set of images.

↑

↑
↑
h
i
g
h
e
r

p
r
e
c
i
s
i
o
n

Low recall, high precision

high recall, high precision

Low recall, low precision

high recall, low precision

>>> higher recall >>>

High recall, low precision

classifier casts a very wide net, catches a lot of fish, but also a lot of other things.

it thinks a lot of things are “cats”, who are not .
it thinks a lot of “cats” are “cats”.

So from the set of images we got a lot of images classified as “cats”,

many of them was in the set of actual “cats”
a lot of them were also “not cats”.

Low recall, high precision

classifier casts a very small but highly specialised net, does not catch a lot of fish, but almost only fish.

it is very picky, and does not think many things are cats.
almost all the images it thinks are “cats”, are really “cats”.

However it also misses a lot of actual “cats”, because it is so very picky.

High recall, high precision

classifier casts a very wide net

and highly specialised, catches a lot of fish, almost only fish.

It is very good: it is very picky, but still it gets almost all of the images of cats.

The math

Precision = tp÷(tp+fp)

The relation between true positives and the total number of true positives and false positives.

Recall = tp ÷ (tp+fn)

The relation between true positives to the total number of true positives and false negatives. [https://en.wikipedia.org/wiki/Precision_and_recall]

These metrics are important in general machine learning, deep learning, NLP Natural Language Processing.

Approfond

Difficolta' di denominazione

L'insieme di tutte le immagini ha 2 parti

rilevanti/irrilevanti ≡ positivi/negativi ≡ gatti/n-gatti

Il classificatore classifica, anch'esso in binario,

si potrebbero usare gli stessi nomi articolando

gatti/n-gatti secondo il riferimento
gatti/n-gatti secondo il classificatore

v/f valutazione del classificatore

V/F valutazione di riferimento

	V	F
v	vV	vF
f	fV	fF

	P	N
v	vP	vN
f	fP	fN

Come terminologia astratta si puo' adottare

insiemistica: elemento appartiene all'insieme
statistica: individuo appartiene alla popolazione

Per selezionare un sottoinsieme:

Insiemistica: proposizione logica applicabile

In generale in pratica: Classificazione di specificazione; classificazione binaria.

In matematica: Assioma di selezione. Assioma di specificazione.

True positive: correctly classified as belonging to T-set
Eg. correctly classified image of a cat as being a “cat”.
True negative: correctly classified as not belonging to T-class
Eg. correctly classified image of a dog as not being a “cat”.
False positive: incorrectly classified as belonging to T-class
Eg. incorrectly classified image of a dog as being a “cat”.
False negative: incorrectly classified as not belonging to T-class
Eg. incorrectly classified image of a cat as not being a “cat”.

Talk

Links

Esempio prototipico, paradigmatico, canonico, classico, tradizionale, ...

Terminologia

Nell'insieme universo, si seleziona un sottoinsieme, tramite una condizione di tipo logico Vero/Falso applicabile ad ognuno degli elementi.

Il sottoinsieme selezionato e' l'insieme degli elementi per cui la condizione risulta vera.

tra tutte le immagini disponibili, il classificatore seleziona quelle volute.

Pero' il classificatore non e' perfetto, quando classifica la singola immagine puo' sbagliare