Supported Encodings¶
The design of this packages revolves around a number of immutable
types, each of which representing a specific label-encoding.
These types are contained within their own namespace LabelEnc
.
The reason for the namespace is mainly convenience, as it allows
for a simple form of auto-completion and also more concise names that
could otherwise be considered to be too ambiguous.
Abstract LabelEncoding¶
We offer a number of different encodings that can best be described in terms of two orthogonal properties. The first property is the number of classes it represents, and the second property is the number of array dimensions it operates on.
-
LabelEncoding{T,K,M}
¶ Abstract super-type of all label encodings. Mainly intended for dispatch. As such this type is not exported. It defines three type-parameters that are useful to divide the different encodings into groups.
-
T
¶ The label-type of the encoding, which specifies which concrete type all label of that particular encoding have.
-
K
¶ The number of labels that the label-encoding can deal with. So for binary encodings this will be the constant
2
-
M
¶ The number of array dimensions that the encoding works with. For most encodings this will be
1
, meaning that a target array of that encoding is expected to be some vector. In contrast to this does the encodingLabelEnc.OneOfK
definedM=2
, because it represents the target array as a matrix.
-
TrueFalse¶
-
LabelEnc.
TrueFalse
¶ Denotes the classes as boolean values, for which
true
corresponds to the positive class, andfalse
to the negative class.julia> supertype(LabelEnc.TrueFalse) MLLabelUtils.LabelEncoding{Bool,2,1}
It belongs to the family of binary vector-based encodings, and as such represents the targets as a vector that is using only two distinct values for its elements. That implies that it is per defintion always binary and as such the number of labels can be inferred at compile time.
julia> nlabel(LabelEnc.TrueFalse) 2
-
TrueFalse
() → LabelEnc.TrueFalse¶ Returns the singleton that represents the encoding. All information about the encoding is already contained withing the type. As such there is no need to specify additional parameters.
For more information on how to use such an encoding, please look at the corresponding parts of the documentation.
julia> true_targets = [false, true, true, false];
julia> labelenc(true_targets)
MLLabelUtils.LabelEnc.TrueFalse()
julia> label(LabelEnc.TrueFalse())
2-element Array{Bool,1}:
true
false
julia> nlabel(LabelEnc.TrueFalse())
2
ZeroOne¶
-
LabelEnc.
ZeroOne
¶ Denotes the classes as numeric values, for which
1
corresponds to the positive class, and0
to the negative class. This type of encoding is often used when the predictions denote a probabilty.julia> supertype(LabelEnc.ZeroOne) MLLabelUtils.LabelEncoding{T<:Number,2,1}
It belongs to the family of binary numeric vector-based encodings, and as such represents the targets as a vector that is using only two distinct values for its elements. In fact, it is by definition always binary and as such the number of labels can be inferred at compile time.
julia> nlabel(LabelEnc.ZeroOne) 2
This type also comes with support for classification (see
classify()
). It assumes that the raw predictions (often called \(\hat{y}\)) are in the closed interval \([0, 1]\) and represent something resembling a probabilty (or some degree of certainty) that the observation is of the positive class. That means that in order to classify a raw prediction to either positive or negative, one needs to decide on a “threshold” parameter, which determines at which degree of certainty a prediction is “good enough” to classify as positive.-
threshold
¶ A real number between 0 and 1 that defines the “cutoff” point for classification. Any prediction less than this value will be classified as negative and any prediction equal to or greater than this value will be classified as a positive prediction.
-
-
ZeroOne
([labeltype][, threshold]) → LabelEnc.ZeroOne¶ Creates a new label-encoding of the
LabelEnc.ZeroOne
family.Parameters: - labeltype (DataType) – The type that should be used to
represent the labels. Has to be a
subtype of
Number
. Defaults toFloat64
. - threshold (Number) – The classification threshold that
should be used in
classify()
. Defaults to0.5
.
- labeltype (DataType) – The type that should be used to
represent the labels. Has to be a
subtype of
For more information on how to use such an encoding, please look at the corresponding parts of the documentation.
julia> LabelEnc.ZeroOne(Int, 0.3) # threshold = 0.3
MLLabelUtils.LabelEnc.ZeroOne{Int64,Float64}(0.3)
julia> true_targets = [0, 1, 1, 0];
julia> labelenc(true_targets)
MLLabelUtils.LabelEnc.ZeroOne{Int64,Float64}(0.5)
julia> label(LabelEnc.ZeroOne())
2-element Array{Float64,1}:
1.0
0.0
julia> nlabel(LabelEnc.ZeroOne())
2
MarginBased¶
-
LabelEnc.
MarginBased
¶ Denotes the classes as numeric values, for which
1
corresponds to the positive class, and-1
to the negative class. This type of encoding is very prominent for margin-based classifier, in particular SVMs.julia> supertype(LabelEnc.MarginBased) MLLabelUtils.LabelEncoding{T<:Number,2,1}
It belongs to the family of binary numeric vector-based encodings, and as such represents the targets as a vector that is using only two distinct values for its elements. In fact, it is by definition always binary and as such the number of labels can be inferred at compile time.
julia> nlabel(LabelEnc.MarginBased) 2
This type also comes with support for classification (see
classify()
). It expects the raw predictions to be real numbers of arbitrary value. The decision boundary between classifying into a negative or a positive label is predefined at zero. More precisely a raw prediction greater than or equal to zero is considered a positive prediction, while any strictly negative raw prediction is considered a negative prediction.
-
MarginBased
([labeltype]) → LabelEnc.MarginBased¶ Creates a new label-encoding of the
LabelEnc.MarginBased
family.Parameters: labeltype (DataType) – The type that should be used to represent the labels. Has to be a subtype of Number
. Defaults toFloat64
.
For more information on how to use such an encoding, please look at the corresponding parts of the documentation.
julia> true_targets = [-1, 1, 1, -1];
julia> labelenc(true_targets)
MLLabelUtils.LabelEnc.MarginBased{Int64}()
julia> label(LabelEnc.MarginBased())
2-element Array{Float64,1}:
1.0
-1.0
julia> nlabel(LabelEnc.MarginBased())
2
OneVsRest¶
-
LabelEnc.
OneVsRest
¶ This is a special type of binary encoding that allows to convert a multi-class problem into a binary one. It does so by only “caring” about what the positive label is, and treating everything that is not equal to it as negative.
julia> supertype(LabelEnc.OneVsRest) MLLabelUtils.LabelEncoding{T,2,1}
It belongs to the family of binary vector-based encodings. It is by definition always binary and as such the number of labels can be inferred at compile time.
julia> nlabel(LabelEnc.OneVsRest) 2
While this encoding only uses to positive label to assert class membership, it still needs to have a placeholder-value of the same type for a negative label in order for
convertlabel()
to work.-
poslabel
¶ The value that will be used to represent the positive class. This value will be used to determine if a given value is positive (if it is equal) or negative.
-
neglabel
¶ Placeholder to represent the negative class. This value will not be used to determine membership, but simply to impute a reasonable value when converting to such an encoding.
-
-
OneVsRest
(poslabel[, neglabel]) → LabelEnc.OneVsRest¶ Creates a new label-encoding of the one-vs-rest family. While both a positive and a negative label have to be known to the encoding, only the positive label is used for comparision and asserting class membership. Note that both parameter have to be of the same type.
Parameters: - poslabel (Any) – The label of interest.
- neglabel (Any) – The negative label. It is optional for the common types, such as symbol, number, or string. For label-types other than that it has to be provided explicitly.
For more information on how to use such an encoding, please look at the corresponding parts of the documentation.
julia> true_targets = [:yes, :no, :maybe, :yes];
julia> convertlabel(LabelEnc.OneVsRest(:yes), true_targets)
4-element Array{Symbol,1}:
:yes
:not_yes
:not_yes
:yes
julia> convertlabel(LabelEnc.MarginBased, true_targets, LabelEnc.OneVsRest(:yes))
4-element Array{Float64,1}:
1.0
-1.0
-1.0
1.0
julia> label(LabelEnc.OneVsRest(:yes))
2-element Array{Symbol,1}:
:yes
:not_yes
julia> nlabel(LabelEnc.OneVsRest(:yes))
2
Indices¶
-
LabelEnc.
Indices
¶ A multiclass encoding that uses the integer numbers in \(\{1, 2, ..., K\}\) as label to denote the classes. While these “indices” are integers in terms of their values, they don’t need to be
Int
as a type.julia> supertype(LabelEnc.Indices) MLLabelUtils.LabelEncoding{T<:Number,K,1}
It belongs to the family of numeric vector-based encodings and can encode any number of classes. As such the number of labels
K
is a free type-parameter. It is considered a binary encoding if and only ifK = 2
-
Indices
([labeltype, ]k) → LabelEnc.Incides¶ Creates a new label-encoding of the
LabelEnc.Indices
family.Parameters: - labeltype (DataType) – The type that should be used to
represent the labels. Has to be a
subtype of
Number
. Defaults toInt
. - k (Int) – The number of classes that the concoding
should represent. This parameter can be
specified as an
Int
or in type-stable manner asVal{k}
- labeltype (DataType) – The type that should be used to
represent the labels. Has to be a
subtype of
For more information on how to use such an encoding, please look at the corresponding parts of the documentation.
julia> true_targets = [1, 2, 1, 3, 1, 2];
julia> labelenc(true_targets)
MLLabelUtils.LabelEnc.Indices{Int64,3}()
julia> label(LabelEnc.Indices(3))
3-element Array{Int64,1}:
1
2
3
julia> label(LabelEnc.Indices(Float32,4))
4-element Array{Float32,1}:
1.0
2.0
3.0
4.0
julia> nlabel(LabelEnc.Indices(Val{5})) # type-stable
5
OneOfK¶
-
LabelEnc.
OneOfK
¶ A multi-class encoding that uses one of the two matrix dimensions to denote the label. More precisely other words it uses an indicator-encoding to explicitly state what class an observation represents and what it does not represent, by only setting one element of each observation to
1
and the rest to0
julia> supertype(LabelEnc.OneOfK) MLLabelUtils.LabelEncoding{T<:Number,K,2}
It belongs to the family of numeric matrix-based encodings and can encode any number of classes. As such the number of labels
K
is a free type-parameter. It is considered a binary encoding if and only ifK = 2
-
OneOfK
([labeltype, ]k) → LabelEnc.OneOfK¶ Creates a new label-encoding of the matrix-based
LabelEnc.OneOfK
family.Parameters: - labeltype (DataType) – The type that should be used to
represent the labels. Has to be a
subtype of
Number
. Defaults toInt
. - k (Int) – The number of classes that the concoding
should represent. This parameter can be
specified as an
Int
or in type-stable manner asVal{k}
- labeltype (DataType) – The type that should be used to
represent the labels. Has to be a
subtype of
For more information on how to use such an encoding, please look at the corresponding parts of the documentation.
julia> true_targets = [0 1 0 0; 1 0 1 0; 0 0 0 1]
3×4 Array{Int64,2}:
0 1 0 0
1 0 1 0
0 0 0 1
julia> labelenc(true_targets)
MLLabelUtils.LabelEnc.OneOfK{Int64,3}()
julia> label(LabelEnc.OneOfK(Float32, 4)) # returns the indices
4-element Array{Int64,1}:
1
2
3
4
julia> ind2label(3, LabelEnc.OneOfK(Float32, 4))
4-element Array{Float32,1}:
0.0
0.0
1.0
0.0
julia> nlabel(LabelEnc.OneOfK(Val{4}))
4
NativeLabels¶
-
LabelEnc.
NativeLabels
¶ A multi-class encoding that can use any abritrary values to represent any number of labels. It does so by mapping each label-index to a class label. The class labels can be of arbitrary type as long as the type is consistent for all labels. Furthermore, all labels have to be specified explicitly.
julia> supertype(LabelEnc.NativeLabels) MLLabelUtils.LabelEncoding{T,K,1}
It belongs to the family of vector-based encodings that can encode any number of classes. As such the number of labels
K
is a free type-parameter. It is considered a binary encoding if and only ifk = 2
-
label
¶ A vector that contains all the used labels in their defined order. If it only contains two values, then the first value will be interpreted as the positive label and the second value as the negative label.
-
invlabel
¶ A Dict that maps each label to their index in the vector label. This map is used for fast lookup and generated automatically.
-
-
NativeLabels
(label[, k]) → LabelEnc.NativeLabels¶ Creates a new vector-based label-encoding for the given values in label. The values in label are expected to be distinct.
Parameters: - label (Vector) – The label that the encoding should use in their intended order
- k (DataType) – The number of labels in label. This
paramater is optional and will be computed
from label if omited. However, if the
number of labels is known at compile time
this parmater can be provided using
Val{k}
For more information on how to use such an encoding, please look at the corresponding parts of the documentation.
julia> true_targets = [:a, :b, :a, :c, :b, :a];
julia> le = labelenc(true_targets)
MLLabelUtils.LabelEnc.NativeLabels{Symbol,3}(Symbol[:a,:b,:c],Dict(:c=>3,:a=>1,:b=>2))
julia> label(le)
3-element Array{Symbol,1}:
:a
:b
:c
julia> nlabel(le)
3
julia> LabelEnc.NativeLabels([:yes, :no, :maybe], Val{3}) # type inferrable
MLLabelUtils.LabelEnc.NativeLabels{Symbol,3}(Symbol[:yes,:no,:maybe],Dict(:yes=>1,:maybe=>3,:no=>2))
FuzzyBinary¶
-
LabelEnc.
FuzzyBinary
¶ A vector-based binary label interpretation without a specific labeltype. It is primarily intended for fuzzy comparision of binary true targets and predicted targets. It basically assumes that the encoding is either TrueFalse, ZeroOne, or MarginBased by treating all non-negative values as positive outputs.