secml.data

CDataset

class secml.data.c_dataset.CDataset(x, y, header=None)[source]

Bases: secml.core.c_creator.CCreator

Creates a new dataset.

A dataset consists in a 2-Dimensional patterns array, dense or sparse format, with one pattern for each row and a flat dense array with each pattern’s label.

Parameters
xarray_like or CArray

Dataset patterns, one for each row. Array is converted to 2-Dimensions before storing.

yarray_like or CArray

Dataset labels. Array is converted to dense format and flattened before storing.

headerCDatasetHeader or None, optional

The header for the dataset. Will define any extra parameter. See CDatasetHeader docs for more informations.

Examples

>>> from secml.data import CDataset
>>> ds = CDataset([[1,2],[3,4],[5,6]],[1,0,1])
>>> print(ds.X)
CArray([[1 2]
 [3 4]
 [5 6]])
>>> print(ds.Y)
CArray([1 0 1])
>>> ds = CDataset([1,2,3],1)  # Patterns will be converted to 2-Dims
>>> print(ds.X)
CArray([[1 2 3]])
>>> print(ds.Y)
CArray([1])
>>> from secml.array import CArray
>>> ds = CDataset(CArray([[1,0],[0,4],[1,0]],tosparse=True), CArray([1,0,1],tosparse=True))
>>> print(ds.X)  
CArray(  (0, 0)     1
  (1, 1)    4
  (2, 0)    1)
>>> print(ds.Y)
CArray([1 0 1])

The number of labels must be equal to the number of samples

>>> ds = CDataset([[1,2],[3,4]],1)
Traceback (most recent call last):
 ...
ValueError: number of labels (1) must be equal to the number of samples (2).
>>> from secml.data import CDatasetHeader
>>> ds = CDataset([1,2,3], 1, CDatasetHeader(id='mydataset', age=34))  # 2 extra attributes
>>> print(ds.header.id)
mydataset
>>> print(ds.header.age)
34
Attributes
X

Dataset Patterns.

Y

Dataset Labels.

class_type

Defines class type.

classes

Classes (unique).

header

Dataset header.

isdense

True if patterns are stored in dense format, else False.

issparse

True if patterns are stored in sparse format, else False.

logger

Logger for current object.

num_classes

Number of classes.

num_features

Number of features.

num_labels

Returns dataset’s number of labels.

num_samples

Number of patterns.

verbose

Verbosity level of logger output.

Methods

append(self, dataset)

Append input dataset to current dataset.

copy(self)

Returns a shallow copy of current class.

create([class_item])

This method creates an instance of a class with given type.

deepcopy(self)

Returns a deep copy of current class.

get_bounds(self[, offset])

Return dataset boundaries plus an offset.

get_class_from_type(class_type)

Return the class associated with input type.

get_labels_onehot(self)

Return dataset labels in one-hot encoding.

get_labels_ovr(self, pos_label)

Return dataset labels in one-vs-rest encoding.

get_params(self)

Returns the dictionary of class parameters.

get_subclasses()

Get all the subclasses of the calling class.

list_class_types()

This method lists all types of available subclasses of calling one.

load(path)

Loads class from pickle object.

save(self, path)

Save class object using pickle.

set(self, param_name, param_value[, copy])

Set a parameter that has a specific name to a specific value.

set_params(self, params_dict[, copy])

Set all parameters passed as a dictionary {key: value}.

timed([msg])

Timer decorator.

todense(self)

Convert dataset’s patterns to dense format.

tosparse(self)

Convert dataset’s patterns to sparse format.

property X

Dataset Patterns.

property Y

Dataset Labels.

append(self, dataset)[source]

Append input dataset to current dataset.

Parameters
datasetCDataset

Dataset to append. Patterns are appended on first axis (axis=0) so the number of features must be equal to dataset.num_features. If current dataset format is sparse, dense dataset to append will be converted to sparse and vice-versa.

Returns
CDataset

A new Dataset resulting of appending new data to existing data. Format of resulting dataset is equal to current dataset format.

See also

CArray.append

More informations about arrays append.

Notes

Append does not occur in-place: a new dataset is allocated and filled.

Examples

>>> from secml.data import CDataset
>>> ds = CDataset([[1,2],[3,4],[5,6]],[1,0,1])
>>> ds_new = ds.append(CDataset([[10,20],[30,40],[50,60]],[1,0,1]))
>>> print(ds_new.X)
CArray([[ 1  2]
 [ 3  4]
 [ 5  6]
 [10 20]
 [30 40]
 [50 60]])
>>> print(ds_new.Y)
CArray([1 0 1 1 0 1])
>>> ds_new = ds.append(CDataset([[10,20],[30,40],[50,60]],[1,0,1]).tosparse())
>>> print(ds_new.X)
CArray([[ 1  2]
 [ 3  4]
 [ 5  6]
 [10 20]
 [30 40]
 [50 60]])
>>> print(ds_new.Y)
CArray([1 0 1 1 0 1])
property classes

Classes (unique).

get_bounds(self, offset=0.0)[source]

Return dataset boundaries plus an offset.

Parameters
offsetfloat

Quantity to be added as an offset. Default 0.

Returns
boundarylist of tuple

Every tuple contain min and max feature value plus an offset for corresponding coordinate.

Examples

>>> from secml.array import CArray
>>> from secml.data import  CDataset
>>> ds = CDataset([[1,2,3],[4,5,6]], [1,2])
>>> ds.get_bounds()
[(1.0, 4.0), (2.0, 5.0), (3.0, 6.0)]
get_labels_onehot(self)[source]

Return dataset labels in one-hot encoding.

Returns
binary_labelsCArray

A (num_samples, num_classes) array with the dataset labels one-hot encoded.

Examples

>>> ds = CDataset([[11,22],[33,44],[55,66],[77,88]], [1,0,2,1])
>>> print(ds.get_labels_onehot())
CArray([[0 1 0]
 [1 0 0]
 [0 0 1]
 [0 1 0]])
get_labels_ovr(self, pos_label)[source]

Return dataset labels in one-vs-rest encoding.

Parameters
pos_labelscalar or str

Label of the class to consider as positive.

Returns
CArray

Flat array with 1 when the class label is equal to input positive class’s label, else 0.

Examples

>>> ds = CDataset([[11,22],[33,44],[55,66],[77,88]], [1,0,2,1])
>>> print(ds.get_labels_ovr(2))
CArray([0 0 1 0])
>>> print(ds.get_labels_ovr(1))
CArray([1 0 0 1])
property header

Dataset header.

property isdense

True if patterns are stored in dense format, else False.

property issparse

True if patterns are stored in sparse format, else False.

property num_classes

Number of classes.

property num_features

Number of features.

property num_labels

Returns dataset’s number of labels.

property num_samples

Number of patterns.

todense(self)[source]

Convert dataset’s patterns to dense format.

Returns
CDataset

A new CDataset with same patterns converted to dense format. Copy is avoided if possible.

Examples

>>> from secml.data import CDataset
>>> ds = CDataset(CArray([[1,2],[3,4],[5,6]], tosparse=True),[1,0,1]).todense()
>>> print(ds.X)
CArray([[1 2]
 [3 4]
 [5 6]])
tosparse(self)[source]

Convert dataset’s patterns to sparse format.

Returns
CDataset

A new CDataset with same patterns converted to sparse format. Copy is avoided if possible.

Examples

>>> from secml.data import CDataset
>>> ds = CDataset([[1,2],[3,4],[5,6]],[1,0,1]).tosparse()
>>> print(ds.X)  
CArray(  (0, 0) 1
  (0, 1)        2
  (1, 0)        3
  (1, 1)        4
  (2, 0)        5
  (2, 1)        6)
>>> print(ds.Y)
CArray([1 0 1])

CDatasetHeader

class secml.data.c_dataset_header.CDatasetHeader(**kwargs)[source]

Bases: secml.core.c_creator.CCreator

Creates a new dataset header.

Parameters to be included into the header could be defined as keyword init arguments or by setting them as new public header attributes.

Immutable objects (scalar, string, tuple, dictionary) will be passed as they are while indexing the header. Arrays will be indexed and the result of indexing will be returned.

To extract a dictionary with the entire set of attributes,

use .get_params().

Parameters
kwargsany, optional

Any extra attribute of the dataset. Could be an immutable object (scalar, tuple, dict, str), or a vector-like CArray. Lists are automatically converted to vector-like CArrays.

Examples

>>> from secml.data import CDatasetHeader
>>> from secml.array import CArray
>>> ds_header = CDatasetHeader(id='mydataset', colors=CArray([1,2,3]))
>>> print(ds_header.id)
mydataset
>>> print(ds_header.colors)
CArray([1 2 3])
>>> ds_header.age = 32
>>> print(ds_header.age)
32
Attributes
class_type

Defines class type.

logger

Logger for current object.

num_samples

The number of samples for which the header defines extra params.

verbose

Verbosity level of logger output.

Methods

append(self, header)

Append input header to current header.

copy(self)

Returns a shallow copy of current class.

create([class_item])

This method creates an instance of a class with given type.

deepcopy(self)

Returns a deep copy of current class.

get_class_from_type(class_type)

Return the class associated with input type.

get_params(self)

Returns the dictionary of class parameters.

get_subclasses()

Get all the subclasses of the calling class.

list_class_types()

This method lists all types of available subclasses of calling one.

load(path)

Loads class from pickle object.

save(self, path)

Save class object using pickle.

set(self, param_name, param_value[, copy])

Set a parameter that has a specific name to a specific value.

set_params(self, params_dict[, copy])

Set all parameters passed as a dictionary {key: value}.

timed([msg])

Timer decorator.

append(self, header)[source]

Append input header to current header.

Parameters
headerCDatasetHeader

Header to append. Only attributes which are arrays are merged. Other attributes are set if not already defined in the current header. Otherwise, the value of the attributes in the input header should be equal to the value of the same attribute in the current header.

Returns
CDatasetHeader

See also

CArray.append

More informations about arrays append.

Notes

Append does not occur in-place: a new header is allocated and filled.

Examples

>>> from secml.data import CDatasetHeader
>>> from secml.array import CArray
>>> ds_header1 = CDatasetHeader(id={'a': 0, 'b': 2}, a=2, age=CArray([1,2,3]))
>>> ds_header2 = CDatasetHeader(id={'a': 0, 'b': 2}, b=4, age=CArray([1,2,3]))
>>> ds_merged = ds_header1.append(ds_header2)
>>> ds_merged.age
CArray(6,)(dense: [1 2 3 1 2 3])
>>> ds_merged.id  
{'a': 0, 'b': 2}
>>> ds_merged.a
2
>>> ds_merged.b
4
property num_samples

The number of samples for which the header defines extra params.

CDatasetPyTorch

data_utils

secml.data.data_utils.label_binarize_onehot(y)[source]

Return dataset labels in one-hot encoding.

Parameters
yCArray

Array with the labels to encode. Only integer labels are supported.

Returns
binary_labelsCArray

A (num_samples, num_classes) array with the labels one-hot encoded.

Examples

>>> a = CArray([1,0,2,1])
>>> print(label_binarize_onehot(a))
CArray([[0 1 0]
 [1 0 0]
 [0 0 1]
 [0 1 0]])