secml.data¶
CDataset¶
-
class
secml.data.c_dataset.
CDataset
(x, y, header=None)[source]¶ Bases:
secml.core.c_creator.CCreator
Creates a new dataset.
A dataset consists in a 2-Dimensional patterns array, dense or sparse format, with one pattern for each row and a flat dense array with each pattern’s label.
- Parameters
- xarray_like or CArray
Dataset patterns, one for each row. Array is converted to 2-Dimensions before storing.
- yarray_like or CArray
Dataset labels. Array is converted to dense format and flattened before storing.
- headerCDatasetHeader or None, optional
The header for the dataset. Will define any extra parameter. See CDatasetHeader docs for more information.
Examples
>>> from secml.data import CDataset
>>> ds = CDataset([[1,2],[3,4],[5,6]],[1,0,1]) >>> print(ds.X) CArray([[1 2] [3 4] [5 6]]) >>> print(ds.Y) CArray([1 0 1])
>>> ds = CDataset([1,2,3],1) # Patterns will be converted to 2-Dims >>> print(ds.X) CArray([[1 2 3]]) >>> print(ds.Y) CArray([1])
>>> from secml.array import CArray >>> ds = CDataset(CArray([[1,0],[0,4],[1,0]],tosparse=True), CArray([1,0,1],tosparse=True)) >>> print(ds.X) CArray( (0, 0) 1 (1, 1) 4 (2, 0) 1) >>> print(ds.Y) CArray([1 0 1])
The number of labels must be equal to the number of samples
>>> ds = CDataset([[1,2],[3,4]],1) Traceback (most recent call last): ... ValueError: number of labels (1) must be equal to the number of samples (2).
>>> from secml.data import CDatasetHeader >>> ds = CDataset([1,2,3], 1, CDatasetHeader(id='mydataset', age=34)) # 2 extra attributes >>> print(ds.header.id) mydataset >>> print(ds.header.age) 34
- Attributes
X
Dataset Patterns.
Y
Dataset Labels.
class_type
Defines class type.
classes
Classes (unique).
header
Dataset header.
isdense
True if patterns are stored in dense format, else False.
issparse
True if patterns are stored in sparse format, else False.
logger
Logger for current object.
num_classes
Number of classes.
num_features
Number of features.
num_labels
Returns dataset’s number of labels.
num_samples
Number of patterns.
verbose
Verbosity level of logger output.
Methods
append
(self, dataset)Append input dataset to current dataset.
copy
(self)Returns a shallow copy of current class.
create
([class_item])This method creates an instance of a class with given type.
deepcopy
(self)Returns a deep copy of current class.
get_bounds
(self[, offset])Return dataset boundaries plus an offset.
get_class_from_type
(class_type)Return the class associated with input type.
get_labels_onehot
(self)Return dataset labels in one-hot encoding.
get_labels_ovr
(self, pos_label)Return dataset labels in one-vs-rest encoding.
get_params
(self)Returns the dictionary of class hyperparameters.
get_state
(self, **kwargs)Returns the object state dictionary.
get_subclasses
()Get all the subclasses of the calling class.
list_class_types
()This method lists all types of available subclasses of calling one.
load
(path)Loads object from file.
load_state
(self, path)Sets the object state from file.
save
(self, path)Save class object to file.
save_state
(self, path, **kwargs)Store the object state to file.
set
(self, param_name, param_value[, copy])Set a parameter of the class.
set_params
(self, params_dict[, copy])Set all parameters passed as a dictionary {key: value}.
set_state
(self, state_dict[, copy])Sets the object state using input dictionary.
timed
([msg])Timer decorator.
todense
(self)Convert dataset’s patterns to dense format.
tosparse
(self)Convert dataset’s patterns to sparse format.
-
property
X
¶ Dataset Patterns.
-
property
Y
¶ Dataset Labels.
-
append
(self, dataset)[source]¶ Append input dataset to current dataset.
- Parameters
- datasetCDataset
Dataset to append. Patterns are appended on first axis (axis=0) so the number of features must be equal to dataset.num_features. If current dataset format is sparse, dense dataset to append will be converted to sparse and vice-versa.
- Returns
- CDataset
A new Dataset resulting of appending new data to existing data. Format of resulting dataset is equal to current dataset format.
See also
CArray.append
More information about arrays append.
Notes
Append does not occur in-place: a new dataset is allocated and filled.
Examples
>>> from secml.data import CDataset
>>> ds = CDataset([[1,2],[3,4],[5,6]],[1,0,1]) >>> ds_new = ds.append(CDataset([[10,20],[30,40],[50,60]],[1,0,1])) >>> print(ds_new.X) CArray([[ 1 2] [ 3 4] [ 5 6] [10 20] [30 40] [50 60]]) >>> print(ds_new.Y) CArray([1 0 1 1 0 1])
>>> ds_new = ds.append(CDataset([[10,20],[30,40],[50,60]],[1,0,1]).tosparse()) >>> print(ds_new.X) CArray([[ 1 2] [ 3 4] [ 5 6] [10 20] [30 40] [50 60]]) >>> print(ds_new.Y) CArray([1 0 1 1 0 1])
-
property
classes
¶ Classes (unique).
-
get_bounds
(self, offset=0.0)[source]¶ Return dataset boundaries plus an offset.
- Parameters
- offsetfloat
Quantity to be added as an offset. Default 0.
- Returns
- boundarylist of tuple
Every tuple contain min and max feature value plus an offset for corresponding coordinate.
Examples
>>> from secml.array import CArray >>> from secml.data import CDataset
>>> ds = CDataset([[1,2,3],[4,5,6]], [1,2]) >>> ds.get_bounds() [(1.0, 4.0), (2.0, 5.0), (3.0, 6.0)]
-
get_labels_onehot
(self)[source]¶ Return dataset labels in one-hot encoding.
- Returns
- binary_labelsCArray
A (num_samples, num_classes) array with the dataset labels one-hot encoded.
Examples
>>> ds = CDataset([[11,22],[33,44],[55,66],[77,88]], [1,0,2,1]) >>> print(ds.get_labels_onehot()) CArray([[0 1 0] [1 0 0] [0 0 1] [0 1 0]])
-
get_labels_ovr
(self, pos_label)[source]¶ Return dataset labels in one-vs-rest encoding.
- Parameters
- pos_labelscalar or str
Label of the class to consider as positive.
- Returns
- CArray
Flat array with 1 when the class label is equal to input positive class’s label, else 0.
Examples
>>> ds = CDataset([[11,22],[33,44],[55,66],[77,88]], [1,0,2,1]) >>> print(ds.get_labels_ovr(2)) CArray([0 0 1 0]) >>> print(ds.get_labels_ovr(1)) CArray([1 0 0 1])
-
property
header
¶ Dataset header.
-
property
isdense
¶ True if patterns are stored in dense format, else False.
-
property
issparse
¶ True if patterns are stored in sparse format, else False.
-
property
num_classes
¶ Number of classes.
-
property
num_features
¶ Number of features.
-
property
num_labels
¶ Returns dataset’s number of labels.
-
property
num_samples
¶ Number of patterns.
-
todense
(self)[source]¶ Convert dataset’s patterns to dense format.
- Returns
- CDataset
A new CDataset with same patterns converted to dense format. Copy is avoided if possible.
Examples
>>> from secml.data import CDataset
>>> ds = CDataset(CArray([[1,2],[3,4],[5,6]], tosparse=True),[1,0,1]).todense() >>> print(ds.X) CArray([[1 2] [3 4] [5 6]])
-
tosparse
(self)[source]¶ Convert dataset’s patterns to sparse format.
- Returns
- CDataset
A new CDataset with same patterns converted to sparse format. Copy is avoided if possible.
Examples
>>> from secml.data import CDataset
>>> ds = CDataset([[1,2],[3,4],[5,6]],[1,0,1]).tosparse() >>> print(ds.X) CArray( (0, 0) 1 (0, 1) 2 (1, 0) 3 (1, 1) 4 (2, 0) 5 (2, 1) 6) >>> print(ds.Y) CArray([1 0 1])
CDatasetHeader¶
-
class
secml.data.c_dataset_header.
CDatasetHeader
(**kwargs)[source]¶ Bases:
secml.core.c_creator.CCreator
Creates a new dataset header.
Parameters to be included into the header could be defined as keyword init arguments or by setting them as new public header attributes.
Immutable objects (scalar, string, tuple, dictionary) will be passed as they are while indexing the header. Arrays will be indexed and the result of indexing will be returned.
- To extract a dictionary with the entire set of attributes,
use .get_params().
- Parameters
- kwargsany, optional
Any extra attribute of the dataset. Could be an immutable object (scalar, tuple, dict, str), or a vector-like CArray. Lists are automatically converted to vector-like CArrays.
Examples
>>> from secml.data import CDatasetHeader >>> from secml.array import CArray
>>> ds_header = CDatasetHeader(id='mydataset', colors=CArray([1,2,3]))
>>> print(ds_header.id) mydataset >>> print(ds_header.colors) CArray([1 2 3])
>>> ds_header.age = 32 >>> print(ds_header.age) 32
- Attributes
class_type
Defines class type.
logger
Logger for current object.
num_samples
The number of samples for which the header defines extra params.
verbose
Verbosity level of logger output.
Methods
append
(self, header)Append input header to current header.
copy
(self)Returns a shallow copy of current class.
create
([class_item])This method creates an instance of a class with given type.
deepcopy
(self)Returns a deep copy of current class.
get_class_from_type
(class_type)Return the class associated with input type.
get_params
(self)Returns the dictionary of class hyperparameters.
get_state
(self, **kwargs)Returns the object state dictionary.
get_subclasses
()Get all the subclasses of the calling class.
list_class_types
()This method lists all types of available subclasses of calling one.
load
(path)Loads object from file.
load_state
(self, path)Sets the object state from file.
save
(self, path)Save class object to file.
save_state
(self, path, **kwargs)Store the object state to file.
set
(self, param_name, param_value[, copy])Set a parameter of the class.
set_params
(self, params_dict[, copy])Set all parameters passed as a dictionary {key: value}.
set_state
(self, state_dict[, copy])Sets the object state using input dictionary.
timed
([msg])Timer decorator.
-
append
(self, header)[source]¶ Append input header to current header.
- Parameters
- headerCDatasetHeader
Header to append. Only attributes which are arrays are merged. Other attributes are set if not already defined in the current header. Otherwise, the value of the attributes in the input header should be equal to the value of the same attribute in the current header.
- Returns
- CDatasetHeader
See also
CArray.append
More information about arrays append.
Notes
Append does not occur in-place: a new header is allocated and filled.
Examples
>>> from secml.data import CDatasetHeader >>> from secml.array import CArray
>>> ds_header1 = CDatasetHeader(id={'a': 0, 'b': 2}, a=2, age=CArray([1,2,3])) >>> ds_header2 = CDatasetHeader(id={'a': 0, 'b': 2}, b=4, age=CArray([1,2,3]))
>>> ds_merged = ds_header1.append(ds_header2) >>> ds_merged.age CArray(6,)(dense: [1 2 3 1 2 3]) >>> ds_merged.id {'a': 0, 'b': 2} >>> ds_merged.a 2 >>> ds_merged.b 4
-
property
num_samples
¶ The number of samples for which the header defines extra params.
CDatasetPyTorch¶
-
class
secml.data.c_dataset_pytorch.
CDatasetPyTorch
(data, labels=None, transform=None)[source]¶ Bases:
torch.utils.data.Dataset
CDataset to PyTorch Dataset wrapper.
- Parameters
- dataCDataset or CArray
- Dataset to be wrapped. Can also be a CArray with the samples and in
this case the labels can be passed using the labels parameter.
- labelsNone or CArray
Labels of the dataset. Can be defined if the samples have been passed to the data parameter. Input must be a flat array of shape (num_samples, ) or a 2-D array with shape (num_samples, num_classes).
- transformtorchvision.transforms or None, optional
Transformation(s) to be applied to each ds sample.
- Attributes
- X
- Y
Methods
__call__
(self, *args, **kwargs)Call self as a function.
-
property
X
¶
-
property
Y
¶
data_utils¶
-
secml.data.data_utils.
label_binarize_onehot
(y)[source]¶ Return dataset labels in one-hot encoding.
- Parameters
- yCArray
Array with the labels to encode. Only integer labels are supported.
- Returns
- binary_labelsCArray
A (num_samples, num_classes) array with the labels one-hot encoded.
Examples
>>> a = CArray([1,0,2,1]) >>> print(label_binarize_onehot(a)) CArray([[0 1 0] [1 0 0] [0 0 1] [0 1 0]])