Data Table (table)

class Orange.data.Table(*args, **kwargs)[source]

Stores data instances as a set of 2d tables representing the independent variables (attributes, features) and dependent variables (classes, targets), and the corresponding weights and meta attributes.

The data is stored in 2d numpy arrays X, Y, W, metas. The arrays may be dense or sparse. All arrays have the same number of rows. If certain data is missing, the corresponding array has zero columns.

Arrays can be of any type; default is float (that is, double precision). Values of discrete variables are stored as whole numbers. Arrays for meta attributes usually contain instances of object.

The table also stores the associated information about the variables as an instance of Domain. The number of columns must match the corresponding number of variables in the description.

There are multiple ways to get values or entire rows of the table.

  • The index can be an int, e.g. table[7]; the corresponding row is returned as an instance of RowInstance.
  • The index can be a slice or a sequence of ints (e.g. table[7:10] or table[[7, 42, 15]], indexing returns a new data table with the selected rows.
  • If there are two indices, where the first is an int (a row number) and the second can be interpreted as columns, e.g. table[3, 5] or table[3, ‘gender’] or table[3, y] (where y is an instance of Variable), a single value is returned as an instance of Value.
  • In all other cases, the first index should be a row index, a slice or a sequence, and the second index, which represent a set of columns, should be an int, a slice, a sequence or a numpy array. The result is a new table with a new domain.

Rules for setting the data are as follows.

  • If there is a single index (an int, slice, or a sequence of row indices) and the value being set is a single scalar, all attributes (not including the classes) are set to that value. That is, table[r] = v is equivalent to table.X[r] = v.
  • If there is a single index and the value is a data instance (Orange.data.Instance), it is converted into the table’s domain and set to the corresponding rows.
  • Final option for a single index is that the value is a sequence whose length equals the number of attributes and target variables. The corresponding rows are set; meta attributes are set to unknowns.
  • For two indices, the row can again be given as a single int, a
    slice or a sequence of indices. Column indices can be a single int, str or Orange.data.Variable, a sequence of them, a slice or any iterable. The value can be a single value, or a sequence of appropriate length.
domain

Description of the variables corresponding to the table’s columns. The domain is used for determining the variable types, printing the data in human-readable form, conversions between data tables and similar.

columns

A class whose attributes contain attribute descriptors for columns. For a table table, setting c = table.columns will allow accessing the table’s variables with, for instance c.gender, c.age ets. Spaces are replaced with underscores.

Constructors

The preferred way to construct a table is to invoke a named constructor.

classmethod Table.from_domain(domain, n_rows=0, weights=False)[source]

Construct a new Table with the given number of rows for the given domain. The optional vector of weights is initialized to 1’s.

Parameters:
  • domain (Orange.data.Domain) – domain for the Table
  • n_rows (int) – number of rows in the new table
  • weights (bool) – indicates whether to construct a vector of weights
Returns:

a new table

Return type:

Orange.data.Table

classmethod Table.from_table(domain, source, row_indices=Ellipsis)[source]

Create a new table from selected columns and/or rows of an existing one. The columns are chosen using a domain. The domain may also include variables that do not appear in the source table; they are computed from source variables if possible.

The resulting data may be a view or a copy of the existing data.

Parameters:
  • domain (Orange.data.Domain) – the domain for the new table
  • source (Orange.data.Table) – the source table
  • row_indices (a slice or a sequence) – indices of the rows to include
Returns:

a new table

Return type:

Orange.data.Table

classmethod Table.from_table_rows(source, row_indices)[source]

Construct a new table by selecting rows from the source table.

Parameters:
  • source (Orange.data.Table) – an existing table
  • row_indices (a slice or a sequence) – indices of the rows to include
Returns:

a new table

Return type:

Orange.data.Table

classmethod Table.from_numpy(domain, X, Y=None, metas=None, W=None)[source]

Construct a table from numpy arrays with the given domain. The number of variables in the domain must match the number of columns in the corresponding arrays. All arrays must have the same number of rows. Arrays may be of different numpy types, and may be dense or sparse.

Parameters:
  • domain (Orange.data.Domain) – the domain for the new table
  • X (np.array) – array with attribute values
  • Y (np.array) – array with class values
  • metas (np.array) – array with meta attributes
  • W (np.array) – array with weights
Returns:

classmethod Table.from_file(filename, sheet=None)[source]

Read a data table from a file. The path can be absolute or relative.

Parameters:
  • filename (str) – File name
  • sheet (str) – Sheet in a file (optional)
Returns:

a new data table

Return type:

Orange.data.Table

Inspection

Table.is_view()[source]

Return True if all arrays represent a view referring to another table

Table.is_copy()[source]

Return True if the table owns its data

Table.ensure_copy()[source]

Ensure that the table owns its data; copy arrays when necessary.

Table.has_missing()[source]

Return True if there are any missing attribute or class values.

Table.has_missing_class()[source]

Return True if there are any missing class values.

Table.checksum(include_metas=True)[source]

Return a checksum over X, Y, metas and W.

Row manipulation

Table.append(instance)[source]

Append a data instance to the table.

Parameters:instance (Orange.data.Instance or a sequence of values) – a data instance
Table.extend(instances)[source]

Extend the table with the given instances. The instances can be given as a table of the same or a different domain, or a sequence. In the latter case, each instances can be given as Instance or a sequence of values (e.g. list, tuple, numpy.array).

Parameters:instances (Orange.data.Table or a sequence of instances) – additional instances
Table.insert(row, instance)[source]

Insert a data instance into the table.

Parameters:
Table.clear()[source]

Remove all rows from the table.

Table.shuffle()[source]

Randomly shuffle the rows of the table.

Weights

Table.has_weights()[source]

Return True if the data instances are weighed.

Table.set_weights(weight=1)[source]

Set weights of data instances; create a vector of weights if necessary.

Table.total_weight()[source]

Return the total weight of instances in the table, or their number if they are unweighted.