CSV Importer¶

class pruneabletree.csv_importer.CsvImporter(encoding='utf-8', sep=', ', dtype=None, na_values=None, class_index=-1, missing_threshold=0.75)[source]¶

Transform a CSV document to a numpy matrix of data such that the data is ready for use by decision tree classifiers. This implies that instances with missing values are removed and that one-hot encoding is applied to all non-numeric columns. The class column is processed with a label encoder.

Parameters:

encoding : string, ‘utf-8’ by default.: The encoding used to decode the input file.
sep : str, default ‘,’: Delimiter to use.
dtype : Type name or dict of column -> type, default None: Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32}. Use str or object together with suitable na_values settings to preserve and not interpret dtype.
na_values : scalar, str, list-like, or dict, default None: Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. The following values are always interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
class_index : int, default -1 (i.e., the last column): Column index of the class attribute. This column will not be present in the transform output, but will be kept separately in the y attribute of this transformer. Multi output scenarios are not supported.
missing_threshold : float (percentage), default 0.75: Indicates the least amount of data that must remain after removing instances with missing values without raising a warning. If less remain, a warning will be raised.

Attributes:

y : numpy array, [n_samples]: Data extracted from the CSV based on the given class_index and then encoded. This data is not returned by transform, but saved here instead.
original_y : numpy array, [n_samples]: Same as y, but before encoding.

Methods

`fit`(csv_file[, y])	Extract data from the given CSV file.
`fit_transform`(csv_file[, y])	Extract data from the given CSV file and return it as a numpy matrix.
`fit_transform_both`(csv_file)	Extract data from the given CSV file and return it as a numpy matrix.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(csv_file)	Extract data from the given CSV file and return it as a numpy matrix.

fit(csv_file, y=None)[source]¶

Extract data from the given CSV file.

Parameters:	csv_file : string File path to CSV file.
Returns:	self

fit_transform(csv_file, y=None)[source]¶

Extract data from the given CSV file and return it as a numpy matrix.

This is equivalent to fit followed by transform, but more efficiently implemented.

Parameters:	csv_file : string File path to CSV file.
Returns:	X : numpy matrix, [n_samples, n_features] Extracted data.

fit_transform_both(csv_file)[source]¶

Extract data from the given CSV file and return it as a numpy matrix. Also returns the encoded class values at the same time.

Parameters:	csv_file : string File path to CSV file.
Returns:	X : numpy matrix, [n_samples, n_features] Extracted data. y : numpy array, [n_samples] Data extracted from the CSV based on the given class_index and then encoded.

transform(csv_file)[source]¶

Extract data from the given CSV file and return it as a numpy matrix.

Parameters:	csv_file : string File path to CSV file.
Returns:	X : numpy matrix, [n_samples, n_features] Extracted data.