CSV Importer¶
-
class
pruneabletree.csv_importer.
CsvImporter
(encoding='utf-8', sep=', ', dtype=None, na_values=None, class_index=-1, missing_threshold=0.75)[source]¶ Transform a CSV document to a numpy matrix of data such that the data is ready for use by decision tree classifiers. This implies that instances with missing values are removed and that one-hot encoding is applied to all non-numeric columns. The class column is processed with a label encoder.
Parameters: - encoding : string, ‘utf-8’ by default.
The encoding used to decode the input file.
- sep : str, default ‘,’
Delimiter to use.
- dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32}. Use str or object together with suitable na_values settings to preserve and not interpret dtype.
- na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. The following values are always interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
- class_index : int, default -1 (i.e., the last column)
Column index of the class attribute. This column will not be present in the transform output, but will be kept separately in the y attribute of this transformer. Multi output scenarios are not supported.
- missing_threshold : float (percentage), default 0.75
Indicates the least amount of data that must remain after removing instances with missing values without raising a warning. If less remain, a warning will be raised.
Attributes: - y : numpy array, [n_samples]
Data extracted from the CSV based on the given class_index and then encoded. This data is not returned by transform, but saved here instead.
- original_y : numpy array, [n_samples]
Same as y, but before encoding.
Methods
fit
(csv_file[, y])Extract data from the given CSV file. fit_transform
(csv_file[, y])Extract data from the given CSV file and return it as a numpy matrix. fit_transform_both
(csv_file)Extract data from the given CSV file and return it as a numpy matrix. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
(csv_file)Extract data from the given CSV file and return it as a numpy matrix. -
fit
(csv_file, y=None)[source]¶ Extract data from the given CSV file.
Parameters: - csv_file : string
File path to CSV file.
Returns: - self
-
fit_transform
(csv_file, y=None)[source]¶ Extract data from the given CSV file and return it as a numpy matrix.
This is equivalent to fit followed by transform, but more efficiently implemented.
Parameters: - csv_file : string
File path to CSV file.
Returns: - X : numpy matrix, [n_samples, n_features]
Extracted data.
-
fit_transform_both
(csv_file)[source]¶ Extract data from the given CSV file and return it as a numpy matrix. Also returns the encoded class values at the same time.
Parameters: - csv_file : string
File path to CSV file.
Returns: - X : numpy matrix, [n_samples, n_features]
Extracted data.
- y : numpy array, [n_samples]
Data extracted from the CSV based on the given class_index and then encoded.