mip_dmp.process.mapping module
Module that provides functions to support the mapping of datasets to a specific CDEs metadata schema.
- mip_dmp.process.mapping.apply_transform_map(dataset_column, transform)[source]
Apply the transform map for binomial and multinominal variables.
Parameters
- dataset_columnpandas.DataFrame
Dataset column to be transformed.
- transformstr
Transformation to be applied to the dataset column. Can be a JSON string for the “map” transformation type or a scaling factor.
Returns
- dataset_column: pandas.DataFrame
The transformed dataset column.
- mip_dmp.process.mapping.apply_transform_scale(dataset_column, cde_code, cde_type, scaling_factor)[source]
Apply the transform scale for real and integer variables.
Parameters
- dataset_columnpandas.DataFrame
Dataset column to be transformed.
- cde_codestr
CDE code of the dataset column.
- cde_typestr
CDE type of the dataset column. Can be “binomial”, “multinomial”, “integer” or “real”.
- scaling_factorfloat
Scaling factor to be applied to the dataset column.
Returns
- dataset_column: pandas.DataFrame
The transformed dataset column.
- mip_dmp.process.mapping.map_dataset(dataset, mappings, cde_codes)[source]
Map the dataset to the schema.
Parameters
- datasetpandas.DataFrame
Dataset to be mapped.
- mappingsdict
Mappings of the dataset columns to the schema columns.
- cde_codeslist
List of codes of the CDE metadata schema.
Returns
- pandas.DataFrame
Mapped dataset.
- mip_dmp.process.mapping.transform_dataset_column(dataset_column, cde_code, cde_type, transform_type, transform)[source]
Transform the dataset column.
Parameters
- dataset_columnpandas.DataFrame
Dataset column to be transformed.
- cde_codestr
CDE code of the dataset column.
- cde_typestr
CDE type of the dataset column. Can be “binomial”, “multinomial”, “integer” or “real”.
- transform_typestr
Type of transformation to be applied to the dataset column. Can be “map” or “scale”.
- transformstr
Transformation to be applied to the dataset column. Can be a JSON string for the “map” transformation type or a scaling factor.
Returns
- dataset_column: pandas.DataFrame
The transformed dataset column.