mip_dmp.process.mapping module

Module that provides functions to support the mapping of datasets to a specific CDEs metadata schema.

mip_dmp.process.mapping.apply_transform_map(dataset_column, transform)[source]

Apply the transform map for binomial and multinominal variables.

Parameters

dataset_columnpandas.DataFrame

Dataset column to be transformed.

transformstr

Transformation to be applied to the dataset column. Can be a JSON string for the “map” transformation type or a scaling factor.

Returns

dataset_column: pandas.DataFrame

The transformed dataset column.

mip_dmp.process.mapping.apply_transform_scale(dataset_column, cde_code, cde_type, scaling_factor)[source]

Apply the transform scale for real and integer variables.

Parameters

dataset_columnpandas.DataFrame

Dataset column to be transformed.

cde_codestr

CDE code of the dataset column.

cde_typestr

CDE type of the dataset column. Can be “binomial”, “multinomial”, “integer” or “real”.

scaling_factorfloat

Scaling factor to be applied to the dataset column.

Returns

dataset_column: pandas.DataFrame

The transformed dataset column.

mip_dmp.process.mapping.map_dataset(dataset, mappings, cde_codes)[source]

Map the dataset to the schema.

Parameters

datasetpandas.DataFrame

Dataset to be mapped.

mappingsdict

Mappings of the dataset columns to the schema columns.

cde_codeslist

List of codes of the CDE metadata schema.

Returns

pandas.DataFrame

Mapped dataset.

mip_dmp.process.mapping.transform_dataset_column(dataset_column, cde_code, cde_type, transform_type, transform)[source]

Transform the dataset column.

Parameters

dataset_columnpandas.DataFrame

Dataset column to be transformed.

cde_codestr

CDE code of the dataset column.

cde_typestr

CDE type of the dataset column. Can be “binomial”, “multinomial”, “integer” or “real”.

transform_typestr

Type of transformation to be applied to the dataset column. Can be “map” or “scale”.

transformstr

Transformation to be applied to the dataset column. Can be a JSON string for the “map” transformation type or a scaling factor.

Returns

dataset_column: pandas.DataFrame

The transformed dataset column.