mip_dmp.utils

The mip_dmp.utils package contains modules with utility functions for i/o, logging, and script argument parsing.

mip_dmp.utils.io

Module for input/output operations with files involved in the MIP Dataset Mapper.

mip_dmp.utils.io.generate_output_path(input_cdes_file: str, output_dir: str, output_suffix: str)[source]

Generate output path for CDEs file, but without any extension.

Parameters

input_cdes_filestr

Path to input CDEs file in JSON or EXCEL format.

output_dirstr

Path to directory where the output CDEs file will be written.

output_suffixstr

Suffix to add to the input CDEs file name, to generate the output CDEs file name.

Returns

out_cdes_fnamestr

Generated absolute path for the output CDEs files where the updated CDEs are written, with extension automatically added (.json for JSON, .xlsx for EXCEL).

mip_dmp.utils.io.load_c2v_model(model_name='eng_50')[source]

Load a chars2vec model from disk.

Parameters

model_namestr, optional

Name of the chars2vec model to load, by default “eng_50”

Returns

dict

Dictionary containing the chars2vec model.

mip_dmp.utils.io.load_csv(csc_file: str)[source]

Load content of a CSV file.

Parameters

csv_filestr

Path to CSV file.

Returns

datapd.DataFrame

Dataframe loaded from CSV file.

mip_dmp.utils.io.load_excel(excel_file: str)[source]

Load content of an Excel file.

Parameters

excel_filestr

Path to Excel file.

Returns

datapd.DataFrame

Dataframe loaded from Excel file.

mip_dmp.utils.io.load_glove_model(model_name='glove-wiki-gigaword-50')[source]

Load a GloVe model from disk.

Parameters

model_namestr, optional

Name of the GloVe model to load, by default “glove-wiki-gigaword-50”

Returns

dict

Dictionary containing the GloVe model.

mip_dmp.utils.io.load_json(json_file: str)[source]

Load content of a JSON file.

Parameters

json_filestr

Path to JSON file.

Returns

datadict

Dictionary loaded from JSON file.

mip_dmp.utils.io.load_mapping_json(json_file: str)[source]

Load content of a saved mapping JSON file.

Parameters

json_filestr

Path to JSON file.

Returns

datadict

Dictionary loaded from JSON file.

mip_dmp.utils.logger

Module to setup logging for the MIP Dataset Mapper.

mip_dmp.utils.logger.setup_logging(log_file)[source]

Set up logging and log file.

Parameters

log_filestr

Path to output log file.

mip_dmp.utils.parser

Module to create argument parser of the script, i.e. command line interface of the MIP Dataset Mapper.

mip_dmp.utils.parser.create_parser()[source]

Create argument parser of the script.

Returns

pargparse.ArgumentParser

Parser of the script.