mip_dmp.plot

The mip_dmp.plot subpackage contains modules that define classes and functions that handle the reading and writing of BIDS data. The classes are also designed to be used with the datahipy command line interface.

mip_dmp.plot.embedding

Module to plot the embeddings of the column names and CDE codes.

mip_dmp.plot.embedding.scatterplot_embeddings(fig: Figure, embeddings: dict, matchedCdeCodes: dict, selectedColumnName: str)[source]

Plot the embeddings of the selected column name and CDE codes in a 3D scatter plot.

fig: matplotlib.figure.Figure

Figure to render the 3D scatter plot of the embeddings.

embeddings: dict

Dictionary of embeddings in the form:

{
    "x": [5, ..., 2],
    "y": [0.5, ..., 0.2],
    "z": [0.5, ..., 0.2],
    "label": ["word1", ..., "wordN"],
    "type": ["cde", ..., "column"]
}

where x, y and z are the lists of the x, y and z coordinates of the embeddings, label is the list of the labels of the embeddings and type is the list of the types of the embeddings (can be “column” or “cde”).

matchedCdeCodes: dict

Dictionary of the matched CDE codes in the form:

{
    "input_dataset_column1": {
        "words": ["cde_code1", "cde_code2", ...],
        "embeddings": [embedding_vector1, embedding_vector2, ...]
        "distances": [distance1, distance2, ...]
    },
    "input_dataset_column2": {
        "words": ["cde_code1", "cde_code2", ...],
        "embeddings": [embedding_vector1, embedding_vector2, ...]
        "distances": [distance1, distance2, ...]
    },
    ...
}
selectedColumnName: str

Name of the selected column.

mip_dmp.plot.matching

Module to plot the initial matching results between the input dataset columns and the target CDE codes.

mip_dmp.plot.matching.heatmap_matching(figure, matrix, inputDatasetColumns, targetCDECodes, matchingMethod)[source]

Render a heatmap of the initial matching results between the input dataset columns and the target CDE codes.

Parameters

figure: matplotlib.figure.Figure

Figure to render the heatmap of the matching results.

matrix: numpy.ndarray

Similarity / distance matrix of the matching results.

inputDatasetColumns: list

List of the input dataset columns. Used as ytick labels.

targetCDECodes: list

List of the target CDE codes. Used as xtick labels.

matchingMethod: str

Matching method used to generate the similarity / distance matrix. Used to generate the title of the figure.