mip_dmp.plot.embedding module

Module to plot the embeddings of the column names and CDE codes.

mip_dmp.plot.embedding.scatterplot_embeddings(fig: Figure, embeddings: dict, matchedCdeCodes: dict, selectedColumnName: str)[source]

Plot the embeddings of the selected column name and CDE codes in a 3D scatter plot.

fig: matplotlib.figure.Figure

Figure to render the 3D scatter plot of the embeddings.

embeddings: dict

Dictionary of embeddings in the form:

{
    "x": [5, ..., 2],
    "y": [0.5, ..., 0.2],
    "z": [0.5, ..., 0.2],
    "label": ["word1", ..., "wordN"],
    "type": ["cde", ..., "column"]
}

where x, y and z are the lists of the x, y and z coordinates of the embeddings, label is the list of the labels of the embeddings and type is the list of the types of the embeddings (can be “column” or “cde”).

matchedCdeCodes: dict

Dictionary of the matched CDE codes in the form:

{
    "input_dataset_column1": {
        "words": ["cde_code1", "cde_code2", ...],
        "embeddings": [embedding_vector1, embedding_vector2, ...]
        "distances": [distance1, distance2, ...]
    },
    "input_dataset_column2": {
        "words": ["cde_code1", "cde_code2", ...],
        "embeddings": [embedding_vector1, embedding_vector2, ...]
        "distances": [distance1, distance2, ...]
    },
    ...
}
selectedColumnName: str

Name of the selected column.