mip_dmp.plot
The mip_dmp.plot
subpackage contains modules that define classes and functions that handle the reading and writing of BIDS data. The classes are also designed to be used with the datahipy
command line interface.
mip_dmp.plot.embedding
Module to plot the embeddings of the column names and CDE codes.
- mip_dmp.plot.embedding.scatterplot_embeddings(fig: Figure, embeddings: dict, matchedCdeCodes: dict, selectedColumnName: str)[source]
Plot the embeddings of the selected column name and CDE codes in a 3D scatter plot.
- fig: matplotlib.figure.Figure
Figure to render the 3D scatter plot of the embeddings.
- embeddings: dict
Dictionary of embeddings in the form:
{ "x": [5, ..., 2], "y": [0.5, ..., 0.2], "z": [0.5, ..., 0.2], "label": ["word1", ..., "wordN"], "type": ["cde", ..., "column"] }
where
x
,y
andz
are the lists of the x, y and z coordinates of the embeddings,label
is the list of the labels of the embeddings andtype
is the list of the types of the embeddings (can be “column” or “cde”).- matchedCdeCodes: dict
Dictionary of the matched CDE codes in the form:
{ "input_dataset_column1": { "words": ["cde_code1", "cde_code2", ...], "embeddings": [embedding_vector1, embedding_vector2, ...] "distances": [distance1, distance2, ...] }, "input_dataset_column2": { "words": ["cde_code1", "cde_code2", ...], "embeddings": [embedding_vector1, embedding_vector2, ...] "distances": [distance1, distance2, ...] }, ... }
- selectedColumnName: str
Name of the selected column.
mip_dmp.plot.matching
Module to plot the initial matching results between the input dataset columns and the target CDE codes.
- mip_dmp.plot.matching.heatmap_matching(figure, matrix, inputDatasetColumns, targetCDECodes, matchingMethod)[source]
Render a heatmap of the initial matching results between the input dataset columns and the target CDE codes.
Parameters
- figure: matplotlib.figure.Figure
Figure to render the heatmap of the matching results.
- matrix: numpy.ndarray
Similarity / distance matrix of the matching results.
- inputDatasetColumns: list
List of the input dataset columns. Used as ytick labels.
- targetCDECodes: list
List of the target CDE codes. Used as xtick labels.
- matchingMethod: str
Matching method used to generate the similarity / distance matrix. Used to generate the title of the figure.