Getting Started
If BDI-Kit is not installed yet, you can install it with:
[ ]:
! pip install bdi-kit
Then import the library:
[1]:
import bdikit as bdi
import pandas as pd
In this example, we are mapping data from Dou et al. (https://pubmed.ncbi.nlm.nih.gov/32059776/) to the GDC format.
[2]:
dataset = pd.read_csv("https://raw.githubusercontent.com/VIDA-NYU/bdi-kit/devel/examples/datasets/dou_2020.csv")
columns = [
"Country",
"Histologic_type",
"FIGO_stage",
"BMI",
"Age",
"Race",
"Ethnicity",
"Gender",
"Tumor_Focality",
"Tumor_Size_cm",
]
dataset[columns].head(10)
[2]:
| Country | Histologic_type | FIGO_stage | BMI | Age | Race | Ethnicity | Gender | Tumor_Focality | Tumor_Size_cm | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | United States | Endometrioid | IA | 38.88 | 64.0 | White | Not-Hispanic or Latino | Female | Unifocal | 2.9 |
| 1 | United States | Endometrioid | IA | 39.76 | 58.0 | White | Not-Hispanic or Latino | Female | Unifocal | 3.5 |
| 2 | United States | Endometrioid | IA | 51.19 | 50.0 | White | Not-Hispanic or Latino | Female | Unifocal | 4.5 |
| 3 | NaN | Carcinosarcoma | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | United States | Endometrioid | IA | 32.69 | 75.0 | White | Not-Hispanic or Latino | Female | Unifocal | 3.5 |
| 5 | United States | Serous | IA | 20.28 | 63.0 | White | Not-Hispanic or Latino | Female | Unifocal | 6.0 |
| 6 | United States | Endometrioid | IA | 55.67 | 50.0 | White | Not-Hispanic or Latino | Female | Unifocal | 4.5 |
| 7 | Other_specify | Endometrioid | IA | 25.68 | 60.0 | White | Not-Hispanic or Latino | Female | Unifocal | 5.0 |
| 8 | United States | Serous | IIIA | 21.57 | 83.0 | White | Not-Hispanic or Latino | Female | Unifocal | 4.0 |
| 9 | United States | Endometrioid | IA | 34.26 | 69.0 | White | Not-Hispanic or Latino | Female | Unifocal | 5.2 |
Matching the table schema to GDC standard vocabulary
BDI-Kit offers a suite of functions to help with data harmonization tasks. For instance, it can help with automatic discovery of one-to-one mappings between the attributes/columns in the input (source) dataset and a target dataset schema. The target schema can be either another table or a standard data vocabulary such as the GDC (Genomic Data Commons).
To achieve this using BDI-Kit, we can use the match_schema() function to match attributes to the GDC vocabulary schema as follows.
[3]:
attribute_matches = bdi.match_schema(dataset[columns], target="gdc", method="magneto_ft_bp")
attribute_matches
[3]:
| source_attribute | target_attribute | similarity | |
|---|---|---|---|
| 0 | BMI | bmi | 1.000000 |
| 1 | Ethnicity | ethnicity | 1.000000 |
| 2 | Gender | gender | 1.000000 |
| 3 | FIGO_stage | figo_stage | 1.000000 |
| 4 | Tumor_Focality | tumor_focality | 1.000000 |
| 5 | Race | race | 1.000000 |
| 6 | Age | age_at_index | 0.988827 |
| 7 | Country | country_of_birth | 0.957011 |
| 8 | Tumor_Size_cm | tumor_length_measurement | 0.905819 |
| 9 | Histologic_type | histologic_progression_type | 0.727179 |
Generating a harmonized table
After discovering a schema mapping, we can generate a new table (DataFrame) using the new attribute names from the GDC standard vocabulary.
To do so using BDI-Kit, we can use the function materialize_mapping() as follows. Note that the column headers have been renamed to the target schema.
[4]:
bdi.materialize_mapping(dataset, attribute_matches)
[4]:
| bmi | ethnicity | gender | figo_stage | tumor_focality | race | age_at_index | country_of_birth | tumor_length_measurement | histologic_progression_type | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 38.88 | Not-Hispanic or Latino | Female | IA | Unifocal | White | 64.0 | United States | 2.9 | Endometrioid |
| 1 | 39.76 | Not-Hispanic or Latino | Female | IA | Unifocal | White | 58.0 | United States | 3.5 | Endometrioid |
| 2 | 51.19 | Not-Hispanic or Latino | Female | IA | Unifocal | White | 50.0 | United States | 4.5 | Endometrioid |
| 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Carcinosarcoma |
| 4 | 32.69 | Not-Hispanic or Latino | Female | IA | Unifocal | White | 75.0 | United States | 3.5 | Endometrioid |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 99 | 29.40 | NaN | Female | IA | Unifocal | NaN | 75.0 | Ukraine | 4.2 | Endometrioid |
| 100 | 35.42 | NaN | Female | II | Unifocal | NaN | 74.0 | Ukraine | 1.5 | Endometrioid |
| 101 | 24.32 | Not-Hispanic or Latino | Female | II | Unifocal | Black or African American | 85.0 | United States | 3.8 | Serous |
| 102 | 34.06 | NaN | Female | IA | Unifocal | NaN | 70.0 | Ukraine | 5.0 | Serous |
| 103 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Ukraine | NaN | Serous |
104 rows × 10 columns
Generating a harmonized table with value mappings
BDI-Kit can also help with translation of the values from the source table to the target standard format.
To this end, BDI-Kit provides the function match_values() that automatically creates value mappings for each string column. The output of match_values() can be fed to materialize_mapping() which materialized the final target using both schema and value mappings.
[5]:
value_mappings = bdi.match_values(dataset, target="gdc", attribute_matches=attribute_matches, method="tfidf")
bdi.materialize_mapping(dataset, value_mappings)
[5]:
| ethnicity | gender | tumor_focality | race | country_of_birth | figo_stage | histologic_progression_type | |
|---|---|---|---|---|---|---|---|
| 0 | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| 1 | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| 2 | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 99 | NaN | female | Unifocal | NaN | Ukraine | Stage IA | NaN |
| 100 | NaN | female | Unifocal | NaN | Ukraine | Stage III | NaN |
| 101 | not hispanic or latino | female | Unifocal | black or african american | United States | Stage III | NaN |
| 102 | NaN | female | Unifocal | NaN | Ukraine | Stage IA | NaN |
| 103 | NaN | NaN | NaN | NaN | Ukraine | NaN | NaN |
104 rows × 7 columns
Verifying the schema mappings
Sometimes the mappings generated automatically may be incorrect or you may to want verify them individually. To verify the suggested attribute matches, you can use BDI-Kit and BDIViz, which offers additional APIs to visualize the data and make any modifications when necessary.
For this example, we will use the column Histologic_type. We can start by exploring the columns most similar to Histologic_type.
For this, we can use the rank_schema_matches() function. Here, we notice that primary_diagnosis could be a potential target column.
[6]:
hist_type_matches = bdi.rank_schema_matches(dataset, target="gdc", attributes=["Histologic_type"])
hist_type_matches
[6]:
| source_attribute | target_attribute | similarity | |
|---|---|---|---|
| 0 | Histologic_type | sample_type | 0.554611 |
| 1 | Histologic_type | history_of_tumor_type | 0.542955 |
| 2 | Histologic_type | primary_diagnosis | 0.526662 |
| 3 | Histologic_type | morphologic_architectural_pattern | 0.478859 |
| 4 | Histologic_type | viral_hepatitis_serologies | 0.469105 |
| 5 | Histologic_type | analyte_type_id | 0.462921 |
| 6 | Histologic_type | histone_variant | 0.461407 |
| 7 | Histologic_type | cog_rhabdomyosarcoma_risk_group | 0.425261 |
| 8 | Histologic_type | tumor_descriptor | 0.419358 |
| 9 | Histologic_type | specimen_type | 0.419077 |
Viewing the attribute domains
To verify that primary_diagnosis is a good target attribute, we view and compare the domains of each attribute using the preview_domain() function. For the source table, it returns the list of unique values in the source attribute. For the GDC target, it returns the list of unique valid values that an attribute can have.
Here we see that the values seem to be related.
[7]:
bdi.preview_domain(dataset, "Histologic_type")
[7]:
| value_name | |
|---|---|
| 0 | Endometrioid |
| 1 | Carcinosarcoma |
| 2 | Serous |
| 3 | Clear cell |
[8]:
bdi.preview_domain("gdc", "primary_diagnosis")
[8]:
| value_name | value_description | attribute_description | |
|---|---|---|---|
| 0 | Abdominal desmoid | An insidious poorly circumscribed neoplasm ari... | Text term used to describe the patient's histo... |
| 1 | Abdominal fibromatosis | An insidious poorly circumscribed neoplasm ari... | |
| 2 | Achromic nevus | A benign nevus characterized by the absence of... | |
| 3 | Acidophil adenocarcinoma | A malignant epithelial neoplasm of the anterio... | |
| 4 | Acidophil adenoma | An epithelial neoplasm of the anterior pituita... | |
| ... | ... | ... | ... |
| 2620 | Wolffian duct tumor | An epithelial neoplasm of the female reproduct... | |
| 2621 | Xanthofibroma | A benign neoplasm composed of fibroblastic spi... | |
| 2622 | Yolk sac tumor | A non-seminomatous malignant germ cell tumor c... | |
| 2623 | Unknown | Not known, not observed, not recorded, or refu... | |
| 2624 | Not Reported | Not provided or available. |
2625 rows × 3 columns
Since primary_diagnosis looks like a correct match for Histologic_type, we can modify the attribute_matches variable directly.
[9]:
attribute_matches.loc[attribute_matches["source_attribute"] == "Histologic_type", "target_attribute"] = "primary_diagnosis"
attribute_matches
[9]:
| source_attribute | target_attribute | similarity | |
|---|---|---|---|
| 0 | BMI | bmi | 1.000000 |
| 1 | Ethnicity | ethnicity | 1.000000 |
| 2 | Gender | gender | 1.000000 |
| 3 | FIGO_stage | figo_stage | 1.000000 |
| 4 | Tumor_Focality | tumor_focality | 1.000000 |
| 5 | Race | race | 1.000000 |
| 6 | Age | age_at_index | 0.988827 |
| 7 | Country | country_of_birth | 0.957011 |
| 8 | Tumor_Size_cm | tumor_length_measurement | 0.905819 |
| 9 | Histologic_type | primary_diagnosis | 0.727179 |
Finding correct value mappings
After finding the correct column, we need to find appropriate value mappings. Using match_values(), we can inspect what the possible value mappings for this would look like after the harmonization.
BDI-Kit implements multiple methods for value mapping discovery, including:
edit_distance- Computes value similarities using Levenstein’s edit distance measure.tfidf- A method based on tf-idf importance weighting computed over charcter n-grams.embeddings- Uses BERT word embeddings to compute “semantic similarity” between the values.
To specify a value mapping approach, we can pass the method parameter.
[10]:
bdi.match_values(
dataset, target="gdc", attribute_matches=("Histologic_type", "primary_diagnosis"), method="edit_distance"
)
[10]:
| source_attribute | target_attribute | source_value | target_value | similarity | |
|---|---|---|---|---|---|
| 0 | Histologic_type | primary_diagnosis | Carcinosarcoma | Carcinosarcoma, NOS | 0.848485 |
| 1 | Histologic_type | primary_diagnosis | Clear cell | Clear cell adenoma | 0.714286 |
| 2 | Histologic_type | primary_diagnosis | Endometrioid | Endometrioid adenoma, NOS | 0.648649 |
| 3 | Histologic_type | primary_diagnosis | Serous | Neuronevus | 0.625000 |
[11]:
bdi.match_values(
dataset, target="gdc", attribute_matches=("Histologic_type", "primary_diagnosis"), method="tfidf"
)
[11]:
| source_attribute | target_attribute | source_value | target_value | similarity | |
|---|---|---|---|---|---|
| 0 | Histologic_type | primary_diagnosis | Carcinosarcoma | Carcinosarcoma, NOS | 0.969 |
| 1 | Histologic_type | primary_diagnosis | Endometrioid | Endometrioid adenoma, NOS | 0.897 |
| 2 | Histologic_type | primary_diagnosis | Clear cell | Clear cell adenoma | 0.853 |
| 3 | Histologic_type | primary_diagnosis | Serous | Serous carcinoma, NOS | 0.755 |
[12]:
bdi.match_values(
dataset, target="gdc", attribute_matches=("Histologic_type", "primary_diagnosis"), method="embedding"
)
[12]:
| source_attribute | target_attribute | source_value | target_value | similarity | |
|---|---|---|---|---|---|
| 0 | Histologic_type | primary_diagnosis | Carcinosarcoma | Carcinofibroma | 0.897 |
| 1 | Histologic_type | primary_diagnosis | Clear cell | Clear cell carcinoma | 0.773 |
| 2 | Histologic_type | primary_diagnosis | Endometrioid | Endometrioid cystadenocarcinoma | 0.755 |
| 3 | Histologic_type | primary_diagnosis | Serous | Myoma | 0.647 |
[13]:
hist_type_vmap = pd.DataFrame(
columns=["source_value", "target_value"],
data=[
("Carcinosarcoma", "Carcinosarcoma, NOS"),
("Clear cell", "Clear cell adenocarcinoma, NOS"),
("Endometrioid", "Endometrioid carcinoma"),
("Serous", "Serous cystadenocarcinoma"),
],
)
hist_type_vmap
[13]:
| source_value | target_value | |
|---|---|---|
| 0 | Carcinosarcoma | Carcinosarcoma, NOS |
| 1 | Clear cell | Clear cell adenocarcinoma, NOS |
| 2 | Endometrioid | Endometrioid carcinoma |
| 3 | Serous | Serous cystadenocarcinoma |
Verifying multiple value mappings at once
Besides verifying value mappings individually, you can also do it for all column mappings at once.
[14]:
mappings = bdi.match_values(
dataset,
target="gdc",
attribute_matches=attribute_matches,
method="tfidf",
output_format="list"
)
for mapping in mappings:
print(f"{mapping.attrs['source_attribute']} => {mapping.attrs['target_attribute']}")
display(mapping)
print("")
Ethnicity => ethnicity
| source_value | target_value | similarity | |
|---|---|---|---|
| 0 | Hispanic or Latino | hispanic or latino | 1.000 |
| 1 | Not reported | not reported | 1.000 |
| 2 | Not-Hispanic or Latino | not hispanic or latino | 0.936 |
| 3 | NaN | NaN | NaN |
Gender => gender
| source_value | target_value | similarity | |
|---|---|---|---|
| 0 | Female | female | 1.0 |
| 1 | NaN | NaN | NaN |
Tumor_Focality => tumor_focality
| source_value | target_value | similarity | |
|---|---|---|---|
| 0 | Unifocal | Unifocal | 1.0 |
| 1 | Multifocal | Multifocal | 1.0 |
| 2 | NaN | NaN | NaN |
Race => race
| source_value | target_value | similarity | |
|---|---|---|---|
| 0 | White | white | 1.0 |
| 1 | White | white | 1.0 |
| 2 | Asian | asian | 1.0 |
| 3 | Not Reported | not reported | 1.0 |
| 4 | Black or African American | black or african american | 1.0 |
| 5 | NaN | NaN | NaN |
Country => country_of_birth
| source_value | target_value | similarity | |
|---|---|---|---|
| 0 | United States | United States | 1.0 |
| 1 | Ukraine | Ukraine | 1.0 |
| 2 | Poland | Poland | 1.0 |
| 3 | NaN | NaN | NaN |
| 4 | Other_specify | NaN | NaN |
Histologic_type => primary_diagnosis
| source_value | target_value | similarity | |
|---|---|---|---|
| 0 | Carcinosarcoma | Carcinosarcoma, NOS | 0.969 |
| 1 | Endometrioid | Endometrioid adenoma, NOS | 0.897 |
| 2 | Clear cell | Clear cell adenoma | 0.853 |
| 3 | Serous | Serous carcinoma, NOS | 0.755 |
FIGO_stage => figo_stage
| source_value | target_value | similarity | |
|---|---|---|---|
| 0 | IIIC2 | Stage IIIC2 | 0.890 |
| 1 | IIIC1 | Stage IIIC1 | 0.890 |
| 2 | IVB | Stage IVB | 0.856 |
| 3 | IIIB | Stage IIIB | 0.850 |
| 4 | IIIA | Stage IIIA | 0.823 |
| 5 | II | Stage III | 0.686 |
| 6 | IB | Stage IB | 0.651 |
| 7 | IA | Stage IA | 0.591 |
| 8 | NaN | NaN | NaN |
Fixing remaining value mappings
We need fix a few value mappings:
Tumor_Site
For Tumor_Site, given that this dataset is about endometrial cancer, all values must be mapped to “Endometrium”. So instead of fixing each mapping individually, we will write a custom function that returns “Endometrium” regardless of the input value. Later, we will show how to use this function to transform the dataset.
[15]:
bdi.match_values(
dataset, target="gdc", attribute_matches=("Tumor_Site", "tissue_or_organ_of_origin"), method="tfidf"
)
[15]:
| source_attribute | target_attribute | source_value | target_value | similarity | |
|---|---|---|---|---|---|
| 0 | Tumor_Site | tissue_or_organ_of_origin | Anterior endometrium | Endometrium | 0.852 |
| 1 | Tumor_Site | tissue_or_organ_of_origin | Posterior endometrium | Endometrium | 0.823 |
| 2 | Tumor_Site | tissue_or_organ_of_origin | Other, specify | Other specified parts of pancreas | 0.543 |
| 3 | Tumor_Site | tissue_or_organ_of_origin | NaN | NaN | NaN |
[16]:
# Custom mapping function that will be used to map the values of the 'Tumor_Site' column
def map_tumor_site(source_value):
return "Endometrium"
Combining custom user mappings with suggested mappings
Before generating a final harmonized dataset, we can combine the automatically generated value mappings with the fixed mappings provided by the user. To do so, we use bdi.create_harmonization_spec() function, which take a list of mappings (e.g., generated automatically) and a list of “user-defined mapping overrides” that will be combined with the first list of mappings and will take precedence whenever they conflict.
In our example below, all mappings specified in the variable user_mappings will override the mappings in value_mappings generated by the bdi.match_values() function.
[17]:
from math import ceil
user_mappings = [
{
# When no mapping is need, specifying the source and target is enough
"source_attribute": "BMI",
"target_attribute": "bmi",
},
{
"source_attribute": "Tumor_Size_cm",
"target_attribute": "tumor_largest_dimension_diameter",
},
{
# mapper can be a custom Python function
"source_attribute": "Tumor_Site",
"target_attribute": "tissue_or_organ_of_origin",
"mapper": map_tumor_site,
},
{
# Lambda functions can also be used as mappers
"source_attribute": "Age",
"target_attribute": "days_to_birth",
"mapper": lambda age: -age * 365.25,
},
{
"source_attribute": "Age",
"target_attribute": "age_at_diagnosis",
"mapper": lambda age: float("nan") if pd.isnull(age) else ceil(age*365.25),
},
{
# We can also use a data frame to specify value mappings using the `matches` attribute
"source_attribute": "Histologic_type",
"target_attribute": "primary_diagnosis",
"matches": hist_type_vmap
}
]
harmonization_spec = bdi.create_harmonization_spec(value_mappings, user_mappings)
Finally, we generate the harmonized dataset, with the user-defined value mappings.
[18]:
harmonized_dataset = bdi.materialize_mapping(dataset, harmonization_spec)
harmonized_dataset
[18]:
| bmi | tumor_largest_dimension_diameter | tissue_or_organ_of_origin | days_to_birth | age_at_diagnosis | primary_diagnosis | ethnicity | gender | tumor_focality | race | country_of_birth | figo_stage | histologic_progression_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 38.88 | 2.9 | Endometrium | -23376.00 | 23376.0 | Endometrioid carcinoma | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| 1 | 39.76 | 3.5 | Endometrium | -21184.50 | 21185.0 | Endometrioid carcinoma | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| 2 | 51.19 | 4.5 | Endometrium | -18262.50 | 18263.0 | Endometrioid carcinoma | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| 3 | NaN | NaN | Endometrium | NaN | NaN | Carcinosarcoma, NOS | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 32.69 | 3.5 | Endometrium | -27393.75 | 27394.0 | Endometrioid carcinoma | not hispanic or latino | female | Unifocal | white | United States | Stage IA | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 99 | 29.40 | 4.2 | Endometrium | -27393.75 | 27394.0 | Endometrioid carcinoma | NaN | female | Unifocal | NaN | Ukraine | Stage IA | NaN |
| 100 | 35.42 | 1.5 | Endometrium | -27028.50 | 27029.0 | Endometrioid carcinoma | NaN | female | Unifocal | NaN | Ukraine | Stage III | NaN |
| 101 | 24.32 | 3.8 | Endometrium | -31046.25 | 31047.0 | Serous cystadenocarcinoma | not hispanic or latino | female | Unifocal | black or african american | United States | Stage III | NaN |
| 102 | 34.06 | 5.0 | Endometrium | -25567.50 | 25568.0 | Serous cystadenocarcinoma | NaN | female | Unifocal | NaN | Ukraine | Stage IA | NaN |
| 103 | NaN | NaN | Endometrium | NaN | NaN | Serous cystadenocarcinoma | NaN | NaN | NaN | NaN | Ukraine | NaN | NaN |
104 rows × 13 columns
For comparison, here is how our original data looked like:
[19]:
original_columns = map(lambda m: m["source_attribute"], harmonization_spec)
dataset[original_columns]
[19]:
| BMI | Tumor_Size_cm | Tumor_Site | Age | Age | Histologic_type | Ethnicity | Gender | Tumor_Focality | Race | Country | FIGO_stage | Histologic_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 38.88 | 2.9 | Anterior endometrium | 64.0 | 64.0 | Endometrioid | Not-Hispanic or Latino | Female | Unifocal | White | United States | IA | Endometrioid |
| 1 | 39.76 | 3.5 | Posterior endometrium | 58.0 | 58.0 | Endometrioid | Not-Hispanic or Latino | Female | Unifocal | White | United States | IA | Endometrioid |
| 2 | 51.19 | 4.5 | Other, specify | 50.0 | 50.0 | Endometrioid | Not-Hispanic or Latino | Female | Unifocal | White | United States | IA | Endometrioid |
| 3 | NaN | NaN | NaN | NaN | NaN | Carcinosarcoma | NaN | NaN | NaN | NaN | NaN | NaN | Carcinosarcoma |
| 4 | 32.69 | 3.5 | Other, specify | 75.0 | 75.0 | Endometrioid | Not-Hispanic or Latino | Female | Unifocal | White | United States | IA | Endometrioid |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 99 | 29.40 | 4.2 | Other, specify | 75.0 | 75.0 | Endometrioid | NaN | Female | Unifocal | NaN | Ukraine | IA | Endometrioid |
| 100 | 35.42 | 1.5 | Other, specify | 74.0 | 74.0 | Endometrioid | NaN | Female | Unifocal | NaN | Ukraine | II | Endometrioid |
| 101 | 24.32 | 3.8 | Other, specify | 85.0 | 85.0 | Serous | Not-Hispanic or Latino | Female | Unifocal | Black or African American | United States | II | Serous |
| 102 | 34.06 | 5.0 | Other, specify | 70.0 | 70.0 | Serous | NaN | Female | Unifocal | NaN | Ukraine | IA | Serous |
| 103 | NaN | NaN | NaN | NaN | NaN | Serous | NaN | NaN | NaN | NaN | Ukraine | NaN | Serous |
104 rows × 13 columns