Gestational Diabetics can lead to complications for both mother and baby. The treatment always includes special meal plans and scheduled physical activity, and it may also include daily blood glucose testing and insulin injections. Early screening improved the pregnancy outcomes, such as emergency cesarean section, neonatal hypoglycemia and macrosomia. So while working on gestational diabetics data, there is a small doubt that if we can predict the GDM in patients at their first visit based on some basic biomarkers it might be helpful for the patients. So machine Learning has been used on the gestational diabetics data to predict their chances of getting GDM in future trimesters.
Importing Libraries :
import numpy as np
import pandas as pd
from sklearn.impute import KNNImputer
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.datasets import make_classification
from sklearn.impute import SimpleImputer
import plotly.graph_objs as go
import plotly.offline as py
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import f1_score
from sklearn.metrics import mean_squared_error
from sklearn import metrics
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import precision_score
from imblearn.under_sampling import RandomUnderSampler
from sklearn import preprocessing
from collections import Counter
#Loading data
from google.colab import files
uploaded = files.upload()
Reading the data :
Understanding the dataset :
Transforming all categorical columns into numerical columns :
In the same way label encoder can be fitted to the column 'Vit D Deficiency' or else one hot encoder can also be used.
Considering the biomarkers(SystolicBP, DiastolicBP, Weight,BMI,Age>30,Vit D Deficiency) from visit 1 as X.
GDM_Diagnoised as y.
And while training a model in Machine Learning null or missing values cannot be present.
From the given dataset the patients with and without gdm are not balancing , to balance we can use either undersampling or oversampling. Here implementation of undersampling on model can be observed.
pip install imblearn
from imblearn import under_sampling, over_sampling
from collections import Counter
from imblearn.under_sampling import RandomUnderSampler
rus=RandomUnderSampler(random_state=0)
X_resampled, y_resampled = rus.fit_resample(X,y)
print(sorted(Counter(y_resampled).items()),y_resampled.shape)
df2 = pd.DataFrame(X_resampled)
df2.head()
Sampled data from undersampling will be used for training and testing of the model. For which logistic regression can be implemented as shown below with an accuracy of around 67%.
From the dataset, the biomarkers data can be given as the input and the can predict whether the patient is with GDM or wihout GDM by resulting the column with 'Yes' if patient is with GDM, if the patient is without GDM then the GDM diagnoised column is "no'.
If the patient is known with the chances of gestational diabetics then necessary precautions can be taken.
input_data = (165.0,112.0,60.6,20.407797,0,0)
def gdm_diagnosis(input_data1):
input_data_as_numpy_array = np.asarray(input_data) #changing input data to numpy array
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1) #reshape the array as we are predicting for one instance
prediction = logmodel.predict(input_data_reshaped)
print(prediction)
gdm_diagnosis (input_data)
input_data = (138.0,63.0,94.5,38.387155,1.0,0.0)
def gdm_diagnosis(input_data1):
input_data_as_numpy_array = np.asarray(input_data) #changing input data to numpy array
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1) #reshape the array as we are predicting for one instance
prediction = logmodel.predict(input_data_reshaped)
print(prediction)
gdm_diagnosis (input_data)
If the patient is known with the chances of gestational diabetics then necessary precautions can be taken ahead only.