cross-validation is a technique which is used to assess how well a predictive model will perform on new, unseen data(test data).
We’re testing different ways of predicting diabetes based on obesity and inactivity. We try simple and more complex models (polynomials of degree 1 to 4).
In the dataframe(combined_data) we have common data for all the 3 variables %DIABETIC, %INACTIVE, and %OBESE. The three variables—obesity, inactivity, and diabetes—have records for 354 data points.
In cross validation we basically perform :
1. Divide and Check: – We split our data into 5 parts (folds).
– Train models on 4 parts and check how well they predict on the remaining part
2. What We Look For:
– We’re trying to find the best balance – not too simple (underfit) and not too complex (overfit).
3. Results:
– After doing this 5 times (each fold as a test once), we compare which model works best.
– This helps us choose the best way to predict diabetes with obesity and inactivity.