The Washington Post Police Data has entries dating back to January 2, 2015, and it is updated each week with the latest information. Today’s session covered a variety of questions, including the issue of missing values for columns like armed, flee, and race that get string entries. And how can we add the missing values to the data? There are several approaches for doing this:
Mode imputation : Replace the missing values with most frequent entry(Mode) in to the column. I think this method would be suitable for armed column even though it has many entries but gun, knife, replica being the most frequent one.
Forward fill (ffill) or Backward Fill (bfil) : fills in the missing values with either the value above or below the current value. This method might be suitable for flee as it has most of the entries as ‘not’.
Constant Imputation : Replaces missing values with a specified constant. This method would appropriate for body camera and signs of mental illness as they either True or False (constant value).
Alternatively, if we are unsure of how to fill in certain unique columns, such as the one named “state” in this dataset, we can train a machine learning model to predict the values based on the other dataset entries. Discussed above Various other methods for filling missing entries in a column can be considered, and their effect on model accuracy can be assessed to identify the most effective approach. I would like to verify my testing set with the above mentioned methods and test which one is more efficient .