Projects aim to impute missing values of the given datasets. You have to write a code in the programming
language of your choice (e.g., MTLAB /or/ Python /or/ R /or/ FORTRAN /or/ C /or/ C++) to read some excel data
(step-1), identify the missing data (step-2), and then impute the missing values in the data based on the technique
given in the proposed reference for this project (step-3), consequently, return the imputed data and compare it
with the complete data to measure the accuracy and reliability of your results (step-4).
In the step 1, do not limit your code to a specific data size or data dimension, I mean you have to be able to read
or load the data with different size and dimension. You will receive some datasets with numerical/categorical
attributes in XLS and/or CSV format, I will upload later!
In the step 2, you discover the number and the location of the missing data. For instance, if you return the missing
indices, you are able to discover the missing data patterns (univariate, monotone, arbitrary missing data). Then not
only you can successfully handle the next step, but also you gain more points!
In the step 3, you have to read the reference paper given for the proposed method and understand the algorithm
and try to write a code to impute (i.e., single or multiple) the missing data based on the given approach.
In the step 4, you have to manage your code to return the imputed values. Then you are able to compare the
imputed values with the original complete data to compute the error (NRMS). You can automatically or manually
generate some diagrams to present and compare your results with the original complete datasets.
Every step has its own credit and the successful and unsuccessful projects will be considered into account.
However, I expect the clear and commented (to some extend) programming where we are able to execute your
code easily, see and check your results (preferably by means of a visualization technique of your choice) and
trustful and reliable results.