This is a time critical project (to be ready by Saturday, 28th October-2017)! Only full-time Freelancers are welcome to bid. Please don't bid if you don't "any" of have the following:-
(1) Expertise in Python, Numpy, Scipy, NLTK and Basic Linear Algebra
(2) Good comfort level with mathematics and equations
(2) Ability to report twice a day - morning IST and evening IST (IST=Indian Standard Time)
(4) Previous experience in Text based Outlier detection (This is not the same as Numerical Outliers). This condition might be negotiable if you don't have this skill set but right mathematical maturity.
The project is in three parts
(a) Coming up with a proper Dissimilarity function that can be fed into Sequential Exception Technique to detect Semantic Outliers (Semantic Outliers are those portions in the bigger text that deviate in meaning from the rest of the text).
(b) Latent Semantic Analysis part is ready but you have to tune the rank and window size to get good stats (see below for what defines a good stats).
(c) Combine LSA and SET to get better results than either of the two.
(d) a-c will be tested on four datasets: BBC News, ENRON email, NIPS and C50:50 Reuters dataset (links to be provided for interested freelancers)
(e) Four statistical parameters to be reported for a-c .. Precision, Recall, Accuracy and F1
Thank you for your time. Current Code and Selected literature may be provided to selected freelancers, if requested.