Perl short text classifier to guess person's ethnicity from their name

已完成 已发布的 5 年前 货到付款
已完成 货到付款

Your job is to create a simple, short string classifier in Perl. The input to the system is a person's name, thus an UTF-8 encoded string usually between 10-40 characters in length, and the system will classify to which of the pre-defined classes the string belongs to. The classes are ethnicity groups.

For example, if input to the system is "John Smith", the system would output class "English", or if the input is "hiromi akiyama" the system would output "Japanese". There are 18 different classes (ethnicity groups).

The system has two parts: 1) Training script called [url removed, login to view] which trains the system using given training data (list of known 'name = ethnicity' pairs) and saves the "trained state" of the system to disk. The script is called by "perl [url removed, login to view] [url removed, login to view]".

2) Analyzer script [url removed, login to view] which loads the "trained state" from disk (generated by the training script previously), and uses the loaded data to classify to which class a given string belongs to. The script is called by "perl [url removed, login to view] [url removed, login to view]" in which case it will load the given test file, OR as in "perl [url removed, login to view] "john smith"" in which case it would simply analyze (classify) the given string from the command line ("john smith" in this case).

Attached is data.zip. It contains [url removed, login to view] and testing_data.txt. The data is in format of "name:class" where the name is base64 encoded.

Your system must be able to be trained using the given [url removed, login to view] in a way it analyzes [url removed, login to view] with 90% or better accuracy.

Notice: The solution must be some kind of training based solution. For example, a bayesian classifier, ngram analyzer or artificial intelligence or machine learning of some sort. The solution must not be based on any regular expressions or fixed (human written) set of detection rules.

You are free to use any existing free Perl code, libraries and modules, such as AI or data classifier libraries.

Perl

项目ID: #16538057

关于项目

5个方案 远程项目 活跃的5 年前

授予:

kchwistek

Hi I am quite experienced programmer knowing several programming languages. Your project is interesting. In past I have studied Computer Science and the AI topic is something what I like to think about. Unfortunately t 更多

$165 USD 在10天内
(2条评论)
3.5

有5名威客正在参与此工作的竞标,均价$156/小时

freelance4hire80

hi, I've checked the project spec. I can come out a perl script for you by using Algorithm::NaiveBayes for example to predict the person's ethnicity using the training data sets

$155 USD 在3天内
(49条评论)
6.6
balu0priya1

I would like to work on this project as I have enough experience in perl .... If interested please let's know.

$155 USD 在3天内
(0条评论)
0.0