Need help to build and evaluate a named entity recognition (NER System) via sequence tagging or perform a systematic comparison of existing NER approaches its your choice.
$30-250 AUD
货到付款
3.0.1 Option I: Implementation
• Implement the Viterbi algorithm for predicting the best tag sequence according to a learnt model;
• Use this to construct a Maximum Entropy Markov Model;
• Explore possible feature sets and perform experiments comparing them;
• Evaluate the performance of your system on English and German;
• Describe your experiments, results and analysis in a report.
For Option I, your submission should include:
• your report (∼3 pages, not including tables/diagrams);
• a zipfile containing your code and README instructions on how to run it. Please do not include the
data, but assume the README directory contains the conll03 subdirectory.
3.0.2 Option II: Application
• Find existing NER systems and apply them to given text;
• Critically compare NER system descriptions;
• Systematically analyse and compare the errors made by NER systems;
• Describe your experiments, results and analysis in a report.
For Option II, your submission should include:
• your report (∼4 pages, not including tables/diagrams).
• a zipfile containing any code or notebooks used in analysis and README instructions on how to run
it. Please include the [login to view URL] data tagged by each system.
Data Set
The main dataset (eng) is a collection of newswire articles (1996-7) from Reuters, which was developed for the
Computational Natural Language Learning (CONLL) 2003 shared task. It uses the typical set of four entity types:
person (PER), organisation (ORG), location (LOC), miscellaneous (MISC). The second dataset (deu) is the German
data from CONLL 2003, which has an extra column (the second) which holds the lemma (base form) of each
word. Your system should be primarily developed for English, but also tested on German for comparison.
The prepared data can be downloaded from here. By downloading the data you agree 1) to only use it for this
assignment, 2) to delete all copies after you are finished with the assignment, and 3) not to distribute it to anybody.
The data is split into subsets:
• training ([login to view URL]) - to be used for training you system;
• development ([login to view URL]) - to be used for development experiments;
• held-out test ([login to view URL]) - to be used only once features and algorithms are finalised.
For further information, see [login to view URL]
Do not download and build the data set from the above URL. It does not include the text.
项目ID: #17138164
关于项目
有4名威客正在参与此工作的竞标,均价$236/小时
Hello, I can help with you in your project build and evaluate a named entity recognition . I have more than 5 years of experience in Machine Learning, Natural Language, Neural Networks, Python, Software Development. 更多
I have a good hands on working with Advanced Excel, R and Python and BI tools and technologies, AI, Big Data. I have quite a good knowledge of DL/ML Algorithm , have also developed Dashboards and Web Application. My ar 更多