Find Jobs
Hire Freelancers

Develop Text Classification and/or Clustering Algorithms in Python

$250-750 USD

进行中
已发布将近 8 年前

$250-750 USD

货到付款
We require assistance on the following tasks. Please contact us directly to describe how you would solve them. Russian language skills may be necessary. 1) Task: Develop/employ a text-classification algorithm in Python or R that classifies items as one of several thousand 10-digit product codes using a descriptive text field of roughly 300 characters in UTF-8 (Russian / Cyrillic). Description: We have a database of several million textual descriptions of products that have been entered by humans. Each entry is connected to a 10 digit product code, but the same product code can be used for multiple differing textual entries. We require a text-classification algorithm that probabilistically classifies a document that can then be applied to another dataset (see task 2). This task requires tokenizing, stemming, and removing stop words, and therefore you may need to know Russian or to use available NLTK packages. Similarly, several different algorithms may need to be used to improve precision. Output: Python scripts/algorithm(s) classifying documents into 10-digit product codes that can be used in task 2. 2) Task: Use the classification algorithm in (1) to classify textual entries in a second dataset. Description: Once the clean list has been created, employ a machine learning algorithm to assign the 10 digit codes to a target dataset of over 60 million textual product descriptions in UTF-8 (Russian / Cyrillic). Not all entries will have sufficient information to be classified ('leftovers') and should be marked so. For example, this could be done if no classification has a probability above some threshold. Also, the dataset in (1) only contains examples of a subset of the items in the second dataset, but we will be able to estimate which items these are. Output: Second dataset of 60 million entries are matched to 10 digit product codes. 3) Task: For the 'leftovers' of (2), develop/employ a text clustering algorithm that groups entries in k subclasses Description: We will provide you a higher-level grouping variable for the 'leftovers' and a number k that designates how many we clusters need within each grouping. Your task will be to use a text clustering algorithm to create k amount of 'clusters' within the higher groups for the 'leftovers'. Output: A unique variable designating cluster membership for each item in the 'leftovers' (those without 10 digit product codes from step 2).
项目 ID: 10479220

关于此项目

17提案
远程项目
活跃8 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作
17威客以平均价$1,372 USD来参与此工作竞价
用户头像
HI there. I would love to be part of this project as it seems very interesting. I am a data scientist with experience applying data mining algorithms to large amounts of data for prediction and description. I do not have knowledge of russian language, but I do have experience using already developed packages to pre process data. I would do all tasks in python. Hope to hear back from you soon. Thanks, Daniel
$526 USD 在10天之内
5.0 (103条评论)
8.7
8.7
用户头像
We are a group of Data Scientists based in Bangalore. Our core areas of expertise are big data and machine learning.
$10,000 USD 在40天之内
4.9 (9条评论)
6.4
6.4
用户头像
I am a computer science professional with a PhD degree and excellent skills in Python and a number of other languages. I've done many projects involving Clustering or Classification. I'm also a fluent Russian speaker. Please see reviews on my profile. It would be my pleasure to do your project. Here is another large project in which I had to process a large volume of texts in Russian using Python: https://www.freelancer.com/projects/Python/Data-Extraction-from-Word-documents/
$1,000 USD 在10天之内
5.0 (59条评论)
6.3
6.3
用户头像
I am very interesting in your project. I have experience in this field. If you work with me, you will get success. I am ready to work with you now. Phon.
$736 USD 在10天之内
4.9 (25条评论)
5.8
5.8
用户头像
Dear Client, Greetings from Flowgica technologies, I have experience with these skills. We do have similar experience therefore I am looking forward to discuss and move ahead. please check our freelancer portfolio at https://www.freelancer.com/u/mmadi.html?page=portfolio I am ready to work with you,kindly waiting for your response. Thanks & Regards, Mmadi
$600 USD 在10天之内
5.0 (1条评论)
3.9
3.9
用户头像
My name is Mike and I’m from UK. I work with individual clients and also provide outsourcing services for a number of UK and USA based agencies. Your project description sounds interesting to me and I do have skills & experience that is required to complete this project. I can show you some examples of my work. Please contact me to discuss your project.
$555 USD 在10天之内
5.0 (1条评论)
3.2
3.2
用户头像
i have gone through your requirement we done similar kind of job before looking forward your earliest Reply on this for a project discussion Awaiting for your earliest reply
$555 USD 在10天之内
0.0 (0条评论)
0.0
0.0
用户头像
Hello, I understood the initial scope of this project. Although i want to discuss further this job in order to prepare the final concept for this project. After Complete discussion over the call or in chat, i will prepare following things for you - Technical Project Proposal - Flow chart for this Project - Execution plan (Step by step procedure with explanation how and at what that we are going to execute a particular task)
$773 USD 在20天之内
0.0 (0条评论)
0.0
0.0
用户头像
Currently Im working part time, where Im using R on daily basis. I have practical experience with R programming and also with classification algorithms, text mining, clustering and machine learning. Im also student in the field of Economics and Econometrics in Praque.
$1,666 USD 在10天之内
0.0 (0条评论)
0.0
0.0
用户头像
A proposal has not yet been provided
$1,111 USD 在21天之内
0.0 (0条评论)
0.0
0.0

关于客户

UNITED STATES的国旗
Washington, United States
5.0
6
付款方式已验证
会员自1月 6, 2016起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。