Building a Prototype Environment for Talend and HDP + Crawler

已完成 已发布的 May 19, 2013 货到付款
已完成 货到付款

We are looking for a company or group or individual who has experience in Talend and HDP. (Hortonworks Data Platform)

This project is prototype project to verify the technical issues before the real project comes.

Therefore, if the result of this project is good, we will hire you for the real project also.

If you have experience of Talend, HDP and JAVA, this project is NOT hard to perform.

Only experienced candidates will be welcomed.  

1. By user-defined keywords, extract specific data(mainly, area) of a specified website and store the data by a text file or db format on my server.

2. By a scheduled Job, inject the extracted data into HDP. And the job must be designed by Talend Open Studio.

- The process condition for the job will be defined by discussion as you design the job. 

- Create 3~5 tables on a mySQL DB for the result

- The transfer from HDP to mySQL DB is performed by Talend. 

- The design of the tables will be discussed as you reach that step.

** Reference : The configuration of the prototype system is attached.

1. First, you should install all the components on my servers by remote. (TeamViewer will be used) 

2. You should work on my servers from the beginning. 

3. All the installation steps must be transferred to my staff by following your way.( Skype will be used for this communication)

: This is the key principle condition for payment.

1. Linux Web Crawler : you should prepare a proper crawler to satisfy the needs, below.

: This crawler must be offered to us. and we can ask some additional customizing needs for the project propose. (discussion needed)

- Searching specified Web site by keywords (including the subdirectories)

- Extraction should be repeated by time setting.

- Extracting Items

a. URL

b. meta tag(title, description, keyword) 

c. plain text between to tag

d. page size

e. last modified date value

2. Talend Open Studio for Big data (free version)

- Already Installed on my machine. but, if you need, you can use user own.

- But, the result project files(Talend project files) must be offered us.

3. HDP  (free version)

- Already Installed on my machine. and you should work on it during this project.

- You can reinstall it or change the configuration of it if you need.

- All the changed and modified history must be transferred to us.

** This project will start by this Wednesday(22th/June), at 9:30am (GMT+9).

** The bid will be closed until this Tuesday(21th/June), at 11pm(GMT+9).

** The desired due date for completing this project is this Sunday, at 2pm(GMT+9)

工程 Hadoop Java MySQL

项目ID: #4536503

关于项目

4个方案 远程项目 活跃的May 21, 2013

授予:

salemkiro

I should be to handle the different parts of the project.

$666 USD 在11天内
(0条评论)
0.0

有4名威客正在参与此工作的竞标,均价$727/小时

IMSeriousBidder

Hello Sir, I am Bing from China,I have necessary skills for this project please check more detais in PM Thanks Bing

$800 USD 在5天内
(82条评论)
7.0
softgallery

Greetings,,,our goal is to exceed the expectations of every client by offering outstanding designs, increased flexibility, greater value in development, thus optimizing system functionality and improving operation effi 更多

$666 USD 在20天内
(0条评论)
0.0
reachusSP

We dealt so many projects in ETL field like AbInitio,Informatica, Talend,Hadoop. We have well talented people who can handle projects and wrk as team to complete requirements on time. Few details about our IT Firm belo 更多

$777 USD 在3天内
(0条评论)
0.0