WikiData is project which attempts to collect data about everything in our world. Basically, it will contain information of people, cities, countries, foods, atoms, stars, everything.
You can download their entire database from here in JSON format:
[login to view URL]:Database_download#JSON_dumps_(recommended)
All entries in this database are based on a Q code. The Q code is an unique index number of the specific item. For example, [login to view URL] is the entry for a famous person.
We are interested of WikiData's data on people. For example that of [login to view URL]
Your job is to write a script in any language you want, that will analyze the downloaded WikiData database file(s) and output an SQL insert file.
The script must extract 6 things of entries of people:
1) Person's full name,
2) Person's given name (= first name),
3) Person's family name,
4) Person's gender,
5) Person's country of citizenship,
6) Person's native language.
So, for the same person [login to view URL] These values would be:
1) Full name = Manuel José Bonnet Locarno
2) Given name = Manuel
3) Family name = Bonnet
4) Gender = M
5) Country of citizenship = Colombia
6) Native language = Spanish
And this data would be added to the output SQL file as follows:
INSERT IGNORE INTO wikidata (q_code, full_name, first_name, family_name, gender, country, language) VALUES ("Q5993357", "Manuel José Bonnet Locarno", "Manuel", "Bonnet", "M', "Colombia", "Spanish");
The next found person from the WikiData database file(s) would generate a new line to the output SQL file, and so on, and so on.
Please have the script show some kind of progress indication. For example, the number of rows or entries in the database and the current index or row currently being analyzed, so when running the script, you could see the progress.
The script must ignore all other types of entries than persons. Also, if the person's data is missing any of the 6 data fields (full name, given name, family name, gender, country of citizenship, or native language), skip that person.
The purpose of this task is to be able to run the script, and it will generate a huge SQL file which will insert the person data (name / gender / country / language) to database.
In your bid, please state what language you would use to write this. Scripting language such as Perl, PHP or Python would be preferred.
39 威客就此工作平均出价 $184
I have worked with huge (> 100Gb) files before that's why I'm sure you'll be impressed with my work. I can provide you Perl or Python script that will parse JSON and generate SQL file.
Now that have read your brief and feel very confident on it. PHP or Python will be used simply to do your job. If you could award me the project, faithful result will be promised. Kind regards