Twitter Sentiment Analysis in R
I have a JSON file which contains a sample of tweets. I would like to analyse these for sentiment in R.
I have one method of analysing sentiment, shown below, however I am open to ideas on how to improve the method.
Attached are 3 files:
JSON [login to view URL] – the JSON file containing the tweets
[login to view URL] – a lexicon of positive words
[login to view URL] – a lexicon of negative words
Please provide an R script which:
1. Imports the tweets in the JSON file into R as a dataframe
2. Identifies the tweets themselves (the $body field in the JSON file)
3. Allows sentiment analysis in the form demonstrated below, or an improved form of your choosing
4. Is useable in real time with a live JSON stream as well as the static file attached
Additionally, please provide workspace images showing the result of the script.
Thank you very much in advance!
Sentiment analysis method currently used:
library (plyr)
library(stringr)
# function to score sentiment
[login to view URL] = function(sentences, [login to view URL], [login to view URL], .progress='none')
{
require(plyr)
require(stringr)
# we got a vector of sentences. plyr will handle a list or a vector as an "l" for us
# we want a simple array of scores back, so we use "l" + "a" + "ply" = laply:
scores = laply(sentences, function(sentence, [login to view URL], [login to view URL]) {
# clean up sentences with R's regex-driven global substitute, gsub():
sentence = gsub('[[:punct:]]', '', sentence)
sentence = gsub('[[:cntrl:]]', '', sentence)
sentence = gsub('\\d+', '', sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
[login to view URL] = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist([login to view URL])
# compare our words to the dictionaries of positive & negative terms
[login to view URL] = match(words, [login to view URL])
[login to view URL] = match(words, [login to view URL])pos
# match() returns the position of the matched term or NA
# we just want a TRUE/FALSE:
[login to view URL] = ![login to view URL]([login to view URL])
[login to view URL] = ![login to view URL]([login to view URL])
# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
score = sum([login to view URL]) - sum([login to view URL])
return(score)
}, [login to view URL], [login to view URL], .progress=.progress )
[login to view URL] = [login to view URL](score=scores, text=sentences)
return([login to view URL])
}
#load positive and negative word lexicons
pos.words.file=scan('[login to view URL]',what='character',comment.char=';')
neg.words.file=scan('[login to view URL]',what='character',comment.char=';')
pos.words=c([login to view URL])
neg.words=c([login to view URL])
#run function on dataframe named 'tweets'
sentiment=[login to view URL](tweets,[login to view URL],[login to view URL],.progress='text')
#total score for all tweets
sentiment.total=sum(sentiment$score)
[login to view URL]
I currently perform twitter sentiment analysis on a daily basis, and I recently wrote an R script to get the content from the JSON files, perform the sentiment analysis and provide the results in a data frame.
Furthermore, the script uses multiple cores and the package [login to view URL], so it runs very fast and scale up quite efficiently. The script is ready to process data from common social content pipelines (i.e. I usually love to use Datasift). However, I am happy to adapt the script for you, if you want.
I have a few slides of how I usually implement the entire solution, and I can send them to you for evaluation.
Feel free to get in touch at your convenience and visit my profile on Linkedin: [login to view URL]
Thank you and kind regards,
Daniele.
Hi,
Many thanks for inviting me to bid. I have looked at you project needs and I can confidently tell you that I am able to deliver. I am very fluent in the R language and have completed several project here (freelancer) using it. Every classifier has its pitfalls and this is the subject of ongoing research in text mining. The classifiers I can think of such as the Naive bayes (NB) of support vector machines (SVM) need training data (a list of tweets whose sentiments are assigned by some gold standard method; human) before application. So I guess I might have to go with your proposed algorithm.
I will be happy to work with you. Feel free to raise any questions.
Kind regards.
I have the code for my stock market analysis and also need to test with your codes if works. I wish to discuss regarding this project in time.
I am not new freelancer as i have several project here using R and java. Skill "R" is newly introduced here.
Thanks,
raiseq
I already worked a bit with the twitter APIs for fun, so I already have the R code to import the JSON into a dataframe. I tested it on the sample you attached and works perfectly, converting the JSON into a dataframe with 204 rows (the maximum number of attributes inside the JSON elements) and 499 columns (the number of tweets in the sample file attached).
If for an user <i> a certain attribute <j> is not specified in the JSON file then the <i,j> entry is a NA, otherwise is the desired attribute.
I can send you the .csv dataframe i obtained from the sample JSON you posted if you want to check it.
From this dataframe it's immediate to extract the tweets (or any other information we want) and run the sentiment analysis with the script you posted.
The live time running is a bit tricky since, as you know, R is a interpreted language and it's not possible to create an .exe file from a R script. What I suggest it's to create a .bat file that runs the R script and from the schedule task menu in Windows schedule the launching of the .bat file every <x> minutes/hours/days.
I can definitely write the .bat file (is literally 3 lines of code) and you can then schedule the event whenever you want from your schedule task manager.
(I'm speaking about Windows/bat because I'm a windows user. The same things should be obtainable on a Mac with launchd/command)
I have myself done the twitter sentiment analysis. I created an App on the twitter and then in R, I used TwitteR Library and using Raouth facility to connect my R to the twitter. Then I extracted large amount of data in Dataframes and save it into Excel files. After that few codes in R and sentiment analysis is done!!!