Clustering methods are more
suitable than traditional approaches for Sentiment analysis. Since these
datasets contain people opinion and judgement. Sentiment analysis can be
classified mainly into lexicon-based methods, machine learning-based methods
and a combination of the two called hybrid methods. Lexicon-based
approach relies on discovering the right lexicon which contains user opinion
for text analysis. This can be done using dictionary-based approach or
corpus-based approach. The first approach relies on finding opinion “seed”
words and then searches the dictionary for their synonyms and antonyms. To
indicate the polarity of a document, lexicon based methods make use of
predefined lexicon. The polarity of the document is determined solely by the polarity
of the microphrases which compose it. The corpus-based approach begins with a
list of opinion words. It then finds other opinion words in a large corpus to
help in finding opinion words with context specific orientations. Lexicon based
approach offer many advantages over machine learning approach: They do not
require labeled data; They aren’t sensitive to the quality and quantity of the
training datasets; No need to preprocess classifier; In case of human labeled
document, efforts required are less; decision made by the classifier is
comprehensible; The learning procedure is easy to understand.
This approach does have
shortcomings. Consider a word which is perceived as positive emotion in one
domain. But in another domain, it could express negative emotion. Therefore,
performance reduces when it comes to different domains and different languages.
If enough lexicons are available, accuracy obtained is high. So, predefined
lexicons should be high. In the presence of emoticons, short hand texts or
abbreviations, lexicon method doesn’t perform well. More importance must be
given to finding sentiment words in the same domain. Lexicon based approach can
be further classified as dictionary based and corpus based.
first finds the opinion word from the review text, then finds its related
synonym or antonym from the dictionary. Initially, we gather a set of opinion
words with known orientations or polarities. We use these words as a reference,
and increase the set size by listing out the synonyms and antonyms using the
thesaurus. The words which are found in this way are added to the original set
of words, and the next iteration begins to gather further set of words using
the newly added words. This process is repeated until no new words are found. After
terminating the process, a manual check can be performed to remove errors.
The dictionary based approach has a major disadvantage:
Inability to find opinion words with domain and context specific orientations.
Not as effective
as dictionary based approach, since building a huge corpus to cover all words
is hard. But this method can be combined with other methods to obtain good
performance. Major advantage using this method is that we can find domain and
context specific opinion words and their orientations using a domain corpus.
helps to find opinion words in a context specific orientation. We begin with a
list of opinion words and then find other opinion words in a huge corpus. Its
methods depend on syntactic patterns or patterns that occur together along with
a seed list of opinion words to find other opinion words in a large corpus.
Corpus based approach is performed either using statistical or semantic