Text classifier are systems that classify your texts and divide them in different classes. In this article we are going to made one such text classifier using textblob and python. You want to read more about naive bayesian theorem, read it here.
Naive bayesian text classifier using textblob and python
For this we will be using textblob, a library for simple text processing. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
We will do this in separate python environment for this we need virtualenv. How to install virtualenv.
sudo pip install virtualenv
Now we have installed virtualenv next step is to create virtaul environment for our little project. Run the below command to create virtualenv.
virtualenv sent
sent is the name of the environment. Now we have created an environment. The above command will create an environment and install setup tools in it. Now we need to launch the environment. For this, run the below command
. sent/bin/activate
Now for installing textblob use below commands
pip install textblob python -m textblob.download_corpora
The second command will download the data files that textblob uses for its functionality and for nltk. Now look at the below script which will do the sentiment classification for you.
Now look at the below script
train = [ ('What an amazing weather.', 'pos'), ('this is an amazing idea!', 'pos'), ('I feel very good about these ideas.', 'pos'), ('this is my best performance.', 'pos'), ("what an awesome view", 'pos'), ('I do not like this place', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with all this tension", 'neg'), ('he is my sworn enemy!', 'neg'), ('my friends is horrible.', 'neg') ] test = [ ('the food was great.', 'pos'), ('I do not want to live anymore', 'neg'), ("I ain't feeling dandy today.", 'neg'), ("I feel amazing!", 'pos'), ('Ramesh is a friend of mine.', 'pos'), ("I can't believe I'm doing this.", 'neg') ] from textblob.classifiers import NaiveBayesClassifier cl = NaiveBayesClassifier(train) print cl.classify("This is an amazing library!") # Lets test the accuracy of the classifier print cl.accuracy(test)
Now you have classifier cl which is based on Naive Bayes Classifier. Use this classifier to get your text classified. Keep in mind that the text classifier generally need a huge amount of data to be trained and here the data is very less. Also we calculated the accuracy of the classifier.
Also the time of training the classifier increases with the data. So the primary approach is to make the classifier object and keep it in memory to use it again and again when required you can also update the classifier as below.
How to update classifier
new_data = [('She is my best friend.', 'pos'), ("I'm happy to have a new friend.", 'pos'), ("Stay thirsty, my friend.", 'pos'), ("He ain't from around here.", 'neg')] cl.update(new_data) cl.accuracy(test)
It is as simple as it. Its really simple to make these with the libraries present now a days its just that we don’t know the libraries and hence we don’t build these.
If you like the article please share and subscribe.
3 COMMENTS
[…] Naive Bayesian Text Classifier using Python and TextBlob […]
is it possible to classify the text for topics? I mean tagging. Or textblob is not suitable for this task?
If you use naive Bayesian classifier with text blob and you have limited number of topics and dataset to train it. You can achieve this.