Natural Language Processing using python and textblob. In this article we are going to see how we can get part of speech, noun phrases, sentences and tokenization. We will use textblob which we used before to make classifiers. You can find the previous videos below.
Naive Bayesian Text Classifier using Python and TextBlob
Lets start by using installing required libraries.
We will do this in separate python environment for this we need virtualenv. How to install virtualenv.
sudo pip install virtualenv
Now we have installed virtualenv next step is to create virtaul environment for our little project. Run the below command to create virtualenv.
virtualenv sent
sent is the name of the environment. Now we have created an environment. The above command will create an environment and install setup tools in it. Now we need to launch the environment. For this, run the below command
. sent/bin/activate
Now for installing textblob use below commands
pip install textblob python -m textblob.download_corpora
The second command will download the data files that textblob uses for its functionality and for nltk. Now look at the below script which will do the sentiment classification for you.
Now that you have installed the required libraries lets look at the scripts needed to get the required parts.
Part of speech Tagging
from textblob import TextBlob text = TextBlob("Python is a high-level, general-purpose programming language.I am loving it.") print text.tags
Thats it now you will get the list of POS tags as below in a list.
[('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]
Noun Phrase Extraction
Just use the below attribute to get the noun phrases.
text.noun_phrases
You will get the below results.
WordList(['python'])
Tokenization
text.words
will give the list of all the words.
text.sentences
will give the list of all the sentences.
Spelling Correction
b = TextBlob("I havv goood speling!") print(b.correct())
This will attempt to correct the spelling
Word frequencies
b = TextBlob("One plus One is two") print(b.word_counts['One'])
The result will be 2
Translate to other dialect
en_blob = TextBlob(u'Simple is better than complex.') en_blob.translate(to='es')
TextBlob("Simple es mejor que complejo.") will be the result.
All of the above information is taken from https://textblob.readthedocs.io/en/dev/quickstart.html#get-word-and-noun-phrase-frequencies you read it for more details.