Part of Speech tagging, noun phrases, sentences and tokenization for natural language processing.

Gaurav Yadav

8 years ago

Natural Language Processing using python and textblob. In this article we are going to see how we can get part of speech, noun phrases, sentences and tokenization. We will use textblob which we used before to make classifiers. You can find the previous videos below.

Naive Bayesian Text Classifier using Python and TextBlob

Lets start by using installing required libraries.

We will do this in separate python environment for this we need virtualenv. How to install virtualenv.

sudo pip install virtualenv

Now we have installed virtualenv next step is to create virtaul environment for our little project. Run the below command to create virtualenv.

virtualenv sent

sent is the name of the environment. Now we have created an environment. The above command will create an environment and install setup tools in it. Now we need to launch the environment. For this, run the below command

. sent/bin/activate

Now for installing textblob use below commands

pip install textblob
python -m textblob.download_corpora

The second command will download the data files that textblob uses for its functionality and for nltk. Now look at the below script which will do the sentiment classification for you.

Now that you have installed the required libraries lets look at the scripts needed to get the required parts.

Part of speech Tagging

from textblob import TextBlob
text = TextBlob("Python is a high-level, general-purpose programming language.I am loving it.")
print text.tags

Thats it now you will get the list of POS tags as below in a list.

[('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]

Noun Phrase Extraction

Just use the below attribute to get the noun phrases.

text.noun_phrases

You will get the below results.

WordList(['python'])

Tokenization

text.words

will give the list of all the words.

text.sentences

will give the list of all the sentences.

Spelling Correction

b = TextBlob("I havv goood speling!")
print(b.correct())

This will attempt to correct the spelling

Word frequencies

b = TextBlob("One plus One is two")
print(b.word_counts['One'])

The result will be 2

Translate to other dialect

en_blob = TextBlob(u'Simple is better than complex.')
en_blob.translate(to='es')

TextBlob("Simple es mejor que complejo.") will be the result.

All of the above information is taken from https://textblob.readthedocs.io/en/dev/quickstart.html#get-word-and-noun-phrase-frequencies you read it for more details.