
The quality of a Language models depends on two building blocks – the model architecture and the training data.

She used SpaCy’s ‘en_core_web_lg’ language model. My partner created a visualisation identifying the POS per paragraph for Martin Luther King, Jr.’s celebrated “I HAVE A DREAM. SpaCy offers four models for English POS tagging. In their opinion, at sentence level, the accuracy is much lower than the claimed 97%. Some scholars, however, have argued that the per token accuracy is not the best way to estimate the accuracy of the POS engine. Most good POS taggers report accuracy numbers of 97% and above on a per word (aka token) basis. There are many NLP tasks based on POS tags. POS tagging is a fundamental problem in NLP. For example, the work left can be a verb when used as ‘he left the room’ or a noun when used as ‘ left of the room’. A word can have multiple POS tags the goal is to find the right tag given the current context. In natural language processing (NLP), there is a similar task called POS tagging, where the aim is to tag each word in a sentence to the correct part of speech (POS).

Understanding this fundamental exercise of classification requires a lot of training. It is important to realise that it is not unusual for a word to belong to multiple part-of-speech classes, depending on how the word is used. Words are classified into parts of speech (POS) according to the way they function in a sentence. The fundamental building block of all languages is the word.
