In this article, we present a simplified list of language ambiguity in natural language processing as well as the effects of ambiguity in the processing:
- Word sense ambiguity: The same noun could have two different meaning, such as the word bank in the sentences “Bank of America” and “river bank”.
- Part of speech ambiguity: Some words could have more than one POS tag, such as the word “lie”, which is according to the context, could come as a verb such as “lie down” or a noun “don’t tell me lies”.
- Syntactic or structural ambiguity: The sentence “I saw a man with a telescope” is it the man who’s carrying telescope? Or the man has been seen through the telescope? Such type of ambiguity causes a problem in creating knowledge graphs, relations extraction, machine translation.
- Attachment Ambiguity: when particular constituent in the sentence, could be attached to the parse tree at more than one place.
- Coordination Ambiguity: different sets or phrases could be created using the conjunction in the sentence.
Ambiguity could affect NLP pipeline stages such as:
- POS tagging: when the probability of some word seems apparently close for two words with a different part of speech tags.
- Tokenization: words such as VS. or et al. could cause a problem when it is confused with the sentence ending dot.
- Parsing: in PCFG, the parsing could be affected by Syntactic probability on different levels and in the different methodology of PCFG top-down or bottom-up or chart parsing