oalogo2  

AUTHOR(S):

Levon Stepanyan

 

TITLE

Automated Custom Named Entity Recognition and Disambiguation

pdf PDF

ABSTRACT

Named Entity Recognition (NER) and Disambiguation are sub-tasks in Natural Language Processing (NLP) that seek to identify and classify named entities in the text into their designated categories. With recent advancements in Deep Learning it is possible to use attention mechanisms and recurrent networks in order to produce reliable NER predictions. The use of NER ranges from profanity detection to extracting meta-data from documents. However, the greatest shortcoming of the classical NER models is the limited number of predefined classes that are set in the task (i.e. Person (PER), Location (LOC), Companies/institutions (ORG) etc.). With this limitation in mind we proposed a novel fast approach (FastEnt) to tackle the task of identifying and detecting Custom Named Entities (CNE) that are not limited to definition. The task was split into 2 parts, where we initially create a basis space of words using several examples of the entity we are trying to identify, by using search across the word representation found through FasText and Word2Vec. We further complete automated online scraping from several sources such as Reddit in order to obtain an annotated corpus that will be used in the modeling step. After producing the Annotated corpus with the designated CNE we train a dilated convolutional neural network with recurrent mechanisms to complete NER on this new entity. We test our findings on classic NE’s mentioned above and are able to reliably reproduce the State-of-the-art (SOTA) results and further show consistent results with this approach on several custom named entity tasks.

KEYWORDS

NLP, NER, Neural Networks, CRF, Parallelization, databases, API

 

Cite this paper

Levon Stepanyan. (2020) Automated Custom Named Entity Recognition and Disambiguation. International Journal of Signal Processing, 5, 1-8

 

cc.png
Copyright © 2020 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0