Open Science Research Excellence

Open Science Index

Commenced in January 2007 Frequency: Monthly Edition: International Paper Count: 2

2
2830
Named Entity Recognition using Support Vector Machine: A Language Independent Approach
Abstract:

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

1
12114
Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines
Abstract:
This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.

Vol:13 No:12 2019Vol:13 No:11 2019Vol:13 No:10 2019Vol:13 No:09 2019Vol:13 No:08 2019Vol:13 No:07 2019Vol:13 No:06 2019Vol:13 No:05 2019Vol:13 No:04 2019Vol:13 No:03 2019Vol:13 No:02 2019Vol:13 No:01 2019
Vol:12 No:12 2018Vol:12 No:11 2018Vol:12 No:10 2018Vol:12 No:09 2018Vol:12 No:08 2018Vol:12 No:07 2018Vol:12 No:06 2018Vol:12 No:05 2018Vol:12 No:04 2018Vol:12 No:03 2018Vol:12 No:02 2018Vol:12 No:01 2018
Vol:11 No:12 2017Vol:11 No:11 2017Vol:11 No:10 2017Vol:11 No:09 2017Vol:11 No:08 2017Vol:11 No:07 2017Vol:11 No:06 2017Vol:11 No:05 2017Vol:11 No:04 2017Vol:11 No:03 2017Vol:11 No:02 2017Vol:11 No:01 2017
Vol:10 No:12 2016Vol:10 No:11 2016Vol:10 No:10 2016Vol:10 No:09 2016Vol:10 No:08 2016Vol:10 No:07 2016Vol:10 No:06 2016Vol:10 No:05 2016Vol:10 No:04 2016Vol:10 No:03 2016Vol:10 No:02 2016Vol:10 No:01 2016
Vol:9 No:12 2015Vol:9 No:11 2015Vol:9 No:10 2015Vol:9 No:09 2015Vol:9 No:08 2015Vol:9 No:07 2015Vol:9 No:06 2015Vol:9 No:05 2015Vol:9 No:04 2015Vol:9 No:03 2015Vol:9 No:02 2015Vol:9 No:01 2015
Vol:8 No:12 2014Vol:8 No:11 2014Vol:8 No:10 2014Vol:8 No:09 2014Vol:8 No:08 2014Vol:8 No:07 2014Vol:8 No:06 2014Vol:8 No:05 2014Vol:8 No:04 2014Vol:8 No:03 2014Vol:8 No:02 2014Vol:8 No:01 2014
Vol:7 No:12 2013Vol:7 No:11 2013Vol:7 No:10 2013Vol:7 No:09 2013Vol:7 No:08 2013Vol:7 No:07 2013Vol:7 No:06 2013Vol:7 No:05 2013Vol:7 No:04 2013Vol:7 No:03 2013Vol:7 No:02 2013Vol:7 No:01 2013
Vol:6 No:12 2012Vol:6 No:11 2012Vol:6 No:10 2012Vol:6 No:09 2012Vol:6 No:08 2012Vol:6 No:07 2012Vol:6 No:06 2012Vol:6 No:05 2012Vol:6 No:04 2012Vol:6 No:03 2012Vol:6 No:02 2012Vol:6 No:01 2012
Vol:5 No:12 2011Vol:5 No:11 2011Vol:5 No:10 2011Vol:5 No:09 2011Vol:5 No:08 2011Vol:5 No:07 2011Vol:5 No:06 2011Vol:5 No:05 2011Vol:5 No:04 2011Vol:5 No:03 2011Vol:5 No:02 2011Vol:5 No:01 2011
Vol:4 No:12 2010Vol:4 No:11 2010Vol:4 No:10 2010Vol:4 No:09 2010Vol:4 No:08 2010Vol:4 No:07 2010Vol:4 No:06 2010Vol:4 No:05 2010Vol:4 No:04 2010Vol:4 No:03 2010Vol:4 No:02 2010Vol:4 No:01 2010
Vol:3 No:12 2009Vol:3 No:11 2009Vol:3 No:10 2009Vol:3 No:09 2009Vol:3 No:08 2009Vol:3 No:07 2009Vol:3 No:06 2009Vol:3 No:05 2009Vol:3 No:04 2009Vol:3 No:03 2009Vol:3 No:02 2009Vol:3 No:01 2009
Vol:2 No:12 2008Vol:2 No:11 2008Vol:2 No:10 2008Vol:2 No:09 2008Vol:2 No:08 2008Vol:2 No:07 2008Vol:2 No:06 2008Vol:2 No:05 2008Vol:2 No:04 2008Vol:2 No:03 2008Vol:2 No:02 2008Vol:2 No:01 2008
Vol:1 No:12 2007Vol:1 No:11 2007Vol:1 No:10 2007Vol:1 No:09 2007Vol:1 No:08 2007Vol:1 No:07 2007Vol:1 No:06 2007Vol:1 No:05 2007Vol:1 No:04 2007Vol:1 No:03 2007Vol:1 No:02 2007Vol:1 No:01 2007