Getting useful information from the Internet plays an important role. A news site is one of the Internet services often used for obtaining information on the Internet. The news site has advantages such that information update is fast and there are abundant kinds of information, and in recent years there are sites that collaborate with multiple newspaper companies and post bulk content. However, as there are a lot of articles, there are problems that it is difficult to find the articles we would like to read. Therefore, how to classify and present articles is an important issue. In this study, we consider the category classification of documents using a distributed representation of sentences. Specifically, we propose a method to classify articles by extracting words with similar meanings from sentence vectors of each category and assigning them as labels.
Distributed representation, paragraph vector, neural network, automatic labeling, text classification, category classification
 Fujitsu Laboratory, http://www.fujitsu.com/jp/group/fri/report/ cyber/research/4/title07.html
 T. Kobayashi, H. Suzuki, A. Hattori and H. Haruno, A life log system that performs automatic tagging of Twitter's tweet, The Special Interest Group Technical Reports of IPSJ, 2013-GN-87, Vol.6, 2013, pp.1-5. (in Japanese)
 M. Tsukada, M. Iwamura and K. Kise, Distorted Character Recognition and Automatic Labeling, Technical Report of IEICE, Vol. 111, No. 317, 2011, pp. 93-98. (in Japanese)
 Q. V. Le and T. Mikolov, Distributed Representations of Sentences and Documents, Proc. of 31st International Conference on Machine Learning, 2014.
 T. Saito and O. Uchida, Automatic Labeling for News Article Classification Based on Paragraph Vector, Proc. 9th International Conference on Information Technology and Electrical Engineering, 2017.
 K. Shiotsu and S. Iwashita, A Method for Automatic Tagging for Classification and Retrieval of News Contents, Proc. 18th Annual Conference of the Association for Natural Language Processing, 2012, pp.529-530. (in Japanese)
 R. Keruma, N. Toma, Y. Akamine, K. Yamada and S. Endo, A Basic Study about Automatic Label Generation on Topic Model, Proc. 76th Annual Conference of the Information Processing Society of Japan, 6C-4, 2014. (in Japanese)
 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, Distributed Representations of Words and Phrases and Their Compositionality," Proc. 26th International Conference on Neural Information Processing Systems, 2013, pp.3111-3119.
 Reuter News, https://www.reuters,com/
Cite this paper
Taishi Saito, Osamu Uchida. (2018) Automatic Labeling to Classify News Articles Based on Paragraph Vector. International Journal of Computers, 3, 27-32
Copyright © 2018 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0