New Method Of Feature Selection For Persian Text Mining Based On Evolutionary Algorithms
Abstract
Today, with the increasingly growing volume of text information, text classification methods seem to be essential. Also, increase in the volume of Persian text resources adds to the importance of this issue. However, classification works which have been especially done in Persian are not still as extensive as those of Latin, Chinese, etc. In this paper, a system for Persian text classification is presented. This system is able to improve the standards of accuracy, retrieval and total efficiency. To achieve this goal, in this system, after texts preprocessing and feature extraction, a new improved method of feature selection based on Particle Swarm Optimization algorithm (PSO) is innovated for reducing dimension of feature vector. Eventually, the classification methods are applied in the reduced feature vector. To evaluate feature selection methods in the proposed classification system, classifiers of support vector machine (SVM), Naive Bayes, K nearest neighbor (KNN) and Decision Tree are employed. Results of the tests obtained from the implementation of the proposed system on a set of Hamshahri texts indicated its improved precision, recall, and overall efficiency. Also, SVM classification method had better performance in this paper.
Keywords
Full Text:
PDFReferences
. M .Aci, And , M . Avci,” A hybrid classification method of k nearest neighbor, Bayesian methods and genetic algorithm”,Elsevier, 2010,vol. 37, p.5061–5067.
. M. shamsfard ,”processing Persian text: past finding and future challenges”, Tehran universitypress, 2007.
. A.yoosofan and M. zolghadri,”an automatic method for stopword recognition in Persian language”, amirkabir university press, 2005.
. M.Aljaly and O.frieder,” improving the retrieval effectiveness via light stemming approach”, journal of information science,2004,vol. 158, pp. 69-88.
. A.Unler and A. MuratA,” maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification”,Elsevier,2011, No. 181, P. 4625–4641.
M.saleh,” list of dissertations on Persian language and computers”, Tehran university press, 2007.
M. shamsfard ,”processing Persian text: past finding and future challenges”, Tehran university press,2007.
. M. Litvakand and. M. last ,”classification of web documents using concept extraction from ontologies”,Proceedings of the 2nd international conference on Autonomous intelligent systems: agents and data mining, Russia, , 2007, pp. 287-292.
. V. Gupta, and S. lehal,” a survey of text mining technique and applications”,journal of emerging technologies in web intelligence,2009,vol.1, no.1.
. A. Sharma , Sh. Dey,” Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis”, International Journal of Computer Applications on Advanced Computing and Communication Technologies for HPC Applications - ACCTHPCA, June 2012.