New Method Of Feature Selection For Persian Text Mining Based On Evolutionary Algorithms

akram roshdi


Today, with the increasingly growing volume of text information, text classification methods seem to be essential. Also, increase in the volume of Persian text resources adds to the importance of this issue. However, classification works which have been especially done in Persian are not still as extensive as those of Latin, Chinese, etc. In this paper, a system for Persian text classification is presented. This system is able to improve the standards of accuracy, retrieval and total efficiency. To achieve this goal, in this system, after texts preprocessing and feature extraction, a new improved method of feature selection based on Particle Swarm Optimization algorithm (PSO) is innovated for reducing dimension of feature vector. Eventually, the classification methods are applied in the reduced feature vector. To evaluate feature selection methods in the proposed classification system, classifiers of support vector machine (SVM), Naive Bayes, K nearest neighbor (KNN) and Decision Tree are employed. Results of the tests obtained from the implementation of the proposed system on a set of Hamshahri texts indicated its improved precision, recall, and overall efficiency. Also, SVM classification method had better performance in this paper.


Feature vector; classification; support vector machines; Feature Extraction; Dimensions Reduction

Full Text:



. M .Aci, And , M . Avci,” A hybrid classification method of k nearest neighbor, Bayesian methods and genetic algorithm”,Elsevier, 2010,vol. 37, p.5061–5067.

. M. shamsfard ,”processing Persian text: past finding and future challenges”, Tehran universitypress, 2007.

. A.yoosofan and M. zolghadri,”an automatic method for stopword recognition in Persian language”, amirkabir university press, 2005.

. M.Aljaly and O.frieder,” improving the retrieval effectiveness via light stemming approach”, journal of information science,2004,vol. 158, pp. 69-88.

. A.Unler and A. MuratA,” maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification”,Elsevier,2011, No. 181, P. 4625–4641.

M.saleh,” list of dissertations on Persian language and computers”, Tehran university press, 2007.

M. shamsfard ,”processing Persian text: past finding and future challenges”, Tehran university press,2007.

. M. Litvakand and. M. last ,”classification of web documents using concept extraction from ontologies”,Proceedings of the 2nd international conference on Autonomous intelligent systems: agents and data mining, Russia, , 2007, pp. 287-292.

. V. Gupta, and S. lehal,” a survey of text mining technique and applications”,journal of emerging technologies in web intelligence,2009,vol.1, no.1.

. A. Sharma , Sh. Dey,” Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis”, International Journal of Computer Applications on Advanced Computing and Communication Technologies for HPC Applications - ACCTHPCA, June 2012.

Lululemon Black Friday cheap nfl jerseys Lululemon factory Outlet ny Black Friday discount tiffany outlet wholesale soccer jerseys online oakley black friday cheap nhl jerseys china cheap nfl jerseys north face black friday sale cheap nfl jerseys online Jordans Black Friday Sale 2015 Cheap Moncler Cyber Monday moncler outlet cheap soccer jerseys moncler outlet black friday cheap authentic nfl jerseys north face cyber monday Louboutin Black Friday canada wholesale cheap nfl jerseys lululemon cyber monday 2015 cheap nfl jerseys from china 2015 Cheap Moncler Black Friday Sale Moncler Cyber Monday 2015 cheap jerseys Lululemon Cyber Monday Sale jordans cyber monday deals 2015 cheap nike nfl jerseys Black Friday deals Lululemon 2015 jordan black friday 2015 Moncler Jackets Black Friday Sale 2015 Louboutin Pas Cher Black Friday 2015 Canada Lululemon north face black friday cheap wholesale soccer jerseys