Discrepancies Detection in Arabic and English Documents

Abdulwahed Almarimi, Gabriela Andrejková


In the paper, there are analyzed and compared results of usable methods for discrepancies detection based on character n-gram profiles (the set of character n-gram normalized frequencies of a text) for English and Arabic documents. English and Arabic texts were analyzed from many statistical characteristics point of view. We covered some statistical differences between both languages and we applied some heuristics for measurements of text parts dissimilarities. The results for each text can call for an attention to the text (or not) if the text parts were written by the same author. We evaluate some Arabic and English documents and show its parts they contain discrepancies and they need some following analysis for plagiarism detection. The analysis depends on selected parameters prepared in experiments.


n-grams; stylistic measure; plagiarism; authorship

Full Text:


Lululemon Black Friday cheap nfl jerseys Lululemon factory Outlet ny Black Friday discount tiffany outlet wholesale soccer jerseys online oakley black friday cheap nhl jerseys china cheap nfl jerseys north face black friday sale cheap nfl jerseys online Jordans Black Friday Sale 2015 Cheap Moncler Cyber Monday moncler outlet cheap soccer jerseys moncler outlet black friday cheap authentic nfl jerseys north face cyber monday Louboutin Black Friday canada wholesale cheap nfl jerseys lululemon cyber monday 2015 cheap nfl jerseys from china 2015 Cheap Moncler Black Friday Sale Moncler Cyber Monday 2015 cheap jerseys Lululemon Cyber Monday Sale jordans cyber monday deals 2015 cheap nike nfl jerseys Black Friday deals Lululemon 2015 jordan black friday 2015 Moncler Jackets Black Friday Sale 2015 Louboutin Pas Cher Black Friday 2015 Canada Lululemon north face black friday cheap wholesale soccer jerseys