Multilingual extraction and editing of concept strings for the legal domain

Andrea Varga, Andrew N. Edmonds


Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an important NLP task, as it allows web search engines to define and perform semantic queries over large collection of documents. Existing web search engines in the legal domain are mainly limited to keyword search, in which the query word is matched against the textual content of the documents. This paper presents a novel framework named the Concept Strings Framework that makes use of CSs for representing the content of the documents, and for allowing semantic search over them. These CSs can consist of individual knowledge base (KB) concepts (e.g. WordNet concepts) or combination of them. In addition, this paper presents an interactive web-based toolkit, called the Template Editor that enables the creation, editing and evaluation of CSs. Experiments on two publicly available legislation websites show satisfactory results.


Semantic Search; Concept Strings; Knowledge Base; WordNet

Full Text:



A. Edmonds. Using concept structures for efficient document comparison and location. In Proceedings of IEEE Symposium on Computational Intelligence and Data Mining, 2007.

C. Soria, R. Bartolini, A. Lenci, S. Montemagni, and V. Pirrelli. Automatic extraction of semantics in law documents. In Proceedings of the V Legislative XML Workshop, 2007.

R. Bartolini, A. Lenci, S. Montemagni, V. Pirrelli, and C. Soria. Automatic classification and analysis of provisions in Italian legal texts: a case study. In Proceedings of OTM Confederated International Conferences, 2004.

L. Dini, W. Peters, D. Liebwald, E. Schweighofer, L. Mommers, and W. Voermans. Cross-lingual legal information retrieval using a WordNet architecture. In Proceedings of the 10th international conference on Artificial intelligence and law, 2005.

E. Schweighofer, and A. Geist. Legal query expansion using ontologies and relevance feedback. In Proceedings of the 2nd Workshop on Legal Ontologies and Artificial Intelligence Techniques, 2007.

G. A. Miller. Wordnet: A lexical database for english. Commun. ACM, 1995.

F. Bond, and K. Paik. A survey of wordnets and their licenses. In Proceedings of the 6th Global WordNet Conference, 2012.

F. Bond, and R. Foster. Linking and extending an open multilingual wordnet. In Proceedings of the ACL. Association for Computational Linguistics, 2013.

W. Black, S. Elkateb, and P. Vossen. Introducing the arabic wordnet project. In Proceedings of the third International WordNet Conference, 2006.

A. F. Montraveta, G. Vazquez, and C. Fellbaum. The spanish version of wordnet 3.0. In Text Resources and Lexical Knowledge, 2008.

B. Sagot, and D. Fier. Building a free French wordnet from multilingual resources. In Ontolex, 2008.

B. Hamp, and H. Feldweg. Germanet - a lexical-semantic net for german. In Proceedings of ACL workshop Automatic IE and Building of Lexical Semantic Resources for NLP Applications, 1997.

V. dePaiva, and A. Rademaker. Revisiting a brazilian wordnet. In Proceedings of Global Wordnet Conference. Global Wordnet Association, 2012.

Lululemon Black Friday cheap nfl jerseys Lululemon factory Outlet ny Black Friday discount tiffany outlet wholesale soccer jerseys online oakley black friday cheap nhl jerseys china cheap nfl jerseys north face black friday sale cheap nfl jerseys online Jordans Black Friday Sale 2015 Cheap Moncler Cyber Monday moncler outlet cheap soccer jerseys moncler outlet black friday cheap authentic nfl jerseys north face cyber monday Louboutin Black Friday canada wholesale cheap nfl jerseys lululemon cyber monday 2015 cheap nfl jerseys from china 2015 Cheap Moncler Black Friday Sale Moncler Cyber Monday 2015 cheap jerseys Lululemon Cyber Monday Sale jordans cyber monday deals 2015 cheap nike nfl jerseys Black Friday deals Lululemon 2015 jordan black friday 2015 Moncler Jackets Black Friday Sale 2015 Louboutin Pas Cher Black Friday 2015 Canada Lululemon north face black friday cheap wholesale soccer jerseys