Project Torngat: Building Large-Scale Semantic 'Maps of Science' with LuSql, Lucene, Semantic Vectors, R and Processing from Full-Text
Project Torngat is a research project here at NRC - CISTI [ Note that I am no longer at CISTI and that I am now continuing this work at Carleton University - GN 2010 04 07 ] that looks to use the full-text of journal articles to construct semantic journal maps for use in -- among other things -- projecting article search results onto the map to visualize the results and support interactive exploration and discovery of related articles, term and journals. Starting with 5.7 million full-text articles from 2200+ journals (mostly science, technology and medical (STM)), and using LuSql , Lucene , Semantic Vectors , R , and processing , a two dimensional mapping of a 512 dimension semantic space was created which revealed an excellent correspondence with the 23 human-created journal categories: Semantic Journal Space of 2231 Journals Scaled to Two Dimensions This initial work was initiated to find a technique that would scale, and follow-up work is looking at integrating t