Monday, December 22, 2008

Web (marketing) controlled experiments == No informed consent?

Kohavi et al[1] is an extremely useful survey and guide to controlled experiments on/using the web, told primarily from a marketing perspective. It introduces and describes various experimental methods, examines the technical and organization challenges of running controlled experiments, and delves into various issues of experimental design. It is - for the most part - an excellent resource for anyone wanting to do these kinds of web-based controlled experiments.

While I know this article is marketing-oriented, it is clear that some of the results from these experiments will be/have been published in peer-reviewed journals. Yet the authors make no mention of informed consent - even as an aside - in the entire article (and no mention of privacy or privacy issues either). Some of the experiments described or cited are not too different from those that might be done in social sciences or IT user interface research, where researchers are usually required to go through an ethics review process and invariable need to obtain informed consent from their subjects.

It seems that you just need to say it is for marketing and these issues all go away.

[1]R. Kohavi, R. Longbotham, D. Sommerfield & R. Henne. 2009. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery 18:1:140-181.

Tuesday, December 09, 2008

Open Standards and standards organizations

This report - from January 2008 - examines 10 "open" standards organizations and evaluates how "open" they are. It uses a methodology that maps directly into Krechmer's open standards requirements.
The organizations reviewed are:
  1. CEN (European Committee for Standardization)
  2. Ecma (European association for standardizing information and communication systems
  3. ETSI (European Telecommunications Standards Institute)
  4. IETF (Internet Engineering Task Force)
  5. ISO (International Organization for Standardization)
  6. ITU (International Telecommunication Union)
  7. NIST (National Institute of Technology and Standards)
  8. OASIS (Organization for the Advancement of Structured Information Standards)
  9. OMG (Object Management Group)
  10. W3C (World Wide Web Consortium)

Evaluation of Ten Standard Setting Orgizanizations with Regard to Open Standards

Abstract: On 2 June 2006, the Danish parliament (the Folketing) unanimously adopted
Parliamentary Resolution B103 on the use of open standards for software in the
public sector. The Resolution instructs the Government to ensure that the
public sector's use of information technology, including the use of software,
should be based on open standards. Therefore, the Danish National IT and
Telecom Agency (IT- og Telestyrelsen) has commissioned to IDC to evaluate the
degree of "openness" of the leading standard setting organizations.

See also:

Friday, December 05, 2008

Article: Canadian Federal Support for University Research Commercialization

Rasmussen[1] does a thorough examination of Canadian federal government programs and organizations supporting the commercialization of university research. This work is based on background research and interviews in January 2006 with 28 "...policy makers, program managers, policy researchers, university administrators, and program users..", including "A case description was written based on the collected material and later verified by several key people at Canadian agencies".

For those following federal university commercialization activities, this work is an excellent review of the recent state of these programs, activities and organizations.

It should be noted that this research is part of a larger and broader research effort[2] benchmarking commercialization of research in Canada, Finland, Ireland, the Netherlands, Scotland, and Sweden. The list of the Canadian interviewees can be found in this larger work (p.52).


"Compared to most countries, Canada has a long tradition of state involvement to promote the economic utilization of scientific research (Atkinson-Grosjean et al., 2001; Slaughter and Leslie, 1997). Moreover, Canada has an overwhelming number of programs at federal and provincial level that may be used to support the commercialization of research. Although using a very broad definition, one survey identified 178 initiatives that represented an expenditure of Canadian dollar (CAD) 3.2 billion a year (Gault and McDaniel, 2004)."

Efficiency of university commercialization:

"Clayman (2004) found that Canadian universities created considerably more spin-off companies than their US counterparts, counting the companies created per dollar of research."

R&D Expenditure: Private/public:

"Canada has a relatively modest level of R&D expenditure due to low investments in the private sector. Public R&D expenditure is, however, among the highest in the world. About one-third of all R&D activity is performed by Canada's close to 100 universities and university colleges (most by the top 20), roughly 12% by government institutes, and just above half by Canadian industry."

Diversity of IP Policy at Canadian universities:

"It is also important to note that Canadian universities have a diversity of approaches to IP ownership, IP strategies, and the organization of their technology transfer activities. For instance, in the city of Vancouver the University of British Columbia owns the IP, while at Simon Fraser University the IP is owned by the inventors. Among the 20 largest universities, the IP is owned by the creator (academics) in eight cases, in another eight cases the IP is university owned, and the remaining four have joint ownership or case-by-case negotiations."

Three Categories of Commercialization Initiatives:

"Federal level initiatives to support the commercialization of Canadian research could be divided into three agency areas. First, the federal research institutes such as NRC make their own internal priorities in supporting commercialization. Second, there are a number of targeted schemes from CIHR, NSERC, and SSHRC towards commercialization at universities. Third, general agencies such as the Industrial Research Assistance Program (NRC-IRAP) and the Business Development Bank of Canada (BDC) give considerable support to research-based spin-off firms. For instance, about half the Canadian university spin-offs have received IRAP funds, and 23 of 35 investments by BDC's Technology Seed Investments involved spin-offs from universities or federal labs according to officials in these organizations."

Approach: bottom-up

"Although all the initiatives investigated in this study are operated by government agencies, they seem to emphasize a bottom-up approach (Goldfarb and Henrekson, 2002). That is, to be flexible according to local needs and support with funding, expertise development, experimenting, and networks, in contrast to a top-down approach imposing a general set of policies and structures for the commercialization of research. As argued by Goldfarb and Henrekson, 2002, a bottom-up approach is a key explanation for the success at US universities in promoting commercialization of research, in contrast to the limited success of the top-down approach in Sweden."

Metrics: People and cooperation would be better?

"A final observation related to the commercialization of university research is that the use of quantitative measures (number of patents, licenses, spin-off firms, revenue generated, etc.) to measure the outcome of technology transfer activity is increasingly critiqued in Canada (Langford et al., 2006). It is recognized that the major channels for technology transfer are the transfer of people, especially graduated students, and research cooperation with existing industry, including faculty consulting. Hence, licensing and spin-offs account for only a small share of technology transfer from research institutions and their impact might be difficult to separate from the other technology transfer activity (Landry et al., 2007). Several Canadian officials expressed concern that a too narrow focus on short-term indicators could be misinterpreted and do more harm than good in order to achieve the potential for social and economic benefits from research."

Programs/Organizations/Activities examined in some detail in this article:

[1]Einar Rasmussen. 2008. Government instruments to support the commercialization of university research: Lessons from Canada. Technovation 28:8:506-517.

[2]Einar Rasmussen, Odd Jarl Borch, Roger Sørheim, Are Gjellan. 2006. Government initiatives to support the commercialization of research - an international benchmarking study.

Dislaimer/disclosure: I am employed by the National Research Council, some of whose activities are described in the above articles. This is a personal blog whose content is my own opinion and does not reflect the policies, views or opinions of the NRC or the Government of Canada.

Uncertainty Reasoning for the Semantic Web I

Uncertainty Reasoning for the Semantic Web I, ISWC International Workshops, URSW 2005-2007, Revised Selected and Invited Papers. DOI, Lecture Notes in Computer Science.
Of note:
  1. Towards Machine Learning on the Semantic Web.
    Author copy:
  2. Semantic Science: Ontologies, Data and Probabilistic Theories.
  3. Analogical Reasoning in Description Logics.
Table of Contents:

Thursday, November 27, 2008

The (near) Future of Research Articles

Rod Page's demo for his Elsevier Grand Challenge submission ("Towards realising Darwin’s dream: setting the trees free") shows the type of enrichment of biological - if not all research - articles that is quickly becoming possible. Taking a published article ("Mitochondrial paraphyly in a polymorphic poison frog species (Dendrobatidae; D. pumilio"), various additional biological, geographical and other metadata are extracted and added to a web page for the article. These include:
  • Map showing all localities mentioned in the paper, with their enclosing
  • List of other studies which have samples in area enclosed by the
    study polygon
  • Each of the following are linked through to their underlying
    databases (such as NIH accession number and NCBI nucleotide viewer
    or linked to ubio taxonomic name viewer record:
    • List of sequence features (such as genes) in the article
    • List of taxa sequenced in the article
    • List of gene sequences cited by the article
  • An image collage of all biological taxa (organisms) in article
  • List of studies on related organisms
You can see his whole vision in his submission, which shows some interesting visualizations, such as his Google Earth Phylogenies:

and his Treemaps of Taxa.

Monday, November 24, 2008

Lucene 2.3.1 vs 2.4 benchmarks using LuSql

I have been doing some indexing performance tests with LuSql, and have some numbers comparing Lucene 2.3.1 with 2.4.

Despite some discussion about 2.4 having poorer indexing performance, my tests with LuSql 0.9 suggest otherwise:

Lucene 2.3.1

Number of records added= 2000000
Optimizing index
Closing index
Optimizing index time: 311 seconds
Closing JDBC: result set
Closing JDBC: statement
Closing JDBC: connection
*********** Elapsed time: 854 seconds
15m 18s

Lucene 2.4

Number of records added= 2000000
Optimizing index
Closing index
Optimizing index time: 322 seconds
Closing JDBC: result set
Closing JDBC: statement
Closing JDBC: connection
*********** Elapsed time: 759 seconds
12m 39s
Index size: 3.7GB.

It is interesting that the overall indexing time is significantly less, but the optimizing time is slightly higher.

Data, hardware and system configuration: as per my previous Lucene benchmarking.

Note that this is a simple benchmark, so YMWV. This benchmark was done with the LuSql default number of threads for the hardware in question, 20.
MySQL version used: v5.0.45 compiled from source, concurrency=8.

Wednesday, November 19, 2008

Asian Digital Libraries 2008 Proceedings

Proceedings of the 11th International Conference on Asian Digital Libraries, ICADL 2008, Bali, Indonesia, December 2-5, 2008 are now available:

  • DL2Go: Editable Digital Libraries in the Pocket. Hyunyoung Kil, Wonhong Nam, Dongwon Lee.

  • Hierarchical Classification of Web Pages Using Support Vector Machine. Yi Wang, Zhiguo Gong.

  • The Prevalence and Use of Web 2.0 in Libraries. Alton Yeow Kuan Chua, Dion Hoe-Lian Goh, Chei Sian Lee.

  • Usability of Digital Repository Software: A Study of DSpace Installation and Configuration. Nils Körber, Hussein Suleman.

  • Developing a Traditional Mongolian Script Digital Library. Garmaabazar Khaltarkhuu, Akira Maeda.

  • Weighing the Usefulness of Social Tags for Content Discovery. Khasfariyati Razikin, Dion Hoe-Lian Goh, Chei Sian Lee, Alton Yeow Kuan Chua.

  • A User Reputation Model for DLDE Learning 2.0 Community. Fusheng Jin, Zhendong Niu, Quanxin Zhang, Haiyang Lang, Kai Qin.

  • Query Relaxation Based on Users Unconfidences on Query Terms and Web Knowledge Extraction. Yasufumi Kaneko, Satoshi Nakamura, Hiroaki Ohshima, Katsumi Tanaka.

  • A Query Language and Its Processing for Time-Series Document Clusters. Sophoin Khy, Yoshiharu Ishikawa, Hiroyuki Kitagawa.

  • Ontology Construction Based on Latent Topic Extraction in a Digital Library. Jian-hua Yeh, Naomi Yang.

  • Towards Intelligent and Adaptive Digital Library Services. Md Maruf Hasan, Ekawit Nantajeewarawat.

  • Searching for Illustrative Sentences for Multiword Expressions in a Research Paper Database. Hidetsugu Nanba, Satoshi Morishita.

  • Query Transformation by Visualizing and Utilizing Information about What Users Are or Are Not Searching. Taiga Yoshida, Satoshi Nakamura, Satoshi Oyama, Katsumi Tanaka.

  • Language Independent Word Spotting in Scanned Documents. Sargur N. Srihari, Gregory R. Ball.

  • Focused Page Rank in Scientific Papers Ranking. Mikalai Krapivin, Maurizio Marchese.

  • Scientific Journals, Overlays and Repositories: A Case of Costs and Sustainability Issues. Panayiota Polydoratou, Martin Moyle.

  • A Collaborative Filtering Algorithm Based on Global and Domain Authorities. Li Zhou, Yong Zhang, Chun-Xiao Xing.

  • Complex Data Transformations in Digital Libraries with Spatio-Temporal Information. Bruno Martins, Nuno Freire, Jose Borbinha.

  • Sentiment Classification of Movie Reviews Using Multiple Perspectives. Tun Thura Thet, Jin-Cheon Na, Christopher S. G. Khoo.

  • Scholarly Publishing in Australian Digital Libraries: An Overview. Bhojaraju Gunjal, Hao Shi, Shalini R. Urs.

  • Utilizing Semantic, Syntactic, and Question Category Information for Automated Digital Reference Services. Palakorn Achananuparp, Xiaohua Hu, Xiaohua Zhou, Xiaodan Zhang.

  • A Collaborative Approach to User Modeling for Personalized Content Recommendations. Heung-Nam Kim, Inay Ha, Seung-Hoon Lee, Geun-Sik Jo.

  • Using a Grid for Digital Preservation. José Barateiro, Gonçalo Antunes, Manuel Cabral, José Borbinha, Rodrigo Rodrigues.

  • A User-Oriented Approach to Scheduling Collection Building in Greenstone. Wendy Osborn, David Bainbridge, Ian H. Witten.

  • LORE: A Compound Object Authoring and Publishing Tool for the Australian Literature Studies Community. Anna Gerber, Jane Hunter.

  • Consolidation of References to Persons in Bibliographic Databases. Nuno Freire, José Borbinha, Bruno Martins.

  • On Visualizing Heterogeneous Semantic Networks from Multiple Data Sources. Maureen, Aixin Sun, Ee-Peng Lim, Anwitaman Datta, Kuiyu Chang.

  • Using Mutual Information Technique in Cross-Language Information Retrieval. Syandra Sari, Mirna Adriani.

  • Exploring User Experiences with Digital Library Services: A Focus Group Approach. Kaur Kiran, Diljit Singh.

  • Beyond the Client-Server Model: Self-contained Portable Digital Libraries. David Bainbridge, Steve Jones, Sam McIntosh, Ian H. Witten, Matt Jones.

  • New Era New Development: An Overview of Digital Libraries in China. Guohui Li, Michael Bailou Huang.

  • Browse&Read Picture Books in a Group on a Digital Table. Jia Liu, Keizo Sato, Makoto Nakashima, Tetsuro Ito.

  • Towards a Webpage-Based Bibliographic Manager. Dinh-Trung Dang, Yee Fan Tan, Min-Yen Kan.

  • Spacio-Temporal Analysis Using the Web Archive System Based on Ajax. Suguru Yoshioka, Masumi Morii, Shintaro Matsushima, Seiichi Tani.

  • Mining a Web2.0 Service for the Discovery of Semantically Similar Terms: A Case Study with Kwan Yi.

  • Looking for Entities in Bibliographic Records. Trond Aalberg, Maja Žumer.

  • Protecting Digital Library Collections with Collaborative Web Image Copy Detection. Jenq-Haur Wang, Hung-Chi Chang, Jen-Hao Hsiao.

  • Enhancing the Literature Review Using Author-Topic Profiling. Alisa Kongthon, Choochart Haruechaiyasak, Santipong Thaiprayoon.

  • Article Recommendation Based on a Topic Model for Wikipedia Selection for Schools. Choochart Haruechaiyasak, Chaianun Damrongrat.

  • On Developing Government Official Appointment and Dismissal Databank. Jyi-Shane Liu.

  • An Integrated Approach for Smart Digital Preservation System Based on Web Service. Chao Li, Ningning Ma, Chun-Xiao Xing, Airong Jiang.

  • Personalized Digital Library Framework Based on Service Oriented Architecture. Li Dong, Chun-Xiao Xing, Jin Lin, Kehong Wang.

  • Automatic Document Mapping and Relations Building Using Domain Ontology-Based Lexical Chains. Angrosh M.A., Shalini R. Urs.

  • A Paper Recommender for Scientific Literatures Based on Semantic Concept Similarity. Ming Zhang, Weichun Wang, Xiaoming Li.

  • Network of Scholarship: Uncovering the Structure of Digital Library Author Community. Monica Sharma, Shalini R. Urs.

  • Understanding Collection Understanding with Collage. Sally Jo Cunningham, Erin Bennett.

  • Person Specific Document Retrieval Using Face Biometrics. Vikram T.N, Shalini R. Urs, K. Chidananda Gowda.

  • The Potential of Collaborative Document Evaluation for Science. Joran Beel, Bela Gipp.

  • Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres. Robert B. Allen, Ilya Waldstein, Weizhong Zhu.

  • Arabic Manuscripts in a Digital Library Context. Sulieman Salem Alshuhri.

  • Discovering Early Europe in Australia: The Europa Inventa Resource Discovery Service. Toby Burrows.

  • Mapping the Question Answering Domain. Mohan John Blooma, Alton Yeow Kuan Chua, Dion Hoe-Lian Goh.

  • A Scavenger Grid for Intranet Indexing. Ndapandula Nakashole, Hussein Suleman.

  • A Study of Web Preservation for DMP, NDAP, and TELDAP, Taiwan. Shu-Ting Tsai, Kuan-Hua Huang.

  • Measuring Public Accessibility of Australian Government Web Pages. Yang Sok Kim, Byeong Ho Kang, Raymond Williams.

  • Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents. Rohini K. Srihari, Erik Peterson.

  • Current Approaches in Arabic IR: A Survey. Mohammed Mustafa, Hisham AbdAlla, Hussein Suleman.

  • A Bilingual Information Retrieval Thesaurus: Design and Value Addition with Online Lexical Tools. K. S. Raghavan, A. Neelameghan.

  • Entity-Based Classification of Web Page in Search Engine. Yicen Liu, Mingrong Liu, Liang Xiang, Qing Yang.

  • MobiTOP: Accessing Hierarchically Organized Georeferenced Multimedia Annotations. Thi Nhu Quynh Kim, Khasfariyati Razikin, Dion Hoe-Lian Goh, Quang Minh Nguyen, Yin Leng Theng, Ee-Peng Lim, Aixin Sun, Chew Hung Chang, Kalyani Chatterjea.

  • Social Tagging in Digital Archives. Shihn-Yuarn Chen, Yu-Ying Teng, Hao-Ren Ke.

  • Editor Networks and Making of a Science: A Social Network Analysis of Digital Libraries Journals. Monica Sharma, Shalini R. Urs.

  • Empowering Doctors through Information and Knowledge. Anjana Chattopadhyay.
  • Monday, November 17, 2008

    Software Announcement: LuSql: Database to Lucene indexing

    LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores.

    LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver.

    LuSql has been extensively tested, including a large 6+ million full-text & article metadata document collection, producing an 86GB Lucene index.

    I am the author of the LuSql software.

    Update 2008 11 17 14:16:
    Update 2008 11 17 22:00

    Wednesday, November 12, 2008

    New Book: Semantic Digital Libraries

    I am looking forward to getting a hold of this just announced book, Semantic Digital Libraries, Editors: Sebastian Ryszard Kruk, DERI NUI, Galway, Bill McDaniel, DERI NUI, Galway. Springer-Verlag, Heidelberg (DE) 2009, XVI, 246 p. 1 illus., Hardcover ISBN: 978-3-540-85433-3.

    The site for the book includes Tutorial on Semantic Digital Libraries, a tutorial presented at JCDL2008, as well as a faceted searchable interface to the (extensive and useful) links described in the book.

    • Introduction
    • Part I - Introduction to Digital Libraries and Semantic Web
      • Digital Libraries and Knowledge Organization
      • Semantic Web and Ontologies
      • Social Semantic Information Spaces
    • Part II - A Vision of Semantic Digital Libraries
      • Goals of Semantic Digital Libraries
      • Architecture of Semantic Digital Libraries
      • Long-time Preservation
    • Part III - Ontologies for Semantic Digital Libraries
      • Bibliographic Ontology
      • Community-aware Ontologies
    • Part IV - Prototypes of Semantic Digital Libraries
      • JeromeDL - the Social Semantic Digital Library
      • The BRICKS Digital Library Infrastructure
      • Semantics in Greenstone
    • Part V - Building the Future - Semantic Digital Libraries in Use
      • Hyperbooks
      • Semantic Digital Libraries for Archiving
      • Evaluation of Semantic and Social Technologies for Digital Libraries
      • Conclusions: The Future of Semantic Digital Libraries

    Monday, November 03, 2008

    Opportunistic Software Systems Development

    In the 25th anniversary issue (November/December 2008 (vol. 25 no. 6)) of IEEE Software, my NRC colleague Anatol Kark is part of the editorial team for the special issue on "Opportunistic Software Systems Development".

    These are all great articles, and I particularly like the Jansen et al article ("Pragmatic and Opportunistic Reuse in Innovative Start-up Companies") and feel that almost everyone who is trying to bring their organizationl IT into the 21st century should be forced to read the Gamble et al article ("Monoliths to Mashups: Increasing Opportunistic Assets").

    Wednesday, October 29, 2008

    "The Thistle Amongst the Lilies"

    I have to break from the usual content of this blog to point out to all the bagpipers who read this blog - there is at least one - my Montreal / Black Watch / 78th Fraser Highlanders friend Jeff McCarthy's new book of pipe music called "The Thistle Amongst the Lilies: A Collection of Original Compositions by Montreal Pipers For The Great Highland Bagpipes".

    I'm going to order one. You should too! :-)

    Tuesday, October 28, 2008

    Fantastic Viral Campaign

    The Pomegranate Phone has a great campaign: make sure you look at all of the features before checking out the release date.

    And yes, I do want one!

    Thursday, October 23, 2008

    Springer to acquire BioMed Central Group

    I just read this happened earlier this month (more at the BioMed Central Blog) via Peter Suber's Open Access News.

    I must admit I am rather surprised by this turn of events.

    Wednesday, October 22, 2008

    Ukraine law mandating open access to publicly funded research

    I have just discovered that the Ukraine passed a law[1] in January 2007
    mandating Open Access to publicly funded research. This was done after
    extensive consultation and lobbying[2,3]:

    • "Since January 2007 Ukraine has a law mandating open access to publicly funded researches.

    • It was widely supported by most of the Parliament members.

    • And it is already the second parliamentary inquiry mandating the
      Cabinet of Ministers to take actions on creating favorable conditions
      for developing open access repositories in archives, libraries,
      museums, scientific and research institutions with open access
      condition to state funded researches."

    [1]Law of Ukraine On the principles of developing information society in Ukraine (in Ukrainian).

    [2]Kuchma, I. 2007. Developing National Open Access Policies: Ukrainian Case Study. Proceedings ELPUB2007 Conference on Electronic Publishing. Vienna, Austria, June 2007.

    [3]Kuchma, I. 2008. Open Access, Equity, and Strong Economy in Developing and Transition Countries: Policy Perspective. Serials Review 34:1:13-20.

    [4]Kuchma, I.2008. Open Access in Ukraine: Cooperation with the policy makers. Open Access Repositories Workshop, Ahmadu Bello University, Zaria, Nigeria.

    Thursday, October 16, 2008

    Second Life Beagle trip

    Wow! This is so very cool:

    "To commemorate the 150th anniversary of Darwin’s On the Origin of Species by Means of Natural Selection, the University of Cincinnati has recreated the Galapagos Islands, where Darwin conducted some of his famous research, in Second Life. The project is part of the university’s 2009 Darwin Sesquicentennial Celebration.

    By January 2009, all avatars will be able to retrace Darwin’s steps — from his 1832 journey to South America aboard the Beagle to his tours of the islands — with the help of a wind-surfing tour guide. Archived audio and video clips, as well as live events, will be available in the Darwin Celebration Theater and Gallery."

    From: Darwin's Famous Journey Is Recreated in Second Life, The Wired Campus, October 16, 2008

    Friday, October 03, 2008

    Science (and life??) through (augmented reality) Semantic Goggles

    The FP6 CINeSPACE project, Experiencing Urban Film and Cultural Heritage project (article) creates an augmented reality by combining GIS information with semantic technology. The results are a location-/ orientation-aware binocular-like device which overlays multimedia information based on - among other things - what the user is looking at:

    This is a great concept and prototype, but could we take it a little further and generalize it? Say, to semantically enhanced reality goggles that allow you to select a particular semantic view of the world, including scientific and social views? Put them on and toggle the "Biological taxonomy" semantic view while you are walking through the rainforest and you have species names overlaid on your enhanced reality; identified poisonous plants and animals are marked with a bright red "Do not eat" and "Avoid", respectively; identified endangered species are marked with an "Endangered: Do not step-on / touch / damage / pick" and when when a dangerous human-eating animal is recognized close-up, perhaps your goggles would go opaque and flash "Don't Panic". ;-)

    Hop on a plane to Devon Island and switch to "Geology" and the goggles make use of its hyperspectral sensor to identify rock types as you hike, supplemented by geological ontologies of geological mapping information, as latitude/longitude second lines are projected across the landscape.

    And as you are hiking across the tundra, perhaps - in an Agent-like manner - the "Biological Taxonomy" interrupts when it detects the movements of a polar bear in the distance!

    Head back home and switch to the "Social Networking" view, and as you walk through the mall - through facial recognition and subsequent information gleaned from social networking sites and other human-oriented data sets and ontologies - the people walking by you are augmented by a colour coding based on some criteria you have defined, like "History of violence" or "SF likes long walks on the beach" or "Recently dissed your blog"...


    Saturday, September 27, 2008

    2006 vs. 2008: Doubling of Gold Open Access articles!

    As reported in Heather Morrison's blog. The Imaginary Journal of Poetic Economics.

    An author and a scientific publisher

    I am following with a sense of detached fascination UBC researcher Dr. Rosie Redfield's saga of her quest to get her CIHR funded article published in a scientific journal and also made available Open Access.

    While her blog is not one dedicated to Open Access but instead to her research ("Thinking about our research into the mechanism, function and evolution of DNA uptake by Haemophilus influenzae and other bacteria"), it is clear that she is spending more time and accruing more frustration dealing with this particular issue than she would want to be, or should be...

    Update: 2008 Nov 12: It seems that the Dr. Redfield has given up on Elsevier, and has decided to stop publishing articles with this publisher: "...I won't be submitting to any Elsevier journals in the future."

    October 14 2008: Open Access Day & PLoS 5th Birthday

    This Oct 14 is Open Access Day and the 5th publishing anniversary of the first PLoS journal, so the PloS and others are celebrating both with a number of events, T-shirts, buttons, blog competition, flyer, downloadable posters, bookmark, etc:

    PS. I wonder if they will be releasing the T-Shirt designs with a Creative Commons
    license, so anyone can print a T-shirt?

    Science Blogging Challenge: Get a Senior Scientist Blogging

    As reported on the Science Blogging 2008: London forum, this clever challenge was announced at the London Science Blogging Conference August 30 2008.


    Monday, September 22, 2008

    Job ad: Scientific Data Management Specialist

    The following excerpt from an ad for Scientific Data Management Specialist suggests it bodes well for the prospects of this (relatively) nascent profession:
    Processing, soliciting, and providing assistance with data submissions for scientific data from genome sequencing and genotyping experiments into existing databases , analysis pipelines and associated data flows. Developing and improving the infrastructure supporting these systems.

    Required Skills
    • Formal Education PhD
    • Scripting experience in perl or related language
    • Experience with SQL
    • Experience with LINUX/UNIX
    • Ability to use Microsoft Excel and related applications
    • Proven record solving related problems
    Desired Skills
    • Knowledge of genetics, especially human genetics
    • Experience with large data sets
    • XML/XSLT and related web based tools
    • Experience with array data, especially expression or genotyping data
    • C/C++
    • Experience with grid computing (LSF,SunGrid, etc.)
    • QA filtering of genotype data (HWE, non-Mendelian segregation)
    • Experience with the NCBI dbSNP or dbGaP databases
    • Experience with the NCBI Trace or GenBank databases

    Thursday, September 18, 2008

    Katta released: Lucene-on-the-Grid!

    I am excited at the release of Katta, a technology built on Lucene, Zookeeper and Hadoop allowing for Lucene indexes to be distributed across a number of nodes for distributed & fault tolerant search. Note that it does not create the indexes, simply deploys existing indexes onto this infrastructure.

    ECDL (European Conference on Digital Libraries) 2008 Proceedings Available

    Research and Advanced Technology for Digital Libraries 12th European Conference, ECDL 2008, Aarhus, Denmark, September 14-19, 2008.
    Lecture Notes in Computer Science. Volume 5173: Research and Advanced Technology for Digital Libraries. Ed. Birte Christensen-Dalsgaard, Donatella Castelli, Bolette Ammitzbøll Jurik, Joan Lippincott.
    Table of Contents:

    Wednesday, September 10, 2008

    Canadian Minister of Industry Accepts S&T Strategy's Sub-Priorities Recommended by the Science, Technology and Innovation Council

    The Industry Canada minister accepted (Sept 2 2008) recommendations from the Science, Technology and Innovation Council on the sub-priorities of the four priority areas identified in the 2007 Science and Technology (S&T) Strategy (The State of Science & Technology in Canada, 2006, Council of Canadian Academies). The recommended sub-areas are:
    • S&T priority: Environmental science and technologies
      Sub-priorities: Water (health, energy, security); cleaner methods of extracting, processing and using hydrocarbon fuels, including reduced consumption of these fuels

    • S&T priority: Natural resources and energy
      Sub-priorities: Energy production in the oil sands; Arctic (resource production, climate change adaptation, monitoring); biofuels, fuel cells and nuclear energy

    • S&T priority: Health and related life sciences and technologies
      Sub-priorities: Regenerative medicine; neuroscience; health in an aging population; biomedical engineering and medical technologies

    • S&T priority: Information and communications technologies
      Sub-priorities: New media, animation and games; wireless networks and services; broadband networks; telecom equipment


    Australian government innovation report Part II: "Innovation in Government"

    The previously reported "Review of the National Innovation System Report - Venturous Australia" -- interestingly and surprisingly -- includes a whole section entitled "Innovation in Government".

    Its recommendations are:
    • Recommendation 10.1: Consideration should be given to extending the platform created to enforce payments and administer income contingent loans through the tax system; for instance, by extending income contingent loans for tertiary education outside universities and for sole trader entrepreneurs seeking to fund innovative projects.

    • Recommendation 10.2: An advisory committee of web 2.0 practitioners should be established to propose and help steer governments as they experiment with web 2.0 technologies and ideas.

    • Recommendation 10.3 An Advocate for Government Innovation should be established to promote innovation in the public sector.

    • Recommendation 10.4: A rigorous policy of evaluating all Australian Government innovation programs ­ and other relevant programs ­ be established. In a way analogous to the requirement that new regulation cannot be implemented without adequate regulatory impact analysis, a policy should be adopted whereby new programs cannot be implemented without an adequate evaluation strategy and funding for evaluation including the collection of `base data' to evaluate the effects of the program.

    • Recommendation 10.5: Experimentation in innovative policy and administration should be a major theme of the current refashioning of federal relations. States and Territories should be able to bid for federal funds to pioneer innovative approaches and to have their innovations properly and independently evaluated. This could be taken up within the COAG National Partnership Rewards payments currently being negotiated.

    • Recommendation 10.6: The Australian Government should recognise its role as an active participant in facilitating innovation through procurement practices. In this context, the Government should: ·actively manage its ability to enable and demand innovation in procured services and products given its significant presence as a major purchaser; ·in procurement, be open to participating in risk sharing in relation to innovation demanded; ·explore the use of forward purchase commitmentsas a means of fostering more innovative approaches to government procurement; and ·work with the State and Territories to implement a pilot Small Business Innovation Contracting program based on the US SBIR design principles, to strengthen the growth of highly innovative firms in Australia. The Advocate for Government Innovation should operate as a source of expertise and seed funding for the resourcing of such approaches to procurement.

    • Recommendation 11.1: National innovation priorities as set out in this Review, be a focus of innovation policy and activities and the National Innovation Council be charged with ongoing evaluation of the alignment of public innovation policy with National Research and Innovation priorities.

    • Recommendation 12.1: The Prime Minister's Science, Engineering and Innovation Council should be replaced by a new National Innovation Council, chaired by the Prime Minister, and supported by a small but high level Office of Innovation. An International Innovation Advisory Panel would be formed to provide advice to the Council on international engagement.

    • Recommendation 12.2: To more effectively coordinate the innovation activities of public sector research agencies and to provide a source of coordinated advice to the National Innovation Council, a Research Coordination Council should be established.

    • Recommendation 12.3: The Minister for Innovation should be a joint signatory to any Cabinet proposals from across government significantly bearing on the nationalinnovation agenda, to ensure co-ordination.

    • Recommendation 12.4: Innovation Australia should be the single major agency responsible for delivering innovation program support for firms. Such programs would be delivered through the AusIndustry network.

    • Recommendation 12.5: The Australian Government and State and Territory governments should adopt a framework of principles for innovation interventions (as setout in this Review) to enhance consistency in approach across governments and improve the overall accessibility and efficiency of the suite of interventions.

    • Recommendation 12.6: That governments review the existing suite of programs and develop any new programs in the light of these principles. All program proposals should contain clear ex ante evaluation criteria, and provide for the provision or collection of relevant base line data before program implementation. Design principles and rules should be applied consistently. (See proposed design principles in Chapter 4 and Annex 4)

    • Recommendation 12.7: That senior government officials develop a collaborative mechanism to oversee the agreed approach and report periodically to relevant Australian Government and State and Territory ministers.

    • Recommendation 12.8: That common metrics, performance indicators and mechanisms for collecting and sharing data be developed and adopted by all jurisdictions.

    • Recommendation 12.9: That governments together develop a single mechanism (such as a web portal) for providing information to clients about access to the full range of Australian and State and Territory government innovation programs.

    • Recommendation 12.10: The ABS should be resourced to ensure the longevity and international consistency of innovation data collections and their availability to facilitate effective policy development. The National Innovation Council should advise where additional data collection is required to produce its Annual Statement on Innovation.

    • Recommendation 12.11: An Annual Statement on Innovation should be prepared by the National Innovation Council and incorporate a clear set of framework indicators. (An initial proposal for these indicators is set out in Annex 12).

    • Recommendation 12.12: The Australian Government, with the guidance of the National Innovation Council, should establish rigorous and consistent evaluation processes for innovation programs in line with the principle that the function should be carried out on an armslength and transparent basis.

    • Recommendation 12.13: A National Centre for Innovation Research should be established to advance knowledge of the innovation system through high quality, independent research which is strongly relevant to policy and practice.


    Australian innovation report recommends Open Access to research outputs, Creative Commons for government documents, open standards for publishing

    The Australian government has just released a report "Review of the National Innovation System Report - Venturous Australia". Given the similarities on size and nature of our economies, innovation, higher education and R&D environments, this report should be examined by Canadians interested in our own national innovation system.

    The Australian minister for Innovation, Industry, Science and Research (just having a ministry so named is a Good Thing!), Kim Carr spoke about this report in a speech released yesterday and talks about - among other interesting things for those interested in national innovation and R&D strategy - Creative Commons and Open Access to research outputs:
    It is embodied in a series of recommendations aimed at unlocking public information and content, including the results of publicly funded research.

    The review panel recommends making this material available under a creative commons licence through:
    • machine searchable repositories, especially for scientific papers and data

    • cultural agencies, collections and institutions, which would be funded to reflect their role in innovation

    • and the internet, where it would be freely available to the world.

    ...The arguments for stepping out first on open access are the same as the arguments for stepping out first on emissions trading – the more willing we are to show leadership on this, we more chance we have of persuading other countries to reciprocate.
    This speech reflects a number of recommendations in the report:
    • Recommendation 7.7: Australia should establish a National Information Strategy to optimise the flow of information in the Australian economy. The fundamental aim of a National Information Strategy should be to: ·utilise the principles of targeted transparency and the development of auditable standards to maximise the flow of information in private markets about product quality; and ·maximise the flow of government generated information, research, and content for the
      benefit of users (including private sector resellers of information).

    • Recommendation 7.8: Australian governments should adopt international standards of open publishing as far as possible. Material released for public information by Australian governments should be released under a creative commons licence.

    • Recommendation 7.9: Funding models and institutional mandates should recognise the research and innovation role and contributions of cultural agencies and institutions responsible for information repositories, physical collections or creative content and fund them accordingly.

    • Recommendation 7.10: A specific strategy for ensuring the scientific knowledge produced in Australia is placed in machine searchable repositories be developed and implemented using public funding agencies and universities as drivers.

    • Recommendation 7.11: Action should be taken to establish an agreed framework for the designation, funding models, and access frameworks for key collections in recognition of the national and international significance of many State and Territory collections (similar to the frameworks and accords developed around Australia's Major Performing Arts Companies).

    • Recommendation 7.14: To the maximum extent practicable, information, research and content funded by Australian governments ­ including national collections ­ should be made freely available over the internet as part of the global public commons. This should be done whilst the Australian Government encourages other countries to reciprocate by making their own contributions to the global digital pubic commons.

    Thursday, September 04, 2008

    "Big Data" Nature special issue

    The latest Nature -- Vol 455(7209), 4 September 2008 -- is a special issue on "Big Data". Articles (& editorial) include:

    • Editorial (2008). Community cleverness required Nature, 455 (7209), 1-1 DOI: 10.1038/455001a

    • David Goldston (2008). Big data: Data wrangling Nature, 455 (7209), 15-15 DOI: 10.1038/455015a

    • Cory Doctorow (2008). Big data: Welcome to the petacentre Nature, 455 (7209), 16-21 DOI: 10.1038/455016a

    • Mitch Waldrop (2008). Big data: Wikiomics Nature, 455 (7209), 22-25 DOI: 10.1038/455022a

    • Clifford Lynch (2008). Big data: How do your data grow? Nature, 455 (7209), 28-29 DOI: 10.1038/455028a

    • Sue Nelson (2008). Big data: The Harvard computers Nature, 455 (7209), 36-37 DOI: 10.1038/455036a

    • Doug Howe, Maria Costanzo, Petra Fey, Takashi Gojobori, Linda Hannick, Winston Hide, David P. Hill, Renate Kania, Mary Schaeffer, Susan St Pierre, Simon Twigger, Owen White, Seung Yon Rhee (2008). Big data: The future of biocuration Nature, 455 (7209), 47-50 DOI: 10.1038/455047a

    Wednesday, September 03, 2008

    Solar lamps turn on too early....

    This is a little off-topic, but I need to rant: I have several types of those inexpensive outdoor solar powered light thingies. You know, they come with a plastic spike to plant them in the ground or hang from the side of your house (Wikipedia calls them "Solar lamps"). Here, one of these:

    These are great things, but they turn on too early, when it is still too bright out, wasting energy. Right now, Sept 2ish, where I live (Ottawa area, Canada), they turn on about 45+ minutes before the sun goes down. Yes, you can see them, but these things are so low intensity that they are not useful until around when the sun goes down. But it means they are burning 45-60 minutes of power being on but being useless. As they often do not have enough juice to stay lit all night, this makes a difference.

    Adding a way of altering the light level causing these little things turn on would increase their price slightly (which is one possible solution), so instead of that it would be nice if the manufacturers slightly reduced the light level triggering these things when they come on. Either way, it would be nice if they indicated at what intensity in candela they turned on, so consumers could select the appropriate solar lamp which turned on at the right light level.