Saturday, September 27, 2008

2006 vs. 2008: Doubling of Gold Open Access articles!

As reported in Heather Morrison's blog. The Imaginary Journal of Poetic Economics.

An author and a scientific publisher

I am following with a sense of detached fascination UBC researcher Dr. Rosie Redfield's saga of her quest to get her CIHR funded article published in a scientific journal and also made available Open Access.

While her blog is not one dedicated to Open Access but instead to her research ("Thinking about our research into the mechanism, function and evolution of DNA uptake by Haemophilus influenzae and other bacteria"), it is clear that she is spending more time and accruing more frustration dealing with this particular issue than she would want to be, or should be...

Update: 2008 Nov 12: It seems that the Dr. Redfield has given up on Elsevier, and has decided to stop publishing articles with this publisher: "...I won't be submitting to any Elsevier journals in the future."

October 14 2008: Open Access Day & PLoS 5th Birthday

This Oct 14 is Open Access Day and the 5th publishing anniversary of the first PLoS journal, so the PloS and others are celebrating both with a number of events, T-shirts, buttons, blog competition, flyer, downloadable posters, bookmark, etc:

PS. I wonder if they will be releasing the T-Shirt designs with a Creative Commons
license, so anyone can print a T-shirt?

Science Blogging Challenge: Get a Senior Scientist Blogging

As reported on the Science Blogging 2008: London forum, this clever challenge was announced at the London Science Blogging Conference August 30 2008.


Monday, September 22, 2008

Job ad: Scientific Data Management Specialist

The following excerpt from an ad for Scientific Data Management Specialist suggests it bodes well for the prospects of this (relatively) nascent profession:
Processing, soliciting, and providing assistance with data submissions for scientific data from genome sequencing and genotyping experiments into existing databases , analysis pipelines and associated data flows. Developing and improving the infrastructure supporting these systems.

Required Skills
  • Formal Education PhD
  • Scripting experience in perl or related language
  • Experience with SQL
  • Experience with LINUX/UNIX
  • Ability to use Microsoft Excel and related applications
  • Proven record solving related problems
Desired Skills
  • Knowledge of genetics, especially human genetics
  • Experience with large data sets
  • XML/XSLT and related web based tools
  • Experience with array data, especially expression or genotyping data
  • C/C++
  • Experience with grid computing (LSF,SunGrid, etc.)
  • QA filtering of genotype data (HWE, non-Mendelian segregation)
  • Experience with the NCBI dbSNP or dbGaP databases
  • Experience with the NCBI Trace or GenBank databases

Thursday, September 18, 2008

Katta released: Lucene-on-the-Grid!

I am excited at the release of Katta, a technology built on Lucene, Zookeeper and Hadoop allowing for Lucene indexes to be distributed across a number of nodes for distributed & fault tolerant search. Note that it does not create the indexes, simply deploys existing indexes onto this infrastructure.

ECDL (European Conference on Digital Libraries) 2008 Proceedings Available

Research and Advanced Technology for Digital Libraries 12th European Conference, ECDL 2008, Aarhus, Denmark, September 14-19, 2008.
Lecture Notes in Computer Science. Volume 5173: Research and Advanced Technology for Digital Libraries. Ed. Birte Christensen-Dalsgaard, Donatella Castelli, Bolette Ammitzbøll Jurik, Joan Lippincott.
Table of Contents:

Wednesday, September 10, 2008

Canadian Minister of Industry Accepts S&T Strategy's Sub-Priorities Recommended by the Science, Technology and Innovation Council

The Industry Canada minister accepted (Sept 2 2008) recommendations from the Science, Technology and Innovation Council on the sub-priorities of the four priority areas identified in the 2007 Science and Technology (S&T) Strategy (The State of Science & Technology in Canada, 2006, Council of Canadian Academies). The recommended sub-areas are:
  • S&T priority: Environmental science and technologies
    Sub-priorities: Water (health, energy, security); cleaner methods of extracting, processing and using hydrocarbon fuels, including reduced consumption of these fuels

  • S&T priority: Natural resources and energy
    Sub-priorities: Energy production in the oil sands; Arctic (resource production, climate change adaptation, monitoring); biofuels, fuel cells and nuclear energy

  • S&T priority: Health and related life sciences and technologies
    Sub-priorities: Regenerative medicine; neuroscience; health in an aging population; biomedical engineering and medical technologies

  • S&T priority: Information and communications technologies
    Sub-priorities: New media, animation and games; wireless networks and services; broadband networks; telecom equipment


Australian government innovation report Part II: "Innovation in Government"

The previously reported "Review of the National Innovation System Report - Venturous Australia" -- interestingly and surprisingly -- includes a whole section entitled "Innovation in Government".

Its recommendations are:
  • Recommendation 10.1: Consideration should be given to extending the platform created to enforce payments and administer income contingent loans through the tax system; for instance, by extending income contingent loans for tertiary education outside universities and for sole trader entrepreneurs seeking to fund innovative projects.

  • Recommendation 10.2: An advisory committee of web 2.0 practitioners should be established to propose and help steer governments as they experiment with web 2.0 technologies and ideas.

  • Recommendation 10.3 An Advocate for Government Innovation should be established to promote innovation in the public sector.

  • Recommendation 10.4: A rigorous policy of evaluating all Australian Government innovation programs ­ and other relevant programs ­ be established. In a way analogous to the requirement that new regulation cannot be implemented without adequate regulatory impact analysis, a policy should be adopted whereby new programs cannot be implemented without an adequate evaluation strategy and funding for evaluation including the collection of `base data' to evaluate the effects of the program.

  • Recommendation 10.5: Experimentation in innovative policy and administration should be a major theme of the current refashioning of federal relations. States and Territories should be able to bid for federal funds to pioneer innovative approaches and to have their innovations properly and independently evaluated. This could be taken up within the COAG National Partnership Rewards payments currently being negotiated.

  • Recommendation 10.6: The Australian Government should recognise its role as an active participant in facilitating innovation through procurement practices. In this context, the Government should: ·actively manage its ability to enable and demand innovation in procured services and products given its significant presence as a major purchaser; ·in procurement, be open to participating in risk sharing in relation to innovation demanded; ·explore the use of forward purchase commitmentsas a means of fostering more innovative approaches to government procurement; and ·work with the State and Territories to implement a pilot Small Business Innovation Contracting program based on the US SBIR design principles, to strengthen the growth of highly innovative firms in Australia. The Advocate for Government Innovation should operate as a source of expertise and seed funding for the resourcing of such approaches to procurement.

  • Recommendation 11.1: National innovation priorities as set out in this Review, be a focus of innovation policy and activities and the National Innovation Council be charged with ongoing evaluation of the alignment of public innovation policy with National Research and Innovation priorities.

  • Recommendation 12.1: The Prime Minister's Science, Engineering and Innovation Council should be replaced by a new National Innovation Council, chaired by the Prime Minister, and supported by a small but high level Office of Innovation. An International Innovation Advisory Panel would be formed to provide advice to the Council on international engagement.

  • Recommendation 12.2: To more effectively coordinate the innovation activities of public sector research agencies and to provide a source of coordinated advice to the National Innovation Council, a Research Coordination Council should be established.

  • Recommendation 12.3: The Minister for Innovation should be a joint signatory to any Cabinet proposals from across government significantly bearing on the nationalinnovation agenda, to ensure co-ordination.

  • Recommendation 12.4: Innovation Australia should be the single major agency responsible for delivering innovation program support for firms. Such programs would be delivered through the AusIndustry network.

  • Recommendation 12.5: The Australian Government and State and Territory governments should adopt a framework of principles for innovation interventions (as setout in this Review) to enhance consistency in approach across governments and improve the overall accessibility and efficiency of the suite of interventions.

  • Recommendation 12.6: That governments review the existing suite of programs and develop any new programs in the light of these principles. All program proposals should contain clear ex ante evaluation criteria, and provide for the provision or collection of relevant base line data before program implementation. Design principles and rules should be applied consistently. (See proposed design principles in Chapter 4 and Annex 4)

  • Recommendation 12.7: That senior government officials develop a collaborative mechanism to oversee the agreed approach and report periodically to relevant Australian Government and State and Territory ministers.

  • Recommendation 12.8: That common metrics, performance indicators and mechanisms for collecting and sharing data be developed and adopted by all jurisdictions.

  • Recommendation 12.9: That governments together develop a single mechanism (such as a web portal) for providing information to clients about access to the full range of Australian and State and Territory government innovation programs.

  • Recommendation 12.10: The ABS should be resourced to ensure the longevity and international consistency of innovation data collections and their availability to facilitate effective policy development. The National Innovation Council should advise where additional data collection is required to produce its Annual Statement on Innovation.

  • Recommendation 12.11: An Annual Statement on Innovation should be prepared by the National Innovation Council and incorporate a clear set of framework indicators. (An initial proposal for these indicators is set out in Annex 12).

  • Recommendation 12.12: The Australian Government, with the guidance of the National Innovation Council, should establish rigorous and consistent evaluation processes for innovation programs in line with the principle that the function should be carried out on an armslength and transparent basis.

  • Recommendation 12.13: A National Centre for Innovation Research should be established to advance knowledge of the innovation system through high quality, independent research which is strongly relevant to policy and practice.


Australian innovation report recommends Open Access to research outputs, Creative Commons for government documents, open standards for publishing

The Australian government has just released a report "Review of the National Innovation System Report - Venturous Australia". Given the similarities on size and nature of our economies, innovation, higher education and R&D environments, this report should be examined by Canadians interested in our own national innovation system.

The Australian minister for Innovation, Industry, Science and Research (just having a ministry so named is a Good Thing!), Kim Carr spoke about this report in a speech released yesterday and talks about - among other interesting things for those interested in national innovation and R&D strategy - Creative Commons and Open Access to research outputs:
It is embodied in a series of recommendations aimed at unlocking public information and content, including the results of publicly funded research.

The review panel recommends making this material available under a creative commons licence through:
  • machine searchable repositories, especially for scientific papers and data

  • cultural agencies, collections and institutions, which would be funded to reflect their role in innovation

  • and the internet, where it would be freely available to the world.

...The arguments for stepping out first on open access are the same as the arguments for stepping out first on emissions trading – the more willing we are to show leadership on this, we more chance we have of persuading other countries to reciprocate.
This speech reflects a number of recommendations in the report:
  • Recommendation 7.7: Australia should establish a National Information Strategy to optimise the flow of information in the Australian economy. The fundamental aim of a National Information Strategy should be to: ·utilise the principles of targeted transparency and the development of auditable standards to maximise the flow of information in private markets about product quality; and ·maximise the flow of government generated information, research, and content for the
    benefit of users (including private sector resellers of information).

  • Recommendation 7.8: Australian governments should adopt international standards of open publishing as far as possible. Material released for public information by Australian governments should be released under a creative commons licence.

  • Recommendation 7.9: Funding models and institutional mandates should recognise the research and innovation role and contributions of cultural agencies and institutions responsible for information repositories, physical collections or creative content and fund them accordingly.

  • Recommendation 7.10: A specific strategy for ensuring the scientific knowledge produced in Australia is placed in machine searchable repositories be developed and implemented using public funding agencies and universities as drivers.

  • Recommendation 7.11: Action should be taken to establish an agreed framework for the designation, funding models, and access frameworks for key collections in recognition of the national and international significance of many State and Territory collections (similar to the frameworks and accords developed around Australia's Major Performing Arts Companies).

  • Recommendation 7.14: To the maximum extent practicable, information, research and content funded by Australian governments ­ including national collections ­ should be made freely available over the internet as part of the global public commons. This should be done whilst the Australian Government encourages other countries to reciprocate by making their own contributions to the global digital pubic commons.

Thursday, September 04, 2008

"Big Data" Nature special issue

The latest Nature -- Vol 455(7209), 4 September 2008 -- is a special issue on "Big Data". Articles (& editorial) include:

  • Editorial (2008). Community cleverness required Nature, 455 (7209), 1-1 DOI: 10.1038/455001a

  • David Goldston (2008). Big data: Data wrangling Nature, 455 (7209), 15-15 DOI: 10.1038/455015a

  • Cory Doctorow (2008). Big data: Welcome to the petacentre Nature, 455 (7209), 16-21 DOI: 10.1038/455016a

  • Mitch Waldrop (2008). Big data: Wikiomics Nature, 455 (7209), 22-25 DOI: 10.1038/455022a

  • Clifford Lynch (2008). Big data: How do your data grow? Nature, 455 (7209), 28-29 DOI: 10.1038/455028a

  • Sue Nelson (2008). Big data: The Harvard computers Nature, 455 (7209), 36-37 DOI: 10.1038/455036a

  • Doug Howe, Maria Costanzo, Petra Fey, Takashi Gojobori, Linda Hannick, Winston Hide, David P. Hill, Renate Kania, Mary Schaeffer, Susan St Pierre, Simon Twigger, Owen White, Seung Yon Rhee (2008). Big data: The future of biocuration Nature, 455 (7209), 47-50 DOI: 10.1038/455047a

Wednesday, September 03, 2008

Solar lamps turn on too early....

This is a little off-topic, but I need to rant: I have several types of those inexpensive outdoor solar powered light thingies. You know, they come with a plastic spike to plant them in the ground or hang from the side of your house (Wikipedia calls them "Solar lamps"). Here, one of these:

These are great things, but they turn on too early, when it is still too bright out, wasting energy. Right now, Sept 2ish, where I live (Ottawa area, Canada), they turn on about 45+ minutes before the sun goes down. Yes, you can see them, but these things are so low intensity that they are not useful until around when the sun goes down. But it means they are burning 45-60 minutes of power being on but being useless. As they often do not have enough juice to stay lit all night, this makes a difference.

Adding a way of altering the light level causing these little things turn on would increase their price slightly (which is one possible solution), so instead of that it would be nice if the manufacturers slightly reduced the light level triggering these things when they come on. Either way, it would be nice if they indicated at what intensity in candela they turned on, so consumers could select the appropriate solar lamp which turned on at the right light level.

"Benefits of Data Sharing for Academic Health Centers"

In the September issue of PLOS Medicine, Piwowar et al. [1] examine the benefits and offer recommendations to encourage data sharing at academic health centers (AHC):
  1. Commit to sharing research data as openly as possible, given privacy constraints. Streamline IRB (Institutional Review Board), technology transfer, and information technology policies and procedures accordingly.

  2. Recognize data sharing contributions in hiring and promotion decisions, perhaps as a bonus to a publication's impact factor. Use concrete metrics when available.

  3. Educate trainees and current investigators on responsible data sharing and reuse practices through class work, mentorship, and professional development. Promote a framework for deciding upon appropriate data sharing mechanisms.

  4. Encourage data sharing practices as part of publication policies. Lobby for explicit and enforceable policies in journal and conference instructions, to both authors and peer reviewers.

  5. Encourage data sharing plans as part of funding policies. Lobby for appropriate data sharing requirements by funders, and recommend that they assess a proposal's data sharing plan as part of its scientific contribution.

  6. Fund the costs of data sharing, support for repositories, adoption of sharing infrastructure and metrics, and research into best practices through federal grants and AHC funds.

  7. Publish experiences in data sharing to facilitate the exchange of best practices.

The article also has an excellent table looking at "selected attributes of example data sharing frameworks and systems".

[1]Heather A. Piwowar, Michael J. Becich, Howard Bilofsky, Rebecca S. Crowley (2008). Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers PLoS Medicine, 5 (9) DOI: 10.1371/journal.pmed.0050183

Tuesday, September 02, 2008

Machine Learning: Ten Challenges for the Next Ten Years

In the October 2008 special issue on inductive logic programming in Machine
, Dietterich et al [1] lay-out the following ten outstanding problems for the next ten years:
  1. Statistical predicate invention
  2. Generalizing across domains
  3. Learning many levels of structure
  4. Deep combination of learning and inference
  5. Learning to map between representations
  6. Learning in the large
  7. Structured prediction with intractable inference
  8. Reinforcement learning with structured time
  9. Expanding SRL (Statistical Relational Learning) to statistical relational AI
  10. Learning to debug programs

[1] Thomas G. Dietterich, Pedro Domingos, Lise Getoor, Stephen Muggleton, Prasad Tadepalli (2008). Structured machine learning: the next ten years Machine Learning, 73 (1), 3-23 DOI: 10.1007/s10994-008-5079-1

Research Data Archiving = Volumes of data? Conference...

One of the implications of research data archiving is that there will likely be large datasets on a variety of subjects. How third parties will analyse these data in a scalable fashion has not entirely been addressed. But the recent conference examines some of the issues for a class of data, images and signals: Advances in Mass Data Analysis of Images and Signals in Medicine, Biotechnology, Chemistry and Food Industry (Third International Conference, MDA 2008 Leipzig, Germany, July 14, 2008) and includes:
  • Burcu Yılmaz, Mehmet Göktürk, Natalie Shvets (2008). User Assisted Substructure Extraction in Molecular Data Mining. Advances in Mass Data Analysis of Images and Signals in Medicine, Biotechnology, Chemistry and Food Industry, 5108, 12-26 DOI: 10.1007/978-3-540-70715-8_4

  • Franco Chiarugi, Sara Colantonio, Dimitra Emmanouilidou, Davide Moroni, Ovidio Salvetti (2008). Biomedical Signal and Image Processing for Decision Support in Heart Failure. Advances in Mass Data Analysis of Images and Signals in Medicine, Biotechnology, Chemistry and Food Industry, 5108, 38-51 DOI: 10.1007/978-3-540-70715-8_4

  • Radu Dobrescu, Loretta Ichim (2008). Automatic Data Acquisition and Signal Processing in the Field of Virology. Advances in Mass Data Analysis of Images and Signals in Medicine, Biotechnology, Chemistry and Food Industry, 5108, 52-61 DOI: 10.1007/978-3-540-70715-8_5

Previous conference: Advances in Mass Data Analysis of Signals and Images in Medicine, Biotechnology and Chemistry International Conference MDA 2006/2007, Leipzig, Germany, July 18, 2007