PeerJ responds to request from US Federal Government on challenge of reproducibility in science

This week, as part of the request from The Office of Science and Technology Policy and the National Economic Council for public comments to provide input into an upcoming update of the Strategy for American Innovation, PeerJ offered a response in answer to the following question:

Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the Federal Government leverage its role as a significant funder of scientific research to most effectively address the problem?”

Reproducibility is critical in science. Without it science is unable to flourish, and scientists are unable to build on the work of others.  Aristotle’s dictum that there is ‘no scientific knowledge of the individual’ seemingly holds true today, as much of the research published in the 21st Century is the result of building on, or testing the findings of others.

The term reproducible research refers to the idea that the ultimate product of academic research is the paper, along with the full computational environment used to produce the results in the paper such as the code and data (1). The full academic output can then be used to reproduce the results and create new work based on the research. Alongside reproducibility lies repeatability – the idea that anyone in the same lab can repeat the same experiment using the same methods and specimens. For science to flourish it is imperative that reproducibility and repeatability become the cornerstones.

Science can only advance on the foundation of the trusted discoveries of others.  But like any good building project there is a financial cost to laying these foundations. Scientific research is often funded by governments, and other associated funding bodies, all looking to ensure their money is spent optimally. For instance, some recent research published on reproducibility in the field of cancer studies at the MD Anderson Cancer Center (2) points to the statistic that only 41.5% - 45.4% of scientific outputs were actually reproducible by those surveyed. Other research in this area suggests alarmingly lower figures still of 11% (3).

The US government gives around $30 billion every year in science funding through the NIH (4) which is mainly distributed in research grants to academic scientists. If you were to take the lowest reproducibility rate of 11% that potentially could mean up to 89% of this money (over $26 billion) is wasted. As a tax paying member of the general public you would want to ensure that the government is able to plough your hard earned capital into funds that yield results over and above those figures. It is therefore commendable that the Federal Government is looking to address this issue and leverage it’s role as a significant funder of scientific research.

Beyond the practicalities of finance, there is also an interesting dilemma. Since the middle of the 20th century, life science research concepts and technologies have rapidly grown from the discovery that DNA is the blueprint for life to sequencing and synthesizing new life altogether. Technologies like microarrays, mass spectrometry, high-throughput assays and imaging have been developed, making biology a data-rich science. With all these new tools you could reasonably expect that science would become more rigorous and precise, but with the reproducibility crisis it appears that something entirely opposite could be happening.

So how do we ensure that scientists are provided with the right conditions for their work to be reproducible?

The current state of affairs results from a combination of the complex nature of modern scientific research, a lack of accountability for researchers, and the incentives created by a publish-or-perish culture in academia.

For a scientific researcher to disseminate their work they are hugely reliant on scientific publishers. The publishing of scientific research has always had a large part to play in the visibility of research, and ultimately the reproducibility of science.  At PeerJ we believe that the more scientific outputs are made available to all, the better it is for science. We would therefore encourage the Federal Government to put more resource into enforcing open access mandates, to ensure scientific research is opened up to all.  

PeerJ publishes articles using a Creative Commons CC-BY licence, which means that authors retain their own copyright, while at the same time others can freely copy and reuse the articles without needing to ask for further permission. If a publisher asks an author to sign over copyright then it becomes difficult, expensive, or impossible for others to access the research. Just as we don’t believe in paywalls blocking access to research, nor do we believe in authors being unable to retain full ownership of their work. By being fully CC-BY, authors and readers don’t need to worry about sharing or reusing articles, so everyone benefits and ultimately science flourishes. The challenge facing those authors who do wish to publish through open access licensing is the proliferation of choice. Choice of licence can be a good thing, but only if there is interoperability in these licences. For instance there is not one common standard among OA licences, and the recently released STM OA licences don’t necessarily operate alongside Creative Commons licences (5). We recommend the move towards everyone using one specific interoperable OA licence.

Scientific journals have a significant role to play in encouraging reproducibility in the first place. They can require more descriptive materials and methods sections and provide unlimited space for them, so that other scientists will know exactly how an experiment was conducted and how they can replicate it. At PeerJ we encourage authors to submit relevant data during the review process, and we would encourage the Federal Government to ensure that more scientific publishers are asking their authors to do so when they submit their work to journals. The current incentive structure for authors does not reward the publication of replication studies. At PeerJ we not only encourage this for our authors, but most importantly our publishing platform enables authors to do just that. We recommend that the Federal Government, and all funders, set aside financial commitment for the replication and publication of the work they fund. We also suggest that the Federal Government looks to set up a specific program incentivising authors to make their data, trackable identifiers, and materials available with publication.

We believe in an open and transparent peer review process. Journals need specialized reviewers to ensure that manuscripts for technically or statistically advanced experiments are vetted thoroughly prior to publication. PeerJ harnesses the talent of thousands of reviewers able to bring their scientific expertise to bear on assessing the science behind the article. But unique to other scientific publishers we encourage our peer reviewers to provide their name as part of their review; and we also give our authors the option to publish the full peer review history of their article alongside the published version. We are hopeful that as more and more journals allow this, and as more and more authors and reviewers experience it, it will become a standard feature of all journals. Ultimately, the reason for doing this is to improve the process of review and publication and to provide fresh new insights for readers. We would ask that the Federal Government consider encouraging and rewarding those publishers which practice some form of open peer-review.

Authors should also be in a position to publish more negative results – those in which an experiment had no effect or clear outcome – because the lack of a finding can sometimes be as important as a finding itself. As technology enables cloud-based storage of all data and file types we encourage authors to openly share their negative results through open data platforms in order that others may learn from the outcomes of their experiments.  We would ask that the Federal Government supports those researchers in making their negative data openly available to the world, perhaps by making the reporting of negative (as well as positive) results a requirement of funding.

Scientists are in the privileged position of being able to shape the world for the benefit of mankind, nature and our planet’s future. As outlined, reproducibility and repeatability are the cornerstones for building on scientific discovery and making breakthroughs that help make the world a better place. Without it scientists can’t learn from the work of others, or indeed ensure their own work leaves a legacy to others. It is up to the publishers of scientific research to ensure we do everything we can to provide the best ecosystem for this. It is up to our governments to foster the right environment for that to happen, and reward those who contribute to engendering this.


Save the date: participative Bay Area OA week event for Generation Open


Join us as we join forces with ScienceOpen, ZappyLab and My Science Work (and others to be announced) to celebrate OA with a participative event at Open Access Week in the Bay Area.

A core group initiated by Liz Allen (ScienceOpen), including Lenny Teytelman (ZappyLab), Laurence Bianchini (My Science Work), Peter Binfield and Georgina Gurnhill (PeerJ), brainstormed what our ideal OA week event would look like.

We agreed that we wanted to avoid a traditional format and so we settled on:

- Moderated un-conference where the audience talks and asks questions

- Simple event theme, we picked “#OpenAccess – it’s up to all of us

- “Lightning talks” that anyone can give, 5 image slides in 5 minutes

- Time to chat and mingle over a drink and something to eat

- Ideally, a cool venue with great views (and Disabled Access)

As a group we feel that the program below achieves our vision. We ran it past our academic partners UCSF Library (Anneliese Taylor), UAW post-doc union (Felicia Goldsmith) and they liked it too. Now all we need is for you all to save the date and make this an event to remember.

Date: Thursday, October 23rd, 2014

Venue: SkyDeck, Berkeley (one of the first research university startup accelerators)

Time: 6.00 pm – 8.30 pm

Theme: #OpenAccess – it’s up to all of us

Format (suitable for global cloning!):

  1. 8 mins – relax with a drink, a snack and “What is OA?” video by Jorge Cham (PhD Comics), Nick Shockey (Right to Research) and Jonathan Eisen (UCD)
  2. 10 minutes – un-conference OA topic selection by audience
  3. 20 minutes – topic discussion with moderation (your host for the evening, Lenny!)
  4. 10 minutes – grab another drink (alcoholic or non), stave off hunger with nibbles
  5. 40 minutes – lightning talks, “#OpenAccess – it’s up to all of us”
  6. Last 30 minutes or so – greeting old friends and making some new ones

In the coming weeks, we’ll be letting you know where to send your lightning talks and the deadline for doing so. We will be recording them for social media too - so anyone not able to attend in person can listen in. Register now at our Eventbrite page.

Finally, in the spirit of “the more the merrier” other OA Publishers and Academic Partners who want to participate are welcome to email Liz.  


Is today’s science where the software industry was in the 70s?

Today there are literally billions of apps that you can download to your smartphone anywhere in the world and start using in seconds. Imagine a totally different world though, one where in order to use an App you had to buy a new phone each time or spend weeks developing it yourself. This was more or less the state of the computing world up until the 1980s. 

There’s a recent blog post called “The Myth of the Fall" regarding open source software. In it the argument is made that up to the 70s/early 80s it was all proprietary software, and that’s because the differing architectures made it impossible for a software program to be compatible with more than one computer - what we call "portability." It was as if in order to use the same software you first had to first translate it into the specific language that a particular computer could "speak." Reusing programs, one of the key benefits to open source, just wasn’t practical. Progress was cumbersome.

It took decades for dedicated individuals to engineer the hardware architectures needed for software portability and just as long for communities to adopt the common standards to make it possible. Once that was established, then open source software such as the Unix operating system could flourish and now the world literally runs on open source - and you can download countless apps to your phone. (Some nice parallels to the billions of years of evolution it took to get past single-cell organisms and into the multi-cellular explosion once it did happen)

Could it be that our science today, and in particular scholarly publishing, is where we were at with software in the 1970s? Software had to battle with incompatible computer hardware in the 70s, and in science we have cultural practices and business models that limit the reuse and dissemination of scholarly knowledge. Paywall publishers continue to lobby against Open Access, much like the 1970s IBM tried to convince us that personal computers weren’t needed - until Apple showed us what could be done with them. The result is that we aren’t making as much progress as we might otherwise achieve. Academic publishing today is like the non-portable software we had 30 years ago. That’s slowly changing though.

The Open Access movement today is to scientific progress as Open Source software was to the information technology explosion that began in the 1980s. And as common hardware architectures greased the wheels for open source to thrive, so do today’s emerging reproducibility and data availability initiatives hint towards a budding scientific explosion. Government and funding agency mandates for Open Access are also greasing those wheels, so that science is more “portable” and accessible to a larger audience.

Our mission at PeerJ is to efficiently publish the world’s knowledge, so that we can tackle this century’s greatest challenges. It’s going to take a modern update to how we think about doing science and publishing to achieve this. Because of that mission we don’t view PeerJ as a publisher in the traditional sense and you’ll continue to see us marching to the beat of a 21st century drum. Our job is to grease the wheels of science by getting scholarly knowledge (of any size) out there as far and as cheap as possible - with a few carrots thrown in for motivation. The software industry is a model we can all take a good look at and have confidence in for what is possible when “Open” is embraced. 


Interview with an Author - Todd Vision

This week is Open Access week, and we thought it would be interesting to talk with Todd Vision, the senior author on our recent publication “Data reuse and the open data citation advantage”. This article, was originally submitted as our first ever PeerJ PrePrint and the PeerJ version was published three weeks ago and has subsequently attracted quite a bit of interest in the scientific community. image

Todd Vision is in the Department of Biology at the University of North Carolina at Chapel Hill and has been the Associate Director for Informatics at NESCent (the National Evolutionary Synthesis Center) since 2006. His research spans evolutionary genetics, computational biology, data science and scholarly communication. He is a co-founder of Dryad, a widely used repository for data underlying biological and medical publications.

PJ: Tell us a bit about the research you published with us, and what is the ‘Take Home Message’ of your article?

This is part of a larger effort to gather evidence on the costs and benefits of making research data accessible and reusable, particular for the data producers themselves. We focused one of the potential motivators for data producers, namely how much additional credit they receive in the form of increased article citations for depositing the data reported the article to a public repository.

One take home message is there is a nontrivial benefit to data producers for making data openly available, and it is increasing over time. Another is that the reuse by others of openly available gene expression data accounts for well more than half of its total usage in the literature, and that this proportion is also on the increase. The implications for policy makers are, we think, pretty self-evident.

PJ: What challenges did you face while doing this research?

The biggest challenge was simply getting machine access to the literature, both to query the citation data and to mine the full text of articles for data accession numbers. At the time we conducted this study, the only option for querying citation data with a list of PubMed IDs was Scopus. Unfortunately, this wasn’t available to us through our institutions, and Elsevier declined to provide us individual access despite our willingness to pay. Heather [Piwowar, first author of the article] tried to use the British Library’s walk-in access during a trip overseas, but the restrictions imposed by the library were not designed for the digital world, and made the exercise impractical. It would have required her to manually type in ten thousand PubMed identifiers one by one. She eventually obtained access to Scopus through an arrangement with Canada’s National Research Library, but even that had its Kafka-esque elements, since she needed to be fingerprinted to obtain a police clearance certificate first. Once she had access, getting the data out of Scopus was very laborious, because she compiled these citation data before Elsevier made the current API available. A study of the scientific literature on this scale is difficult to pull off without the ability to automate the search, either with an API or access to the source data.

Another challenge is that we didn’t (and still don’t) have access to mine the full text of all the scientific articles that might be mentioning these datasets. For that reason, the second part of the study was restricted to the subset of articles available from PubMedCentral. This means we had to extrapolate our estimates, or rely on minimum estimates, for a number of the core results. Even though our university library pays for a subscription that enables humans to read these articles, there is a layer of legal fog that prevents academic researchers from writing software that reads the articles. At one point, multiple Elsevier executives were on the phone with Heather to discuss granting her full-text access to the articles they publish. But the legal negotiations were too slow to be of much help for this study, and at any rate it was only one publisher. There are precious few academics with the time, determination and expertise to negotiate bilateral agreements with all the relevant publishers, and to do it afresh every time there is some new study that requires full text access.

So, we are very happy that publishing our own article as Open Access in PeerJ isn’t contributing another brick in this access wall for future researchers.

PJ: So far, did you get any comments from colleagues about the results you have published with us?

We have been getting an encouraging stream of attention and feedback on this work since it was posted as a PeerJ Preprint in April. In fact, I suspect that the availability of the preprint is the reason the article has managed to start receiving citations already, even though it has been published for less than three weeks.

PJ: PeerJ encourages Authors to make their review comments visible. Why did you choose to reproduce the complete peer-review history of your article?

A lot of expert time goes into reviews, and often the dialog between the authors, the reviewers and the editor adds valuable context that does not get fully surfaced in the paper. I find that writing reviews knowing that they will be made public motivates me to be as constructive as I can. If reviewers wish to stay anonymous, that is still an option - and note that one of the two did in this case. Furthermore, if reviewers wish to make sensitive comments, they always have the option of sharing those privately with the editor. Actually, I’m not sure anymore what purpose is served by having the content of the reviews kept secret by default!

PJ: You received some great media coverage for your article. How was that process? Did the fact that we are an Open Access publisher help with exposure at all?

It’s a fairly involved paper, with lots of different quantitative analyses, including 11 figures and tables. Distilling that down to a few key points for a wider audience has been an interesting and fun challenge, and it has shifted my own thinking about which results are most important and why.

Some of the pieces I have seen were clearly based on the PeerJ press release, but in others you can tell that the reporter went to get additional material from the article itself. It stands to reason that reporters are more likely to do that for an Open Access article like this one.

PJ: With our new Q&A feature, you’ve already had the chance to answer a few questions about your paper. Could you comment on that?

We had one questioner, who asked both ‘did you think of this?’ and ‘where can I learn more about this?’ kinds of questions. Since we, as authors, don’t have an infallible crystal ball that lets us know where readers will want to delve deeper, I think it’s great to allow readers and authors to continue the dialog after publication. It feels like being in a very dispersed, asynchronous journal club, except everyone involved is really interested and has actually read the paper. And the authors can respond if someone thinks they’ve found a fatal flaw! The way that has been implemented in the website, marked in the margin for the relevant section of the manuscript, is very nice.

PJ: Anything else you would like to talk about?

One interesting piece of backstory is the source of the introductory paragraph. Heather felt that she opened the introduction well in a paper she published on the same topic in 2007, and was disinclined to reword the same ideas just for the sake of it. The rationale is that since the introduction section is supposed to restate ideas from the literature anyway, there’s not much point in putting them in new, and potentially inferior, words. So we convinced ourselves to just include the passage verbatim. But then it wasn’t clear whether or how to attribute the passage in order to avoid concerns of self-plagiarism. In the end, we simply stated the source in the acknowledgements. That should be noncontroversial, since these were, after all, one of the author’s own words and the original paper was Open Access. But it did cause some mild unease during review, so we appreciate that PeerJ allowed us, as authors, to make the final call.

Another interesting aspect of the paper is that it was written under version control in a way that all the analyses in the paper, including tables and figures, can be updated just by modifying and replacing the data and recompiling the thing. We used Knitr with embedded R code and data, and had all the files versioned on GitHub. After it was published we put the final snapshot into Dryad, so anyone interested in reusing the data or analysis is free to do so. We’d be delighted if others end up use our source files as a template for figuring out how to write their own reproducible papers.

 PJ: Thank you for your time. We are pleased to have published this paper, which clearly contributes important new information in the open access debate.

If you would like to experience the future of publishing for yourself, then submit now to PeerJ.


Alleged deletion of comments on new Elsevier video - is that innovating?

A new video showing on YouTube that was produced by Elsevier aims to promote how “Open” most of it journals really are. It’s actually a decent explainer video about the basics of Open Access and how many of Elsevier’s journals meet Open Access requirements from funding bodies. A few details of course were left out, for example failing to explain what hybrid journals are and the controversy of “double dipping”, or the fact that not all Elsevier journals offer the same CC licensing, etc. 

The main concern, as shown in this Twitter conversation, is that Elsevier is selective in which comments it allows on its blogs, videos, etc. As pointed out in that thread by John Wilbanks, who was previously at Creative Commons, one should either turn off comments altogether, or allow all of them (save for obvious spam/abuse of course).  


Now, there is no universal law that states a company must abide by the commenting “netiquette” described by Wilbanks. What the alleged removal of comments (in this case mentioning other Open Access options besides Elsevier) suggests however, is that there is a disconnect between what some companies say and what they do. You cannot proclaim to be innovative, but not understand or follow the norms of the Internet, which is to say allowing comments fair and square, even if it means competing journals are mentioned.

Removing comments is a prudent short-term business tactic, but it won’t help in the long-term. It suggests a lack of innovation and desire to have a real conversation for improving Open Access options available to researchers.