Machine-aided skim reading and life science publications - Author Interview

Today we published a method for machine-aided skim reading, which outperforms tools like PubMed in terms of focused browsing and informativeness of the browsing context. We were very interested in hearing more about this tool so we invited the first author Vit Novacek to comment on his research and his experience publishing with us.

PJ: Can you tell us a bit about yourself?

imageVN: I’m currently a senior post-doctoral researcher and project leader at Insight Centre for Data Analytics, National University of Ireland Galway. After I was awarded Ph.D. in computer science in 2011, I got involved in setting up a large industrial collaboration between my university and Fujitsu Laboratories in UK and Japan. Since 2012 I mostly work on that project, leading several work packages and helping to coordinate a group of up to 10 Fujitsu-funded researchers located in Galway. My work revolves around knowledge discovery in unstructured and semi-structured data, using methods building on and extending the theoretical arsenal of distributional semantics and graph analysis. The main application domain of my research has been life sciences (e.g., knowledge discovery in biomedical publications and ontologies), but recently I’ve got involved also in some enlightening experiments with financial and public administration data. When I’m not busy messing around with computers, I try to balance the sedentary nature of my profession by playing competitive volleyball, and also freediving or surfing along the Irish west coast whenever the moody North Atlantic is kind enough to let me in, and out :)

PJ: Can you briefly explain the research you published in PeerJ?

VN: The main goal of the research is to help life scientists to make more sense of all the articles being published at an ever-increasing rate in the field. Since it is harder and harder to keep up with the latest research even in relatively limited sub-domains of life sciences, we need technical solutions that will make `drilling’ for relevant pieces of knowledge more comprehensive and time-efficient. SKIMMR, the methodology and prototype presented in our article, presents one such solution. We extracted significant co-occurrence and similarity relationships from various corpora of biomedical abstracts on PubMed, and enabled users to conveniently navigate the networks of the relationships. This way one can `skim’ through the high-level conceptual structure of the domain covered by the corresponding corpus of abstracts. The skimming takes much less time then actual reading, and can reveal interesting relationships between particular topics in literature that may not be covered by any single abstract. Once an interesting area in the conceptual graph is discovered, publications associated with it can easily be retrieved and further manually examined. This potentially saves a lot of time spent reading irrelevant articles, and also can lead to serendipitous discoveries that could be easily missed with the classical publication search tools.

PJ: Do you have any anecdotes about this research?

VN: One of the most memorable and encouraging things happened in a relatively early stage of the research, during my stay at Information Sciences Institute, University of Southern California. Me and Gully, the co-author of the SKIMMR article whose group I was visiting at that time, were presenting a very preliminary prototype of the SKIMMR methodology to Maryann Martone, a professor of neuroscience at the University of California San Diego. Maryann helped us to set up an experimental corpus of articles on Spinal Muscular Atrophy (SMA) using her experience from working with SMA Foundation. We demonstrated an SMA instance of SKIMMR to her, letting her navigate the conceptual networks at will. After only about two minutes, she stumbled upon a few relationships that indicated experiments with LIX1 gene in feline models of the SMA disease. This was new and valuable knowledge to her, so she started to explore the related articles quite enthusiastically to learn more. Witnessing that made us very excited – maybe we just got lucky, but it was a clear indication of the potential of our research even in its very early stages.

PJ: What surprised you the most with these results?

VN: I can’t say anything surprised me so that I’d be like, wow, I didn’t really expect this at all. But one thing that was perhaps a little counter-intuitive for me was how far one can go with very simple representation underlying SKIMMR. We experimented with more types of relationships, based on the syntactic structure of the texts being explored and some sophisticated background knowledge, but eventually we decided to go with two simple ones, co-occurrence and similarity. Yet even such lightweight and relatively abstract knowledge representation model can support complex knowledge acquisition, as I’m learning whenever I use SKIMMR myself. I’m not a biomedical scientist by training, but SKIMMR helped me to learn quite a lot about Spinal Muscular Atrophy and Parkinson’s Disease (our testing domains) by now. The process is perhaps a little serendipitous (or chaotic if you want to call it that way), but very time-efficient, especially in the beginning when one does not have a clue about the domain at all.

PJ: What kinds of lessons do you hope the public takes away from the research?

VN: I’d like people to picture the SKIMMR methodology and prototype as something that can soon help them with their research a lot on a daily basis if they give us a hand in adjusting the prototype to their specific needs. I think pretty much everyone acknowledges the fact that we can’t get much further with traditional publication search engines if we want to be able to make sense of all relevant texts out there. On the other hand, one cannot expect the machines to read everything on our behalf and present us with concise and complex summaries of the content, at least not in near future. We need domain experts actively working with AI researchers to develop tools that can establish an efficient man-machine collaboration, letting computers to sieve through the oceans of information and allowing humans to be creative when interpreting whatever relatively simple connections have been discovered automatically. This is obviously a rather abstract vision that involves many specific research questions, and SKIMMR is just a small step towards realizing a truly functional man-machine intelligence. Yet if our preliminary results managed to make few more people think about how could they contribute towards realizing that ultimate goal, we’d be very happy.

PJ: Where do you hope to go from here?

VN: We’re currently working on methods for inferring more complex, taxonomic and domain specific relationships from the basic co-occurrence and similarity, and once that’s ready, we can reflect it in much more flexible and comprehensive way of navigating through the conceptual networks in SKIMMR. We are also working on using the conceptual networks to compute links between the actual articles that would explain the nature of the connection via simple semantic annotations. This has high potential for helping biomedical researchers by complementing services like PubMed’s retrieval of related articles.

PJ: If you had unlimited resources (money, lab equipment, trained personnel, participants, etc.), what study would you run?

VN: We have been running user studies to evaluate the SKIMMR instances `in the wild’, but the data we have collected so far do not allow for too representative statistical analysis. Therefore we needed to devise an alternative, simulation-based method of evaluation. If we had an army of users working with SKIMMR in several different domains on a daily basis, we could compare the findings from the simulations with actual user behavior and learn many more things about both our method and the data processed by it.

PJ: Why did you choose to reproduce the complete peer-review history of your article?

VN: It is an interesting feature not so frequently seen elsewhere. I’ve always been for maximum openness and accountability in science, and making reviews available (and ideally also non-anonymous) is an important step in this direction. The full revision history may be quite helpful for other authors, as they could learn from our mistakes and imperfections. Last but not least, setting up such an open review process in the publishing industry in general could make the peer review process much more thorough and fair in a long run.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

VN: My co-author met some PeerJ representatives at a workshop and became very enthusiastic about the whole PeerJ business model and revolutionary attitude. This kind of infected me, and when we were thinking about where to publish the results of our joint work, we realized that we should support PeerJ by trying to publish in it as well.

PJ: Do you have any anecdotes about your overall experience with us? Anything surprising?

VN: The biggest (and very positive) surprise definitely was the quick, yet still quite thorough and factual review feedback. I didn’t really expect having so much to process during my Christmas break after submitting the first version of the article in early December.

PJ: How would you describe your experience of our submission/review process?

VN: The submission wizard is perhaps a bit confusing and lengthy at first, but that may be just me, coming from computer science field where the meta-data required for publications is pretty minimalistic when compared to the tradition in life sciences. Anyway, once I got used to the system, it was OK as the web interface is quite clean and intuitive. The reviewing process felt very smooth and informative, with thorough and helpful reviews and sensitive supervision of the whole process by the Academic Editor. Whatever issues we had during the submission, the PeerJ team was ready to resolve them quickly, so the whole process felt very smooth and hassle-free.

PJ: Did you get any comments from your colleagues about your publication with PeerJ?

People I talked to were pretty interested in the publishing model and generally like it, they only regret something like that doesn’t exist in the `hard core’ computer science field (most people I work with at my institute don’t do research that could be published in PeerJ).

PJ: Would you submit again, and would you recommend that your colleagues submit?

Yes, definitely.

PJ: In conclusion, how would you describe PeerJ in three words?

Disruptive, Supportive, Professional

PJ: Many thanks for your time!

Join Vit Novacek and thousands of other satisfied authors, and submit your next article to PeerJ. And until the end of August, if you engage with PeerJ articles or preprints, then you can publish for free!


Classification of bird sounds – Author Interview

Today we published an article which combines feature-learning – an automatic analysis technique - and a classification algorithm, to create a system that can detect which bird species are present in a large dataset. This automatic large-scale classification could be useful for expert and amateur bird-watchers alike.

We were very interested in hearing more from Dan Stowell about this successful way of identifying bird sounds from large audio collections.

PJ: Can you tell us a bit about yourself?

DS: I’m a research fellow at QMUL in London, and I’m working on applying machine learning techniques to analyse bird sounds. I develop techniques to answer questions such as “What species of bird?”, “How many birds?”, “Are they calling to each other, or ignoring each other?” just by automatically analyzing the audio content.

It’s a fascinating topic because bird vocalizations have so much rich structure – you can tell just by listening – and we are a long way from understanding all of that structure. So I’m developing tools that can help us analyze these sounds. On the one hand I make use of what we know about bird sounds, and on the other hand these tools will enable us to find out more about bird sounds by analyzing large amounts of sound recordings.


PJ: Can you briefly explain the research you published in PeerJ?

DS: The research is about automatically classifying bird species from a sound recording. Simple concept: you have a sound recording, but you’re no bird expert, so you want the machine to tell you which species are present. Or maybe you’re a bird expert but you have thousands of sound recordings because you run a sound archive or an ecological monitoring project. Either way, it’s valuable to have some automated way to work out which species are present in each recording. So we apply “machine learning” to learn from labeled examples and generalize to unlabeled examples.

People have published research on species classification since at least 1997, but often it’s been on small datasets – for example a personal collection covering ten or twenty species. In real outdoor recordings there are hundreds of possible bird species. And the more species you have to choose between, the harder the task becomes.

The specific contribution of this paper is to apply a technique called “unsupervised feature learning” which can dramatically improve classification performance by automatically finding a high-dimensional transformation of the audio data. We put this together with a modern classification algorithm (“random forest”) to create a bird sound classifier that performs very well even on a very big dataset, thousands of recordings covering more than 500 species in Brazil.

PJ: What surprised you the most with these results?

DS: The really tricky thing that we found is that you get very different results on small datasets and on big datasets. So, imagine for example that you’re developing a new method, you test it on twenty recordings, just as a quick test so you can decide whether or not to apply it to a million recordings. And I’m not talking about statistical significance here: let’s assume that twenty is enough to find a significant difference between two techniques. The real killer is that the results from twenty might point the other way from the results you’d get from a million. They might seem to show your method wouldn’t work, when in fact it would! The reason for this is that some techniques, in particular this unsupervised feature learning, really get their strength from large datasets. You can see it really clearly in our results, where the new technique seems a bit pointless on the smallest dataset, but then the benefit becomes really clear, as the datasets get bigger.

Another surprise was that we applied our technique, which in a sense learns little “jigsaw pieces” that go to make up a bird sound, and then later on we discovered that neuroscience research had found similar jigsaw pieces represented in the sensitivities of bird auditory neurons. I’m not claiming that our system does the same thing as a bird’s hearing system – it’s not designed to – but it’s a hint that we’re doing something right.

PJ: Where do you hope to go from here?

DS: The next topic I’m working on is how to get more information out than just a species label. I’m working on techniques that can transcribe all the bird sounds in an audio scene: not just who is talking, but when, in response to whom, and what relationships are reflected in the sound (e.g. dominance, pair-bonding). To pull all this information out of unlabeled sound, we need to apply more maths!

PJ: Why did you choose to reproduce the complete peer-review history of your article?

DS: I think it’s a great idea to publish peer reviews. Often they’re the most focused expert feedback you’ll get on your work, and it’s good to make the most of that expertise by letting future readers see how they reacted to the paper. I still think anonymity is important in peer review (to reduce the risk of biased judgments) so I think PeerJ’s approach here is a good one.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

DS: I heard of PeerJ from my biologist colleagues, and I noticed quite a few of them publishing there, so I asked around. The speed of the publishing process was a definite plus for me, as well as it being properly open-access, and the very readable way articles are formatted online.

PJ: How would you describe your experience of our submission/review process?

DS: Smooth and efficient, and yes it was fast compared against all my previous experience with journals.

PJ: Did you get any comments from your colleagues about your publication with PeerJ?

DS: I’m based in an Electronic Engineering / Computer Science department and most of my colleagues hadn’t heard of PeerJ. It seems that the biological sciences are doing pretty well for modern open-access journals, and it’d be nice for EE/CS to catch up.

PJ: Anything else you would like to talk about?

DS: I should mention one thing that made this “big data” bird sound research possible. There are various audio archives in the world, but for machine learning research we need public datasets, and preferably open-licensed datasets.

One of the sources we used was Xeno Canto, which crowd sources many thousands of recordings under open licenses. Also a recent French research project called SABIOD, which has performed a massive service for the community by creating research challenges - public challenges to get the best score on tasks such as bird species sound classification. Challenges are a great way to set benchmarks for specific tasks, really understand where the state of the art is, and stretch it a little bit further. They (SABIOD) coordinated a challenge using the Brazil data that I mentioned, and I’m pleased to say our system was the best-performing audio-only classifier that was entered in the challenge.

PJ: Many thanks for your time!

DS: And thank you!

Join Dan Stowell and thousands of other satisfied authors, and submit your next article to PeerJ. And until the end of August, if you engage with PeerJ articles or preprints, then you can publish for free!


Proceedings of GNOME 2014 - a new PeerJ collection



Earlier this month, PeerJ was pleased to launch the Proceedings of GNOME 2014 — Festschrift for Gaston Gonnet’ PeerJ collection. This Collection consists of contributions to a Festschrift on the occasion of the retirement of Prof. Gaston H. Gonnet.

Prof. Gaston H Gonnet has made seminal contributions to at least three fields. He was a pioneer of symbolic computation as founder and early developer of the Maple computer algebra system. He contributed to the computerization of the Oxford English Dictionary and in the process contributed several influential text searching algorithms. In bioinformatics, he performed some of the earliest large-scale analyses of molecular sequences and helps establish fundamental methods for protein identification by mass profile fingerprinting. ‘GNOME’ therefore stands for “Gonnet is Not Only about Molecular Evolution”!

To mark the launch of the Collection, we asked Christophe Dessimoz, co-organizer of the symposium, to write a guest blog post explaining some of the motivation for this collection.

Dr. Dessimoz’s guest blog post is below:

“On 4 July 2014, the retirement symposium of Prof. Gaston H. Gonnet took place at ETH Zurich, Switzerland. We were delighted to publish the proceedings Festschrift for Gaston Gonnet’ as a PeerJ collection.

All submissions were first considered as PeerJ PrePrints. Because two contributions also happened to fall within PeerJ’s remit (original research papers in bioinformatics and computational science papers with a biology component), they were additionally submitted for peer-review. This process is still ongoing; provided they meet PeerJ's criteria, they will be published as PeerJ articles.

To provide a keepsake for attendees, we also produced a hardcopy, which can be obtained as print on demand.

Overall, partnering with PeerJ provided us organizers with an effective, lean, and inexpensive way of doing the proceedings.”

imageThe symposium’s group photo

Here at PeerJ, we believe we have the perfect publishing solution for your next Conference Proceedings, Symposium, or Research Consortia! So if you have a collection of articles, and need a high quality, cost effective, publication solution then check out this post for more information, and explore our PeerJ Collections at https://peerj.com/collections/.


Combining scientific work and photography - Author Interview

You certainly couldn’t miss the amazing photo of a Reef Manta Ray on our homepage a few weeks ago. Dr. Simon Pierce—marine biologist and photographer—took this picture, along with many others, whilst doing his own research. We were very interested in hearing more about his scientific work and his passion for photography.


Simon Pierce with whale shark

PJ: Can you tell us a bit about yourself and your life as a marine biologist?

SP: Sure! I grew up in New Zealand and studied ecology for my undergraduate degree.  After I learnt to dive in my final year, I decided to move over to Australia to work on rays and sharks for my honors and PhD. My doctoral work focused on the conservation and management of inshore elasmobranchs.

As I was finishing my fieldwork, in 2005, I was invited to start a whale shark research project in Mozambique. Never having been to Africa before, or indeed having seen a whale shark, this was somewhat challenging. Fun though. In the years since I’ve had the opportunity to study whale sharks in multiple countries. I’m also supervising a PhD project on sea turtle conservation in Mozambique, and collaborating on global manta ray studies.

Professionally, I’m a Principal Scientist at the Marine Megafauna Foundation (which I co-founded with manta ray researcher Andrea Marshall), as well as a Director at Wild Me and Science Coordinator of one of their flagship projects, the global whale shark database. I’m also a member of the IUCN Shark Specialist Group.

PJ: How did you become interested in the conservation of threatened marine species?

SP: New Zealand suffered a number of misguided introductions of alien species. Many of our native animals were wiped out, or restricted to small populations on pest-free islands. Learning about what scientists and conservationists are doing to reverse these declines was absolutely fascinating to me. My father was a keen fisher, which introduced me to marine life. As I became more familiar with marine animals I realized that, globally, many marine vertebrates are similarly under threat.

Anthropogenic pressures have hugely impacted shark and rays. Management intervention is required for recovery to occur in many species. Personally, I feel I can best contribute by identifying and prioritizing the means of reversing these declines, and assisting with these efforts.

Whale sharks are a focus for me. These gentle giants are globally threatened by fishing and boat strikes. They’re wonderful animals to work with, often interested in people, and despite their enormous size (up to 20 m) they’re entirely harmless. They’re really rather endearing.


Whale sharks have an enormous mouth, but are harmless to people

PJ: You are the author of several PeerJ PrePrints, which form part of ‘The Third International Whale Shark Conference PeerJ Collection. Can you tell us a bit more?

SP: The organizers of the Third International Whale Shark Conference, held at the Georgia Aquarium in the US, partnered with PeerJ to produce the conference proceedings. I think that was a great decision. All the studies published from the conference will be grouped together, freely and easily accessible to all. The abstracts that are currently online will, hopefully, be replaced by full publications in many cases.

PJ: How do you combine your scientific work and your passion for photography?

SP: Photography is a big part of my work. Many of the species I routinely work with, such as whale sharks, manta rays and sea turtles, are individually identifiable through their color patterns or body form. When I’m in the water, I’ve almost always got a camera in hand. It’s a great chance to get some opportunistic pictures.


Researcher photo-identifying a whale shark

In 2012 I got my first interchangeable-lens camera. That really opened up photographic possibilities. Mirrorless cameras, like the one I have, are small enough to be easy to swim (and travel) with. It’s my constant companion underwater.

PJ: Can you tell us about your favorite photo? What’s the story behind it?


Whale shark silhouette

SP: I was in Mexico last year to lead a whale shark research expedition.I had attempted a couple of shark silhouette shots over the preceding days, but I had been stymied by the brightness of the sunball. As you can probably imagine, positioning yourself perfectly underneath a fast-moving fish is not always simple either.

In this particular case, the swimmers alongside the shark were positioned perfectly to block me from taking the ‘science’ photos I wanted. I noted that it was momentarily overcast, and I decided to try for a silhouette instead. In hindsight, the swimmers were a stroke of luck, as I think their presence improves the balance of the picture and emphasizes the size of the shark. 

PJ: Do you have any anecdotes to tell us about your pretty sweet job?

SP: A few years ago I was out with a bunch of tourists off Mozambique. They jumped in with a whale shark, while I stayed on the boat to confirm that everyone could swim. One of them shouted back to me that the shark had a net on it. I went in, dived down, and saw the shark was wrapped in a net that was chopping off both the pectoral fins (on the side) and the dorsal fin on it’s back. I quickly grabbed some scissors from the first aid kit, and caught up with the shark again. The shark was swimming slowly enough that, in a series of dives, I was able to remove the entire net. As I made the final cut, and the net fell away, I made eye contact with the shark. I’m pretty sure we had a moment.

I was radiating moral superiority for about a fortnight after that one.

PJ: What kinds of lessons do you hope the public take away from your work?

SP: Well, for one thing, they can be active participants. In addition to joining in on my research trips, there’s a global database for whale shark sightings. Anyone that takes a photograph of a whale shark, anywhere in the world, can submit the photo to the site. Over 4,000 people have already contributed their shots. If the shark is identifiable from the photo, we’ll be able to see the sighting history of that individual shark and track its re-sightings in future. ‘Citizen science’ is a hugely valuable resource for us – thanks to all the public assistance, over 5,000 whale sharks from 45+ countries have been identified.

More generally, I’d like people to understand that sharks aren’t scary. They’re just fish. Some are big, most are small. They’re not particularly interested in people. Keen marine tourists already know that they’re actually very difficult to find. Some of my work has focused on sustainable whale shark tourism. I hope it helps more people interact positively with these amazing fish. 

PJ: What’s next in your research?

SP: Whale sharks are an enigmatic species. For a shark they are the subjects of significant research attention, but in real terms they remain poorly known. We generally study them in the few areas where they aggregate to feed. For some reason, though, the majority of the sharks found at such sites are juvenile males. The only location where adult female sharks are consistently seen is in the northern Galapagos Islands. Shortly we’ll start deploying satellite-linked tags out there to work out their depth and dietary preferences. Then, hopefully, we can identify where these enormous fish might be in other oceans.

PJ: If you had unlimited resources, what study would you run?

SM: What a great question. What I’d actually like to do is eradicate all the introduced pests from New Zealand to let the native life recover. For many species, the technical knowledge of how to remove them exists, but it would obviously be a significant undertaking!  

PJ: Anything else you would like to talk about?

SM: For those who would like to get involved in whale shark research themselves, I lead expeditions that are open to the public. I’ve also just started to blog about both marine science and photography.


Tourists with whale shark

More generally, I’d just like to say how glad I am that PeerJ has been created. First, it accelerates the whole process of science. The ability to publish preprints and the speed of review (and subsequent publication) is amazing. In other journals, I’ve sometimes had to wait two years between acceptance and publication online. Second, I want my work to be available and (hopefully) useful. Most of the countries in which I work are still developing, and academics and managers do not have access to paid scientific databases. I love that PeerJ papers are available to anyone with an Internet connection.

PJ: Many thanks for your time!

SP: No worries!

We encourage you to check out some of our Marine Biology publications in PeerJ. Submit your next article to PeerJ or PeerJ PrePrints, and (assuming your article is published) don’t hesitate to contact us at editorial.support@peerj.com if you have an image suitable for being featured on the PeerJ homepage!


Exciting times! PeerJ secures next round of funding led by SAGE and O’Reilly

We are pleased to announce that we have secured a new round of funding led by SAGE and O’Reilly. These new investments ensure that we can keep pushing forward the pace of publishing open access articles at a low cost of entry for authors. We still maintain our independent company status, and continue to stand by our mission of enabling authors to publish fast, at minimal cost, with maximum exposure to their research.

SAGE is the one of the world’s leading publishers in science, humanities and social sciences, and their new investment in PeerJ reflects their commitment to exploring innovative publishing models. We are delighted to partner with a publisher that has been such an early advocate for open access publishing. For example, SAGE formed a partnership with Hindawi Publishing Corporation in 2007, and was also a founding board member of OASPA in 2008. They have since launched a number of open access titles – including SAGE Open.

David McCune, non-Executive Director of SAGE will now sit on the PeerJ Board of Directors alongside Tim O’Reilly, CEO and founder of O’Reilly Media, and our two co-founders. McCune commented “I’m proud to be joining PeerJ’s Board of Directors, who share SAGE’s commitment to learning, scholarship and high quality research.”

O’Reilly Media and OATV have supported PeerJ from our inception and their continued support through this second round of funding is testament to their belief in our vision. Tim O’Reilly adds “PeerJ’s vision for a new model of scientific publishing is as fundamentally important now as at the initial outset, and we are committed to ensuring they are able to continue on their path.”

PeerJ has spent the last year demonstrating market fit and strong customer demand and in that time we have published nearly 900 articles across both PeerJ and PeerJ PrePrints representing the work of almost 3,000 authors. At the same time, we have won significant accolades, including the 2013 ALPSP Award for Publishing Innovation. This shows that practicing academics consider PeerJ beneficial to both their chosen field and careers, and this hasn’t gone unnoticed by other players within the industry.

With the additional capital from this round we will be able to further develop our offering, as well as promote ourselves much more broadly to the wider academic community. Our authors consistently rank us highest for speed and responsiveness and we are signing up institutions at a pace. We believe that PeerJ is a pioneer in low cost, high speed and high quality scientific publication. We advocate a peer-review process that is rigorous and thorough yet fair, despite the lower entry level costs. Therefore funding support from highly regarded global publishers only goes to validate our belief in this mission.

We are excited by these new developments and thrilled that we can continue to build on our growing success. A big thank you to all the authors that have pioneered with us on our journey so far!