Machine-aided skim reading and life science publications – Author Interview

by | Jul 22, 2014 | regular

Today we published a method for machine-aided skim reading, which outperforms tools like PubMed in terms of focused browsing and informativeness of the browsing context. We were very interested in hearing more about this tool so we invited the first author Vit Novacek to comment on his research and his experience publishing with us.

PJ: Can you tell us a bit about yourself?

imageVN: I’m currently a senior post-doctoral researcher and project leader at Insight Centre for Data Analytics, National University of Ireland Galway. After I was awarded Ph.D. in computer science in 2011, I got involved in setting up a large industrial collaboration between my university and Fujitsu Laboratories in UK and Japan. Since 2012 I mostly work on that project, leading several work packages and helping to coordinate a group of up to 10 Fujitsu-funded researchers located in Galway. My work revolves around knowledge discovery in unstructured and semi-structured data, using methods building on and extending the theoretical arsenal of distributional semantics and graph analysis. The main application domain of my research has been life sciences (e.g., knowledge discovery in biomedical publications and ontologies), but recently I’ve got involved also in some enlightening experiments with financial and public administration data. When I’m not busy messing around with computers, I try to balance the sedentary nature of my profession by playing competitive volleyball, and also freediving or surfing along the Irish west coast whenever the moody North Atlantic is kind enough to let me in, and out 🙂

PJ: Can you briefly explain the research you published in PeerJ?

VN: The main goal of the research is to help life scientists to make more sense of all the articles being published at an ever-increasing rate in the field. Since it is harder and harder to keep up with the latest research even in relatively limited sub-domains of life sciences, we need technical solutions that will make `drilling’ for relevant pieces of knowledge more comprehensive and time-efficient. SKIMMR, the methodology and prototype presented in our article, presents one such solution. We extracted significant co-occurrence and similarity relationships from various corpora of biomedical abstracts on PubMed, and enabled users to conveniently navigate the networks of the relationships. This way one can `skim’ through the high-level conceptual structure of the domain covered by the corresponding corpus of abstracts. The skimming takes much less time then actual reading, and can reveal interesting relationships between particular topics in literature that may not be covered by any single abstract. Once an interesting area in the conceptual graph is discovered, publications associated with it can easily be retrieved and further manually examined. This potentially saves a lot of time spent reading irrelevant articles, and also can lead to serendipitous discoveries that could be easily missed with the classical publication search tools.

PJ: Do you have any anecdotes about this research?

VN: One of the most memorable and encouraging things happened in a relatively early stage of the research, during my stay at Information Sciences Institute, University of Southern California. Me and Gully, the co-author of the SKIMMR article whose group I was visiting at that time, were presenting a very preliminary prototype of the SKIMMR methodology to Maryann Martone, a professor of neuroscience at the University of California San Diego. Maryann helped us to set up an experimental corpus of articles on Spinal Muscular Atrophy (SMA) using her experience from working with SMA Foundation. We demonstrated an SMA instance of SKIMMR to her, letting her navigate the conceptual networks at will. After only about two minutes, she stumbled upon a few relationships that indicated experiments with LIX1 gene in feline models of the SMA disease. This was new and valuable knowledge to her, so she started to explore the related articles quite enthusiastically to learn more. Witnessing that made us very excited – maybe we just got lucky, but it was a clear indication of the potential of our research even in its very early stages.

PJ: What surprised you the most with these results?

VN: I can’t say anything surprised me so that I’d be like, wow, I didn’t really expect this at all. But one thing that was perhaps a little counter-intuitive for me was how far one can go with very simple representation underlying SKIMMR. We experimented with more types of relationships, based on the syntactic structure of the texts being explored and some sophisticated background knowledge, but eventually we decided to go with two simple ones, co-occurrence and similarity. Yet even such lightweight and relatively abstract knowledge representation model can support complex knowledge acquisition, as I’m learning whenever I use SKIMMR myself. I’m not a biomedical scientist by training, but SKIMMR helped me to learn quite a lot about Spinal Muscular Atrophy and Parkinson’s Disease (our testing domains) by now. The process is perhaps a little serendipitous (or chaotic if you want to call it that way), but very time-efficient, especially in the beginning when one does not have a clue about the domain at all.

PJ: What kinds of lessons do you hope the public takes away from the research?

VN: I’d like people to picture the SKIMMR methodology and prototype as something that can soon help them with their research a lot on a daily basis if they give us a hand in adjusting the prototype to their specific needs. I think pretty much everyone acknowledges the fact that we can’t get much further with traditional publication search engines if we want to be able to make sense of all relevant texts out there. On the other hand, one cannot expect the machines to read everything on our behalf and present us with concise and complex summaries of the content, at least not in near future. We need domain experts actively working with AI researchers to develop tools that can establish an efficient man-machine collaboration, letting computers to sieve through the oceans of information and allowing humans to be creative when interpreting whatever relatively simple connections have been discovered automatically. This is obviously a rather abstract vision that involves many specific research questions, and SKIMMR is just a small step towards realizing a truly functional man-machine intelligence. Yet if our preliminary results managed to make few more people think about how could they contribute towards realizing that ultimate goal, we’d be very happy.

PJ: Where do you hope to go from here?

VN: We’re currently working on methods for inferring more complex, taxonomic and domain specific relationships from the basic co-occurrence and similarity, and once that’s ready, we can reflect it in much more flexible and comprehensive way of navigating through the conceptual networks in SKIMMR. We are also working on using the conceptual networks to compute links between the actual articles that would explain the nature of the connection via simple semantic annotations. This has high potential for helping biomedical researchers by complementing services like PubMed’s retrieval of related articles.

PJ: If you had unlimited resources (money, lab equipment, trained personnel, participants, etc.), what study would you run?

VN: We have been running user studies to evaluate the SKIMMR instances `in the wild’, but the data we have collected so far do not allow for too representative statistical analysis. Therefore we needed to devise an alternative, simulation-based method of evaluation. If we had an army of users working with SKIMMR in several different domains on a daily basis, we could compare the findings from the simulations with actual user behavior and learn many more things about both our method and the data processed by it.

PJ: Why did you choose to reproduce the complete peer-review history of your article?

VN: It is an interesting feature not so frequently seen elsewhere. I’ve always been for maximum openness and accountability in science, and making reviews available (and ideally also non-anonymous) is an important step in this direction. The full revision history may be quite helpful for other authors, as they could learn from our mistakes and imperfections. Last but not least, setting up such an open review process in the publishing industry in general could make the peer review process much more thorough and fair in a long run.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

VN: My co-author met some PeerJ representatives at a workshop and became very enthusiastic about the whole PeerJ business model and revolutionary attitude. This kind of infected me, and when we were thinking about where to publish the results of our joint work, we realized that we should support PeerJ by trying to publish in it as well.

PJ: Do you have any anecdotes about your overall experience with us? Anything surprising?

VN: The biggest (and very positive) surprise definitely was the quick, yet still quite thorough and factual review feedback. I didn’t really expect having so much to process during my Christmas break after submitting the first version of the article in early December.

PJ: How would you describe your experience of our submission/review process?

VN: The submission wizard is perhaps a bit confusing and lengthy at first, but that may be just me, coming from computer science field where the meta-data required for publications is pretty minimalistic when compared to the tradition in life sciences. Anyway, once I got used to the system, it was OK as the web interface is quite clean and intuitive. The reviewing process felt very smooth and informative, with thorough and helpful reviews and sensitive supervision of the whole process by the Academic Editor. Whatever issues we had during the submission, the PeerJ team was ready to resolve them quickly, so the whole process felt very smooth and hassle-free.

PJ: Did you get any comments from your colleagues about your publication with PeerJ?

People I talked to were pretty interested in the publishing model and generally like it, they only regret something like that doesn’t exist in the `hard core’ computer science field (most people I work with at my institute don’t do research that could be published in PeerJ).

PJ: Would you submit again, and would you recommend that your colleagues submit?

Yes, definitely.

PJ: In conclusion, how would you describe PeerJ in three words?

Disruptive, Supportive, Professional

PJ: Many thanks for your time!

Join Vit Novacek and thousands of other satisfied authors, and submit your next article to PeerJ. And until the end of August, if you engage with PeerJ articles or preprints, then you can publish for free!

Get PeerJ Article Alerts