A TEXT POST

Author Interview with Alex Clark

Two weeks ago, we published “Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation”. In this study, Alex Clark and his colleagues describe a hybrid machine learning / interactive method for marking up bioassay data. Alex shared his “very positive experience” of submitting to us on his blog so we got in touch with him, as we wanted to hear more about his work.

PJ: Can you tell us a bit about yourself?

imageAC: I grew up in New Zealand, and migrated abroad as scientists tend to do, living in the United States initially before settling in Canada. As a kid, I became fascinated with computer programming, but quickly realized that I did not want to make that my only profession, and so went to university and eventually ended up with a doctorate in chemistry in 1999. Pursuing a career that involves equal parts science and software engineering has been quite the balancing act, and in 2010 entrepreneur was added to my list of day jobs when I founded Molecular Materials Informatics, Inc., which is dedicated to helping bring chemical informatics into the modern software era. The most visible products to date are a variety of chemistry themed mobile apps for Apple iOS and Google Android devices, though there are a number of advanced original algorithms keeping things moving under the hood.

While many of the projects that I work on are exclusive to my own company, about half of them are collaborative in nature, involving joint efforts with individuals and companies, such as Collaborative Drug Discovery, Inc.

PJ: Can you briefly explain the research you published in PeerJ?

AC: The research addresses the fact that when scientists setup a new screening experiment for testing small molecules for bioactivity, they document the details using plain scientific English text. This is a problem for informaticians, who would like to create software capable of analyzing activity measurements, which is rate limited by the inability of a computer to determine whether two screening configurations are measuring the same thing. The solution is to express the experiments using semantic markup, where the important properties such as disease target, cell line, protein, measurement type, reference controls, etc., are annotated using a consistent scheme.

Past efforts to solve this problem have mainly focused on either fully automated parsing of text, or completely manual user-operated markup. Unfortunately the former tends to have an unacceptable error rate, while the latter consumes far too much of a scientist’s time. We built a proof of concept software tool that splits the difference between the two extremes: taking the best of automated text-to-markup machine learning in order to get the right answer most of the time, while keeping the user in the loop to confirm when the automated assignments are correct, and step in and intervene when they are not. In this way scientists can make their experiment descriptions useful to informatics software with just a few minutes of their time per experiment - a burden which decreases as more annotations are collected, which improves the quality of the training sets.

PJ: Do you have any anecdotes about this research?

AC: Once we built the initial prototype and had it working well in our hands, we demonstrated it to a number of colleagues in the industry. One of the many things we learned is that recognition of the problem is widespread: it seems like every organization that has collected a significant number of textual assay descriptions is well aware of the limitations, and many have already looked into trying to find a solution. I’m accustomed to having to provide a reasonably thorough introduction to why a problem is important and why current solutions are not as good as they could be, but in this case that part was pretty much taken as given.

PJ: What surprised you the most with these results?

AC: First of all that the first approach I tried worked well (black box natural language processing followed by Bayesian analysis). And secondly as I mentioned before, that explaining the need for this research was relatively easy due to high awareness of the importance of the problem and that it remains largely unsolved.

PJ: What kinds of lessons do you hope the public takes away from the research?

AC: That writing up an experiment in human readable text is only the first half of the exercise. To be fully useful, documentation has to be processed into a form that computers can use for precise searching, categorization and large-scale decision support informatics. If your data remains as words and arbitrary diagrams, it will remain just an isolated data point that will only ever be read by a handful of other humans. If it is machine readable, it will be able to influence every relevant scientific decision that follows. The research seeks to demonstrate that by balancing the best of natural language processing and the best of user interface design, it is possible to reduce the amount of time a scientist needs to invest in this process to a nominal commitment that is quickly paid back in terms of new capabilities.

PJ: Where do you hope to go from here?

AC: We intend to upgrade the prototype into a modular web interface that can be plugged into a number of data entry systems, starting with CDD Vault. As users annotate more bioassay descriptions with semantic terminology, the training set will continue to improve. As domain coverage increases, the likelihood that an assay can be marked up very quickly increases, i.e. the user just approves all the predicted annotations, rather than having to hunt through and dig them out. As the data grows, the capabilities that we can built on top of it grow too: being able to search for assay properties, or compare assays for similarity, are immediate examples, but there are larger scale options too: once the marked up data becomes prevalent, analysis software can observe trends over the entire domain of drug discovery, revealing trends that might have otherwise been very difficult to spot.

PJ: If you had unlimited resources, what study would you run?

AC: First of all, I would hire enough expert professionals to painstakingly annotate every biological assay ever written down, and thus create an exhaustive training set. Then I would commission every creator of data entry software for biological content to make use of the annotation interface, so that all lab notebook software would provide scientists with the opportunity to conveniently describe their bioassays in a machine-friendly format.

PJ: Why did you choose to reproduce the complete peer-review history of your article?

AC: The reviews were thoughtful and constructive, and I saw no reason to keep them private. And since the first reviewer had taken the first step and made her identity known to us, it only seemed fair.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

AC: I heard about it before it went live, on a blog or a tweet, I forget which. The PeerJ decision was a combination of moral and financial reasons: I am personally a gigantic fan of open access scientific literature, since the peer review process is essentially an exercise in crowd sourcing, and the whole point of science is to be open. Unfortunately the current breed of scientific publishers has carried over its legacy cost structure from the dead-tree era, which means that scientists have a choice between reader-pays and author-pays, and the fees involved can be prohibitive to many. That system works fine if everyone who is involved in creating or consuming science is rolling around in excess grant money, but that certainly does not describe all of us. PeerJ could be summed up as bringing the lean startup technology movement to scientific publishing. In my opinion it’s not a moment too soon, and I welcome the opportunity to play a small part.
 
PJ: Do you have any anecdotes about your overall experience with us? Anything surprising?

AC: In context it is not surprising, but the responsiveness of the staff took some getting used to: having an email conversation with an identifiable person on the other end is unusual for scientific publishing. I tend to expect to receive automated messages from noreply@bigpublisher.com whenever I have an inquiry. The manuscript submission process gave me the overall impression that the journal was a partner with an interest in making this happen smoothly, rather than a system that is rather indifferent about my contribution.

PJ: How would you describe your experience of our submission/review process?

AC: The website for receiving submissions is very well designed. Given that it is quite detailed, it is no surprise that there are one or two ambiguities, but as long as the staff keep paying attention and iterating, I am optimistic that the design will reduce a significant amount of manual labour for both the authors and publishers. Also, the peer review was done very promptly, and no less thoroughly for its quick turnaround.

PJ: Did you get any comments from your colleagues about your publication with PeerJ?

AC: Just the usual congratulatory encouragement, but it’s early days yet.

PJ: Would you submit again, and would you recommend that your colleagues submit?

AC: I would ideally like to make PeerJ my go-to journal, but there are no categories for chemistry or cheminformatics, which means I can only use it when I occasionally venture out into bioinformatics. I look forward to the day when PeerJ branches out in the direction of chemistry, and/or when other publishing startups recognize that the PeerJ business model is successful and rush to fill all of the other vacant niches.

PJ: Anything else you would like to talk about?

AC: I wish the company every success. In terms of the bigger picture, it’s really important that PeerJ can successfully demonstrate that its lean business model works, because the contemporary journal fee structures are keeping authors and readers out of science. This is indefensible in the information age, but somebody has to step up and show that there is a better way.

PJ: In conclusion, how would you describe PeerJ in three words?

AC: Disrupting scientific publishing.

PJ: Many thanks for your time!

AC: My pleasure.

We encourage you to check out some of our Computational Science publications in PeerJ. Join Alex Clark and thousands of other satisfied authors, and submit your next article to PeerJ.

A TEXT POST

'I study how crabs sniff' - Author Interview

Yesterday we published the work of Lindsay Waldrop and her colleagues in which they modeled how the performance of the hermit crab’s antennae might change as they both grow and transition from water to air. We invited Lindsay to comment on her research and her experience publishing with us.

PJ: Can you tell us a bit about yourself?

image

LW: I am a postdoctoral research associate with Prof. Laura Miller at the University of North Carolina at Chapel Hill in the Departments of Biology and Mathematics.  I did my graduate work at the University of California, Berkeley with Prof. Mimi Koehl in the Department of Integrative Biology.

.

PJ: Can you explain the research you published in PeerJ?

LW: This work is an extension of my dissertation research on how crabs capture odors from their fluid environments. I studied both marine crabs and terrestrial hermit crabs and found that they use different ways of capturing odors. Marine crabs use a very dense tuft of chemosensory hairs on their first antennae to capture and hold a discrete sample of water close to these hairs; it’s important for them to hold this sample because it gives time for odor molecules to diffuse to the surfaces of the hairs so that they can interact with sensory dendrites.  But for terrestrial hermit crabs in air, the diffusion of odor molecules is so much faster in air that they don’t need to hold on to a sample the way marine crabs do, which has likely caused a shift in the morphology of the hairs themselves.
The way fluid interacts with a structure depends a lot on how big the structure is, how fast it moves, and the properties of the fluid it’s in. So the scaling of a structure like a chemosensory hair during growth that operates in water is a big deal. Size changes that crabs experience between when they settle as juveniles (~4 mm in body length) to their adult size (~30-40 mm in body length) could drastically change the way their antennae interact with the odor-containing fluid. We know that marine crabs scale their antennae allometrically – that is the antennae of small juveniles are relatively much bigger than the antennae of adults. This helps them continue to capture odor samples when they are small. But since terrestrial hermit crabs have antennae that capture odors in air and a different odor-capture mechanism, it was unclear how being small would impact a juvenile hermit crab’s ability to capture odor molecules. Our study looks at how the antennae of juvenile hermit crabs scale and uses a simple odor-capture model to determine how the scaling impacts odor capture.

PJ: Do you have any anecdotes about this research?

LW: [spoiler alert: gross but funny] I collected the animals used in the study from a very tiny island off the coast of Moorea, French Polynesia. I went out at night (because that is when the large hermit crabs are the most active) with a few other scientists who were collecting other animals, one of whom was collecting moths with a very bright headlamp. His light accidentally flashed me in the eyes, which triggered the very first migraine headache I ever had, not a very fun experience. I threw up as a result, and tried to sit in the dark without moving so that my head would hurt less. As my eyes adjusted to the dark, I looked down and saw tiny things start to cluster around my vomit. They were the hermit crabs that I was there to collect, and they were quite happily munching away on my former dinner. So I can add human vomit to the list of food sources for terrestrial hermit crabs, along with detritus, rotting plant matter, dead animals, and human excrement!

image

Terrestrial hermit crabs. Photo: Lindsay Waldrop

PJ: What kinds of lessons do you hope the public takes away from the research?

LW: When someone asks me what I study, I say “I study how crabs sniff!” which almost always makes someone smile and think. They have often never heard of such a crazy thing – sniffing crabs! – much less that someone has studied it for years. It’s often the first time they have really considered what it must be like to be a crab, what sort of challenges a crab has to face in daily life and how they could go about it all. I hope that public takes away that if you stop and look closely at nature, to consider how plants and animals do all the amazing things they do just to survive, and that they will always find something brilliant that will make them smile!

PJ: Where do you hope to go from here?

LW: Our mathematical model of odor capture in the paper is extremely simple but it gave us some interested trends to investigate with a better model. I’m currently working with a mathematician to take a more realistic antennae geometry, based on the antennae morphology reported in the paper, and solve advection-diffusion equations to get a more accurate idea of how odor capture varies during growth. With a more accurate model, we hope to validate some of the trends that the simple model produced.

PJ: Why did you choose to reproduce the complete peer-review history of your article?

LW: I chose to release my peer-review history because first and foremost, I think the history shows how the paper has progressed to something that we’re really proud of. Our reviewers were very thorough and took us to task over the weaknesses of the odor-capture model and the interpretation of our results, but did it in ways that were deeply constructive and helpful. In the end, the finished paper is far higher quality than the preprint, and for that, I have my reviewers and the academic editor to thank.

Second, I think it’s a great example for people to see how the process of peer review really works. It’s a term that’s thrown around a lot, but I think the public has very little understanding of what it’s like to go through the process as a researcher. This is a very concrete example that I can point to when I teach students about constructive peer-review or when an interested layperson wants to know what the phrase really means. I really think this is a great example of how peer review ought to work.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

LW: I heard about PeerJ from colleagues on Twitter, where you seem to be very active! I am a strong proponent of open access research and wanted to be part of the movement to publish in journals where that was an option. However, I did my graduate work without a big grant, I funded it entirely with small grants pieced together, so I didn’t have the budget to pay the thousands of dollars in fees for open access. PeerJ allows me, as a former graduate student who independently funded their research, to publish open access. 
In addition to that, your wonderful policy on waiving fees for undergraduate researchers really helped out. My two coauthors, Roxanne Bantay and Quang Nguyen, where both undergrads at UC Berkeley that help through the Undergraduate Research Apprenticeship Program to conduct the research. They did a brilliant job, and I was delighted that I could include them as authors on the paper for free to recognize their contribution to the work. I will continue to have an active undergraduate research program, so I anticipate sending more papers with undergrad authors your way in the future.

PJ: How would you describe your experience of our submission/review process?

LW: I enjoyed my experience published with PeerJ. The staff are very friendly and work quickly. The academic editor and reviewers were extremely constructive and professional. I was genuinely shocked by how fast I received a first decision on my manuscript. As an early career scientist, it is critical for me to get papers out in a timely manner, so the fact that the turn-around was so fast is a huge plus!

PJ: Would you submit again, and would you recommend that your colleagues submit?

LW: I will absolutely submit to PeerJ again, and I have already recommended to colleagues, particularly those with active undergraduate research programs, to submit as well.

PJ: In conclusion, how would you describe PeerJ in three words?

LW: Fast, Open, and Affordable!

PJ: Many thanks for your time!

Join Lindsay Waldrop and thousands of other satisfied authors, and submit your next article to PeerJ.

A TEXT POST

DNA sequencing in the middle of the Pacific Ocean – Video



A group of researchers including scientists at San Diego State University overcame equipment failure, space constraints and shark-infested waters to do real-time DNA sequencing in a remote field location. Check out this video to find out how they did it!

Today we published the article in which they describe the sequencing and informatics pipelines established during the 2013 Line Islands research expedition, release the data generated during the expedition, and discuss some of the unexpected challenges in remote sequencing.

A TEXT POST

DNA sequencing in the middle of the Pacific Ocean – Author Interview

During the 2013 Southern Line Islands Research Expedition, a group of researchers brought a DNA sequencer on a ship in the central Pacific to do remote sequencing in real time. The most surprising thing is that it actually worked! Computer scientists and biologists successfully sequenced 26 marine microbial genomes and 2 marine microbial metagenomes. Sequencing out into the field allowed them to look at their data and develop new hypotheses without delay, enhancing the productivity of the research expedition. Today, we published an account of their trip and methods, and we invited Robert Edwards, corresponding author on the study to answer a few questions.

PJ: Can you tell us a bit about yourself?

RE: I am an Associate Professor in Computer Science at San Diego State University. My background is in Microbiology, my PhD was studying the regulation of nitrogen fixation in Klebsiella pneumonia, but I have also worked on bacterial pathogenesis, in particular Salmonella and E. coli infections. I have been focusing on bioinformatics for over a decade, and have built tools for microbial genomics (SEED, RAST) and metagenomics (MG-RAST, RTMg, crAss, etc) analysis.  Part of that focus, which was indispensible here, is an expertise in Linux, which I have been using for nearly 20 years. My research group focuses on building solutions for biologists to enable them to analyze their data, as well as de novo analysis of biological datasets.

PJ: Can you briefly explain the research you published in PeerJ?

RE: We took a next-generation sequencer, an Ion Torrent, on a research expedition, and toured the Line Islands in the central Pacific, sequencing microbes and analyzing their genomes while we traveled. The analysis led to hypotheses that we could test while we were in the field, and shows that next-generation sequencing can be used by biologists wherever their research leads them. This has massive implications for ecology, but also for conservation, bioprospecting, and a whole suite of other sequence based technologies.

image

Yan Wei Lim, SDSU graduate student and author on the paper, exploring corals in the southern Line Islands. Photo: Rob Edwards

PJ: Do you have any anecdotes about this research?

RE: Most people thought we were completely crazy when we said that we would do this. The protocols for next generation sequencing are designed to be used in clean laboratories, and the data requires high performance computing. Seawater tends not to prolong the life of electronic components, and so everything was really stacked against us. However, those weren’t the real challenges that we faced. The sample preparation was easy, especially for me, as Yan Wei [first author] did it all! It helps to have someone with her amazing ability to fine tune experiments to ensure their success, of course.
The computational aspects were much more challenging than the biology. As we note in the paper, there were some significant problems that we had to overcome, particularly a broken OneTouch. It would have been easy at home in San Diego – in principle one call to tech support, or an hour searching the Internet, and the problem would have been solved with an express shipment. However, in the middle of the ocean with no chance of outside help except very expensive satellite phone communications we were on our own.
You become inspired to be creative when you have to solve all the problems without outside help and become especially creative when all your colleagues are out SCUBA diving with dolphins and sharks and you are stuck inside working on a computer!
The other anecdote to share was the first time we figured out it had actually worked. We had finally annotated our sequences, and the annotations were being written to a file. It quickly became clear that it was a Vibrio genome, as we expected. The dive teams had left the boat in zodiacs, and were preparing to enter the water, but I couldn’t wait to share the news, so I radioed all the teams to tell them we’d succeeded in sequencing and analyzing the genomes. Most of the scientists aboard didn’t share my enthusiasm and were wondering why I was holding up their dive!

image

SDSU Postdoctoral Researcher Andreas Haas takes a break from processing the water samples in the large tubes to survey the view of Vostock island from the aft deck of the Hanse Explorer. Photo: Rob Edwards

PJ: What surprised you the most with these results?

RE: That it worked was the biggest surprise – not only to us, but also to the naysayers who said it was impossible! That we can take next-generation sequencing and run it anywhere, even without internet access, that we can analyze genomes and metagenomes, and come up with meaningful results. That is the truly transformative result of the paper.

PJ: What kinds of lessons do you hope the public takes away from the research?

RE: That sequencing is really everywhere, and impacting every aspect of biology. Although our work is not at all about medicine, clearly the ability to sequence DNA anywhere in the world is going to have huge impacts on society. We’re also pushing the boundaries of what is technically feasible and at the same time learning new things about ecology and evolution, areas that we have studied for hundreds of years.

PJ: Where do you hope to go from here?

RE: We talked about next generation sequencing on a boat for a long time, and we learned a lot of lessons by doing it. One of the key lessons we learned is that the “thinking” part quickly becomes limiting. We can generate so much data so easily, but we need more people to actually think about it while we are there! In the next iteration I think we will try and post the data while we are at sea to crowd-source the questions we should be answering while we are still there.

PJ: If you had unlimited resources, what study would you run?

RE: These results, together with our ten years of studying these uninhabited atolls, is leading us to the conclusion that there are highly related bacteria that are resident on the reef, but that they are supplemented by transient bacteria. Of course, they all share their DNA via horizontal gene transfer (HGT). A detailed and comprehensive study on these islands would be the ideal place to unravel the role of HGT in the evolution of bacteria in natural, unspoiled, habitats.

PJ: Why did you choose to reproduce the complete peer-review history of your article?

RE: It is a novel feature of PeerJ, and it is also appealing because of the transparency that it inspires. I often read papers and think “oh that sentence was introduced in response to the reviews” or “if I’d reviewed this paper I would have asked x, y, or z”. By seeing the reviews you can see what other people are thinking about the paper, you can see how the authors adapted the paper in response to the reviews, and you can see how science progresses.
For this paper, when you read the reviews one of the biggest criticisms was that the code was not available. That was a clear oversight on our behalf, all of the code was already in public repositories but we never really thought that is what people wanted to see.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

RE: I first heard about it through the affiliation with O’Reilly publishing. I’ve long been a fan of O’Reilly publishing (it was the camel book that got me into Perl), and of Tim O’Reilly’s philosophy on publishing, DRM, and all things to do with electronic media. Since then we’ve published several papers in PeerJ including the Line Islands collection, of which this article is a part. It was a natural home for this article and we never considered publishing it anywhere else.

PJ: Do you have any anecdotes about your overall experience with us? Anything surprising?

RE: I was really surprised to be working with the journal staff while the paper was being revisited after the first round of review and before it went back to the academic editor. That seemed like such a natural way to work that I was surprised that I haven’t experienced that before. It allowed me to present the best possible article to the reviewers and unburdened others from correcting my mistakes!

PJ: How would you describe your experience of our submission/review process?

RE: The ease of submission was fantastic. Working in a Linux environment I use Open Office for everything, and it is awesome to be able to submit native documents. With reference support in zotero, there is no reason not to use open software. Its time the rest of the open access community considered other forms of open!

PJ: Did you get any comments from your colleagues about your publication with PeerJ?

RE: Everyone was 100% supportive of the decision to use PeerJ. Not everyone understood the decision to publish a preprint, but it is there and no one has complained (to me!).

PJ: Would you submit again, and would you recommend that your colleagues submit?

RE: Definitely. I would not hesitate to submit to PeerJ again.

PJ: In conclusion, how would you describe PeerJ in three words?

RE: Open. Accessible. Science.

PJ: Many thanks for your time!

RE: You’re welcome!

Check out some of our Marine Biology publications in PeerJ. Submit your next article to PeerJ, and join Robert Edwards and thousands of other satisfied authors.

A TEXT POST

Passeriform birds introduction - Author Interview

Yesterday we published ‘A comparison of success rates of introduced passeriform birds in New Zealand, Australia and the United Statesin which Michael Moulton and Wendell Cropper compiled lists of successful and unsuccessful passeriform introductions to nine sites in these 3 countries.

image
Michael Moulton is Associate Professor in the Department of Wildlife Ecology & Conservation at the University of Florida, and we felt it would be informative to ask him a few questions on his work and his experience publishing with us.

PJ: Can you tell us a bit about yourself?

MM: My colleagues and I are interested in introduced species, chiefly birds.  The questions we are addressing involve determining why some species succeed and others fail when introduced to new places.

PJ: Can you briefly explain the research you published in PeerJ?

MM: Our paper involves a comparison of introduction success rates among passerine birds across sites in Australia, New Zealand and the United States.  We are interested in the importance of site-level factors as the primary determinant of introduction fate.

PJ: What surprised you the most with these results?

MM: Perhaps our greatest surprise was how much lower the success rate for passerine introductions was in the United States even when we limited our analysis to a comparable time (late nineteenth century) and in reasonably homogeneous sets of species.

PJ: What kinds of lessons do you hope the public takes away from the research?

MM: Several studies have argued that the most important factor in determining introduction outcomes is propagule pressure, meaning the number of individuals released per species. There has clearly been a rush to assume that this is a well-established fact. Unfortunately, the propagule pressure hypothesis is based on an incomplete historical record.

PJ: How was your overall experience with us?

MM: PeerJ appealed to me by the speed of publication, and the open access. I was also impressed with how many well-known scientists are serving as Academic Editors. Lastly the production of the final papers is truly top notch. I really feel that this is the publishing model of the future, and I am pleased to be a part of it.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

MM: A colleague mentioned it to me, and I checked it out and submitted the paper immediately.

PJ: Do you have any anecdotes about your overall experience with us? Anything surprising?

MM: I have been very impressed with the care and attention to detail from the production team. The production people are incredibly fast and sharp-eyed to notice little things that might otherwise go undetected (extra commas, footnote labels, and all that sort of thing). For example, we used several papers by an author (Pfluger) published in the 1890s in the Oregon Naturalist. Most of his papers had in the title the term “German song birds”. However, one of his installments was slightly different and Jackie Thai quickly zeroed in on that. It turned out that Pfluger had actually used a slightly different title for that particular installment, but Jackie’s sharp eye and attention to detail was quite impressive.
 

PJ: How would you describe your experience of our submission / review process?

MM: I would say that it was very fast and that communication with the Academic Editor and the production team has been exceptional. Nobody likes it when journals take months to process their submissions.
 
PJ: Did you get any comments from your colleagues about your publication with PeerJ?

MM: Yes. And I can safely say that PeerJ's reputation as a great place for fast turn around and top-flight production is growing rapidly.

PJ: Would you submit again, and would you recommend that your colleagues submit?

MM: I will definitely submit again to PeerJ.

PJ: In conclusion, how would you describe PeerJ in three words?
 
Fast, accurate and attractive.

PJ: Many thanks for your time!

Join Michael Moulton and thousands of other satisfied authors, and submit your next article to PeerJ.