Tag Archives: research

Free software for biologists pt. 1 – writing tools

This is the first in a planned series of five posts, to cover (1) writing tools, (2) data management and analysis, (3) preparing figures, (4) writing presentations and (5) choosing a new operating system. They will eventually be collated here.

Document-writing tools

Microsoft Word remains the default word processing software for the majority of people. Its advantage is exactly that, which makes collaboration relatively straightforward. The track changes function is appreciated by many people, though I would argue it’s unnecessary and can lead to problems; see below for tips on collaborative writing.

If you’re going to be spending a large proportion of your life writing then Word is not the ideal solution, especially for scientists. On this point it’s worth making clear that `scientist’ is just another word for `writer’. We write constantly — papers, grant proposals, lecture notes, articles and books. Professional writers use other commercial software such as Scrivener; this however is just paying for something different. Microsoft Word has improved in recent years, but there are still problems. The main limitations are:

  • It’s terrible at handling large documents (e.g. theses, or anything more than a couple of pages). Do you really need to do all that scrolling?
  • Including equations or mathematical script is difficult and always looks poor quality.
  • Embedded images are reproduced at low resolution.
  • Files are unnecessarily large in size.
  • The .docx format is very unstable. Send it to a collaborator on another computer (even with Windows) and it will appear different, with mangled formatting.
  • The default appearance doesn’t look very professional, and improving it takes forever.
  • It keeps reformatting everything as you go along, particularly when you combine sections from different documents.

I didn’t realise how much time was spent fighting Word’s defaults until I tried other software. Escaping isn’t tricky, as this blog post reveals. Several options are available to the scientific writer, and will improve both the quality and the experience of writing.

LibreOffice Writer. Want something that looks exactly like Microsoft Word, does everything that Word does, but don’t fancy paying for it? Just download LibreOffice and you’ll find it works equally well (if not better). This is perhaps the best option if you have an out-of-date or bootlegged version of Word and can’t access updates. With LibreOffice you will be able to open, edit and share all of your existing Word documents, and even save them in .doc format. The native format is .odt (for open document text). This is recommended as a stable document format by the British Government, which tells you something. Your Word-using colleagues will be able to open them as well.

Markdown. This has grown in popularity with scientists as it’s easier to use than professional tools such as LaTeX (see below) but provides many of the document-formatting tasks that scientists need. You can even write Markdown scripts in Word, but why would you. Combining it with pandoc makes it even more powerful because you can convert a Markdown template into any other format to match the requirements of a journal (or your collaborators). This is much easier to do than with LaTeX, which requires some programming nous. A good, free Markdown editor is Retext.

LaTeX. The gold standard, as used by many professional writers and editors (it’s pronounced lay-tech; the final letter is a chi). All my handouts are prepared in LaTeX, as are my presentations, manuscripts, in fact pretty much everything I write apart from e-mails. The problem is that learning LaTeX takes time. Most word processor programs run on the principle of WYSIWYG (What You See Is What You Get), whereas in LaTeX you need to explicitly state the formatting as you go along.

There are a number of gateway programs which allow you to write in LaTeX but with a more familiar writing environment. These therefore ease the transition and can show you the potential. I know many people who swear by LyX. My preferred editor is Kile, though this will involve a steeper learning curve. A great help while writing in LaTeX is to be able to see what the document looks like as you write. I pair Kile with Okular, but there are many other options that are equally good.

As a health warning, before diving into the deep end, bear in mind that working in LaTeX will initially be much slower. It takes time to become competent, and there are annoying side issues that remain frustrating (installing new fonts, for example, is bizarrely complex). While the majority of journals and publishers accept LaTeX submissions, and most will provide a template to format your manuscripts, there are still a few who require .doc format. This is changing though due to demand on the part of authors.

Collaborative writing

In the old days, when you collaborated on writing a paper, it required dozens of e-mails to be sent round as each author added her comments. Version control became impossible as soon as there were multiple copies and it was easy to lose track. Some people persist in working this way despite the fact that there are loads of tools that make this unnecessary. By using an online collaborative-writing site, multiple authors can contribute simultaneously, and you can even chat to each other while you’re at it.

The best-known is of course Google Docs which has the virtue of a familiar interface. It’s not designed for scientific writing though, and unsurprisingly there are more specific tools out there. While I’ve not used it, Fidus Writer looks like a promising option with a familiar layout to Google Docs but more better suited to the demands of science writing.

The one I’ve used most often is Authorea, which has the major advantage that anyone can write in any style and on any platform. This means that one person can write the technical parts in LaTeX while another adds sections Markdown, or you can cut-and-paste text from a normal word processor. The final document can be exported in your format of choice. This solves the problem of having all your collaborators needing to use the same software. My favoured option (for LaTeX users only) is shareLaTeX, though writeLaTeX looks to be equally good.

I haven’t mentioned GitHub here, even though I know many people who use it to maintain version history in collaborative work. This is particularly true of programmers who need to trace changes in code as it’s being developed. The same functionality can be very helpful in writing manuscripts, but using GitHub is not easy to use and it’s rare in biology that you will find yourself working with a pool of collaborators who know what they’re doing.

As a final note, I discourage the use of tracked changes due to many bad experiences. The main issue is that once more than one person has commented on a document it gets completely mangled, and it can take a long time to reconstruct the flow of the text once all the contradictory changes have been accepted. Furthermore, if your reason for having a WYSIWYG processor is that you want to see how the final document will look, then tracked changes remove that benefit and make your document unreadable. Lastly, whenever I’ve been forced into using them (in one notable occasion by a journal editor) it has invariably introduced errors into the text. By using some of the software recommended here there should be no need for the track changes function at all.

References and citations

The old standard for reference management used to be Endnote, which is an expensive solution if you don’t have either an institutional license or a student discount. Much the same can be said of Papers, which I hear is excellent but have never used.

I strongly recommend Mendeley to all my students. Think of it as iTunes for papers. It’s free and integrates smoothly with all the word processing software above. Even better is the online functionality which means you can synchronise documents across all your devices, including a commenting function, and share with colleagues. So you can read a PDF on the train, make notes on it, then open your office computer and retrieve all the notes straight away before dropping the citation directly into your manuscript. There are many tutorials online and the few hours you spend learning to use it will be rewarded by much time saved. Apparently Zotero, which is also free, offers similar functionality, but I’ve not tried it.

Having said all that, I don’t use Mendeley. If you’re using LaTeX then citing references is done through BibTeX, and I prefer kBibTeX to manage my reference library as it integrates nicely with Kile. This is only a personal choice though, and Mendeley would achieve the same result.

 

In praise of backwards thinking

What is science? This is a favourite opening gambit of some external examiners in viva voce examinations. PhD students, be warned! Imagine yourself in that position, caught off-guard, expected to produce some pithy epithet that somehow encompasses exactly what it is that we do.

It’s likely that in such a situation most of us would jabber something regarding the standard narrative progression from observation to hypothesis then testing through experimentation. We may even mumble about the need for statistical analysis of data to test whether the outcome differs from a reasonable null hypothesis. This is, after all, the sine qua non of scientific enquiry, and we’re all aware of such pronouncements on the correct way to do science, or at least some garbled approximation of them.* It’s the model followed by multiple textbooks aimed at biology students.

Pause and think about this in a little more depth. How many great advances in ecology, or how many publications on your own CV, have come through that route? Maybe some, and if so then well done, but many people will recognise the following routes:

  • You stumble upon a fantastic data repository. It takes you a little while to work out what to do with it (there must be something…) but eventually an idea springs to mind. It might even be your own data — this paper of mine only came about because I was learning about a new statistical technique and remembered that I still had some old data to play with.
  • In an experiment designed to test something entirely different, you spot a serendipitous pattern that suggests something more interesting. Tossing away your original idea, you analyse the data with another question in mind.
  • After years of monitoring an ecological community, you commence descriptive analyses with the aim of getting something out of it. It takes time to work out what’s going on, but on the basis of this you come up with some retrospective hypotheses as to what might have happened.

Are any of these bad ways to do science, or are they just realistic? Purists may object, but I would say that all of these are perfectly valid and can lead to excellent research. Why is it then that, when writing up our manuscripts, we feel obliged — or are compelled — to contort our work into a fantasy in which we had the prescience to sense the outcome before we even began?

We maintain this stance despite the fact that most major advances in science have not proceeded through this route. We need to recognise that descriptive science is both valid and necessary. Parameter estimation and refinement often has more impact than testing a daring new hypothesis. I for one am entranced by a simple question: over what range do individual forest trees compete with one another? The question is one that can only be answered with an empirical value. To quote a favourite passage from a review:

“Biology is pervaded by the mistaken idea that the formulation of qualitative hypotheses, which can be resolved in a discrete unequivocal way, is the benchmark of incisive scientific thinking. We should embrace the idea that important biological answers truly come in a quantitative form and that parameter estimation from data is as important an activity in biology as it is in the other sciences.”Brookfield (2010)

Picture 212

Over what distance do these Betula ermanii trees in Kamchatka compete with one another? I reckon around three metres but it’s not straightforward to work that out. That’s me on the far left, employing the most high-tech equipment available.

It might appear that I’m creating a straw man of scientific maxims, but I’m basing this rant on tenets I have received from reviewers of manuscripts, grant applications or been given as advice in person. Here are some things I’ve been told repeatedly:

  • Hypotheses should precede data collection. We all know this is nonsense. Take, for example, the global forest plot network established by the Center For Tropical Forest Science (CTFS). When Steve Hubbell and Robin Foster set up the first 50 ha plot on Barro Colorado Island, they did it because they needed data. The plots have led to many discoveries, with new papers coming out continuously. Much the same could be said of other fields, such as genome mapping. It would be absurd to claim that all the hypotheses should have been known at the start. Many people would refine this to say that the hypothesis should precede data analyses (as in most of macroecology) but that’s still not the way that our papers are structured.
  • Observations are not as powerful as experiments. This view is perhaps shifting with the acknowledgement that sophisticated methods of inference can strip patterns from detailed observations. For example, this nice paper using Bayesian analyses of a global dataset of tropical forests to discern the relationship between wood density and tree mortality. Ecologists frequently complain that there isn’t enough funding for long-term or large-scale datasets to be produced; we need to demonstrate that they are just as valuable as experiments, and recognising the importance of post-hoc explanations is an essential part of making this case. Perfect experimental design isn’t the ideal metric of scientific quality either; even weak experiments can yield interesting findings if interpreted appropriately.
  • Every good study should be a hypothesis test. We need to get over this idea. Many of the major questions in ecology are not hypothesis tests.** Over what horizontal scales do plants interact? To my mind the best element of this paper by Nicolas Barbier was that they determined the answer for desert shrubs empirically, by digging them up. If he’d tried to publish using that as the main focus, I doubt it would have made it into a top ecological journal. Yet that was the real, lasting contribution.

Still wondering what to say when the examiner turns to you and asks what science is? My answer would be: whatever gets you to an answer to the question at hand. I recommend reading up on the anarchistic model of science advocated by Paul Feyerabend. That’ll make your examiner pause for thought.


* What I’ve written is definitely a garbled approximation of Popper, but the more specific and doctrinaire one gets, the harder it becomes to achieve any form of consensus. Which is kind of my point.

** I’m not even considering applied ecology, where a practical outcome is in mind from the outset.

EDIT: added the direct quotation from Brookfield (2010) to make my point clearer.

Two lumps please

Here’s a quick thought experiment. Imagine you have a spare flowerbed in your garden, in which you scatter a handful of seeds across the bare ground. You then ignore them, and come back some months later. What will have happened?* Your expectation might be that you will have a healthy patch of plants, all about the same size. Some might be larger or smaller than average, but overall you’d expect them to be pretty similar. This is known as a unimodal size distribution. They have after all experienced identical conditions.

You’d be wrong. In fact, it’s more likely that your plants will have separated into two or more size groupings. There will be a set of larger plants, spread apart from one another, and which dominate the newly-formed canopy. In between them will be scattered other plants of smaller size. This results in a bimodal (or multimodal) size distribution. There isn’t a standard, expected size; instead there will be different size classes present.

modes.png

A normal, unimodal distribution of sizes (left) is what you might expect to see when all plants are the same age and growing in the same conditions. In fact it’s more common to see a bimodal size distribution (right), or something even more complicated.

This observation is nothing new. Much was written about the issue from the 1950s through to the 70s, particularly in the context of forest stands. The phenomenon was widely-recognised but remained paradoxical.

I stumbled upon this old literature back in 2010 when I published a small paper based on a birch forest in Kamchatka which showed a clearly bimodal size distribution. I didn’t need to go all the way to Kamchatka to find a stand with this feature; but since I had the data it made sense to use it. I used the spatial pattern of stems to infer that the bimodality was the result of asymmetric competition (i.e. that large trees obtain disproportionately more resources than small trees, which is definitely true in terms of light capture). All the trees were the same age, but the larger stems were spread out, with the smaller stems in the interstices between them. Had the bimodality been the result of environmental drivers we would expect there to be patches of large and small stems, but in fact they were all mixed together.

White birch forest, central Kamchatka

This is the stand of Betula platyphylla with a bimodal size distribution that was described in Eichhorn (2010). If it looks familiar, it’s because the strapline of this blog is a picture of us surveying it. The white lights on the photo aren’t faeries, it’s the reflectance of mosquito wings from the camera flash. So many mosquitoes.

Three things struck me when I was reading the literature. The first was that hardly anyone had thought about multimodal size distributions in cohorts for several decades**. This was a forgotten problem. The second was that the last major review of the phenomenon back in 1987 had concluded that asymmetric competition was the least likely cause — which conflicted with my own conclusions. Finally, I had no difficulty in finding other examples of multimodal size distributions in the literature, but authors kept dismissing them as anomalous. I wasn’t convinced.

Analysing spatial patterns is all well and good but if you want to really demonstrate that a particular process is important, you need to create a model. Enter Jorge Velazquez, who was a post-doc with me at the time but now has a faculty position in Mexico. He built a simple model in which trees occupy fixed positions in space and can only obtain resources from an the area immediately around themselves. Larger trees can obtain resources from a greater area. When two trees are close to one another, their intake areas overlap, leading to competition for resources.

overlap.png

When there are two individual trees (i and j), each of which obtains resources from within a radius proportional to its size m, the overlap is determined by the distance d between them. Within the area of overlap the amount of resources that each receives depends on the degree of asymmetric competition, i.e. how much of an advantage one gets by being larger than the other. This is included in the model as a parameter described below.

This is where asymmetric competition is introduced as a parameter p. When = 0, competition is symmetric, and resources are evenly divided between two trees when their intake areas overlap. When = 1, each tree receives resources in direct proportion to its size  (i.e. a tree that’s twice as large will receive two thirds of the available resources). Increasing makes competition ever more asymmetric, such that the larger competitor receives a greater fraction of the resources being competed for. In nature we expect asymmetric competition to be strong because a taller tree will capture most of the light and leave very little for those beneath it.

We applied the model to data from a set of forest plots from New Zealand which have already been well-studied. Not only did we discover that two thirds of these plots had multimodal size distributions, but also that our model could reproduce them.

We then started running our own thought experiments. What if you changed the starting patterns, making them clustered, random or dispersed? That turned out to have very little effect on size distributions. What about completely regular patterns? That’s when things started to get really interesting.

By testing the model with different patterns we discovered three important things:

  • Asymmetric competition is the only process which consistently causes multimodal size distributions within simulated cohorts of plants. Nothing else we tried worked.
  • Asymmetric competition is the cause, not the consequence of size differences in the population.
  • The separation of modes is determined by the length of time it takes for competition in the cohort to start, which usually reflects the distance between individuals.
  • The number of modes reflects the effective number of competitors that each individual has.

What does all this mean? Given that asymmetric competition is normal for plants, I would argue that we should expect to see multimodal size distributions everywhere. In fact, seeing unimodal size distributions should be a surprise. Don’t believe me? Grab some seeds, give it a go, and tell me if I’m wrong.

You can read our new paper on the subject here. If you can’t get hold of a copy then let me know.


* Luckily this is a thought experiment, because in my garden the usual answer is ‘everything has been eaten by slugs’.

** I should stress here that I’m specifically referring to multimodality in size distributions of equal-aged cohorts. When several generations overlap then the distribution of sizes reflects the ages of the individuals. If multiple species are present this adds additional complications, and in fact size distributions of species across communities have been a hot topic in the literature of late. This is very interesting but a completely different set of processes are at work.

We’re all stupid to someone

I spend an increasing proportion of my time collaborating with engineers and theoretical physicists. It keeps me on my toes and I’ve had to adjust to very different research cultures. The engineers, for example, get particularly excited by designing a technical solution to a problem. The long haul of data collection and statistical analysis has less appeal; once they’ve proven it can be done then they’re itching to move on to the next challenge. Likewise physicists genuinely do spend meetings in front of whiteboards sketching equations, which leaves me feeling a bit frazzled. Nevertheless, I’ve learnt that if an idea can’t be expressed mathematically then it hasn’t been properly defined. That turns out to apply to a lot of verbal models in ecology.

Both engineers and physicists are ready to publish at an earlier stage than most ecologists would, and their papers are a model of efficiency in preparation. Not for them a lengthy waffle of an introduction, followed by an even more prolonged and rambling discussion. Cut to the point, make it clearly, then wrap up. It makes me wonder whether we’re doing something wrong in ecology. I certainly don’t enjoy either reading or writing long papers, and I can’t fully justify our practice.

I also find myself fielding questions or tackling issues that would never come up when chatting to an ecologist. One of the misapprehensions I’ve had to counter is that trees are not lollipops. It might be more computationally efficient to assume that trees are spheres of leaves on a stick, and it can lead to some elegant mathematical solutions, but the outcomes are going to depart from natural systems pretty rapidly. Our disciplinary training leads us to consider particular assumptions to be perfectly reasonable, despite them sounding ridiculous to others or bearing little resemblance to the real world. (Even within their own field, forest ecologists are not immune to this syndrome).

Understanding how another researcher arrived at their assumptions can be informative — sometimes it boils down to analytical frameworks, computational efficiency or technological limitations, all of which are valid reasons to consider accepting a proposition that on first hearing might sound far-fetched. Likewise it helps to have our own assumptions challenged. Sometimes we are able to justify and defend them. Other times they leave us exposed, which is when we know we’re onto something important.

It’s also a sad but common trait within all social groups to mock outsiders for making mistakes about things that appear self-evident to those on the inside. Ecologists can easily play the same game, but make no friends by doing so. I had a chat with one of my collaborators this week who was itching to find a small tree on campus, scan it using ground-based LiDAR, then strip and record the sizes of all its leaves. It’s a perfectly reasonable idea (if a lot of hard work). The main stumbling block is that it’s the middle of February and we’re a good three months at least from having full leaf canopies to play with. An obvious problem? Only to someone who spends their life thinking about trees the whole time. We had a laugh about it then moved back to our simulations, which have the considerable benefit of not shedding their leaves seasonally.

This kind of interaction only makes me wonder what crazy things I’m responsible for coming out with in our meetings. It also makes me grateful to my collaborators for their patience in humouring me, because I’m pretty sure that I come across as an idiot more often than I realise. This to me is the greatest pleasure of interdisciplinary collaborations. We could all spend the rest of our careers treading the same academic paths, publishing in the same journals, and not need to stretch ourselves quite as far. By heading way outside our comfort zones we all end up learning more than we expected to, so long as we don’t mind feeling stupid every now and again (which happens every time I get tangled in algebra). If you’re not willing to be wrong then you’re not willing to learn. And if I end up the subject of an amusing anecdote at a theoretical physics meeting? That’s fine by me. I hope it raises a good laugh. As a wise man once said, ridicule is nothing to be scared of.