Tag Archives: research

Why should anyone care about Ugandan lianas?

Liana_group

The liana team surveying in 2015 (Takuji Usui, Julian Baur and first author Telma Laurentino). Bridget Ogolowa (far left) did not participate in the study. Photo by Line Holm Andersen.

Habent sua fata libelli as the Latin epithet puts it, meaning ‘little books also have their destinies’. I’d like to think that the same is true of papers. Not every scientific publication appears in a major journal, or attracts media attention, or becomes a highly-cited classic. Some, perhaps, are never read again by anyone. This doesn’t mean that publishing them wasn’t valuable. A paper represents a new piece of knowledge or insight that adds to our total understanding of the world. And in some cases its small part in the greater whole is the main reason why it matters.

As an example, our latest paper just came out in African Journal of Ecology, a minor regional journal with an impact factor so small (0.797 in 2017) that in the metric-obsessed world of Higher Education it barely registers. Some would argue that the effort of publishing in such a low-status journal is a waste of time*. Why bother?

In this case, our study — small and limited in scope as it was — adds an important point on the map. Over recent years it has been noted that the abundance of lianas is increasing in South American forests. This process, sometimes known as ‘lianification’, is troubling because lianas can impede the growth of forest trees, or the recovery of forests following disturbance (including logging). At a time when we need forests to capture carbon from the atmosphere, an increase in the abundance of lianas could be exactly what we don’t want.

The causes of this increase in lianas are unknown, and it is also uncertain how widespread the effect might be. The best evidence that it’s happening comes from neotropical forests**, but we can’t be sure whether the same process is occurring in Southeast Asia, or Sri Lanka, or Africa. If the driver is global one, for example a change in the climate (warming, higher carbon dioxide concentrations, or longer dry seasons) then we would expect the same trend to be occurring everywhere. If it’s a purely local effect within South America then it might reflect historical factors, modern disturbance or the particular composition of plant communities.

It’s not just that we don’t know whether lianas are increasing in all parts of the world simultaneously; for most forests we don’t even know how many lianas were there in the first place. We could only find evidence of four published studies of liana abundance in the entirety of Africa, of which two were in secondary or transitional forests. That means only two previous studies on the continent had measured lianas in a primary forest. If we want to monitor change then we first need a starting point.

aje12584-fig-0001-m

Location of our study in in Kanyawara, Kibale National Park, Uganda. Figure 1 in Laurentino et al. (2018).

What did we find? Actually it turns out that liana densities in our forest were quite similar to those seen elsewhere in the world. An average liana basal area of 1.21 m2/ha is well within the range observed in other forests, as are the colonisation rates, with 24% of saplings and 57% of trees having at least one liana growing on them. These figures are unexceptional.

What does this tell us about lianification? To be completely honest, nothing. Or at least not yet. A single survey can’t say anything about whether the abundance of lianas in Africa is increasing, decreasing, or not changing at all. The point is that we now have baseline data from a part of the world where no-one had looked before. On their own these data aren’t particularly interesting. But considering the global context, and the potential for future studies to compare their work with ours, means that we have placed one more small piece in the jigsaw. And for the most part, that’s what science is about.

 

CODA: There’s another story behind this paper, because it came about through the awesome work of the Tropical Biology Association, an educational charity whose aims are capacity-building for ecologists in Africa and exposing ecologists from species-poor northern countries to the diversity and particular challenges of the tropics. Basically they’re fantastic, and I can’t recommend their courses highly enough. The work published here is based on a group project from the 2015 field course in Uganda and represents the first paper by three brilliant post-graduate students, Telma Laurentino, Julian Baur and Takuji Usui, who did all the real work***. That alone justifies publishing it, and I hope it’s only the first output of their scientific careers.


 

* A colleague at a former employer once memorably stated in a staff meeting that any journal with an IF of less than 8 was ‘detritus’. This excluded all but a handful of the most prestigious journals in ecology but was conveniently mid-ranking in his own field.

** Although this might be confounded by other factors — look out for a paper on this hopefully some time in 2019.

*** I also blogged about the liana study at the time here.

How representative of ecology are the top 100 papers?

The publication in Nature Ecology & Evolution of the 100 most important papers in ecology has led, inevitably, to a fierce debate. Several rapid responses are already in review. The main bone of contention has been that not only were the first authors of 98% of the papers male, but the only two papers written by women were relegated to the very bottom of the list. In a generous reading this reflects implicit biases at every stage of their compilation, rather than any malign intent on the part of the authors*, but I’m sure they’ve received plenty of feedback on this oversight.

Pretty soon after it came out, Terry McGlynn on Twitter asked:

If you want a guide to all the essential papers that didn’t make the list, and happen to have been written by women, this thread is a good place to start. I’m not going to fan the flames any further here, but it’s important that this glaring omission remains the headline response. Instead I’m going to respond to another observation:

This pricked up my senses, given that I am also an undergraduate textbook author. In writing the Natural Systems book (published 2016) I made a deliberate attempt to not cite the same things as everyone else, and to emphasise promising directions for the future of the field of ecology. That made me wonder: how many of the 100 most important papers in ecology did I manage to cite? Note that I had no input into the Nature Ecology & Evolution article, and the book only includes references up to the end of 2014, so these form entirely independent samples. Without formally counting, I estimate that I’ve read around 80% of the top 100 papers, and I’m aware of almost all of them.

How many? Only 17/100 papers.** That raw figure disguises some interesting discontinuities within the list. Of the top ten I actually cited six, and a total of nine from the top twenty. This indicates a reasonable amount of agreement on the most important sources. But of the bottom 80 I only managed another eight (10%). This comes from a total of over 800 sources cited in the book.

Why did I cite them? The main reasons:

  • Posing an important question we have since spent a long time trying to answer (Hutchinson 1957, 1959, 1966, Janzen 1967).
  • Defining a new idea which remains relevant (Grinnell 1917, Gleason 1926, Janzen 1970, Connell 1978).
  • Creating a framework which has been elaborated since (MacArthur 1955, MacArthur & Wilson 1963, Tilman 1994, May 1972, Chesson 2000, Leibold et al. 2004, Brown 2004).
  • Reviewing the evidence for an important principle (Tilman 1996).
  • The first empirical demonstration of an important idea (Tilman 1977).

In many cases I have cited the same authors from the top 100 multiple times, but not necessarily for the original or classic piece of work; often it’s a later review or synthesis. This is because I deliberately chose citations that would be most helpful for students or other readers, not always on the basis of precedence.

The aim of this post is not to argue in any way that the authors of the paper were wrong; this is only a reflection of my personal opinion of what matters in the field. Theirs was generated through the insights of 147 journal editors and a panel of 368 scientists from across the discipline, and is therefore a much more genuine representation of what opinion-makers within the field of ecology believe (although there are better ways to conduct such an exercise). Mine is only one voice and certainly not the authoritative one.***

Writing a textbook is something like curating an exhibition at a museum or art gallery. It bestows on the author the responsibility of deciding which pieces to show in order to tell a particular story. Of necessity this becomes a very personal perspective. I’m amused to find that my view of ecology overlaps by only 17% with the leaders in my field.**** That doesn’t make either of us right or wrong, only that we must be looking in very different directions.

As for their aim of creating an essential reading list for post-graduates or those wishing to learn the foundations of the field, here I profoundly disagree. The best way to learn about current practice in ecology is to start with a good core textbook (and there are lots more out there), read recent synthetic reviews, or pick over the introductions of papers in the major journals. In the same way that you don’t need to read Darwin to understand evolutionary theory, or Wallace to understand biogeography, it’s not strictly necessary to read Grinnell, Clements or Gause to get to grips with modern ecology. Fun if you have the time but most people have more important things to do.

One final comment: three of the top ten papers in ecology were written by one man, G. E. Hutchinson. There is no doubt that his work was highly influential, and I agree that these are important papers to read. What I find most interesting though is that all of them are essentially opinion pieces that frame a general research question, but go little further than that. None of them would get published in a modern ecological journal.

Where would you find similar pieces of writing today? On a blog.

 

UPDATE: Dr Kelly Sierra is soliciting suggestions for a more inclusive list. Whether or not you feel that such lists have any inherent value, if we’re going to make them then they should at least represent the full diversity of our scientific community.


* In the comments below, Jeremy Fox points out that this isn’t very well worded, and could be read as a suggestion that I think there was some malign intent. So, to be absolutely clear, I am not suggesting that the authors made a deliberate choice to exclude or devalue papers written by women. If anything this was a sin of omission, not of commission, and we all need to learn from it rather than attribute blame to individuals.

** As an aside, 16 of the 17 were sole-authored papers. Only Leibold et al. (2004), which defined the metacommunity concept, had more than one author.

*** Nor do I think it’s healthy for there to be a voice of authority in ecology, or any other academic field. We make progress through testing every argument or piece of evidence, not by accepting anyone’s word, however senior or trustworthy. If there were an authority figure you can almost guarantee that I would disagree with them.

**** I’m more in line with the recent attempt to define the 100 most important concepts in ecology, although a little peeved that so many people dismissed Allee effects given my recent work on them.

Free software for biologists pt. 5 –operating systems

If you’ve made it this far in the series then you’ll have already explored software for writing, analysing data, preparing figures and creating presentations, many of which are designed explicitly with scientists in mind. You’re clearly interested in learning how to make your computer work better, which is great. If you’re willing to do this then why not take the natural next step and choose an operating system for your computer which is designed with the scientific user in mind?

Put simply, Windows is not the ideal operating system for scientific computing. It takes up an unnecessarily large amount of space on your hard drive, uses your computer’s resources inefficiently, and slows down everything else you’re trying to do*. Ever wondered why you have to keep updating your anti-virus software, and worry about attachments or executable files? It’s because Windows is so large and unwieldy that it’s full of back-doors, loopholes and other vulnerabilities. You are not safe using Windows.**

What should you do? Macs are superior (and pretty), but also expensive, and free software solutions are preferable. The alternative is to install a Linux operating system. If this sounds intimidating, but you own a smartphone, then you may not realise that Android is actually a Linux operating system. Many games consoles such as the PlayStation, along with TVs and other devices, also run on Linux. Do you own a Chromebook? Linux. You’ve probably been a Linux user for some time without realising it.

Linux-Tux-Logo-Vector

I have no idea why the Linux avatar is a penguin. It just is.

If you’re coming out of Windows then you can get an operating system that looks and behaves almost identically. Try the popular Linux Mint or Mageia, which both offer complete desktops with many of the programs listed in earlier posts pre-installed. Mint is based on the Ubuntu distribution, which is another common Linux version, but has a default desktop environment that will take a few days to get used to. The best thing about Ubuntu is that there is a vast support network, and whatever problem you come across, however basic, a quick web search will show you how to resolve it in seconds.

thumb_11

Your Linux Mint desktop could look like this sample image from their website. See, Linux isn’t so intimidating after all.

Unlike Windows, all these distributions are free to download, easy to install, and everything works straight out of the box. Within a week you will be able to do everything you could do on Windows. Within two weeks you will be realising some of the benefits. Like any change, it takes a little time to get used to, but the investment is worth it. There are literally thousands of operating systems, each tailored to a particular group of users or devices. Rather than getting confused by them all, try one of the major distributions first, which offer plenty of support for beginners. Once you know what you need you can seek out an operating system that is specifically tailored for you (or, if you’re really brave, create one).

DistroWatch.com: Put the fun back into computing. Use Linux, BSD._001

Yeah, I know, DistroWatch.com may not look like the most exciting website in the world, but it does contain download links to every Linux OS you could imagine, and many more.

It’s possible to boot many of these distributions from a DVD or even a USB stick. This means you can try them out and see whether they suit before taking the plunge and installing them on your hard drive (remembering to back all your files up first, of course). If it doesn’t work out then take the DVD out of the computer and all will return to normal. An alternative, once you’ve set it up, is VirtualBox, which allows you to run a different distribution inside your existing operating system.

If you have an old computer which appears to have slowed down to a standstill thanks to all the Windows updates and is not capable of running the newer versions, don’t throw it away! This is exactly what manufacturers want you to do, and is why it’s not in their interests to have an efficient operating system. Making your computer obsolete is how they make more money. Try installing one of the smaller operating systems designed for low-powered computers like elementaryOS. You will get years more use out of your old hardware. A really basic OS like Puppy Linux will run even on the most ancient of computers, and if all you need to do are the basics then it might be good enough.

My preferred operating system is Arch, which has an accessible version Manjaro for moderately-experienced users. It’s not recommended for beginners though so try one of the above first. Why bother? Well, there’s an old adage among computer geeks that ‘if it isn’t broken, break it’. You learn a lot by having to build your OS from the ground up, making active decisions about how your computer is going to work, fixing mistakes as you go along. I won’t pretend that it saves time but there is a satisfaction to it***. Even if it means having to remember to update the kernel manually every now and again. One of its best features is the Arch User Repository, which contains a vast array of programs and tools, all a quick download away.

desktop 3_003

Behold the intimidating greyness that I favour on my laptop, mainly to minimise distractions, which is one of the advantages of the OpenBox window manager. Files and links on the desktop just stress me out.

As with every other article in this series, I’ve made it clear that you will need to spend a little time learning to use new tools in order to break out of your comfort zone. In this case there are great resources both online and from more traditional sources, such as the magazine Linux Format, which is written explicitly for the general Linux user in mind. You might outgrow it after a few years but it’s an excellent entry point. If you’re going to spend most of your working life in front of a computer then why not learn to use it properly?

With that, my series is complete. Have I missed something out? Made a catastrophic error? Please let me know in the comments!


* To be fair to Microsoft, Windows 10 is much better in this regard. That said, if you don’t already have it then you’ll need to pay for an upgrade, which is unnecessary when there are free equivalents.

** If you think I’m kidding, and you’re currently on a laptop with an integral camera, read this. Then go away, find something to cover the camera, and come back. You’re also never completely safe on other operating systems, but their baseline security is much better. For the absolutely paranoid (or if you really need privacy and security), try the TAILS OS.

*** Right up until something goes snap when you need it most. For this reason I also have a computer in the office that runs safe, stable Debian, which is valued by many computer users for its reliability. It will always work even when I’ve messed up my main workstation.

Free software for biologists pt. 4 – presentations

This post is going to strike a slightly different note to previous pieces on software tools for writing, handling data and preparing figures. In each of those I emphasised the advantages of breaking away from the default proprietary software shipped with the average PC and exploring bespoke options designed for scientists. In the case of giving talks or lectures, I’m going to argue for the complete opposite position: it’s not so much what you use, but how you use it.

When delivering a talk, the slides that accompany it are visual aids. I’ve emphasised that term because its meaning has been lost through repetition. The key word is aids. The slides are there to support and enhance the understanding of the audience, and to back up what you say. They are not supposed to be the focus of attention. The slides are not your notes*.

What’s more, slides cause problems more often than they dramatically improve a talk. An ideal talk is one where the audience receive the message without anything getting in the way. How many times have you walked out of a conference talk thinking ‘great slides’? Perhaps never. On the other hand, how many times have you seen a perfectly good talk ruined by a distracting display or computing failure?** For me, that’s at least once a session.

With this in mind, I recommend starting to plan a talk with a simple question: do you need to have any slides at all? Yes, I know, I’ve just challenged the default assumption of almost every conference presenter these days. But I’m absolutely serious. Start from the perspective of thinking what you are going to tell the audience, in normal speech, while they look directly at you and listen to what you say. If you can convey all the information you need to  without slides (or by using other visual aids, such as props or exhibits) then there is no obligation to have slides at all.

Next ask yourself what elements would benefit from being presented visually as well. Note that I’m explicitly trying not to write the talk around the slides, but the visual aids around the talk. Once again there might be no need for slides — you could work through equations or models by sketching them on a blackboard. Nevertheless, for certain types of information, slides are the best means to present them. Data figures, photographs, diagrams, maps and so on are going to need to be put up on the big screen. Note that none of these involve much text, if any.

When you start from that perspective, the software you choose to prepare your slides should be the one that permits you to most clearly present your figures without distracting clutter.

example

Slides are there to help the audience understand your points, not to replicate the talk. Only include the bare minimum of text and be prepared to walk your audience through the details.

With this in mind, PowerPoint is fine for producing lecture slides, and easy to use. The main challenge is changing all the default settings to be as plain and simple as possible, and resisting the temptation to use features that only serve to distract the audience from your intended content (animations, background images, sound effects). These should be used sparingly, and only if they improve the transmission of information***. Remember: slides are there to inform, not to entertain. If you don’t want to pay for Powerpoint then the free LibreOffice Impress will do all the same things and serves as a direct replacement.

An online alternative is slides, which adds the neat trick of allowing remote control of presentations from a second computer or your mobile phone. Another choice is reveal.js, which is free for basic users, but if you want to download a copy of the presentation or collaborate with a colleague then a subscription is required. Being willing to write a little code helps too.

 

If you’re using LaTeX then an alternative is the beamer document class. powerdot appears to do the same thing but I’ve never used it. The usual caveat about LaTeX applies — if you’re not already using it for everything then the time investment for presentations alone won’t be worth it. I have also yet to find a way to embed videos directly into slides.

example2

All my slides are prepared in LaTeX using the intridea beamer theme. I like the look of them, but it takes time and expertise to set up. You could achieve something similar with much less effort.

One good reason to move away from Powerpoint or its analogues is frequency-dependent selection. You can stand out from the crowd simply by virtue of using something different. By the end of the first day of a meeting people are already suffering from Powerpoint fatigue, which makes anything else a pleasant relief.

 

To really change style and impress your audience, try Prezi. This is a different way of visualising your talk, and some time investment is required to get it right. As with Powerpoint, there are many tricks and decorations that can be inserted, but which will distract from the information you’re trying to get across. Particularly try to minimise use of the ‘swooping’ movement, which can induce nausea in your audience.

The two main disadvantages to Prezi are that you need to be connected to the internet to use it, and that the free version requires your presentation to be visible online. The first is seldom an issue, the latter only matters if what you’re showing is somehow private or confidential, and if so then why are you presenting?

In general I don’t submit posters at conferences, though there are many good reasons to choose a poster over a talk, and a lot of guidance on how to do it well. I’m not going to repeat this because I have nothing to add, but also because I have no personal experience to draw from, and can’t therefore recommend any particular software.


* This is true for most public, professional presentations. Lectures for undergraduate students are a different matter though, at least within my experience. Many students now assume that the slides are the notes, and expect to be able to reconstruct the material from these alone. Some lecturers provide printouts of slides as their handouts. You can debate whether this means you should include more material on your slides to serve this function, or make a stand, expect students to take their own notes, and risk complaints.

** Many years ago — long enough for the scars to have healed — a collaborator of mine presented her work at a major international conference. It was a hot topic, and the theatre was packed. We had gone through the talk together the previous night on her laptop and I’d not seen any problems. But on the day it turned into a nightmare. For some unknown reason, every animation (in Powerpoint terms, that means lines or other elements appearing on the screen) was accompanied by a sound effect. Distorted by the conference room speakers it was transformed into something akin to the bellow of a caged animal. This happened every time she clicked, all the way through the talk. Even worse, none of the videos worked. Her evident mortification was met by the awkward, sympathetic unease of the audience. Everyone remembered that talk, though not for the right reasons.

*** A good general rule is: can I save it as a pdf file with no loss of features? If you can then do; not only are they smaller, but they’re more stable, and guaranteed to look identical on whatever computer you need to use. If there are features that would be lost then think carefully about whether you really need them.

Barnacles are much like trees

I am not a forest ecologist. OK, that’s not entirely true, as demonstrated by the strapline of this blog and the evidence on my research page. Nevertheless, having published papers on entomology, theoretical ecology and snail behaviour (that’s completely true), I’m not just a forest ecologist. Having now published a paper on barnacles, one could suspect that I’m having an identity crisis.

When a biologist is asked what they work on, the answer often depends on the audience. On the corridor that hosts my office, neighbouring colleagues might tell a generally-interested party that they work on spiders, snails, hoverflies or stickleback. Likewise, I usually tell people that I work on forests. When talking to a fellow ecologist, however, the answer is completely different, as it would be for every one of the colleagues mentioned above*.

If you walked up to me at a conference, or met me at a seminar, I would probably say that I work on spatial self-organisation in natural systems. If you were likely to be a mathematician or physicist** then I’d probably claim to study the emergent properties of spatially-structured systems. I might follow this up by saying that I’m mostly concerned with trees, but that would be a secondary point.

What I and all my colleagues have in common is that we are primarily interested in a question. The study organism is a means to an end. We might love the organism in question, rear them in our labs, grow them in our glasshouses, spend weeks catching or watching them in the field, learn the fine details of their taxonomy, or even collect them as a hobby… but in the end it is the fundamental question that drives our work. The general field of study always takes priority when describing your work to a fellow scientist.

appendix1

Behold the high-tech equipment used to survey barnacles. This is the kind of methodology a forest ecologist can really get behind.

The work on barnacles was done by a brilliant undergraduate student, Beki Hooper, for her final-year project***. The starting point was the theory of spatial interactions among organisms most clearly set out by Iain Couzin in this paper****. His basic argument is that organisms often interact negatively at short distances: they compete for food, or territorial space, or just bump into one another. On the other hand, interactions at longer ranges are often positive: organisms are better protected against predators, able to communicate with one another, and can receive all the benefits of being in a herd. Individuals that get too close to one another will move apart, but isolated individuals will move closer to their nearest neighbour. At some distance the trade-off between these forces will result in the maximum benefit.

Iain’s paper was all about vertebrates, and his main interest has been in the formation of shoals of fish or herds of animals (including humans). I’m interested in sessile species, in other words those that don’t move. Can we apply the same principles? I would argue that we can, and in fact, I’ve already applied the same ideas to trees.

What about barnacles? They’re interesting organisms because, although they don’t move as adults, to some extent they get to choose where they settle. Their larvae drift in ocean currents until they reach a suitable rock surface to which they can cling. They then crawl around and decide whether they can find a good spot to fix themselves. It’s a commitment that lasts a lifetime; get it wrong, and that might not be a long life.

If you know one thing about barnacles, it’s probably that they have enormously long penises for their size. Many species, including acorn barnacles, require physical contact with another individual to reproduce. This places an immediate spatial constraint on their settlement behaviour. More than 2.5 cm from another individual and they can’t mate; this is potentially disastrous. Previous studies have focussed on settling rules based on this proximity principle. They will also benefit from protection from exposure or predators.  On the other hand, settle too close to another barnacle and you run the risk of being crushed, pushed off the rock, or having to compete for other resources.

theory

Barnacles can be expected to interact negatively at short distances, but positively at slightly longer distances. This disparity in the ranges of interactions gives rise to the observed patterning of barnacles in nature.

 

What Beki found was that barnacles are most commonly found just beyond the point at which two barnacles would come into direct contact. They cluster as close as they possibly can, even to the point of touching, and even though this will have the side effect of restricting their growth.

Furthermore, Beki found that dead barnacles had more neighbours at that distance than would be expected by chance, and that particularly crowded patches had more dead barnacles in them. There is evidence that this pattern is structured by a trade-off between barnacles wanting to be close together, but not too close.

1a_all

On the left, the pattern of barnacles in a 20 cm quadrat. On the right, the weighted probability of finding another barnacle at increasing distance from any individual. A random pattern would have a value of 1. This shows that at short distances (less than 0.30 cm) you’re very unlikely to find another barnacle, but the most frequent distance is 0.36 cm. Where it crosses the line at 1 is where the benefits of being close exceed the costs.

Hence the title of our paper: too close for comfort. Barnacles deliberately choose to settle near to neighbours, even though this carries risks of being crowded out. The pattern we found was exactly that which would be expected if Iain Couzin’s model of interaction zones were determining the choices made by barnacles.

When trees disperse their seeds, they don’t get to decide where they land, they just have to put up with it. The patterns we see in tree distributions therefore reflect the mortality that takes place as they grow and compete with one another. This is also likely to take place in barnacles, but the interesting difference lies in the early decision by the larvae about where they settle.

Where do we go from here? I’m now developing barnacles as an alternative to trees for studying self-organisation in nature. The main benefit is that their life cycles are much shorter than trees, which means we can track the dynamics year-by-year. For trees this might take lifetimes. We can also scrape barnacles off rocks and see how the patterns actually assemble in real time. Clearing patches of forests for ecological research is generally frowned upon. The next step, working with Maria Dornelas at St. Andrews, will be to look at what happens when you have more than one species of barnacle. Ultimately we’re hoping to test these models of how spatial interactions can allow species to coexist. Cool, right?

The final message though is that as an ecologist you are defined by the question you work on rather than the study organism. If barnacles turn out to be a better study system for experimental tests then I can learn from them, and ultimately they might teach me to understand my forests a little bit better.


 

* Respectively: Sara Goodacre studies the effects of long-range dispersal on population genetics; Angus Davison the genetic mechanisms underpinning snail chirality; Francis Gilbert the evolution of imperfect mimicry; Andrew MacColl works on host-parasite coevolution. I have awesome colleagues.

** I’ve just had an abstract accepted for a maths conference, which will be a first for me, and slightly terrifying. I’ve given talks in mathematics departments before but this is an entirely new experience.

*** Beki is now an MSc student on the Erasmus+ program in Evolutionary Biology (MEME). Look out for her name, she’s going to have a great research career. Although I suspect that it won’t involve barnacles again.

**** Iain and I once shared a department at Leeds, many years ago. He’s now at Princeton. I’m in the East Midlands. I’m not complaining…

Free software for biologists pt. 3 – preparing figures

So far we’ve looked at software tools for handing and analysing data and for writing. Now it’s time to turn to the issue of making figures.

Early in my career, I wish someone had taken me to one side and explained just how important figures are. Too often I see students fretting over the text, reading endless reams of publications out of concern that they haven’t cited enough, or cited the right things. Or fine-tuning their statistical analyses far beyond the point at which it makes any meaningful difference. And yet when it comes to the figures, they slap something together using default formatting, almost as an afterthought.

Having recently written a textbook (shameless plug), it has only brought home to me how crucial figures are to whether your work will get used and cited*. The entry criterion for a study being used in a book isn’t necessarily the quality of science, volume of data or clarity of expression, though I would argue that all of these are high in the best papers. What really sets a paper apart is its figures. Most of us, when we read papers, look at the pictures, and often make a snap judgement based on those. If the figures are no good then the chances of anyone wading through your prose to pick out the gems of insight will be substantially reduced.

Here then is a useful rule of thumb: you should spend at least one working day preparing each figure in a manuscript. That’s after collecting and analysing the data, and after doing a first-pass inspection of the output. A whole day just fine-tuning and making sure that each final figure is carefully and concisely constructed. You might not do it all in one sitting; you may spend 75% of the time trying out multiple formats before settling on the best one. All this is time well spent. And if you’re going to put the time into preparing them then you should look into bespoke software that will improve the eventual output.

xkcd1945

Easy to use does not mean good quality! Comic by XKCD.

Presenting statistical outputs

If you’ve been following this series of posts then it will come as no shock that I don’t recommend any of Microsoft’s products for scientific data presentation. The default options for figures in Excel are designed for business users and are unsuitable for academic publication. Trying to reformat an Excel figure so that it is of the required quality is a long task, and one that has to be repeated from scratch every time**. Then saving it in the right format for most journals (a .tiff or .eps file) is even less straightforward. As an intermediate option, and for those who wish to remain in Excel, Daniel’s XL plugin is a set of tools for analysis and presentation that improve its functionality for scientists.

Needless to say, this is all easier in R with a few commands and, once you’ve figured it out, you can tweak and repeat with minimal effort (the ggplot2 package is especially good). The additional investment in learning R will be rewarded. In fact, I’d go so far as to say that R is worth the effort for preparing figures alone. No commercial product will offer the same versatility and quality.

foliage2

Here’s one I made earlier, showing foliage profiles in 40 woodlands across the UK. Try creating that in Excel.

One of the reasons I recommend ggplot2 is that it is designed to follow the principles of data presentation outlined in Edward Tufte’s seminal book The Visual Display of Quantitative Information. It’s one of those books that people get evangelical about. It will change the way you think about presenting data, and forms the basis for the better scientific graphing tools.

visual-display-quantitative-information-tufte

What do you mean you haven’t read it? OK, you don’t have to, but it will convince you that data can be aesthetically pleasing as well as functional.

If you’re not an R user then a good alternative is the trusty gnuplot. Older readers can be forgiven for shedding a nostalgic tear, as this is one of the ancient software tools from the pre-internet age, having been around for about 30 years. It lives on, and has been continually maintained and developed, making it just as useful today as it was then.

A colleague pointed me towards D3.js, which is a JavaScript library that manipulates documents based on data input. I haven’t played with it but it might be an option for those who want to quickly generate standardised and reproducible reports.

Finally, if your main aim is to plot equations, then Octave is a free alternative to the commercial standard MATLAB. Only the most mathematical of biologists will want to use this though.

Diagrams

Some people try to produce diagrams using PowerPoint. No. Don’t do it. They will invariably look rubbish and unprofessional.

For drawing scientific diagrams, the class-leader is the fearsomely expensive Adobe Illustrator. Don’t even consider paying for your own license though because the free Inkscape will do almost everything you’ll ever need, unless you’re a professional graphic designer, in which case someone else is paying. Another free option is sK1 which has even more technical features should you need them. Xara Xtreme may have an awful name but it’s in active development and looks very promising. It’s also worth mentioning LibreOffice Draw, which comes as part of the standard LibreOffice installation.

One interesting tool I’m itching to try is Fiziko, which is a MetaPost script for preparing black-and-white illustrations for textbooks which mimic the appearance of blocky woodcuts or ink drawings. It looks like some effort and experience is required to use it though.

Image editing

The expensive commercial option is Photoshop, which is so ubiquitous that it has even become its own verb. For most users the free GIMP program will do everything they desire. I also sometimes use ImageMagick for image transformation, but mostly the command-line tool sam2p. Metadata attached to image files can be read and edited with ExifTool.

A common task in manuscripts is to create a simplified vector image, perhaps using a photo as a template. You might need to draw a map, show the structure of an organ or demonstrate an animal’s behaviour. For this there are specialist tools like Blender, Cheetah3D for Mac users or Google’s SketchUp, though the latter only offers a limited version for free download. Incidentally, never use a pixel art program (like Photoshop) to trace an image. All you end up with is a simplified pixel image of the original, which looks terrible. Plus you’ve paid for Photoshop.

For the rather specialised task of cropping and assembling documents from pdf files, briss might be an ancient piece of software but it’s still the go-to application.

Preparing outline maps (e.g. of study sites) is a common task and an expensive platform like ArcGIS is unnecessary. Luckily the free qGIS is almost as good and improving rapidly. There’s a guide to preparing maps here.

anglesey

A map showing the study site in a forthcoming paper (Hooper & Eichhorn 2016) and prepared by Jon Moore in qGIS.

There are countless programs out there for sorting, handling and viewing photographs (e.g. digiKam, Shotwell). Not being much of a photographer I’m not a connoisseur.

Flowcharts

Flowcharts, organisational diagrams and other images with connected elements can be created in LibreOffice Draw. I’ve not used it for this though, and therefore can’t compare it effectively to commercial options like OmniGraffle, which is good but expensive for something you might not be doing regularly. A LaTeX-based option such as TikZ is my usual choice, and infinitely better than spending ages trying to get boxes to snap to a grid in Powerpoint. If you’re not planning to put the time into learning LaTeX then this is no help, but add it to the reasons why you might. If anyone knows of a particularly good FOSS solution to this issue then please add in the comments and I will update.

pdfmaker

I made this in TikZ to illustrate the publication process for my MSci class in research skills. I won’t lie, it took a long time (even as a LaTeX obsessive), and I’d like to find a more efficient means of creating these figures.

Animations

This is one task that R makes very easy. Take the output of a script that creates multiple PNG files from a loop and bundle them into an animation using QuickTime or the very straightforward FFmpeg. For something that looks so impressive, especially in a presentation, it’s surprisingly easy to do.

Collecting data

To collect data from images ImageJ is by far the best program, largely due to the immense number of specialist plug-ins. Some of these have been collected into a spin-off called Fiji, which provides a great set of tools for biologists. Whatever you need to do, someone has almost certainly written a plug-in for it. Note that R can also collect data from images and even interfaces with ImageMagick via the EBimage package. Load JPEGs with the ReadImage package and TIFF files with rtiff.

A common task if you’re redrawing figures, or preparing a meta-analysis, is to extract data from figures. This is especially common when trying to obtain data from papers published before the digital age, or when the authors haven’t put their original data online. For this, Engauge will serve your needs.

Next time: how to prepare presentations!


* At some point in the pre-digital age, maybe in the 90s, I recall an opinion piece by one textbook author making exactly this point. Was it Lawton, Krebs, Southwood… I really can’t remember. If anyone can point me in the right direction then I’d be grateful because I can’t track it down.

** I did overhear one very prominent ecologist declare only half-jokingly that they stopped listening to talks if they saw someone present an Excel figure because it indicated that the speaker didn’t know what they were doing. Obviously I wouldn’t advocate such an extreme position, but using Excel does send a signal, and it’s not a good one.

Free software for biologists pt. 2 – data management and analysis

This is the second part of a five-part series, collated here. Having covered writing tools in the last post, this time I’m focussing on creating something to write about.

Data management

Let’s assume that you’ve been out, conducted experiments or sampling regimes, and returned after much effort with a mountain of data. As scientists we invest much thought into how best to collect reliable data, and also in how to effectively analyse it. The intermediate stage — arranging, cleaning and processing the data — is often overlooked. Yet this can sometimes take as long as collecting the data in the first place, and specialist tools exist to make your life easier.

I’m not going to dwell here on good practices for data management; for that there’s an excellent guide produced by the British Ecological Society which says more than I could. The principles of data organisation are well covered in this paper by Hadley Wickham. Both are on the essential reading list for students in my group, and I’d recommend them to anyone. Instead my focus here is on the tools you can use to do it.

The familiar Microsoft Excel is fine for small datasets, but struggles with large spreadsheets, and if you’ve ever tried to load a sizeable amount of data into it then you’ll know that you might as well go away to make a cup of tea, come back and hope it hasn’t crashed. This is a problem with Excel, not your data. Incidentally, LibreOffice Calc is the free substitute for Excel if you want a straight replacement. Don’t even consider using either of them to do statistics or draw figures (on which there will be more next time). I consider this computational limitation more than enough reason to look elsewhere, even though there are many official and unofficial plug-ins which extend Excel’s capabilities. Excel can also reformat your data without you knowing about it.

One of the main functionalities lacking in Excel is a way to use GREP. Regular Expressions are powerful search terms that allow you to screen data, check for errors and fix problems. Learning how to use them properly will save all the time you used to spend scrolling through datasheets looking for problems until your mind went numb. Proper text editors allow this functionality. Personally I use jEdit to manage my data, which is available free for all operating systems. Learning to parse a .csv or .txt file that isn’t in a conventional box-format spreadsheet takes a little time but soon becomes routine.

For larger, linked databases, Microsoft Access used to be the class-leader. The later versions have compromised functionality for accessibility, leading many people to seek alternatives. Databases are compiled using SQL (Structured Query Language), and learning to use Access compels you to pick up the basics of this anyway. Given this, starting with a free alternative is no more difficult. I have always found MySQL to be easy and straightforward, but some colleagues strongly recommend SQLite. It might not have all the same functions of the larger database tools but most users won’t notice the difference. Most importantly, a database in SQL format can be transferred between any of these software tools with no loss of function.  Migrating into (or out of) Access is trickier.

As a general rule, your data management software should be used for that alone. The criterion for choosing what software to use is that it should allow you to clean your data and load it into an analysis platform as quickly and easily as possible. Don’t waste time producing summaries, figures or reports when this can be done more efficiently using proper tools.

Data analysis

These days no-one looks further than R. As a working environment it’s the ideal way to load and inspect data, carry out statistical tests, and produce publication-quality figures. Many people — including myself — do pretty much all their data processing, analysis and visualisation in R*.

It’s interesting to note just how rapidly the landscape has changed. As an undergraduate in the 90s we were taught using Minitab. For my PhD I did all my statistics in SPSS, then as a post-doc I transitioned to GenStat. All are perfectly decent, serviceable solutions for basic statistical analyses. Each has its limitations but moving between them isn’t difficult.

I won’t hide the simple truth — learning R is hard, especially if you have no experience of programming. Why then did I bother? The simple answer is that R can do everything that all the above programs can do, and more. It’s also more efficient, reproducible and adaptable. Once you have the code to do a particular set of analyses you can tweak, amend and reapply at will. Never again do you have to work through a lengthy menu, drag-and-drop variables, tick the right boxes and remember the exact sequence for next time. Once a piece of code is written, you keep it.

If you’re struggling then there are loads of websites providing advice to all levels from beginners to experienced statistical programmers. It’s also worth looking at the excellent books by Alain Zuur which I can’t recommend highly enough. If you have a problem then a quick internet search will usually retrieve an answer in no time, while the mailing lists are filled with incredibly helpful people**. The other great thing about R is that it’s free***.

One word of warning is to not dive too deep at the beginning. Start by replicating analyses you’re already familiar with, perhaps from previous papers. The Quick-R page is a good entry point. A bad (but common) way of beginning with R is to be told that you need to use a particular analytical approach, and that R is the only way to do it. This way leads at best to frustration, at worst to errors. If someone tells you to use approximate Bayesian inference via integrated nested Laplace approximation, then you can do it with the R-INLA package. The responsibility is still on you to know what you’re doing though; don’t expect someone to hold your hand.

Because R is a language rather than a program, the default environment isn’t very easy to work in, and you’re much better using another program to interface with R. By far the most widely-used is RStudio, and it’s the one I recommend to my own post-graduate students. It will improve your R coding experience immensely. Some programmers use it for almost everything. An alternative is Tinn-R, which I used to use, but gave up on a few years ago because it was too buggy. It may have improved now so by all means try it out. If you’re desperate for a familiar-looking graphical user interface with menus then R Commander provides one, but I recommend using this as a gateway to learning more (or teaching students) rather than a long-term solution.

I’m a bit old-fashioned and prefer to use a traditional text editor to work in R. My choice, for historical reasons, is eMacs, which links neatly to R through ESS. The other tribe of programmers use Vim with the sensibly-named Vim-R-plugin, and we shall speak no more of them. If you’re already a programmer then you know about these, and can be assured that you can code in R just as easily. If not then stick to Rstudio, which is much easier. I also often use Geany as a tool for making quick edits to scripts.

Most of all, don’t type directly into R, it’s a recipe for disaster, and removes the greatest advantage which is its reproducibility. Likewise don’t keep a Word document open with R commands while continually copy-and-pasting them over. I’ve seen many students doing this, and it’s only recommended if you want to speed the onset of repetitive strain injury. Word will also keep reformatting and autocorrecting your text, introducing many errors. Use a proper editor and it’s done in one click.

One issue with R that more experienced users will come across is that it is relatively slow at processing very large datasets or large numbers of files. This is a problem that relatively few users will encounter, and by that point most will be competent programmers. In these cases it’s worth learning one of the major programming languages for file handling. Python is the easiest to pick up, for which Rosalind provides a nice series of scaled problems for learning and teaching (albeit with a bioinformatics focus). Serious programmers will know of or already use C, which is more widespread and has greater power. Finding out how to use a Bash shell efficiently is also immensely helpful. Learning to program in these other languages will open many doors, including to alternative careers, but is not essential for most people.

As a final aside, there is a recent attempt to link the power of C with the statistical capabilities of R in a new programming language called Julia. This is still in early development but is worth keeping an eye on if statistical programming is likely to become a major feature of your research.

Specialist software tools

Almost everything can be done in R, and those that can’t already, can be programmed. That said, there are some bespoke free software tools that are worth mentioning as they can be of great use to ecologists. They’re also valuable for those who prefer a GUI (Graphical User Interface) and aren’t ready to move over to a command-line tool just yet. Where I know of them, I’ve mentioned the leading R packages too.

Diversity statistics — the majority of people now use the vegan package in R. Outside R, the most widely-used free tool for diversity analysis is EstimateS. Much of the same functionality is contained in SPADE, written by Anne Chao (who has a number of other free programs on her website). I’ve always found the latter to be a little buggy, but it’s also reliably updated with the very latest methods. It has more recently been converted into an R package, spadeR, which has an accessible webpage that will do all the analyses for you. As a final mention, there is good commercial software available from Pisces Conservation, but apart from a cleaner-looking interface I’ve never seen any advantage to using it.

GIS — I’ll be returning to the issue of making maps in a later post, but will mention here that a direct replacement for the expensive ArcGIS is the free qGIS. I’ve never found any functionality lacking, but I’m not a serious GIS user either. There are a plethora of R packages which in combination cover the same range of functions but I wouldn’t like to make recommendations.

MacroecologySAM (for Spatial Analysis in Macroecology) is a useful tool for quickly loading and inspecting patterns in spatial ecological data. I would personally still move into R for publication-grade analyses, but this can be a helpful stepping stone when exploring a new dataset.

Null models — these can be very useful in community ecology. The only time I’ve done this, I used the free version of EcoSim. I see that you now have to pay for the full version, so if someone can recommend a comparable R package in the comments then I’ll update this accordingly.

I’m happy to extend this list with further recommendations; please drop a note in the comments.

Further reading

Practical Computing for Biologists is a great book. A little knowledge goes a long way, and learning how to use the shell, regular expressions and a small amount of Python will soon reap dividends for your research, whatever stage you’re at.


* The most mathematically-inclined biologists might hanker after something more like MATLAB, for which a direct free replacement is GNU Octave. You can even transfer MATLAB programs across, although there are some minor differences in the language.

** Normal forum protocol applies here, which is that you shouldn’t ask a question to which you could reasonably have found an answer by searching for yourself. If you ask a stupid question that implies no effort on your part then you can expect a curt answer (or none at all).  That said, if you really can’t work something out then it’s well worth bringing up because you might be the first person to spot an issue. If your problem is an interesting one then often you’ll find yourself receiving support from some of the top names in the field, so long as you are willing to learn and engage. Please read the posting guide before you start.

*** A few years ago a graduate student declined my advice to use R, declaring in my office that if R was so good, someone would be charging for it. I was taken aback, perhaps because I take the logic of Free Open-Source Software for granted. If you’re unsure, then the main benefit is that it’s free to obtain and modify the original code. This means that someone has almost certainly created a specific tool to meet your research needs. Proprietary commercial software is aimed at the market and the average user, whereas open-source software can be tweaked and modified. The reason R is so powerful is that it’s used by so many people, many of whom are actively developing new tools and bringing them directly to your computer. Often these will be published in Journal of Statistical Software or more recently Methods in Ecology and Evolution.

 

Free software for biologists pt. 1 – writing tools

This is the first in a planned series of five posts, to cover (1) writing tools, (2) data management and analysis, (3) preparing figures, (4) writing presentations and (5) choosing a new operating system. They will eventually be collated here.

Document-writing tools

Microsoft Word remains the default word processing software for the majority of people. Its advantage is exactly that, which makes collaboration relatively straightforward. The track changes function is appreciated by many people, though I would argue it’s unnecessary and can lead to problems; see below for tips on collaborative writing.

If you’re going to be spending a large proportion of your life writing then Word is not the ideal solution, especially for scientists. On this point it’s worth making clear that `scientist’ is just another word for `writer’. We write constantly — papers, grant proposals, lecture notes, articles and books. Professional writers use other commercial software such as Scrivener; this however is just paying for something different. Microsoft Word has improved in recent years, but there are still problems. The main limitations are:

  • It’s terrible at handling large documents (e.g. theses, or anything more than a couple of pages). Do you really need to do all that scrolling?
  • Including equations or mathematical script is difficult and always looks poor quality.
  • Embedded images are reproduced at low resolution.
  • Files are unnecessarily large in size.
  • The .docx format is very unstable. Send it to a collaborator on another computer (even with Windows) and it will appear different, with mangled formatting.
  • The default appearance doesn’t look very professional, and improving it takes forever.
  • It keeps reformatting everything as you go along, particularly when you combine sections from different documents.

I didn’t realise how much time was spent fighting Word’s defaults until I tried other software. Escaping isn’t tricky, as this blog post reveals. Several options are available to the scientific writer, and will improve both the quality and the experience of writing.

LibreOffice Writer. Want something that looks exactly like Microsoft Word, does everything that Word does, but don’t fancy paying for it? Just download LibreOffice and you’ll find it works equally well (if not better). This is perhaps the best option if you have an out-of-date or bootlegged version of Word and can’t access updates. With LibreOffice you will be able to open, edit and share all of your existing Word documents, and even save them in .doc format. The native format is .odt (for open document text). This is recommended as a stable document format by the British Government, which tells you something. Your Word-using colleagues will be able to open them as well.

Markdown. This has grown in popularity with scientists as it’s easier to use than professional tools such as LaTeX (see below) but provides many of the document-formatting tasks that scientists need. You can even write Markdown scripts in Word, but why would you. Combining it with pandoc makes it even more powerful because you can convert a Markdown template into any other format to match the requirements of a journal (or your collaborators). This is much easier to do than with LaTeX, which requires some programming nous. A good, free Markdown editor is Retext.

LaTeX. The gold standard, as used by many professional writers and editors (it’s pronounced lay-tech; the final letter is a chi). All my handouts are prepared in LaTeX, as are my presentations, manuscripts, in fact pretty much everything I write apart from e-mails. The problem is that learning LaTeX takes time. Most word processor programs run on the principle of WYSIWYG (What You See Is What You Get), whereas in LaTeX you need to explicitly state the formatting as you go along.

There are a number of gateway programs which allow you to write in LaTeX but with a more familiar writing environment. These therefore ease the transition and can show you the potential. I know many people who swear by LyX. My preferred editor is Kile, though this will involve a steeper learning curve. A great help while writing in LaTeX is to be able to see what the document looks like as you write. I pair Kile with Okular, but there are many other options that are equally good.

As a health warning, before diving into the deep end, bear in mind that working in LaTeX will initially be much slower. It takes time to become competent, and there are annoying side issues that remain frustrating (installing new fonts, for example, is bizarrely complex). While the majority of journals and publishers accept LaTeX submissions, and most will provide a template to format your manuscripts, there are still a few who require .doc format. This is changing though due to demand on the part of authors.

Collaborative writing

In the old days, when you collaborated on writing a paper, it required dozens of e-mails to be sent round as each author added her comments. Version control became impossible as soon as there were multiple copies and it was easy to lose track. Some people persist in working this way despite the fact that there are loads of tools that make this unnecessary. By using an online collaborative-writing site, multiple authors can contribute simultaneously, and you can even chat to each other while you’re at it.

The best-known is of course Google Docs which has the virtue of a familiar interface. It’s not designed for scientific writing though, and unsurprisingly there are more specific tools out there. While I’ve not used it, Fidus Writer looks like a promising option with a familiar layout to Google Docs but more better suited to the demands of science writing.

The one I’ve used most often is Authorea, which has the major advantage that anyone can write in any style and on any platform. This means that one person can write the technical parts in LaTeX while another adds sections Markdown, or you can cut-and-paste text from a normal word processor. The final document can be exported in your format of choice. This solves the problem of having all your collaborators needing to use the same software. My favoured option (for LaTeX users only) is shareLaTeX, though writeLaTeX looks to be equally good.

I haven’t mentioned GitHub here, even though I know many people who use it to maintain version history in collaborative work. This is particularly true of programmers who need to trace changes in code as it’s being developed. The same functionality can be very helpful in writing manuscripts, but using GitHub is not easy to use and it’s rare in biology that you will find yourself working with a pool of collaborators who know what they’re doing.

As a final note, I discourage the use of tracked changes due to many bad experiences. The main issue is that once more than one person has commented on a document it gets completely mangled, and it can take a long time to reconstruct the flow of the text once all the contradictory changes have been accepted. Furthermore, if your reason for having a WYSIWYG processor is that you want to see how the final document will look, then tracked changes remove that benefit and make your document unreadable. Lastly, whenever I’ve been forced into using them (in one notable occasion by a journal editor) it has invariably introduced errors into the text. By using some of the software recommended here there should be no need for the track changes function at all.

References and citations

The old standard for reference management used to be Endnote, which is an expensive solution if you don’t have either an institutional license or a student discount. Much the same can be said of Papers, which I hear is excellent but have never used.

I strongly recommend Mendeley to all my students. Think of it as iTunes for papers. It’s free and integrates smoothly with all the word processing software above. Even better is the online functionality which means you can synchronise documents across all your devices, including a commenting function, and share with colleagues. So you can read a PDF on the train, make notes on it, then open your office computer and retrieve all the notes straight away before dropping the citation directly into your manuscript. There are many tutorials online and the few hours you spend learning to use it will be rewarded by much time saved. Apparently Zotero, which is also free, offers similar functionality, but I’ve not tried it.

Having said all that, I don’t use Mendeley. If you’re using LaTeX then citing references is done through BibTeX, and I prefer kBibTeX to manage my reference library as it integrates nicely with Kile. This is only a personal choice though, and Mendeley would achieve the same result.

 

In praise of backwards thinking

What is science? This is a favourite opening gambit of some external examiners in viva voce examinations. PhD students, be warned! Imagine yourself in that position, caught off-guard, expected to produce some pithy epithet that somehow encompasses exactly what it is that we do.

It’s likely that in such a situation most of us would jabber something regarding the standard narrative progression from observation to hypothesis then testing through experimentation. We may even mumble about the need for statistical analysis of data to test whether the outcome differs from a reasonable null hypothesis. This is, after all, the sine qua non of scientific enquiry, and we’re all aware of such pronouncements on the correct way to do science, or at least some garbled approximation of them.* It’s the model followed by multiple textbooks aimed at biology students.

Pause and think about this in a little more depth. How many great advances in ecology, or how many publications on your own CV, have come through that route? Maybe some, and if so then well done, but many people will recognise the following routes:

  • You stumble upon a fantastic data repository. It takes you a little while to work out what to do with it (there must be something…) but eventually an idea springs to mind. It might even be your own data — this paper of mine only came about because I was learning about a new statistical technique and remembered that I still had some old data to play with.
  • In an experiment designed to test something entirely different, you spot a serendipitous pattern that suggests something more interesting. Tossing away your original idea, you analyse the data with another question in mind.
  • After years of monitoring an ecological community, you commence descriptive analyses with the aim of getting something out of it. It takes time to work out what’s going on, but on the basis of this you come up with some retrospective hypotheses as to what might have happened.

Are any of these bad ways to do science, or are they just realistic? Purists may object, but I would say that all of these are perfectly valid and can lead to excellent research. Why is it then that, when writing up our manuscripts, we feel obliged — or are compelled — to contort our work into a fantasy in which we had the prescience to sense the outcome before we even began?

We maintain this stance despite the fact that most major advances in science have not proceeded through this route. We need to recognise that descriptive science is both valid and necessary. Parameter estimation and refinement often has more impact than testing a daring new hypothesis. I for one am entranced by a simple question: over what range do individual forest trees compete with one another? The question is one that can only be answered with an empirical value. To quote a favourite passage from a review:

“Biology is pervaded by the mistaken idea that the formulation of qualitative hypotheses, which can be resolved in a discrete unequivocal way, is the benchmark of incisive scientific thinking. We should embrace the idea that important biological answers truly come in a quantitative form and that parameter estimation from data is as important an activity in biology as it is in the other sciences.”Brookfield (2010)

Picture 212

Over what distance do these Betula ermanii trees in Kamchatka compete with one another? I reckon around three metres but it’s not straightforward to work that out. That’s me on the far left, employing the most high-tech equipment available.

It might appear that I’m creating a straw man of scientific maxims, but I’m basing this rant on tenets I have received from reviewers of manuscripts, grant applications or been given as advice in person. Here are some things I’ve been told repeatedly:

  • Hypotheses should precede data collection. We all know this is nonsense. Take, for example, the global forest plot network established by the Center For Tropical Forest Science (CTFS). When Steve Hubbell and Robin Foster set up the first 50 ha plot on Barro Colorado Island, they did it because they needed data. The plots have led to many discoveries, with new papers coming out continuously. Much the same could be said of other fields, such as genome mapping. It would be absurd to claim that all the hypotheses should have been known at the start. Many people would refine this to say that the hypothesis should precede data analyses (as in most of macroecology) but that’s still not the way that our papers are structured.
  • Observations are not as powerful as experiments. This view is perhaps shifting with the acknowledgement that sophisticated methods of inference can strip patterns from detailed observations. For example, this nice paper using Bayesian analyses of a global dataset of tropical forests to discern the relationship between wood density and tree mortality. Ecologists frequently complain that there isn’t enough funding for long-term or large-scale datasets to be produced; we need to demonstrate that they are just as valuable as experiments, and recognising the importance of post-hoc explanations is an essential part of making this case. Perfect experimental design isn’t the ideal metric of scientific quality either; even weak experiments can yield interesting findings if interpreted appropriately.
  • Every good study should be a hypothesis test. We need to get over this idea. Many of the major questions in ecology are not hypothesis tests.** Over what horizontal scales do plants interact? To my mind the best element of this paper by Nicolas Barbier was that they determined the answer for desert shrubs empirically, by digging them up. If he’d tried to publish using that as the main focus, I doubt it would have made it into a top ecological journal. Yet that was the real, lasting contribution.

Still wondering what to say when the examiner turns to you and asks what science is? My answer would be: whatever gets you to an answer to the question at hand. I recommend reading up on the anarchistic model of science advocated by Paul Feyerabend. That’ll make your examiner pause for thought.


* What I’ve written is definitely a garbled approximation of Popper, but the more specific and doctrinaire one gets, the harder it becomes to achieve any form of consensus. Which is kind of my point.

** I’m not even considering applied ecology, where a practical outcome is in mind from the outset.

EDIT: added the direct quotation from Brookfield (2010) to make my point clearer.

Two lumps please

Here’s a quick thought experiment. Imagine you have a spare flowerbed in your garden, in which you scatter a handful of seeds across the bare ground. You then ignore them, and come back some months later. What will have happened?* Your expectation might be that you will have a healthy patch of plants, all about the same size. Some might be larger or smaller than average, but overall you’d expect them to be pretty similar. This is known as a unimodal size distribution. They have after all experienced identical conditions.

You’d be wrong. In fact, it’s more likely that your plants will have separated into two or more size groupings. There will be a set of larger plants, spread apart from one another, and which dominate the newly-formed canopy. In between them will be scattered other plants of smaller size. This results in a bimodal (or multimodal) size distribution. There isn’t a standard, expected size; instead there will be different size classes present.

modes.png

A normal, unimodal distribution of sizes (left) is what you might expect to see when all plants are the same age and growing in the same conditions. In fact it’s more common to see a bimodal size distribution (right), or something even more complicated.

This observation is nothing new. Much was written about the issue from the 1950s through to the 70s, particularly in the context of forest stands. The phenomenon was widely-recognised but remained paradoxical.

I stumbled upon this old literature back in 2010 when I published a small paper based on a birch forest in Kamchatka which showed a clearly bimodal size distribution. I didn’t need to go all the way to Kamchatka to find a stand with this feature; but since I had the data it made sense to use it. I used the spatial pattern of stems to infer that the bimodality was the result of asymmetric competition (i.e. that large trees obtain disproportionately more resources than small trees, which is definitely true in terms of light capture). All the trees were the same age, but the larger stems were spread out, with the smaller stems in the interstices between them. Had the bimodality been the result of environmental drivers we would expect there to be patches of large and small stems, but in fact they were all mixed together.

White birch forest, central Kamchatka

This is the stand of Betula platyphylla with a bimodal size distribution that was described in Eichhorn (2010). If it looks familiar, it’s because the strapline of this blog is a picture of us surveying it. The white lights on the photo aren’t faeries, it’s the reflectance of mosquito wings from the camera flash. So many mosquitoes.

Three things struck me when I was reading the literature. The first was that hardly anyone had thought about multimodal size distributions in cohorts for several decades**. This was a forgotten problem. The second was that the last major review of the phenomenon back in 1987 had concluded that asymmetric competition was the least likely cause — which conflicted with my own conclusions. Finally, I had no difficulty in finding other examples of multimodal size distributions in the literature, but authors kept dismissing them as anomalous. I wasn’t convinced.

Analysing spatial patterns is all well and good but if you want to really demonstrate that a particular process is important, you need to create a model. Enter Jorge Velazquez, who was a post-doc with me at the time but now has a faculty position in Mexico. He built a simple model in which trees occupy fixed positions in space and can only obtain resources from an the area immediately around themselves. Larger trees can obtain resources from a greater area. When two trees are close to one another, their intake areas overlap, leading to competition for resources.

overlap.png

When there are two individual trees (i and j), each of which obtains resources from within a radius proportional to its size m, the overlap is determined by the distance d between them. Within the area of overlap the amount of resources that each receives depends on the degree of asymmetric competition, i.e. how much of an advantage one gets by being larger than the other. This is included in the model as a parameter described below.

This is where asymmetric competition is introduced as a parameter p. When = 0, competition is symmetric, and resources are evenly divided between two trees when their intake areas overlap. When = 1, each tree receives resources in direct proportion to its size  (i.e. a tree that’s twice as large will receive two thirds of the available resources). Increasing makes competition ever more asymmetric, such that the larger competitor receives a greater fraction of the resources being competed for. In nature we expect asymmetric competition to be strong because a taller tree will capture most of the light and leave very little for those beneath it.

We applied the model to data from a set of forest plots from New Zealand which have already been well-studied. Not only did we discover that two thirds of these plots had multimodal size distributions, but also that our model could reproduce them.

We then started running our own thought experiments. What if you changed the starting patterns, making them clustered, random or dispersed? That turned out to have very little effect on size distributions. What about completely regular patterns? That’s when things started to get really interesting.

By testing the model with different patterns we discovered three important things:

  • Asymmetric competition is the only process which consistently causes multimodal size distributions within simulated cohorts of plants. Nothing else we tried worked.
  • Asymmetric competition is the cause, not the consequence of size differences in the population.
  • The separation of modes is determined by the length of time it takes for competition in the cohort to start, which usually reflects the distance between individuals.
  • The number of modes reflects the effective number of competitors that each individual has.

What does all this mean? Given that asymmetric competition is normal for plants, I would argue that we should expect to see multimodal size distributions everywhere. In fact, seeing unimodal size distributions should be a surprise. Don’t believe me? Grab some seeds, give it a go, and tell me if I’m wrong.

You can read our new paper on the subject here. If you can’t get hold of a copy then let me know.


* Luckily this is a thought experiment, because in my garden the usual answer is ‘everything has been eaten by slugs’.

** I should stress here that I’m specifically referring to multimodality in size distributions of equal-aged cohorts. When several generations overlap then the distribution of sizes reflects the ages of the individuals. If multiple species are present this adds additional complications, and in fact size distributions of species across communities have been a hot topic in the literature of late. This is very interesting but a completely different set of processes are at work.