Tag Archives: research

How I stopped believing in the biodiversity-ecosystem functioning consensus

The E120 Biodiversity Experiment at Cedar Creek in Minnesota, from their website.

It all began with the best of intentions. By the time the Millennium Ecosystem Assessment was published in 2005, there was already strong evidence that the services nature provided to humanity were in decline, and our degradation and over-exploitation of nature was to blame. We were also acutely aware of rising extinction rates, and biodiversity had long been a cause around which world leaders were prepared to rally.

The missing piece of the puzzle appeared to be obvious: show that biodiversity was good for ecosystem functions and services, and the case for conservation could add raw self-interest to the mix. This was the foundation for thousands of studies of the biodiversity-ecosystem functioning relationship (often shortened to BEF).

In practice biodiversity was usually translated into species richness. This is of course reductive, but then species richness is often assumed to capture many other aspects of diversity, including traits, phylogeny, resource use and much more. This assumption is seldom directly tested for the set of species under study. Nevertheless it seemed to work. Early evidence of positive effects of species richness from sealed laboratory systems had been replicated in some the most ambitious field experiments in ecology, resulting in countless papers and a growing confidence that more species were a universal good. It also appeared that one of the key benefits was multifunctionality, which meant that the more species you had, the more services they were able to deliver. Everyone came to agree that biodiversity was one of the most important forces shaping ecosystem function. There was even emerging evidence that real-world patterns might be stronger than those in experimental systems, which made sense because we expected ecosystems to be structured in a way that made the roles of species complementary.

Ten years ago I published a textbook with a chapter on why diversity matters which presented this consensus. Prior to publication I sent it to a few colleagues who were active in the BEF literature and they gave it the thumbs-up. Everyone was agreed: more species were always a good thing.

At the moment I’m revising the textbook and finding that the last decade has not been kind to the consensus view. Every bandwagon eventually attracts a backlash, so this is not unexpected, but it creates a challenge for how to effectively synthesise the literature and present a clear story that allows readers to understand and engage with the evidence as it emerges.

First, even after thousands of studies, there is still enormous bias in the evidence base behind BEF research. Most studies are of plants, the main ecosystem function measured is simple productivity, and they are geographically located mainly in temperate zones. The evidence from aquatic systems (marine or freshwater), the tropics or non-invertebrate animals is slight. We should always be cautious about declaring a general rule for which the evidence is not yet general.

Then there’s the problem that experimental results haven’t replicated in the real world. Early reviews were unequivocal: biodiversity was always good, all the time. More recent studies in natural systems have found that while positive effects outweigh negative ones, actually the majority of studies find no effect at all.

To this we can add emerging evidence of ecosystem disservices. There are certainly ways in which biodiversity can be harmful to humans. In some of my own research we found that increased tree cover around rural villages was linked to greater risks of disease by supporting vector populations. Likewise in agriculture, the whole history of intensification has been one of deliberately reducing diversity in fields to boost the necessary service of crop production.

There are also critiques of the ways in which previous studies analysed their data which suggest that they may have over-stated the benefits of biodiversity. While not wanting to get into the technical details, it’s at least worth noting that the fundamental nature of the relationship has come under scrutiny.

Finally, the emerging picture is that species richness isn’t always the main correlate of functioning. Sometimes functional or trait diversity turns out to be more important, and species richness is only an indirect or imperfect means of capturing this. This then leads to another set of questions for which ecological theory leaves us on unsteady ground. Which traits matter, and why? There is a danger of circular reasoning if we end up picking traits because they affect ecosystem functions (try them all and see which works), or because they’re the ones we’re able to measure, without putting a deductive framework in place first. We could end up just creating a new set of problems for the next generation of ecologists.

For some groups of organisms there are standard sets of traits, most notably the Leaf Economic Spectrum for plants, which has been joined by the Wood Economic Spectrum for woody plants and some other developing initiatives. There is a different problem here though, which is that we find that particular environments select for a relatively narrow range of traits. If for example only plants with certain traits can survive in the desert, and deserts have low productivity, is it the environment or the trait diversity that limits productivity? Most ecologists would vote for the former. The pathways of causation become complicated and we need to reflect on whether trait diversity is the cause or a side effect of ecosystem functioning.

Where does this leave us? Perhaps we shouldn’t have expected nature to do any favours for us. We certainly haven’t reciprocated. Quite simply, nature doesn’t care about us. There are interesting fundamental questions to be asked about what structural aspects of natural systems are connected to ecosystem processes, but these need to be addressed without the distorting lens of of activism. There are also plenty of ways to justify conservation of biodiversity without making claims for its benefits to people. We might eventually find ways in which maintaining high biodiversity can be reconciled with our own self-interest, but for now I’m stepping off the BEF bandwagon and waiting for the fog of evidence to clear.

Addendum: make sure you read on to the comments below where some serious experts in BEF help to clarify the issues and provide some genuine insights. There was also an excellent discussion on LinkedIn which included a heap of recommended papers. If you’re as confused about BEF as I am then these are great starting points to find out more.

How to write a PhD thesis. Part I: Saving Time

How to write a conference abstract

Why should anyone care about Ugandan lianas?

3 Replies

The liana team surveying in 2015 (Takuji Usui, Julian Baur and first author Telma Laurentino). Bridget Ogolowa (far left) did not participate in the study. Photo by Line Holm Andersen.

Habent sua fata libelli as the Latin epithet puts it, meaning ‘little books also have their destinies’. I’d like to think that the same is true of papers. Not every scientific publication appears in a major journal, or attracts media attention, or becomes a highly-cited classic. Some, perhaps, are never read again by anyone. This doesn’t mean that publishing them wasn’t valuable. A paper represents a new piece of knowledge or insight that adds to our total understanding of the world. And in some cases its small part in the greater whole is the main reason why it matters.

As an example, our latest paper just came out in African Journal of Ecology, a minor regional journal with an impact factor so small (0.797 in 2017) that in the metric-obsessed world of Higher Education it barely registers. Some would argue that the effort of publishing in such a low-status journal is a waste of time*. Why bother?

In this case, our study — small and limited in scope as it was — adds an important point on the map. Over recent years it has been noted that the abundance of lianas is increasing in South American forests. This process, sometimes known as ‘lianification’, is troubling because lianas can impede the growth of forest trees, or the recovery of forests following disturbance (including logging). At a time when we need forests to capture carbon from the atmosphere, an increase in the abundance of lianas could be exactly what we don’t want.

The causes of this increase in lianas are unknown, and it is also uncertain how widespread the effect might be. The best evidence that it’s happening comes from neotropical forests**, but we can’t be sure whether the same process is occurring in Southeast Asia, or Sri Lanka, or Africa. If the driver is global one, for example a change in the climate (warming, higher carbon dioxide concentrations, or longer dry seasons) then we would expect the same trend to be occurring everywhere. If it’s a purely local effect within South America then it might reflect historical factors, modern disturbance or the particular composition of plant communities.

It’s not just that we don’t know whether lianas are increasing in all parts of the world simultaneously; for most forests we don’t even know how many lianas were there in the first place. We could only find evidence of four published studies of liana abundance in the entirety of Africa, of which two were in secondary or transitional forests. That means only two previous studies on the continent had measured lianas in a primary forest. If we want to monitor change then we first need a starting point.

Location of our study in in Kanyawara, Kibale National Park, Uganda. Figure 1 in Laurentino et al. (2018).

What did we find? Actually it turns out that liana densities in our forest were quite similar to those seen elsewhere in the world. An average liana basal area of 1.21 m²/ha is well within the range observed in other forests, as are the colonisation rates, with 24% of saplings and 57% of trees having at least one liana growing on them. These figures are unexceptional.

What does this tell us about lianification? To be completely honest, nothing. Or at least not yet. A single survey can’t say anything about whether the abundance of lianas in Africa is increasing, decreasing, or not changing at all. The point is that we now have baseline data from a part of the world where no-one had looked before. On their own these data aren’t particularly interesting. But considering the global context, and the potential for future studies to compare their work with ours, means that we have placed one more small piece in the jigsaw. And for the most part, that’s what science is about.

CODA: There’s another story behind this paper, because it came about through the awesome work of the Tropical Biology Association, an educational charity whose aims are capacity-building for ecologists in Africa and exposing ecologists from species-poor northern countries to the diversity and particular challenges of the tropics. Basically they’re fantastic, and I can’t recommend their courses highly enough. The work published here is based on a group project from the 2015 field course in Uganda and represents the first paper by three brilliant post-graduate students, Telma Laurentino, Julian Baur and Takuji Usui, who did all the real work***. That alone justifies publishing it, and I hope it’s only the first output of their scientific careers.

* A colleague at a former employer once memorably stated in a staff meeting that any journal with an IF of less than 8 was ‘detritus’. This excluded all but a handful of the most prestigious journals in ecology but was conveniently mid-ranking in his own field.

** Although this might be confounded by other factors — look out for a paper on this hopefully some time in 2019.

*** I also blogged about the liana study at the time here.

How representative of ecology are the top 100 papers?

9 Replies

The publication in Nature Ecology & Evolution of the 100 most important papers in ecology has led, inevitably, to a fierce debate. Several rapid responses are already in review. The main bone of contention has been that not only were the first authors of 98% of the papers male, but the only two papers written by women were relegated to the very bottom of the list. In a generous reading this reflects implicit biases at every stage of their compilation, rather than any malign intent on the part of the authors*, but I’m sure they’ve received plenty of feedback on this oversight.

Pretty soon after it came out, Terry McGlynn on Twitter asked:

https://twitter.com/hormiga/status/930194024182321152

If you want a guide to all the essential papers that didn’t make the list, and happen to have been written by women, this thread is a good place to start. I’m not going to fan the flames any further here, but it’s important that this glaring omission remains the headline response. Instead I’m going to respond to another observation:

https://twitter.com/GrunerDaniel/status/930269079293833216

This pricked up my senses, given that I am also an undergraduate textbook author. In writing the Natural Systems book (published 2016) I made a deliberate attempt to not cite the same things as everyone else, and to emphasise promising directions for the future of the field of ecology. That made me wonder: how many of the 100 most important papers in ecology did I manage to cite? Note that I had no input into the Nature Ecology & Evolution article, and the book only includes references up to the end of 2014, so these form entirely independent samples. Without formally counting, I estimate that I’ve read around 80% of the top 100 papers, and I’m aware of almost all of them.

How many? Only 17/100 papers.** That raw figure disguises some interesting discontinuities within the list. Of the top ten I actually cited six, and a total of nine from the top twenty. This indicates a reasonable amount of agreement on the most important sources. But of the bottom 80 I only managed another eight (10%). This comes from a total of over 800 sources cited in the book.

Why did I cite them? The main reasons:

Posing an important question we have since spent a long time trying to answer (Hutchinson 1957, 1959, 1966, Janzen 1967).
Defining a new idea which remains relevant (Grinnell 1917, Gleason 1926, Janzen 1970, Connell 1978).
Creating a framework which has been elaborated since (MacArthur 1955, MacArthur & Wilson 1963, Tilman 1994, May 1972, Chesson 2000, Leibold et al. 2004, Brown 2004).
Reviewing the evidence for an important principle (Tilman 1996).
The first empirical demonstration of an important idea (Tilman 1977).

In many cases I have cited the same authors from the top 100 multiple times, but not necessarily for the original or classic piece of work; often it’s a later review or synthesis. This is because I deliberately chose citations that would be most helpful for students or other readers, not always on the basis of precedence.

The aim of this post is not to argue in any way that the authors of the paper were wrong; this is only a reflection of my personal opinion of what matters in the field. Theirs was generated through the insights of 147 journal editors and a panel of 368 scientists from across the discipline, and is therefore a much more genuine representation of what opinion-makers within the field of ecology believe (although there are better ways to conduct such an exercise). Mine is only one voice and certainly not the authoritative one.***

Writing a textbook is something like curating an exhibition at a museum or art gallery. It bestows on the author the responsibility of deciding which pieces to show in order to tell a particular story. Of necessity this becomes a very personal perspective. I’m amused to find that my view of ecology overlaps by only 17% with the leaders in my field.**** That doesn’t make either of us right or wrong, only that we must be looking in very different directions.

As for their aim of creating an essential reading list for post-graduates or those wishing to learn the foundations of the field, here I profoundly disagree. The best way to learn about current practice in ecology is to start with a good core textbook (and there are lots more out there), read recent synthetic reviews, or pick over the introductions of papers in the major journals. In the same way that you don’t need to read Darwin to understand evolutionary theory, or Wallace to understand biogeography, it’s not strictly necessary to read Grinnell, Clements or Gause to get to grips with modern ecology. Fun if you have the time but most people have more important things to do.

One final comment: three of the top ten papers in ecology were written by one man, G. E. Hutchinson. There is no doubt that his work was highly influential, and I agree that these are important papers to read. What I find most interesting though is that all of them are essentially opinion pieces that frame a general research question, but go little further than that. None of them would get published in a modern ecological journal.

Where would you find similar pieces of writing today? On a blog.

UPDATE: Dr Kelly Sierra is soliciting suggestions for a more inclusive list. Whether or not you feel that such lists have any inherent value, if we’re going to make them then they should at least represent the full diversity of our scientific community.

* In the comments below, Jeremy Fox points out that this isn’t very well worded, and could be read as a suggestion that I think there was some malign intent. So, to be absolutely clear, I am not suggesting that the authors made a deliberate choice to exclude or devalue papers written by women. If anything this was a sin of omission, not of commission, and we all need to learn from it rather than attribute blame to individuals.

** As an aside, 16 of the 17 were sole-authored papers. Only Leibold et al. (2004), which defined the metacommunity concept, had more than one author.

*** Nor do I think it’s healthy for there to be a voice of authority in ecology, or any other academic field. We make progress through testing every argument or piece of evidence, not by accepting anyone’s word, however senior or trustworthy. If there were an authority figure you can almost guarantee that I would disagree with them.

**** I’m more in line with the recent attempt to define the 100 most important concepts in ecology, although a little peeved that so many people dismissed Allee effects given my recent work on them.

Free software for biologists pt. 5 –operating systems

Free software for biologists pt. 4 – presentations

5 Replies

This post is going to strike a slightly different note to previous pieces on software tools for writing, handling data and preparing figures. In each of those I emphasised the advantages of breaking away from the default proprietary software shipped with the average PC and exploring bespoke options designed for scientists. In the case of giving talks or lectures, I’m going to argue for the complete opposite position: it’s not so much what you use, but how you use it.

When delivering a talk, the slides that accompany it are visual aids. I’ve emphasised that term because its meaning has been lost through repetition. The key word is aids. The slides are there to support and enhance the understanding of the audience, and to back up what you say. They are not supposed to be the focus of attention. The slides are not your notes*.

What’s more, slides cause problems more often than they dramatically improve a talk. An ideal talk is one where the audience receive the message without anything getting in the way. How many times have you walked out of a conference talk thinking ‘great slides’? Perhaps never. On the other hand, how many times have you seen a perfectly good talk ruined by a distracting display or computing failure?** For me, that’s at least once a session.

With this in mind, I recommend starting to plan a talk with a simple question: do you need to have any slides at all? Yes, I know, I’ve just challenged the default assumption of almost every conference presenter these days. But I’m absolutely serious. Start from the perspective of thinking what you are going to tell the audience, in normal speech, while they look directly at you and listen to what you say. If you can convey all the information you need to without slides (or by using other visual aids, such as props or exhibits) then there is no obligation to have slides at all.

Next ask yourself what elements would benefit from being presented visually as well. Note that I’m explicitly trying not to write the talk around the slides, but the visual aids around the talk. Once again there might be no need for slides — you could work through equations or models by sketching them on a blackboard. Nevertheless, for certain types of information, slides are the best means to present them. Data figures, photographs, diagrams, maps and so on are going to need to be put up on the big screen. Note that none of these involve much text, if any.

When you start from that perspective, the software you choose to prepare your slides should be the one that permits you to most clearly present your figures without distracting clutter.

Slides are there to help the audience understand your points, not to replicate the talk. Only include the bare minimum of text and be prepared to walk your audience through the details.

With this in mind, PowerPoint is fine for producing lecture slides, and easy to use. The main challenge is changing all the default settings to be as plain and simple as possible, and resisting the temptation to use features that only serve to distract the audience from your intended content (animations, background images, sound effects). These should be used sparingly, and only if they improve the transmission of information***. Remember: slides are there to inform, not to entertain. If you don’t want to pay for Powerpoint then the free LibreOffice Impress will do all the same things and serves as a direct replacement.

An online alternative is slides, which adds the neat trick of allowing remote control of presentations from a second computer or your mobile phone. Another choice is reveal.js, which is free for basic users, but if you want to download a copy of the presentation or collaborate with a colleague then a subscription is required. Being willing to write a little code helps too.

If you’re using LaTeX then an alternative is the beamer document class. powerdot appears to do the same thing but I’ve never used it. The usual caveat about LaTeX applies — if you’re not already using it for everything then the time investment for presentations alone won’t be worth it. I have also yet to find a way to embed videos directly into slides.

All my slides are prepared in LaTeX using the intridea beamer theme. I like the look of them, but it takes time and expertise to set up. You could achieve something similar with much less effort.

One good reason to move away from Powerpoint or its analogues is frequency-dependent selection. You can stand out from the crowd simply by virtue of using something different. By the end of the first day of a meeting people are already suffering from Powerpoint fatigue, which makes anything else a pleasant relief.

To really change style and impress your audience, try Prezi. This is a different way of visualising your talk, and some time investment is required to get it right. As with Powerpoint, there are many tricks and decorations that can be inserted, but which will distract from the information you’re trying to get across. Particularly try to minimise use of the ‘swooping’ movement, which can induce nausea in your audience.

The two main disadvantages to Prezi are that you need to be connected to the internet to use it, and that the free version requires your presentation to be visible online. The first is seldom an issue, the latter only matters if what you’re showing is somehow private or confidential, and if so then why are you presenting?

In general I don’t submit posters at conferences, though there are many good reasons to choose a poster over a talk, and a lot of guidance on how to do it well. I’m not going to repeat this because I have nothing to add, but also because I have no personal experience to draw from, and can’t therefore recommend any particular software.

* This is true for most public, professional presentations. Lectures for undergraduate students are a different matter though, at least within my experience. Many students now assume that the slides are the notes, and expect to be able to reconstruct the material from these alone. Some lecturers provide printouts of slides as their handouts. You can debate whether this means you should include more material on your slides to serve this function, or make a stand, expect students to take their own notes, and risk complaints.

** Many years ago — long enough for the scars to have healed — a collaborator of mine presented her work at a major international conference. It was a hot topic, and the theatre was packed. We had gone through the talk together the previous night on her laptop and I’d not seen any problems. But on the day it turned into a nightmare. For some unknown reason, every animation (in Powerpoint terms, that means lines or other elements appearing on the screen) was accompanied by a sound effect. Distorted by the conference room speakers it was transformed into something akin to the bellow of a caged animal. This happened every time she clicked, all the way through the talk. Even worse, none of the videos worked. Her evident mortification was met by the awkward, sympathetic unease of the audience. Everyone remembered that talk, though not for the right reasons.

*** A good general rule is: can I save it as a pdf file with no loss of features? If you can then do; not only are they smaller, but they’re more stable, and guaranteed to look identical on whatever computer you need to use. If there are features that would be lost then think carefully about whether you really need them.

Barnacles are much like trees

1 Reply

I am not a forest ecologist. OK, that’s not entirely true, as demonstrated by the strapline of this blog and the evidence on my research page. Nevertheless, having published papers on entomology, theoretical ecology and snail behaviour (that’s completely true), I’m not just a forest ecologist. Having now published a paper on barnacles, one could suspect that I’m having an identity crisis.

When a biologist is asked what they work on, the answer often depends on the audience. On the corridor that hosts my office, neighbouring colleagues might tell a generally-interested party that they work on spiders, snails, hoverflies or stickleback. Likewise, I usually tell people that I work on forests. When talking to a fellow ecologist, however, the answer is completely different, as it would be for every one of the colleagues mentioned above*.

If you walked up to me at a conference, or met me at a seminar, I would probably say that I work on spatial self-organisation in natural systems. If you were likely to be a mathematician or physicist** then I’d probably claim to study the emergent properties of spatially-structured systems. I might follow this up by saying that I’m mostly concerned with trees, but that would be a secondary point.

What I and all my colleagues have in common is that we are primarily interested in a question. The study organism is a means to an end. We might love the organism in question, rear them in our labs, grow them in our glasshouses, spend weeks catching or watching them in the field, learn the fine details of their taxonomy, or even collect them as a hobby… but in the end it is the fundamental question that drives our work. The general field of study always takes priority when describing your work to a fellow scientist.

Behold the high-tech equipment used to survey barnacles. This is the kind of methodology a forest ecologist can really get behind.

The work on barnacles was done by a brilliant undergraduate student, Beki Hooper, for her final-year project***. The starting point was the theory of spatial interactions among organisms most clearly set out by Iain Couzin in this paper****. His basic argument is that organisms often interact negatively at short distances: they compete for food, or territorial space, or just bump into one another. On the other hand, interactions at longer ranges are often positive: organisms are better protected against predators, able to communicate with one another, and can receive all the benefits of being in a herd. Individuals that get too close to one another will move apart, but isolated individuals will move closer to their nearest neighbour. At some distance the trade-off between these forces will result in the maximum benefit.

Iain’s paper was all about vertebrates, and his main interest has been in the formation of shoals of fish or herds of animals (including humans). I’m interested in sessile species, in other words those that don’t move. Can we apply the same principles? I would argue that we can, and in fact, I’ve already applied the same ideas to trees.

What about barnacles? They’re interesting organisms because, although they don’t move as adults, to some extent they get to choose where they settle. Their larvae drift in ocean currents until they reach a suitable rock surface to which they can cling. They then crawl around and decide whether they can find a good spot to fix themselves. It’s a commitment that lasts a lifetime; get it wrong, and that might not be a long life.

If you know one thing about barnacles, it’s probably that they have enormously long penises for their size. Many species, including acorn barnacles, require physical contact with another individual to reproduce. This places an immediate spatial constraint on their settlement behaviour. More than 2.5 cm from another individual and they can’t mate; this is potentially disastrous. Previous studies have focussed on settling rules based on this proximity principle. They will also benefit from protection from exposure or predators. On the other hand, settle too close to another barnacle and you run the risk of being crushed, pushed off the rock, or having to compete for other resources.

Barnacles can be expected to interact negatively at short distances, but positively at slightly longer distances. This disparity in the ranges of interactions gives rise to the observed patterning of barnacles in nature.

What Beki found was that barnacles are most commonly found just beyond the point at which two barnacles would come into direct contact. They cluster as close as they possibly can, even to the point of touching, and even though this will have the side effect of restricting their growth.

Furthermore, Beki found that dead barnacles had more neighbours at that distance than would be expected by chance, and that particularly crowded patches had more dead barnacles in them. There is evidence that this pattern is structured by a trade-off between barnacles wanting to be close together, but not too close.

On the left, the pattern of barnacles in a 20 cm quadrat. On the right, the weighted probability of finding another barnacle at increasing distance from any individual. A random pattern would have a value of 1. This shows that at short distances (less than 0.30 cm) you’re very unlikely to find another barnacle, but the most frequent distance is 0.36 cm. Where it crosses the line at 1 is where the benefits of being close exceed the costs.

Hence the title of our paper: too close for comfort. Barnacles deliberately choose to settle near to neighbours, even though this carries risks of being crowded out. The pattern we found was exactly that which would be expected if Iain Couzin’s model of interaction zones were determining the choices made by barnacles.

When trees disperse their seeds, they don’t get to decide where they land, they just have to put up with it. The patterns we see in tree distributions therefore reflect the mortality that takes place as they grow and compete with one another. This is also likely to take place in barnacles, but the interesting difference lies in the early decision by the larvae about where they settle.

Where do we go from here? I’m now developing barnacles as an alternative to trees for studying self-organisation in nature. The main benefit is that their life cycles are much shorter than trees, which means we can track the dynamics year-by-year. For trees this might take lifetimes. We can also scrape barnacles off rocks and see how the patterns actually assemble in real time. Clearing patches of forests for ecological research is generally frowned upon. The next step, working with Maria Dornelas at St. Andrews, will be to look at what happens when you have more than one species of barnacle. Ultimately we’re hoping to test these models of how spatial interactions can allow species to coexist. Cool, right?

The final message though is that as an ecologist you are defined by the question you work on rather than the study organism. If barnacles turn out to be a better study system for experimental tests then I can learn from them, and ultimately they might teach me to understand my forests a little bit better.

* Respectively: Sara Goodacre studies the effects of long-range dispersal on population genetics; Angus Davison the genetic mechanisms underpinning snail chirality; Francis Gilbert the evolution of imperfect mimicry; Andrew MacColl works on host-parasite coevolution. I have awesome colleagues.

** I’ve just had an abstract accepted for a maths conference, which will be a first for me, and slightly terrifying. I’ve given talks in mathematics departments before but this is an entirely new experience.

*** Beki is now an MSc student on the Erasmus+ program in Evolutionary Biology (MEME). Look out for her name, she’s going to have a great research career. Although I suspect that it won’t involve barnacles again.

**** Iain and I once shared a department at Leeds, many years ago. He’s now at Princeton. I’m in the East Midlands. I’m not complaining…

Free software for biologists pt. 3 – preparing figures

11 Replies

So far we’ve looked at software tools for handing and analysing data and for writing. Now it’s time to turn to the issue of making figures.

Early in my career, I wish someone had taken me to one side and explained just how important figures are. Too often I see students fretting over the text, reading endless reams of publications out of concern that they haven’t cited enough, or cited the right things. Or fine-tuning their statistical analyses far beyond the point at which it makes any meaningful difference. And yet when it comes to the figures, they slap something together using default formatting, almost as an afterthought.

Having recently written a textbook (shameless plug), it has only brought home to me how crucial figures are to whether your work will get used and cited*. The entry criterion for a study being used in a book isn’t necessarily the quality of science, volume of data or clarity of expression, though I would argue that all of these are high in the best papers. What really sets a paper apart is its figures. Most of us, when we read papers, look at the pictures, and often make a snap judgement based on those. If the figures are no good then the chances of anyone wading through your prose to pick out the gems of insight will be substantially reduced.

Here then is a useful rule of thumb: you should spend at least one working day preparing each figure in a manuscript. That’s after collecting and analysing the data, and after doing a first-pass inspection of the output. A whole day just fine-tuning and making sure that each final figure is carefully and concisely constructed. You might not do it all in one sitting; you may spend 75% of the time trying out multiple formats before settling on the best one. All this is time well spent. And if you’re going to put the time into preparing them then you should look into bespoke software that will improve the eventual output.

Easy to use does not mean good quality! Comic by XKCD.

Presenting statistical outputs

If you’ve been following this series of posts then it will come as no shock that I don’t recommend any of Microsoft’s products for scientific data presentation. The default options for figures in Excel are designed for business users and are unsuitable for academic publication. Trying to reformat an Excel figure so that it is of the required quality is a long task, and one that has to be repeated from scratch every time**. Then saving it in the right format for most journals (a .tiff or .eps file) is even less straightforward. As an intermediate option, and for those who wish to remain in Excel, Daniel’s XL plugin is a set of tools for analysis and presentation that improve its functionality for scientists.

Needless to say, this is all easier in R with a few commands and, once you’ve figured it out, you can tweak and repeat with minimal effort (the ggplot2 package is especially good). The additional investment in learning R will be rewarded. In fact, I’d go so far as to say that R is worth the effort for preparing figures alone. No commercial product will offer the same versatility and quality.

Here’s one I made earlier, showing foliage profiles in 40 woodlands across the UK. Try creating that in Excel.

One of the reasons I recommend ggplot2 is that it is designed to follow the principles of data presentation outlined in Edward Tufte’s seminal book The Visual Display of Quantitative Information. It’s one of those books that people get evangelical about. It will change the way you think about presenting data, and forms the basis for the better scientific graphing tools.

visual-display-quantitative-information-tufte

What do you mean you haven’t read it? OK, you don’t have to, but it will convince you that data can be aesthetically pleasing as well as functional.

If you’re not an R user then a good alternative is the trusty gnuplot. Older readers can be forgiven for shedding a nostalgic tear, as this is one of the ancient software tools from the pre-internet age, having been around for about 30 years. It lives on, and has been continually maintained and developed, making it just as useful today as it was then.

A colleague pointed me towards D3.js, which is a JavaScript library that manipulates documents based on data input. I haven’t played with it but it might be an option for those who want to quickly generate standardised and reproducible reports.

Finally, if your main aim is to plot equations, then Octave is a free alternative to the commercial standard MATLAB. Only the most mathematical of biologists will want to use this though.

Diagrams

Some people try to produce diagrams using PowerPoint. No. Don’t do it. They will invariably look rubbish and unprofessional.

For drawing scientific diagrams, the class-leader is the fearsomely expensive Adobe Illustrator. Don’t even consider paying for your own license though because the free Inkscape will do almost everything you’ll ever need, unless you’re a professional graphic designer, in which case someone else is paying. Another free option is sK1 which has even more technical features should you need them. Xara Xtreme may have an awful name but it’s in active development and looks very promising. It’s also worth mentioning LibreOffice Draw, which comes as part of the standard LibreOffice installation.

One interesting tool I’m itching to try is Fiziko, which is a MetaPost script for preparing black-and-white illustrations for textbooks which mimic the appearance of blocky woodcuts or ink drawings. It looks like some effort and experience is required to use it though.

Image editing

The expensive commercial option is Photoshop, which is so ubiquitous that it has even become its own verb. For most users the free GIMP program will do everything they desire. I also sometimes use ImageMagick for image transformation, but mostly the command-line tool sam2p. Metadata attached to image files can be read and edited with ExifTool.

A common task in manuscripts is to create a simplified vector image, perhaps using a photo as a template. You might need to draw a map, show the structure of an organ or demonstrate an animal’s behaviour. For this there are specialist tools like Blender, Cheetah3D for Mac users or Google’s SketchUp, though the latter only offers a limited version for free download. Incidentally, never use a pixel art program (like Photoshop) to trace an image. All you end up with is a simplified pixel image of the original, which looks terrible. Plus you’ve paid for Photoshop.

For the rather specialised task of cropping and assembling documents from pdf files, briss might be an ancient piece of software but it’s still the go-to application.

Preparing outline maps (e.g. of study sites) is a common task and an expensive platform like ArcGIS is unnecessary. Luckily the free qGIS is almost as good and improving rapidly. There’s a guide to preparing maps here.

A map showing the study site in a forthcoming paper (Hooper & Eichhorn 2016) and prepared by Jon Moore in qGIS.

There are countless programs out there for sorting, handling and viewing photographs (e.g. digiKam, Shotwell). Not being much of a photographer I’m not a connoisseur.

Flowcharts

Flowcharts, organisational diagrams and other images with connected elements can be created in LibreOffice Draw. I’ve not used it for this though, and therefore can’t compare it effectively to commercial options like OmniGraffle, which is good but expensive for something you might not be doing regularly. A LaTeX-based option such as TikZ is my usual choice, and infinitely better than spending ages trying to get boxes to snap to a grid in Powerpoint. If you’re not planning to put the time into learning LaTeX then this is no help, but add it to the reasons why you might. If anyone knows of a particularly good FOSS solution to this issue then please add in the comments and I will update.

I made this in TikZ to illustrate the publication process for my MSci class in research skills. I won’t lie, it took a long time (even as a LaTeX obsessive), and I’d like to find a more efficient means of creating these figures.

Animations

This is one task that R makes very easy. Take the output of a script that creates multiple PNG files from a loop and bundle them into an animation using QuickTime or the very straightforward FFmpeg. For something that looks so impressive, especially in a presentation, it’s surprisingly easy to do.

Collecting data

To collect data from images ImageJ is by far the best program, largely due to the immense number of specialist plug-ins. Some of these have been collected into a spin-off called Fiji, which provides a great set of tools for biologists. Whatever you need to do, someone has almost certainly written a plug-in for it. Note that R can also collect data from images and even interfaces with ImageMagick via the EBimage package. Load JPEGs with the ReadImage package and TIFF files with rtiff.

A common task if you’re redrawing figures, or preparing a meta-analysis, is to extract data from figures. This is especially common when trying to obtain data from papers published before the digital age, or when the authors haven’t put their original data online. For this, Engauge will serve your needs.

Next time: how to prepare presentations!

* At some point in the pre-digital age, maybe in the 90s, I recall an opinion piece by one textbook author making exactly this point. Was it Lawton, Krebs, Southwood… I really can’t remember. If anyone can point me in the right direction then I’d be grateful because I can’t track it down.

** I did overhear one very prominent ecologist declare only half-jokingly that they stopped listening to talks if they saw someone present an Excel figure because it indicated that the speaker didn’t know what they were doing. Obviously I wouldn’t advocate such an extreme position, but using Excel does send a signal, and it’s not a good one.

Free software for biologists pt. 2 – data management and analysis

11 Replies

This is the second part of a five-part series, collated here. Having covered writing tools in the last post, this time I’m focussing on creating something to write about.

Data management

Let’s assume that you’ve been out, conducted experiments or sampling regimes, and returned after much effort with a mountain of data. As scientists we invest much thought into how best to collect reliable data, and also in how to effectively analyse it. The intermediate stage — arranging, cleaning and processing the data — is often overlooked. Yet this can sometimes take as long as collecting the data in the first place, and specialist tools exist to make your life easier.

I’m not going to dwell here on good practices for data management; for that there’s an excellent guide produced by the British Ecological Society which says more than I could. The principles of data organisation are well covered in this paper by Hadley Wickham. Both are on the essential reading list for students in my group, and I’d recommend them to anyone. Instead my focus here is on the tools you can use to do it.

The familiar Microsoft Excel is fine for small datasets, but struggles with large spreadsheets, and if you’ve ever tried to load a sizeable amount of data into it then you’ll know that you might as well go away to make a cup of tea, come back and hope it hasn’t crashed. This is a problem with Excel, not your data. Incidentally, LibreOffice Calc is the free substitute for Excel if you want a straight replacement. Don’t even consider using either of them to do statistics or draw figures (on which there will be more next time). I consider this computational limitation more than enough reason to look elsewhere, even though there are many official and unofficial plug-ins which extend Excel’s capabilities. Excel can also reformat your data without you knowing about it.

One of the main functionalities lacking in Excel is a way to use GREP. Regular Expressions are powerful search terms that allow you to screen data, check for errors and fix problems. Learning how to use them properly will save all the time you used to spend scrolling through datasheets looking for problems until your mind went numb. Proper text editors allow this functionality. Personally I use jEdit to manage my data, which is available free for all operating systems. Learning to parse a .csv or .txt file that isn’t in a conventional box-format spreadsheet takes a little time but soon becomes routine.

For larger, linked databases, Microsoft Access used to be the class-leader. The later versions have compromised functionality for accessibility, leading many people to seek alternatives. Databases are compiled using SQL (Structured Query Language), and learning to use Access compels you to pick up the basics of this anyway. Given this, starting with a free alternative is no more difficult. I have always found MySQL to be easy and straightforward, but some colleagues strongly recommend SQLite. It might not have all the same functions of the larger database tools but most users won’t notice the difference. Most importantly, a database in SQL format can be transferred between any of these software tools with no loss of function. Migrating into (or out of) Access is trickier.

As a general rule, your data management software should be used for that alone. The criterion for choosing what software to use is that it should allow you to clean your data and load it into an analysis platform as quickly and easily as possible. Don’t waste time producing summaries, figures or reports when this can be done more efficiently using proper tools.

Data analysis

These days no-one looks further than R. As a working environment it’s the ideal way to load and inspect data, carry out statistical tests, and produce publication-quality figures. Many people — including myself — do pretty much all their data processing, analysis and visualisation in R*.

It’s interesting to note just how rapidly the landscape has changed. As an undergraduate in the 90s we were taught using Minitab. For my PhD I did all my statistics in SPSS, then as a post-doc I transitioned to GenStat. All are perfectly decent, serviceable solutions for basic statistical analyses. Each has its limitations but moving between them isn’t difficult.

I won’t hide the simple truth — learning R is hard, especially if you have no experience of programming. Why then did I bother? The simple answer is that R can do everything that all the above programs can do, and more. It’s also more efficient, reproducible and adaptable. Once you have the code to do a particular set of analyses you can tweak, amend and reapply at will. Never again do you have to work through a lengthy menu, drag-and-drop variables, tick the right boxes and remember the exact sequence for next time. Once a piece of code is written, you keep it.

If you’re struggling then there are loads of websites providing advice to all levels from beginners to experienced statistical programmers. It’s also worth looking at the excellent books by Alain Zuur which I can’t recommend highly enough. If you have a problem then a quick internet search will usually retrieve an answer in no time, while the mailing lists are filled with incredibly helpful people**. The other great thing about R is that it’s free***.

One word of warning is to not dive too deep at the beginning. Start by replicating analyses you’re already familiar with, perhaps from previous papers. The Quick-R page is a good entry point. A bad (but common) way of beginning with R is to be told that you need to use a particular analytical approach, and that R is the only way to do it. This way leads at best to frustration, at worst to errors. If someone tells you to use approximate Bayesian inference via integrated nested Laplace approximation, then you can do it with the R-INLA package. The responsibility is still on you to know what you’re doing though; don’t expect someone to hold your hand.

Because R is a language rather than a program, the default environment isn’t very easy to work in, and you’re much better using another program to interface with R. By far the most widely-used is RStudio, and it’s the one I recommend to my own post-graduate students. It will improve your R coding experience immensely. Some programmers use it for almost everything. An alternative is Tinn-R, which I used to use, but gave up on a few years ago because it was too buggy. It may have improved now so by all means try it out. If you’re desperate for a familiar-looking graphical user interface with menus then R Commander provides one, but I recommend using this as a gateway to learning more (or teaching students) rather than a long-term solution.

I’m a bit old-fashioned and prefer to use a traditional text editor to work in R. My choice, for historical reasons, is eMacs, which links neatly to R through ESS. The other tribe of programmers use Vim with the sensibly-named Vim-R-plugin, and we shall speak no more of them. If you’re already a programmer then you know about these, and can be assured that you can code in R just as easily. If not then stick to Rstudio, which is much easier. I also often use Geany as a tool for making quick edits to scripts.

Most of all, don’t type directly into R, it’s a recipe for disaster, and removes the greatest advantage which is its reproducibility. Likewise don’t keep a Word document open with R commands while continually copy-and-pasting them over. I’ve seen many students doing this, and it’s only recommended if you want to speed the onset of repetitive strain injury. Word will also keep reformatting and autocorrecting your text, introducing many errors. Use a proper editor and it’s done in one click.

One issue with R that more experienced users will come across is that it is relatively slow at processing very large datasets or large numbers of files. This is a problem that relatively few users will encounter, and by that point most will be competent programmers. In these cases it’s worth learning one of the major programming languages for file handling. Python is the easiest to pick up, for which Rosalind provides a nice series of scaled problems for learning and teaching (albeit with a bioinformatics focus). Serious programmers will know of or already use C, which is more widespread and has greater power. Finding out how to use a Bash shell efficiently is also immensely helpful. Learning to program in these other languages will open many doors, including to alternative careers, but is not essential for most people.

As a final aside, there is a recent attempt to link the power of C with the statistical capabilities of R in a new programming language called Julia. This is still in early development but is worth keeping an eye on if statistical programming is likely to become a major feature of your research.

Specialist software tools

Almost everything can be done in R, and those that can’t already, can be programmed. That said, there are some bespoke free software tools that are worth mentioning as they can be of great use to ecologists. They’re also valuable for those who prefer a GUI (Graphical User Interface) and aren’t ready to move over to a command-line tool just yet. Where I know of them, I’ve mentioned the leading R packages too.

Diversity statistics — the majority of people now use the vegan package in R. Outside R, the most widely-used free tool for diversity analysis is EstimateS. Much of the same functionality is contained in SPADE, written by Anne Chao (who has a number of other free programs on her website). I’ve always found the latter to be a little buggy, but it’s also reliably updated with the very latest methods. It has more recently been converted into an R package, spadeR, which has an accessible webpage that will do all the analyses for you. As a final mention, there is good commercial software available from Pisces Conservation, but apart from a cleaner-looking interface I’ve never seen any advantage to using it.

GIS — I’ll be returning to the issue of making maps in a later post, but will mention here that a direct replacement for the expensive ArcGIS is the free qGIS. I’ve never found any functionality lacking, but I’m not a serious GIS user either. There are a plethora of R packages which in combination cover the same range of functions but I wouldn’t like to make recommendations.

Macroecology — SAM (for Spatial Analysis in Macroecology) is a useful tool for quickly loading and inspecting patterns in spatial ecological data. I would personally still move into R for publication-grade analyses, but this can be a helpful stepping stone when exploring a new dataset.

Null models — these can be very useful in community ecology. The only time I’ve done this, I used the free version of EcoSim. I see that you now have to pay for the full version, so if someone can recommend a comparable R package in the comments then I’ll update this accordingly.

I’m happy to extend this list with further recommendations; please drop a note in the comments.

Further reading

Practical Computing for Biologists is a great book. A little knowledge goes a long way, and learning how to use the shell, regular expressions and a small amount of Python will soon reap dividends for your research, whatever stage you’re at.

* The most mathematically-inclined biologists might hanker after something more like MATLAB, for which a direct free replacement is GNU Octave. You can even transfer MATLAB programs across, although there are some minor differences in the language.

** Normal forum protocol applies here, which is that you shouldn’t ask a question to which you could reasonably have found an answer by searching for yourself. If you ask a stupid question that implies no effort on your part then you can expect a curt answer (or none at all). That said, if you really can’t work something out then it’s well worth bringing up because you might be the first person to spot an issue. If your problem is an interesting one then often you’ll find yourself receiving support from some of the top names in the field, so long as you are willing to learn and engage. Please read the posting guide before you start.

*** A few years ago a graduate student declined my advice to use R, declaring in my office that if R was so good, someone would be charging for it. I was taken aback, perhaps because I take the logic of Free Open-Source Software for granted. If you’re unsure, then the main benefit is that it’s free to obtain and modify the original code. This means that someone has almost certainly created a specific tool to meet your research needs. Proprietary commercial software is aimed at the market and the average user, whereas open-source software can be tweaked and modified. The reason R is so powerful is that it’s used by so many people, many of whom are actively developing new tools and bringing them directly to your computer. Often these will be published in Journal of Statistical Software or more recently Methods in Ecology and Evolution.

Trees In Space

Boldly going where no forest ecologist has gone before