Over the past years it has become apparent we have a bit of a problem in psychology: because of wonky stats, methods, and publishing ethics, a lot of the progress in our field over the past decades turns out to be built on quicksand. 2016 has seen several high-profile replication failures which in part may be attributed to these problems. But there is hope! The so-called Open Science Movement is pushing for reforms in scientific methodology and practices that will undoubtedly leave its mark for years to come (in a good way!)
The Open Science Movement, a loose group of scientists passionate about open science, is well-represented on social media, for example on Twitter (check the hashtag #OpenScience), or on Facebook in the Psychological Methods and Practices (PsychMAP) page (please note that open science is not just about psychology, but being a psychologist I primarily follow the psychologists!) I have been following the developments in Open Science and improving statistics and methodology over the past years with great interest, because I genuinely believe open and better science to be a good thing.
Nevertheless, one of the first things I have done in the new year is to disengage from the ongoing discussions on Twitter and Facebook. I found that they take up a lot of time, and a lot of energy, without actually helping improve my science. The main reason for this is that all the arguments have been made, the open science community is increasingly preaching to the choir, and all too often, the sermons are little more than Schadenfreude at yet another failed replication (or improbable research result).
A similar observation can be made in the Bayesian versus Frequentist statistics debate. Following this debate on Twitter (or Facebook) is very interesting and entertaining. Snarky comments, interesting stats theory and beautiful examples of confirmation bias, as evidenced by papers linked to by Bayesian converts or frequenstist fundamentalists – it’s brilliant. However, what to make of it?
Finally, it seems that more and more discussions about the tone of the debate rather than the debate itself (see here for a rather [fill in your own adjective] example)… It leads to the puzzling situation where a lot of people are talking metascience on metascience. A bit to meta-squared to my taste.
It used to be fun for a while, following all this stuff, but after the latest posts on PsychMAP I realized following the methods debate is teaching me only few new things anymore, and that Open Science Advocates and Methods Reformers are more often annoying than inspiring me. That is not a good thing, because the cause of open science is a good one. So, the best way to deal with that negative emotion? Well, for me: disengage. However, not before leaving a few final comments on my position in these debates, of course 😉
1. Bayesians versus Frequentists
It seems to me that there is a deep misunderstanding of ‘Bayesianism’ at the core of this entire debate. Bayesian statistics is not just another way doing statistics, rather it is a different epistemological view on inference. You can read tonnes on Bayesian versus frequentist statistics elsewhere, so I will not reiterate this here, but it seems to me that the problems many people observe with p-values and/or Bayes factors simply boil down to a problem with inference and interpreting statistics.
Basically, a Bayesian does not believe we can express our knowledge of the world in absolutes (as in, “the null hypothesis can be rejected”), whereas a frequentist with a naive interpretation of null-hypothesis significance testing does. The Bayesian expresses her/his knowledge about the world in likelihood ratios, or how much more likely hypothesis A is than hypothesis B, which is exactly what a Bayes factor allows you to do. Unfortunately, this very nice and sensible philosophy is undermined by people who think a Bayes factor can be interpreted in a very similar way as a p-value, and are craving a cutoff at which they can say “my alternative hypothesis is true”! No, that’s not how it works, sorry. Whether you need to revise your beliefs in a hypothesis is up to you and not specified by a cutoff table. Given that a Bayes factor means something completely different than a p-value, I see very little use in reporting both p-values and Bayes factors, as some people propose.
However, of course one does not need a Bayes factor to make nuanced inferences. By actually reading papers, and looking at statistical evidence (such as a p-value) we can do the very same thing, albeit not in a quantified manner. A ‘significant’ result does not mean anything in and by itself. A replicated significant result, however, … brings me to the following:
…are key to scientific progress. However, when replicating a study, there is such a thing as ‘flair’, a concept introduced by Roy Baumeister, and subsequently widely ridiculed by methodological progressives. I don’t think that is entirely justified – there are effects that require quite a bit of expertise to obtain. In my own field, I am thinking of TMS-induced masking, for example. There’s a lot of tacit knowledge required with regard to subject handling, coil positioning, and stimulus design to get good masking effects. However, I think the same goes for ‘social’ manipulations. Sometimes you need to make a participant believe something that isn’t true (such as that they are participating in a group effort). Not every experimenter is equally good at this. Therefore I tend to be a bit careful when seeing a non-replication, rather than basking in Schadenfreude as seems to be a bit more customary than it should – especially when a non-replication is reported by a sceptic researcher. Experimenter effects are a thing, after all… Personally, I take the extreme sceptic view of wanting to replicate something for myself (which does not always work out…) before I believe it.
3. Effect sizes and power
Yeah, about that – what Andrew and Sabrina said. Small effects may matter, but on a group these are most often the result of a small number of participants showing a stronger response to a manipulation for some unknown reason. A small effect typically means (at least, in my book) that you do not know how the mechanism works, and perhaps need a better theory.
No-brainer. Data unboxing
parties protocols are in full effect in my lab as of this year.
5. Open access and open data
Open access: that’s a no-brainer, too. Everyone should have access to scientific papers, all the more if those papers are funded by the tax payer.
Open materials: sure. All in. Feel free to download my bread-and-butter paradigm!
Open data: that depends. In psychology, we have a problem with data sharing. Several studies have shown that ‘data is available on request’ doesn’t mean sh*t when it comes to data sharing. The PRO Initiative, which has come into effect per Janurary 1, suggests that to counter this, scientists should make their raw data publicly available. I have some issues with this, and the concerns I express in that and subsequent posts have not been taken away. I am preparing a more detailed response in a full paper, including actual data and legal stuff, but I don’t think it is up to scientists themselves to decide what data can be publicly shared for the benefit of science, and that we should err on the side of caution, and not have public sharing of human participant data as default. In sum, I am still not joining PRO. However, with regard to my own data sharing practices: my university already requires me to store all my data on the university servers, and to share with other researchers (which is not the same as public sharing!), so I think I am as open with my data as I can (and want to) be at this stage.
The Open Science Movement has definitely changed my scientific practices for the better, and I have learnt a great deal following the debate. However, apart from the open data issue, I think I am kind of done with it. Time to move on, and use all the great things I have learnt to do some real science!