Why I am not signing the PRO Initiative (yet)

This week, Richard Morey et al.’s PRO (Peer Reviewer Openness) Initiative launched, the revamped version of their Agenda for Open Research. The PRO Initiative is a laudable step by a group of devoted Open Science proponents to make our science more transparent. And for good reason – science should be open, and accessible to everyone. The PRO Initiative aims to do this by asking reviewer to withhold in-depth review of academic papers if data and materials are not made open. The arguments for improving the way we handle data are compelling. The present practice, in which data is ‘available on request’ simply does not work, as has been shown several times. Moreover, data sharing encourages collaboration and emphasizes that science is a collaborative enterprise. We’re in this together, figuring out how the world works, and hopefully making it a better place. Sharing data is helping towards that ideal.

If you’re following Richard (if you do not, you should – even if you’re not into Open Science, his posts on Bayesian statistics cannot be missed!) or other members of the PRO Initiative (whom you should follow, too, again, even if you’re not into Open Science, because they’re all pretty good bloggers with sensible things to say), you will have seen many calls to sign the Initiative. As a signatory, you pledge that, starting January 2017, you will request that authors make their data publicly available when you review a paper, and withhold further review if they do not do so without good reason. Of course I have been thinking about this, starting a couple of months ago when Richard published the first version of the ‘Agenda for Open Science’ – what is not to like about Open Science?

But something did not sit well with me. I decided to wait for the updated version of the Agenda, which now is the PRO Initiative, and there was still this something that made me feel uneasy, a bit worried even. As a matter of fact, the past weeks I have been writing on a manuscript detailing these concerns, but maybe it’s better to throw some of these ideas out here and see what you think of this. I am still not decided.

I have a big problem with the PRO Initiative’s definition of ‘open data’. The PRO Initiative asks researchers to make data publicly available. Sharing, or depositing data on a server to which only researchers have access is not enough – data, preferably raw data, has to be publicly accessible in order to count as ‘open’. In principle, there is nothing wrong with wide open data –  on the contrary. CERN streams its data live to a publicly accessible server, some major archaeological discoveries have been made in the openly accessible data of Google Maps, and undoubtedly, if you want to discover extraterrestrial life, you’re free to roam NASA’s open database of images from other worlds. So, why do I feel uneasy about open data, then? The main reason: because we (cognitive neuroscientists/psychologists) observe people. Our raw data is a detailed description of human behaviour and neurophysiology. I have a problem throwing such data out in the open.

What I dearly miss in most discussions on open data is the perspective of the research participant. All arguments are centered around scientists and the process of science. We seem to forget that our (psychologists’) data is about actual people. In the discussion on open data, participants are stakeholders, too. It’s their data (not ours) we are planning to throw on the internet. As a scientist studying human behaviour, I feel my very first responsibility is to the participants in my experiments. I am obliged to guard them as well as I can from any harm coming from their participation. Moreover, I think that they should have strong voice in stating how their data can and should be used – stronger than that of the scientist. If a participant requests to be taken out of a dataset, so be it.

So, what may be harmful about publishing properly anonymized raw data? Well, I am trained in thinking in doomsday scenarios, so let’s come up with a potential disaster:

I participate in an fMRI/EEG experiment of a colleague in which by brain responses to a pornographic clip with very inappropriate material (insert your favourite fetish here) are measured, together with the physiological response of my Private Willy Johnson. The participant after me happens to be one of my first year students. This student unfortunately has an unhealthy obsession for his lecturer, and makes a note that I participated in this weird experiment, on Dec 3, around noon. One year later, the research paper with raw data is published. Being a good experimenter, my colleague notifies all research participants of this joyful occasion. Our student now downloads the data, and although I am known as Participant-007, our student checks the time stamps, and presto, he can now work on his blog post “How My Professor Got A Stiffy From Copulating Hippopotamuses And He Really Enjoyed It! (with data)”. Moreover, the student now also has my fMRI and EEG data. A recent study has shown that individuals can be reliably identified on basis of the neural connectivity data, so this means my stalker can now also identify my data in the study on The Effects of Mindfulness of Believing in Bullshit – an EEG Connectivity Study, and see that I score massive bonus points on the bullshit scale for knowing who Deepak Chopra is (and actually having talked to him)

Ok, sure, this is a strictly hypothetical scenario – but it does show how vulnerable wide open data is for breaches of privacy. Open data means basically giving up privacy to anyone who knows you participated in a particular experiment at a given time, and such knowledge can fairly easily be obtained by someone who wants to. So, delete the time stamps! Well, my colleague from the example would love to, but she pre-registered her study and she needs the time stamps to show she did collect the data after she submitted her preregistration…

Fine, you say. Let’s not post such sensitive data then. The PRO Initiative leaves enough space for this – if a researcher has a good reason to not make data publicly available, she/he can say so. However, I still have some more issues.

If the PRO Initiative gains momentum, petabytes of behavioural and neurophysiological data will become publicly accessible. Given that the vast majority of our studies are carried out in undergraduate psychology students, it is relatively easy to identify particular strata (e.g. students from the class 2015-2016 – I can just look at the timestamped data). For example, most of our freshmen are on Facebook, where they started a group page, and in the kindness of their hearts, they allowed me to be a member as well. This means I have access to all their profiles, and as such, I can compile a pretty interesting profile of the average psychology student. But with research data out in the open, I can also mine actual research data measuring validated psychological constructs. A lot will not be particularly interesting, but data about cognitive abilities, implicit prejudice, or attitudes towards political ideas may all be quite worthwhile from the perspective of, let’s say, a marketing company, or another party with an interest in nudging behaviour.

This is not a direct threat for any individual research participant (contrary to a breach of anonynimity), but if I, as a research participant, come into the for the benefit of science, I would be somewhat displeased to figure out that a shady marketing company uses my data for profiling. To aggravate matters – it is very well conceivable that matching up open research data with data from social networks or other sources can lead to identification – for example, individual ‘likes’ on Facebook predict personality; a measure I may be able to cross-reference with all the open data I just downloaded from the Groningen psychology servers. The more data, the happier my data-crunching algorithms will be.

You may think this is all very hypothetical, and very unlikely. Maybe you say I am scaremongering. Then again, that is what you need to do when thinking about research ethics, I’d say – I served in a couple of Ethics Committees over the past years. What is the absolute worst that can happen, and how likely is that scenario? My personal evaluation is that a breach of anonymity is conceivable, and in some cases even likely (e.g., when someone knows you participated in a particular experiment, which at least is not uncommon among first year participants here in Groningen).

So, yes, I am worried about the PRO Initiative. I am not at all convinced making data wide open is such a good idea. “But the data belongs to the public, the tax payer paid your salary!”, I hear you say. Well, sure, the tax payer also pays for the construction of the road they’re building next to our office building, but that doesn’t mean I (yes, I also pay taxes) can go to the construction site and help myself to a nice supply of concrete. I am entitled to use the road, however, once it’s finished. Metaphorically, the concrete is the data underlying the research paper. I think access to the research paper is what tax payers pay for – so making research papers open access should be a no-brainer to anyone.

I believe that posting raw data on the internet according to the guidelines of the PRO Initiative, results in an increased risk to the well-being my participants I am not willing to expose them to, no matter how small it is. I am happy the PRO Initiative leaves enough room to voice such concerns on an individual basis, but if the Initiative gains momentum, many research participants will be exposed to the risk of their data being used in manners they did not anticipate or consented with.

This is all the more pressing since there are excellent alternatives to ‘wide open data’ – hosting data on an institutional or national repository, for example, where access to data is regulated by an Ethics Committee or dedicated data officer, who can grant access to data on a case-by-case basis, or to registered users, independent of the researcher. As a matter of fact, this is already a requirement by many funding agencies and institutions, including mine (mind you, depositing data is required, making data publicly available typically is not!) Publicly posting raw data is – in my opinion – exposing participants to unnecessary risk for adverse effects of their research participation. Asking other scientists to do the same, and putting pressure on them to do so, makes me feel uneasy. Maybe this feeling is unjustified, I don’t know. But in a rather long nutshell, this is why I have not signed the PRO Initiative.

So, my Open Science pledge is that I will not make raw data public in any way unless my research participants request me to do so, nor will I ask others to expose their participants to unnecessary risk. I will pre-register my studies, upload my stimulus materials and analysis scripts to a publicly accessible place, but I will post my raw data to a reliable depository hosted by either my institution or a third party, where anyone who needs my data for research purposes can have access to it without intervention from my part. Moreover, I promise my participants that I will share their data with anyone who needs it for her or his research to advance science, but also to not share their data if they request me so. And finally, I will make sure that all my research output is openly accessible to everyone.

That’s what I have to offer, Team PRO. Hope we can still be friends?

The Feel Good Song Formula

Update 22/9/2016

I see the Feel Good Formula has been getting some attention again! Since last year, we have repeated this study in a Dutch sample, but now with a continuous rating (i.e. “How ‘feel good’ is this song on a scale from 1-100?”) That allows for a far better statistical model. Fortunately, the results do confirm the earlier work (i.e. Don’t Stop Me Now is still firmly in the Top 3). The full Dutch list can be found here (of course, it is edited for radio-friendliness). For those of you interested, based on the Dutch data, the full regression formula is:

Rating = 60 + (0.00165 * BPM – 120)^2 + (4.376 * Major) + 0.78 * nChords – (Major * nChords)

Where BPM is beats per minute (tempo), Major is 1 if the song is in a major key and 0 is the song is in a minor key, and nChords is the number of chords in the song (including modulations etc.) The formula basically says we generally like songs with a tempo that deviates from the average pop song tempo, that are in a major key, and are a bit more complex than 3 chord songs, UNLESS the song is in a major key.

If you’re in the UK, maybe you have seen or heard something about the ultimate feel good song formula uncovered by a real scientist with a somewhat unpronouncable name from a university with an equally unproncouncable name. Well, that scientist was yours truly! I got quite some questions about this feel good formula and how I ‘uncovered’ it, so here’s a short blog post on it!

The research was commissioned by a British electronics brand called Alba, who did a large customer survey in the UK and the Republic of Ireland, asking respondents for their musical preference, where they got their musical taste from, and, most importantly, their favourite songs to improve their moods. Probably given my background is using music as a mood manipulation (a.o. in Jolij & Meurs, 2011) my name popped up when they were looking for an academic to help them analyze this enormous dataset. Basically, they asked me whether I could find a general pattern in the songs that respondents reported as ‘feel good songs’, and whether they could use this pattern to come up with a ‘formula’. I found this an interesting challenge, so I said yes. One week later I received the data, and I could get to work.

A ‘feel good song’ is rather tricky to define. Music appreciation is highly personal and strongly depends on social context, and personal associations. In that respect, the idea of a ‘feel good formula’ is a bit odd – factoring in all these personal aspects is next to impossible, in particular if you want to come up with a quantitave feel good formula. Basically, what you need are song features that you can express in numbers.

Fortunately, music does have specific features that are known to play an important role in emotional reception of songs. In particular these are mode (major or minor) and tempo. So, the first thing I did was to identify all unique songs that respondents listed as ‘feel good’, and find the scores of these songs to determine key and tempo. Next, I looked at some additional variables, such as season in which the song was released, genre, lyrical theme, and overall emotionality of the lyrics.

So, now I’d got myself a big matrix with numbers. Now what? Originally, I planned to fit a linear mixed model to predict whether a song is a feel good song or not. A mixed model would be ideal – it would allow me to include a random factor for song, or even for respondent, and thus correct (somewhat) for individual differences such as social context, associations, and what more. Unfortunately, the list I got only listed feel-good songs. That’s a problem for an LMM, because you cannot fit a model if your outcome variable (feel good-song or not a feel good-song) has zero variability. Same thing for a machine learning algorithm – you need exemplars of both categories you want to classify. And I had just one…

The perfect solution is of course to come up with a baseline of songs that were not classified as ‘feel good songs’. Given I had only a very limited amount of time for this analysis, that was not feasible. I therefore decided to have a look at the means and in particular distributions of the key variables tempo and key to see if they would differ from the average pop song. The pattern was very clear – the average tempo of a ‘feel good’-song was substantially higher than the average pop song. Where the average tempo of pop songs is around 118 BPM, the list of feel good songs had an average tempo of around 140 to 150 BPM. Next I had a look at key (major or minor). Again a very clear pattern: only two or three songs were in a minor key, the rest was all in a major key. Of course, the proof of the pudding is in eating. I’ve created four short clips, two in a major key (C G Am F, the famous I-V-vi-IV progression), and two in a minor key (Am Am Em Bm), each at 118 BPM and 148 BPM, with a 4-to-the-floor beat under them. Listen to the differences, and decide which one would make the best feel good song.

Of course, a song is more than its score. I have also looked at lyrical themes. Predominantly, the feel good songs were about positive events (going to a beach, going to a party, doing something with your love, etc.) or did not make sense at all.

At the end of the story, I had to cook up a formula. My client had asked me to come up with a formula for PR-purposes: a formula can nicely explain the ‘main’ ingredients of a feel good song at a glance. The formula I came up with takes the number of positive lyrical elements in a song, and divides that by how much a song deviates from 150 BPM and from the major key. It’s not perfect at all – it’s mostly an illustration (all four clips I posted here would score 0 on my formula, simply because they have no lyrics, for example).

So, how to get from the ‘formula’ to the list of ultimate feel good songs? I had little to with that actually – we simply took the most often mentioned song per decade. Given that these modal feel good songs contribute to the averages, of course they fit the ‘formula’ reasonably well.

All in all, this was a fun assignment to do. Of course the main purpose for Alba was marketing, but that’s ok. They are to commended for doing this in such a data-driven fashion, in stead of making something up. Is this hardcore science? No, it’s data crunching – for me as a scientist, it’s useful because I now have a list of songs I can use for mood manipulations. However, the truly interesting questions are still open. Is this model predictive, that is, can it be used by composers to write specific feel good songs? What is so special about the major key that it makes us feel good? Why do fast songs work so well? Stuff to work on in the future – and maybe the most exciting thing about this commission is the sheer amount of responses I got from people interested in this work, and interested in finding an answer to the questions I mentioned earlier. I’m sure you’ll be hearing more about this topic from us in the near future!

PS: as this research was a private commission, I am afraid there is not going to be a peer-reviewed publication in the short term, nor am I at liberty to release the data. However, the reception of this work has inspired me to put my music-related work on top of my to-do list. Watch this space for more music research soon!

Within-subject designs in social priming – my attempts

TL;DR summary: it’s perfectly possible to do a within-subject design for ‘social’ priming.

This is going to be an attempt at a more serious post, about some actual research I have done. Moreover, I really need to get back into writing mode after summer leave. Just starting cold turkey on the >7 manuscripts still waiting for me did not work out that well, but maybe a nice little blog will do the trick!

This weekend, I was engaged in a Twitter exchange with Rolf Zwaan, Sam Schwarzkopf, and Brett Buttliere about social priming (what else? Ah, psi maybe!)  A quick recap: social (or better: behavioural) priming refers to the modification of behaviour by environmental stimuli. For example, washing your hands (and thus ‘cleaning your conscience’) reduces severity of moral judgments. Reading words that have to do with elderly people (‘bingo’, ‘Florida’) makes you walk slower. Or, feeling happy makes you more likely to see happy faces.

The general idea behind such effects is that external stimuli trigger a cascade of semantic associations, resulting in a change in behaviour. ‘Florida’ triggers the concept ‘old’, the concept ‘old’ triggers the concept ‘slow’, and if you think about ‘slow’ this automatically makes you walk slower. Indeed, semantics are closely tied to information processing in the brain – a beautiful study from the lab from Jack Gallant shows that attention during viewing of natural scenes guides activation of semantically related concepts. However, whether the influence of external stimuli and semantic concepts is indeed so strong as some researchers want us to believe is questionable. Sam Schwarzkopf argued in a recent blog post that if we are so strongly guided by external stimuli, our behaviour would be extremely unstable. Given the recent string of failures to replicate high-profile social priming studies, many researchers have become very suspicious of the entire concept of ‘social priming’.

What does not exactly help is that the average social priming study is severely underpowered. People like Daniel Lakens and UIi Schimmack have done far better in explaining what that means than I can, but basically it boils down to this: if you’re interested in running a social priming study (example courtesy of Rolf Zwaan), you pick a nice proverb (e.g., ‘sweet makes you sweeter’), and come up with an independent variable (your priming manipulation, e.g. half of your participants drink sugary water; the other half lemon juice) and a dependent variable (e.g., the amount of money a participant would give to charity after drinking the priming beverage). I’ve got no idea whether someone did something like this… oh wait… of course someone did.

Anyway, this is called a ‘between subject’ design. You test two groups of people on the same measure (amount of money donated to charity), but the groups are exposed to different primes. To detect a difference between your two groups, you need to test an adequate number of participants (or, your sample needs to have sufficient power). How many is adequate? Well, that depends on how large the effect size is. The effect size is the mean difference divided by the pooled standard deviation of your group, and the smaller your effect size, the more participants you need to test in order to draw reliable conclusions. The problem with many social priming-like studies is that participants are only asked to produce the target behaviour once (they come into the lab, drink their beverage, fill out a few questionnaires, and that’s it). This means that the measurements are inherently noisy. Maybe one of the participants in the sweet group was in a foul mood, or happened to be Oscar the Grouch. Maybe one of the participants in the sour group was Mother Theresa. Probably three participants fell asleep, and at least one will not have read the instructions at all.

To cut a long story short, if you don’t test enough participants, you run a large risk of missing a true effect (a false negative), but also you risk finding a significant difference between your groups whilst there is no true effect present (a false positive). Unfortunately, many social priming studies have used far too few participants to draw valid conclusions. This latter thing is significant (no pun intended). Given that journal editors until recently primarily were interested in ‘significant results’ (i.e., studies that report a significant difference between two groups), getting a significant result meant ‘bingo’. A non-significant result… well, too bad. Maybe the sweet drink wasn’t sweet enough to win over Oscar the Grouch! Add a sugar cube to the mix, and test your next batch of subjects. If you test batches of around 30 participants (ie. 15 per group, which was not abnormal in the literature), you can run such an experiment in half a day. Sooner or later (at least within two weeks if you test full time), there will be one study that gives you your sought-after p < .05. Boom, paper in!

In cognitive psychology and neuroscience we tend to be a bit jealous of such ‘easy’ work. Our experiments are harder to pull off. Right before summer, one of my grad students finished her TMS experiment, for which she tested 12 participants. For 18 hours. Per participant. In the basement of a horrible 1960s building with poor airco whilst the weather outside was beautiful. Yes, a position in my lab comes with free vitamin D capsules because of occupational health & safety reasons.

Moreover, the designs that we typically employ are within-subject designs. We subject our participants to different conditions and compare performance between conditions. Each participant is his/her own control. In particular for physiological measurements such as EEG this makes sense: the morphology, latency and topography of brain evoked potentials vary wildly from person to person, but are remarkably stable within a person. This means that I can eliminate a lot of noise in my sample by using a within-subjects design. As a matter of fact, the within-subjects design is pretty much the default in most EEG (and fMRI, and TMS, and NIRS, etc.) work. Of course we have to deal with order effects, learning effects, etc., but careful counterbalancing can counteract such effects to some extent.

Coming from this tradition, when I started running my own ‘social priming’ experiments, I naturally opted for a within-subjects design. My interest in social priming comes from my work on unconscious visual processing – very briefly, my idea about unconscious vision is that we only use it for fight-or-flight responses, but that we otherwise rely on conscious vision. The reason for this is that conscious vision is more accurate, because of the underlying cortical circuitry. Given that (according to the broad social priming hypothesis) our behaviour is largely guided by the environment, it is important to base our behaviour on what we consciously perceive (otherwise we’d be  acting very odd all the time). This led me to hypothesize that social priming only works if the primes are perceived consciously.


I tested this idea using a typical masked priming experiment: I presented a prime (in this case, eyes versus flowers, after this paper), and measured the participant’s response in a social interaction task after being exposed to the prime, in total 120 trials (2 primes (eyes/flowers) x 2 conditions (masked/not masked) x 30 trials per prime per condition). The ‘social interaction’ was quite simple: the participant got to briefly see a target stimulus (happy versus sad face), and had to guess the identity of the face, and bet money on whether the answer was correct. Critically, we told the participant (s)he was not betting her/his own money, but that of the participant in the lab next door. Based on the literature, we expected participants to be more conservative on ‘eye’-primed trials, because the eyes would remind them to behave more prosocially and not waste someone else’s money.

Needless to say, this horrible design led to nothing. Major problem: it is very doubtful whether my DV truly captured prosocial behaviour. After this attempt, we tried again in a closer replication of earlier eye-priming studies using a between-subjects design and a dictator task, but after wasting >300 participants we came to the conclusion many others had drawn before: eye priming does not work.

But this doesn’t mean within-subjects designs cannot work for priming studies. There’s no reason why you could not use a within-subjects design to test, for example, whether having a full bladder makes you act less impulsively. As a matter of fact, I’ve proposed such a study in a blog post from last year.

Another example: I am not sure if we could call it ‘social priming’, but a study we did a while ago used a within-subject design to test the hypothesis whether happy music makes you better at detecting happy faces and vice-versa. Actually, this study fits the bill of a typical ‘social priming’ study – activation of a high level concept (happy music) has an effect on behaviour (detecting real and imaginary faces) via a very speculative route. It’s a ‘sexy’ topic and a finding anyone can relate co. It may not surprise you we got a lot of media attention for this one…

Because of the within-subjects design we got very robust effects. More importantly, though, we have replicated this experiment two times now, and I am aware of others replicating this result. As a matter of fact, we were hardly the first to show these effects… music-induced mood effects on face perception had been reported as early as the 1990s (and we nicely cite those papers). The reason I am quite confident in the effect of mood on perception is that in our latest replication, we also measured EEG, and indeed find an effect of mood congruence on visual evoked potentials. Now, I am not saying that if you cannot find a neural correlate of an effect, it does not exist, but if you do find a reliable one, it’s pretty convincing that the effect *does* exist.

What would be very interesting for the social priming field is to come up with designs that show robust effects in a within-subjects setting, and ideally, effects that show up on physiological measures. And to be frank, it’s not that difficult. Let’s suppose that elderly priming is true. If concepts related to old people makes you indeed behave like grandpa, we should not just see this in walking speed, but also in cognitive speed. Enter the EEG-amplifier! Evoked potentials can be used nicely assess speed of cognitive processing – in a stimulus recognition task, for example, latency of the P3 correlates with reaction time. If ‘old’ makes you slower, we’d expect longer P3 latencies for trials preceded by ‘old’ or a related word, than for trials preceded by ‘young’. Fairly easy experiment to set up, can be run in a week.

Or even better – if, as the broad social priming hypothesis postulates, social priming works by means of semantic association, we should be able to find evidence of semantic relations between concepts. Again something that is testable, for example in a simple associative priming task in which you measure N400 amplitudes (an index for semantic relatedness). As a matter of fact, we have already run such experiments, in the context of Erik Schoppen‘s PhD project, with some success – we were able to discriminate Apple from Android enthusiasts using a very simple associative priming test, for example.

All in all, my position in the entire social priming debate has not changed that much. I do believe that environmental stimuli can influence behaviour to quite some extent, but I am very skeptical of many of the effects reported in the literature, not in the last place because of the very speculative high-level semantic association mechanisms that are supposed to be involved. In order to lean more credibility to the claims of ‘social priming’, the (often implicit) hypotheses about the involved mechanisms have to be tested. I think we (cognitive/social neuroscientists) are in an excellent position to help flesh out paradigms and designs that are more informative than the typical between-subject designs in this field. At least I think working together our colleagues in social psychology this way is more fruitful than trying to ‘educate’ social priming researchers about how ‘wrong’ they have been, and doing direct replications (however useful) of seminal studies and bask in Schadenfreude when the yet another replication attempt fails, or meta-analysis shows how flimsy an effect is. We know that stuff already. No need to piss each other off IMO (I am referring to a rather escalated ISCON discussion here of last week).

Let’s do some cool stuff and learn something new about how the mind works. Together. Offer made last year still stands.

The Open Data Pitfall II – Now With Data

Yesterday I wrote something on why I think providing unrestricted access data from psychological experiments, as advocated by some, is not a good idea. Today I was in the opportunity to actually collect some data surrounding this issue, from the people who are neglected in this discussion: the participants.

I have used Mentimeter to ask the 60 first year students who showed up for my Biopsychology lecture whether they would participate in an experiment of which the data would be made publicly available.

At the beginning of the lecture, I gave a short introduction on open data. I referred to the LaCour case, and to Wicherts et al.’s work on the lack of willingness to share data, and emphasized the necessity of sharing data. I also mentioned that there is a debate going on on how data should be shared. I mentioned that some researchers are in favour of storing data in institutional repositories, whereas other researchers are in favour of posting data on publicly accessible repositories. I then explicitly told I would give my thoughts on the matter after I asked them two short questions via Mentimeter.

I read out two vignettes to the students:

1. “Imagine you signed up via Sona for one *name of researcher* studies on sexual arousal. Data of the study will be shared with other researchers. The dataset will be anonymized – it may contain some information such as your gender and age, but no personally identifiable information. Would you consent to participate in this study?”

2. “Imagine you signed up for the same study. However, now *name of researcher* will make the data publicly available on the internet. This means other researchers will have easier access to it, but also that anyone, such as your fellow students, companies, or the government, can see the data. Of course, the dataset will be anonymized – it may contain your gender, or age, but no personally identifiable information. Would you consent to participate in this study?”

After each vignette, they submited their response to Mentimeter.com.

As I said, respondents were 60 first year psychology students, of the international bachelor in psychology of the University of Groningen, most of them German. It is my experience that this population generally guards its privacy a lot more than their Dutch counterparts – please keep this in mind.

The results? For scenario 1, 13.3% indicated they would *not* partcipate. This percentage indicates the data may be a bit skewed – for most studies I run (EEG work on visual perception and social interaction) I have a non-consent rate of about 5 to max. 10%. For my TMS work this can go op to 33%. However, given the nature of the research I used as an example (I named a researcher they know, and her research involves the role of disgust in sexual arousal – stuff like touching the inside of a toilet bowl after watching a porn clip), 13.3% might not be totally unreasonable.

For scenario 2, the percentage of non-consenters was obviously higher. But not just a little bit – it went up to a whopping 52.4%. More than half of the students present indicated they would not want to participate in this study if the data were to be made publicly available, even though I clearly indicated all data would be anonymized.

The Mentimeter result can be found here. Please note that there are 61 votes for vignette 2; one student was late and voted only for vignette 2. Feel free to remove one ‘no’ vote from the poll – it’s now 51.6% non-consenters.

What does this tell us? Well, there are some obvious caveats. First of all – this was a very ad-hoc experiment in a rather select and possibly biased group of students (ie., students who took the trouble of going to a lecture from 17:00 to 19:00 in a lecture hall 15 mins from the city centre, knowing I would lecture about consiousness, my favourite topic). Second, the experimenter (me) was biased, and even though I explicitly mentioned I would only give my view after the experiment, we all know how experimenter bias affects outcome of experiments. Maybe I did not defend the ‘open’ option furiosly enough. Maybe I made a weird face during vignette 2. Finally, the vignettes I used were about experiments in which potentially sensitive data (sexual arousal) is collected.

Nevertheless, I was surprised by the result. I expected an increase in non-consent, but not to such an extent that more than half would decline. Either I am very good at unconsciously influencing people, or this sample actually has a problem with having their data made publicly accessible. Anyway, it confirmed my hunch that in the debate on open data we should involve the people it is really about: our participants.

I do not wish to use this data as a plea against open data. But I do think researchers should talk to participants. Have a student on your IRB if you use first year participant pools, or otherwise someone from your paid participant pool. Set up a questionnaire to find out what participants find acceptable with regard to data sharing. In the end, if you post a dataset online without restrictions, it’s *their* data and *their* privacy that are at stake.

As a side note, going through some paperwork about consent forms, it actually turned that data storage and sharing in my default consent form is phrased as such:

“My data will be stored anonymously, and will only be used for scientific purposes, including publication in scientific journals.”

This formulation, which is presribed by my IRB, allows for data sharing between researchers, but forbids unrestricted (open) publication. I was actually quite happy to rediscover this – it means I can adhere to the Agenda for Open Research (or better, can not adhere to it with good reason)… publication of data would be a breach of consent in this case. If I were to put my data publicly online, I cannot keep my promise that data would only be used for scientific purposes.

But why not add something to the informed consent?

“The researcher will take care my data is stored at an institutional repository and guarantees she or he will share my data upon request with other researchers.”

Everybody happy.

The pitfalls of open data

TL;DR summary: some data can be made publicly available without any problems. A lot of data, however, cannot. Therefore, unrestricted sharing should not be the default. In stead, all data could be hosted on institutional repositories to which researchers can get access upon request to the institution.

Data is an essential part of research, and it is a no-brainer that scientists should share their data. The default approach is and has been ‘share on request’: if you’re interested in a dataset, you simply e-mail the author of a paper, and ask for the data. However, it turns out that this does not work that well. Wicherts, Borsboom, Kats, & Molenaar (2006)  have shown, for example, that authors are not really enthusiastic about sharing data, something not unique to psychology.

This is bad. Sadly, not just for the sake of scientific progress – recently, social science has seen another data-fabrication scandal where a graduate student faked his data for a study published in Science (you would think they had learned their lesson at Science after Stapel, but sadly, no). Making data available with your publication at least makes sure that a) you conducted the study, and b) allows others to (re)use your data, saving work in the end.

It is therefore not surprising that there is now an open research movement, calling for full transparency in research, including making all research data public by default. I totally support open research, and I have considered signing the ‘Agenda’ several times. After a discussion of the ISCON Facebook page I have now decided not to.

As a matter of fact, the discussion has convinced me that making all research data publicly available without restriction by default is in fact a bad idea.

Before the flame-war starts, let me point out that I am not against sharing data between researchers, or not even against compulsory data sharing (i.e., if an author refuses to share data without good reason, her/his boss will send the data). However, I disagree with unrestricted data publishing, i.e. putting all data online where anyone (i.e., the general public) can access it. I am strongly in favour of a system where data is deposited at an institutional repository and anyone interested in the data may ask for access, if necessary even without consent of the author.

Let me illustrate my concerns with the following thought experiment. You participate in an experiment on sexual arousal, and have to fill out a questionnaire about how aroused you are after watching a clip with the most depraved sex acts. Your data is stored anonymously, and will be uploaded to Github directly after the experiment (see Jeff Rouder’s paper on an implementation of such a system). Would you give consent?

For this example, I may. I can always fake my response on the questionnaire should I feel something tingling in my nether regions to avoid embarrassment.

For the next experiment, this study is repeated, but we’re now measuring physiological arousal (i.e., the response of your private parts to said depraved sex acts). Again, the data will be uploaded directly to Github after the experiment.

Now, I would be a bit uncomfortable. Suppose I got sexually aroused (or not – it actually does not matter, the behaviour of my private Willy Johnson is not anyone’s business besides my own and my wife’s, and for this one occasion, the researcher’s). This is now freely available for anyone to see. And by the timestamp on the file, I may be identified by the one or two students who saw me entering the sex research room for the 12:00 session on June 2nd. Unlikely, but not impossible. Oh sure, remove the timestamp then! Yes, but how is a researcher then going to show (s)he collected all the data after preregistering his/her study and not before (or did not fabricate the data on request after someone asked for it)?

Ok, we take it a step further. We now measure the response of your nether regions, but now we ask you to have your fingerprint scanned and stored with the data as well.

Making this data publicly available would be huge no to me. Fingerprints are unique identifiers, are you mad?

But now replace ‘fingerprint’ with raw EEG data. We do not often consider this, but EEG data is as uniquely identifiable as fingerprints. I can recognize some of my regular test subjects and students from their raw EEG data – shape and topography of alpha, for example, are individual traits and may be used to identify individuals if you really, really want to.

One step further: individual, raw fMRI data, associated with your physiological ‘performance’ on this sex rating task. Rendering a 3D face from the associated anatomical image is trivial – it’s one of the first things you do (for fun!) when you start learning MRI analysis. How identifiable do you want to have your participant? And note that raw individual fMRI data cannot be interpreted without the anatomical scan – you need the latter to map activations on brain structures.

So, don’t publish the raw data then! Sure, that fixes some problems, but creates others. What if I want to re-analyze a study’s data, because I do not agree with the author’s preprocessing pipeline, and rather try my own? For this I would still ask the author for the full data set, then. Mind you – most researcher degrees of freedom for EEG and fMRI are in the preprocessing of data (e.g., what filters do you use, what kind of corrections do you apply, what rereferencing do you apply, etc.), and aggregate datasets, such as published on Neurosynth do not allow you to reproduce a preprocessing pipeline.

But the main problem is that many data or patterns in data can be used as unique identifiers. Even questionnaire, reaction time, or psychophysics data. Data mining techniques can be used to find patterns in datasets, such as Facebook likes, that can be used for personal identification. What’s to stop people from running publicly available research data through such algorithms? Unlikely? Sure. Very much so, even. Impossible? Nope.

Of course, my thought experiment deals with a rather extreme example – I guess that very little people are willing to have their boy/girl-boner data in a public database for everyone to see. So let’s take another example. Visual masking. What can go wrong with that? Well – performance on a visual masking task may be affected by illnesses such as schizophrenia, or being related to an individual with schizophrenia. Is that something you want to be publicly accessible? And so there are many other examples. Data reveals an awful lot about participants and it is not clear at all how much data is needed to identify people. It may be less than we think.

I fully realize that the scenarios I put forward here are extreme, hypothetical, and I am sure some people will think I am fearmongering, making a fuss, and maybe even an enemy of open science. Ok, so be it. I think that we as scientists do not only have a responsibility to each other, but even more so to our participants. People participating in our studies are the lifeblood of what we do and earn our utmost respect and care. They participate in our studies and provide us with often very intimate data, but also trust us we handle that data conscientiously, and they contribute their data for science. We need to protect their privacy. Just putting all data online for everyone to see does not fit with that idea. There is always a potential for violations of privacy, but making all data public also opens up the data for, let’s say, the government, insurance companies, marketeers, and so on, for corporate analyses, marketing purposes, and other goals than the progress of science. Do we want that?

Maybe I should give another example – what about video material? Suppose you carried out an experiment in which you taped participants’ emotional responses to shocking material. Even if you would blur out faces to prevent identification, and my IRB is ok with publishing these clips, I would still not submit such material to a public depository for every Tom, Dick, and Harry to browse clips of crying participants.

I am not saying these are realistic scenarios, but is worth giving some thought – at least, more than people are doing now.

There are and will be many datasets that can be made publicly available without any concern at all. I’ve got a feeling that the authors of the Agenda for Open Research primarily work with such datasets, but do not sufficiently realize that there is a lot of sensitive data being collected as well. The ideal of all data made public by default does not fit well with my ideas of being a responsible experimenter. And there is a clear ‘grey zone’ here. Not everyone will share my concerns. Some will even say I am making a fuss out of nothing. But I would like to be able to carry out my job with a clear conscience. Towards my colleagues, but most of all towards my participants. And that means I will not make every dataset I collect publicly available, even if this entails the signatories of Agenda for Open Research will not review my paper because they do not agree with my reasons not to make a given dataset publicly available. Too bad.

So, you want access to the data I did not made publicly available, but I am on extended leave? There is a fix! And actually, this fix should appeal to the Open Research movement too.

For every IRB-approved experiment, require authors to deposit their data at an institutional repository. All data and materials that is. Raw data, stimuli, analysis code and scripts. The whole shebang, all documented of course. Authors are free to give anyone access to this data they want to. Scientists interested in the data can request access via a link provided with a paper. In principle, the author will provide access, but if no reply is given within a reasonable term (let’s say two weeks), or the data is not shared, but without proper reason, the request is forwarded to the Director of Research (or another independent authority) who then decides.

In Groningen, we have such a system in place. It ensures that for every published study, the data is accounted for, and access to the raw data can be granted if an individual requires so. The author of a study controls who has access to the data, but can be overruled by the Director of Research. It works for me, and I do not see what the added benefits of unrestricted access are over this system. Working in this way makes me feel a lot better. I can only hope that the signatories of the Agenda for Open Research consider this practice to be open enough.

A quick comment to my previous post

Well, my previous blogpost was poorly timed – I did not expect such an explosion on my Twitter timeline. I cannot reply to all (I am sorting Legos with my son on my day off, quite an important and pressing matter), and replying tomorrow during my staff meeting would be a bit rude, so let me clarify on a couple of things I have seen in the message centre of my iPad.

An apology to Daniel Lakens

First, I feel I have to briefly apolize to Daniel Lakens – the first two paragraphs of my post I made little fun of him. From my part in good jest, but I admit that I may have let some of my sentiment of annoyance towards Daniel’s occasionally moralizing tone shine through a bit too much. I’m sorry, Daniel, if I offended you – if anything, know I deeply appreciate the good work you’re doing.

Can you please derive psi from GTR?

Not sure if this was directed to me, but I’ve seen this briefly in a tweet. Er, no, I cannot derive psi from the general theory of relativity. But neither can the Stroop effect be derived from the GTR. So, it’s a silly question. If the question is “can you derive psi from known physics”, then it’s a different matter. Physical laws give the boundary conditions for normal biological functioning. Psi, according to many, cannot exist because these boundary conditions forbid it. I have argued that that is not necessarily true. That does not mean that psi exists, though – the existence of pink elephants and uranium-powered dragons is also not prohibited by physics, and their existence remains also unproven by many studies (my son and I prefer uranium-powered dragons over Russell’s Teapot, but essentially it’s the same argument).

By the way – I do suspect that the asker knows that GTR is indeed potentially problematic for psi. Most physics-based psi theories are based on the concept of quantum non-locality. The non-locality aspect of quantum theory is incompatible with GTR, but yet we know both are correct – it’s one of the great problems in physics. There is presently one theory that seems to integrate both successfully, based on the very speculative concept of spontaneous collapse of the wave function. If this theory is correct, this would rule out pretty much all non-local psi theories that assign a special role to consciousness.

There is no evidence for psi, can we please stop this non-sense?

True. There is no conclusive experimental evidence for the existence of psi. If such evidence existed, we would not have this debate. However, does this mean psi does not exist? No, of course not. But, “Russell’s Teapot!”, I hear you think. Sure, it would be, if psi would be confined to a (non-existent) lab-related phenomenon. However, paranormal experiences have been reported throughout history, by all cultures. Of course, the vast majority of these phenomena can be explained nowadays by normal psychological or biological processes (including fraud). Psi research, however, started with the aim to reliably recreate such phenomena in the lab. Which does not seem to work for many phenomena. Although that’s of course not very promising, we have to acknowledge that Reality is not confined to our labs. An inability to recreate something in a lab does not mean it does not exist, of course.

You are a@#$@@#$ psi-believer!

Well, not really. As stated above, I do not agree with people like Bem and Tressoldi that there is convincing evidence for psi. However, I also do not agree with people like Daniel Lakens and EJ Wagenmakers that psi research is nonsense. I do believe psi is a very worthwhile topic of study, if done properly, because a convincing demonstration of psi would be a breakthrough for consciousness research. Given that there is a continuous stream of experiments that do seem to show effects, and that I have been getting some odd (and replicable) results in a my own lab, I am inclined to keep a close eye on this line of research and give it the benefit of doubt. However, I do not expect others to jump on the bandwagon or make it priority research (yet).

What genuinely annoys me, though, is the patronizing, scoffing, ridiculing, and accusations of QRPs or outright fraud towards psi researchers by self-proclaimed skeptics. There’s a lot a chaff between the wheat, that’s absolutely true, but I would say it’s not really necessary to make fun of intelligent people who are really trying to do serious research.

Any more?

Not for now – but if you’ve got comments/questions re: this topic, please do engage.


Why a meta-analysis of 90 studies does not tell that much about psi, or why academic papers should not be reduced to their data

Social psychologist-turned-statistics-and-publication-ethics crusader Daniel Lakens has recently published his review of a meta-analysis of 90 studies by Bem and colleagues that allegedly shows that there is strong evidence for precognition. Lakens rips apart the meta-analysis in his review, in particular because of the poor control for publication bias. According to Lakens, who recently converted to PET-PEESE as the best way to control for publication bias, there is a huge publication bias in the literature on psi, and if one, contrary to the meta-analysis’ authors, properly controls for that, the actual effect size is not different from zero. Moreover, Lakens suggests in his post that doing experiments without a theoretical framework is like running around like a headless chicken – every now and then you bump into something, but it’s not as if you were actually aiming.

I cannot comment on Daniel’s statistical points. I have not spent the last two years freshing up my stats, as Daniel so thoroughly has done, so I have to assume that he knows to some extent what he’s talking about. However, it may be worthwhile noting that the notion of what is an effect, and how to determine its existence has become somewhat fluid over the past five years. An important part of the debate we’re presently having in psychology is no longer about interpretations of what we have observed, but increasingly about the question whether we have observed anything at all. Daniel’s critical review of Bem et al.’s meta-analysis is an excellent example.

However, I do think Daniel’s post shows something interesting about the role of theory and methods in meta-analyses as well, that in my opinion stretches beyond the present topic. After reading Daniel’s post, and going through some of the original studies included in the meta-analysis it struck me that there might be going something wrong here. And with ‘here’ I mean reducing experiments to datasets and effect sizes. We all know that in order to truly appreciate an experiment and its outcomes, it does not suffice to look at the results section, or to have access to the data. You also need to carefully study the methods section to verify that an author has actually carried out the experiment in such a way that is measured what the author claims has been measured. And this is where many studies go wrong. I will illustrate this with a (in)famous example; Maier et al.’s 2014 paper ‘Feeling the Future Again’.

To give you some more background: Daniel claims that psi lacks a theoretical framework. This statement is incorrect. In fact, there are quite some theoretical models that potentially explain psi effects. Most of these make use or abuse concepts from (quantum) physics, and a as a result many psychologists either do not understand the models, or do not bother to try understand them, and simply draw the ‘quantum waffling’ card. Often this is the appropriate response, but it’s nothing more than a heuristic.

Maier et al. (2014) did not start running experiments like headless chickens hoping to find a weird effect. In fact, they quite carefully crafted a hypothesis about what can be expected from precognitive effects. Precognition is problematic from a physics point of view, not because it’s impossible (it isn’t), but because it creates the possibility for grandfather paradoxes. In typical precognition/presentiment experiments, an observer shows an anomalous response to an event that will take place in the near future, let’s say a chandelier falling down from the ceiling. However, if the observer is aware of his precognitive response, (s)he can act in order to prevent the future event (fixing new screws to the chandelier). However, now said event will not occur anymore, so how can it affect the past? Similarly, you cannot win roulette using your precognitive powers – any attempt to use a signal from the future to intentionally alter your behaviour leads to time paradoxes.

In order to avoid this loophole, Maier et al. suggest that precognition may only work unconsciously; that is, if there are precognitive effects they may only work in a probabilistic way, and only affect unconsciously initiated behaviour. Very superficially, this line of reasoning resembles Deutsch’s closed timelike curves proposal for time-travel of quantum information, but that’s besides the point here. The critical issue is that Maier et al. set up a series of experiments in which they manipulated consciousness of stimuli and actions that were believed to induce or be influenced by precognitive signals.

And that is where things go wrong in their paper.

Maier et al. used stimuli from the IAPS to evoke emotional responses. Basically, the methodology is this: participants had to press two buttons, left and right. Immediately after, two images would appear in on the screen, one of which would have negative emotional content. The images were masked in order to avoid them from entering conscious awareness. The idea is that participants would respond slower pressing the button at the same side as where the negative image would appear (ie., they would precognitively avoid the negative image). However, since this would be a strictly unconscious effect, it would avoid time paradoxes (although one could argue about that one).

What Maier et al. failed to do, though, is properly checking whether their masking manipulation worked. Properly masking stimuli is deceivingly difficult, and reading through their method sections, I am actually very skeptical whether they could have been successful at all. The presentation times of the masked stimuli were 1 video-frame, which would be necessary to properly mask the stimuli, but the presentation software used (E-Prime) is quite notorious for its timing errors, especially under Windows 7 or higher, with video cards low on VRAM. The authors, however, do not provide any details on what operating system they used, or what graphics board they used. To add insult to injury, they did not ask participants on a trial-by-trial basis whether the masked image was seen or not (and even that may not be the best way to check for awareness). Therefore, I have little faith the authors actually succeeded in successfully masking their emotional images in the lab. Their important, super-high-powered N=1221 study, which is often cited, has been carried out on-line. It’s very dubious whether masking was successful in this case at all.

If we follow the reasoning of Maier et al., conscious awareness of stimuli is important in getting precognitive effects (or not). Suppose that E-Prime’s timing messed up in 1 out of 4 trials, and the stimuli were visible – what does that mean for the results? Should these trials have been excluded? Can’t it be the case that such trials diluted the effect, so we end up with an underestimation? And, can’t it be that the inclusion of non-masked trials in the online experiment has affected the outcomes? Measuring ‘unconscious’ behaviour, as in blindsight-like behaviour, in normal individuals is extremely difficult and sensitive to general context – could this have played a role?

In sum, if you do not carefully check your critical manipulations you’re left with a high-powered study that may or may not tell us something about precognition. However, it matters when you include it in your meta-analysis – a study with such a high N will appear very informative because of its (potential) power, but if the methodology is not sound, it is not informative at all.

On a separate note, Maier et al.’s study is not the only one where consciousness is sloppily manipulated – the average ‘social priming’ or ‘unconscious thinking’ study is far worse – make sure you read Tom Stafford’s excellent commentary on this matter!

So, how it this relevant to Bem’s meta-analysis? Quite simply put: what studies you put in matters. You cannot reduce an experiment to its data if you are not absolutely sure the experiment has been carried out properly. And in particular with sensitive techniques like visual masking, or manipulations of consciousness, having some expertise matters. To some extent, Neuroskeptic’s Harry Potter Theory makes perfect sense – there are effects and manipulations which require specific expertise and technical knowledge to replicate (ironically, Neuroskeptic came up with HPT to state the opposite). In order to evaluate an experiment you not only need to have access to the data, but also to the methods used. Given that this information seems to be lacking, it is unclear what this meta-analysis actually tells us.

Now, one problem that you will run in to a whole series ‘No True Scotsman’ arguments (“we should leave Maiers’s paper out of our discussions of psi, because they did not really measure psi”), but some extent that is inevitable. The data of an experiment with clear methodological problems is worthless, even if it is preregistered, open, and high-powered. Open data is not necessarily good data, more data does not necessarily mean better data, and a replication of a bad experiment will not result in better data. The present focus in the ‘replication debate’ draws attention away from this – Tom Postmes referred to this ‘data fetishism’ in a recent post, and he is right.

So how do we solve this problem? The answer is not just “more power”. The answer is “better methods”. And a better a priori theoretical justification of our experiments and analysis. What do we measure, how do we measure it, and why? Preferably, such a justification has to be peer-reviewed, and ideally a paper should be accepted on basis of such a proposal rather than on basis of the results. Hmm, this sounds somewhat familiar