Skip to main contentdfsdf

Oneweb's List: Problems with Research

    • Johannessen’s experience, and the project’s as a whole, illustrate that the crisis of “reproducibility” has a double meaning. In one sense, it’s a problem of results: Can a given finding be repeated in another lab? Does the finding tell us something true about the world? In another sense, though, it’s a problem of methodology: Can a given experiment even be repeated in another lab, whatever its results might be? If there’s no way to reproduce experiments, there’s no way to know if we can reproduce results.
    • Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data. Here, we used resting-state fMRI data from 499 healthy controls to conduct 3 million task group analyses. Using this null data with different experimental designs, we estimate the incidence of significant results. In theory, we should find 5% false positives (for a significance threshold of 5%), but instead we found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.

       

      I’m not a big fan of the whole false-positive, false-negative thing. In this particular case it makes sense because they’re actually working with null data, but ultimately what you’ll want to know is what’s happening to the estimates in the more realistic case that there are nonzero differences amidst the noise. The general message is clear, though: don’t trust FMRI p-values. And let me also point out that this is yet another case of a classical (non-Bayesian) method that is fatally assumption-based.

       

      Perhaps what’s the most disturbing thing about this study is how unsurprising it all is. In one sense, it’s big big news: FMRI is a big part of science nowadays, and if it’s all being done wrong, that’s a problem. But, from another perspective, it’s no surprise at all: we’ve been hearing about “voodoo correlations” in FMRI for nearly a decade now, and I didn’t get much sense that the practitioners of this sort of study were doing much of anything to clean up their act. I pretty much don’t believe FMRI studies on the first try, any more than I believe “gay gene” studies or various other headline-of-the-week auto-science results.

    • Those guys at Harvard (but not in the statistics department!) will say, “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.” But they’re innumerate, and they’re wrong. Time for us to move on, time for the scientists to do more science and for the careerists to find new ways to play the game.
    • “A lot of what is published is incorrect.”
    • the idea that something has gone fundamentally wrong with one of our greatest human creations.

    1 more annotation...

    • But a study published last month in the Proceedings of the National Academy of Sciences uncovered flaws in the software researchers rely on to analyze fM.R.I. data. The glitch can cause false positives — suggesting brain activity where there is none — up to 70 percent of the time.
    • The authors of the paper on the software glitch found that a vast majority of published papers in the field do not make this “multiple comparison” correction. But when they do, they said, the most widely used fM.R.I. data analysis software often doesn’t do it adequately.

    2 more annotations...

    • Dr Eklund now used the current analysis methods and compared 20 healthy people with 20 other healthy people. In other words, there should not have been any differences -- or, in any case only the five percent that chance provides. In total he made three million comparisons of randomly selected groups with data from 499 healthy persons.

       

      "The differences were considerably greater than five percent, up to 60 percent in the worst case," Dr Eklund says.

       

      This means that the analyses could have shown positive results where there shouldn't have been any, thereby indicating brain activity where there was no activity.

       

      He also analysed the same data set with his more calculation-heavy method and obtained a considerably better correspondence, with differences in the expected five percent of cases.

    • Irreproducible history

      In 2012, Amgen researchers made headlines when they declared that they had been unable to reproduce the findings  in 47 of 53 'landmark' cancer papers1. Those papers were never identified

    • One study adds to existing criticism of a Science paper that suggested that a cancer drug might be a potential treatment for Alzheimer’s disease2; a second counters earlier findings (including some by Amgen researchers) connecting a gene to insulin sensitivity in mice3, 4; and a third counters a Nature paper reporting that inhibiting one particular protein could enhance degradation of other proteins associated with neurodegenerative diseases5.

    2 more annotations...

    • I'm aware that there's a lot of debate going on around the PACE trial, and, as you will gather, my advice would be to make the data available.
      However, as I note in my blogpost, open data is still a long way from being the norm, and refusal to deposit data seems to be more common than not (see the Ascoli paper I cited). In addition, there was a time when every study that I got ethics approval for had to tell the participants explicitly that their data would *not* be publicly available and would *only* be seen by members of the research team. This was done to protect patients and because of concerns about confidentiality.
      Things are now changing, not least because some patient groups have objected to over-restrictive ethics statements that preclude re-use of data, so studies that are starting out now are more likely to include consent forms and information sheets that make it clear that data-sharing will occur. In such cases, it is of course vital that adequate anonymisation is carried out and that the data does not contain information such as postcodes or dates of birth that could link back to individuals. Even particular symptom patterns could identify someone in some cases. None of these are insuperable problems if data collection and ethics procedures plan for them in advance. But they are issues that affect older studies and can be hard to deal with retrospectively.
      Anyhow, having said all that, I hope it does become possible to fully anonymise and release the PACE data; given the high tension around this study, people are going to think there is some kind of cover-up if the data are withheld.
    • “I believe we are in the steroids era of social science,” he says.
    • Crowdsourcing research can reveal how conclusions are contingent on analytical choices
    • Under the current system, strong storylines win out over messy results. Worse, once a finding has been published in a journal, it becomes difficult to challenge. Ideas become entrenched too quickly, and uprooting them is more disruptive than it ought to be. The crowdsourcing approach gives space to dissenting opinions.
    • The stem-cell field holds enormous promise for therapy. As a result, all claims of considerable importance should be verified with utmost care before being made public. The Review suggests that such claims in the field of reprogramming and pluripotency should be demonstrated in more than one experimental model, and encourages their independent replication.

      Nature will endeavour to help the field to achieve its promise, and is looking at ways to support and encourage this reproducibility enterprise. For example, we ask authors to include more details about the methods developed in their studies. We strongly encourage our authors to deposit step-by-step protocols on freely accessible platforms, such as Protocol Exchange (www.nature.com/protocolexchange) — this may be requested for extraordinary claims, at the editor’s discretion. We encourage our authors to verify the origin of the cell lines they use, as we do for cancer cell lines (see Nature 520, 264; 2015).

      The Review concludes: “Science is ultimately a self-correcting process where the scientific community plays a crucial and collective role.” In this case, the stem-cell community has excelled in that role and should be congratulated.

    • Tie funding to verified good institutional practice, and robust science will shoot up the agenda, say C. Glenn Begley, Alastair M. Buchan and Ulrich Dirnagl.
    • Irreproducible research poses an enormous burden: it delays treatments, wastes patients' and scientists' time, and squanders billions of research dollars. It is also widespread.

    13 more annotations...

    • A large portion of replications  produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance  for methodological fidelity, and high statistical power to detect the original effect sizes
    • The 39% figure derives from the team's subjective assessments of success or failure (see graphic, 'Reliability test'). Another method assessed whether a statistically significant effect could be found, and produced an even bleaker result. Whereas 97% of the original studies found a significant effect, only 36% of replication studies found significant results
    • The team also found that the average size of the effects found in the replicated studies was only half that reported in the original studies.

    3 more annotations...

    • But these disparate results don’t mean that studies can’t inch us toward truth. “On the one hand, our study shows that results are heavily reliant on analytic choices,” Uhlmann told me. “On the other hand, it also suggests there’s a there there. It’s hard to look at that data and say there’s no bias against dark-skinned players.” Similarly, most of the permutations you could test in the study of politics and the economy produced, at best, only weak effects, which suggests that if there’s a relationship between the number of Democrats or Republicans in office and the economy, it’s not a strong one.
    • , he and some colleagues looked back at papers their journal had already published. “We had to go back about 17 papers before we found one without an error,” he told me. His journal isn’t alone — similar problems have turned up, he said, in anesthesia, pain, pediatrics and numerous other types of journals.

    1 more annotation...

1 - 20 of 64 Next › Last »
20 items/page
List Comments (0)