we don’t need another p < 0.05

This is an Eval Central archive copy, find the original at camman-evaluation.com.

Photo by Jordan McDonald on Unsplash. I swear the picture is relevant.

If you listened to our recent episode of Eval Cafe with Michael Quinn Patton on principles-focused evaluation, you’ll remember him sharing his new favourite example of principles in action. It’s from the introductory article of a recent special issue of The American Statistician, which is all about moving beyond the use of p < 0.05 as the threshold for determining statistical significance. The article offers an impassioned explanation of why abandoning the entire concept of statistical significance is necessary and also outlines the beginnings of an alternative practice for valuing and interpreting statistical findings. The reason it showed up in the podcast is because the authors ground this new framework in principles, or flexible advice that can guide decisions and give direction, but must be adapted and interpreted in context. In comparison, p < 0.05 is a rule—it is applied the same way regardless of any contextual factors. (Check out the podcast and also Michael’s book, Principles-Focused Evaluation, to learn more about the implications of principles for evaluative work.) Specifically, the principles that the authors offer are, “Accept uncertainty. Be thoughtful, open, and modest” (or “ATOM”, as a mnemonic), and the remainder of the issue (43 articles worth!) goes on to offer more depth around the issues of p < 0.05 and the discussion of alternatives.

For an academic publication about statistics, it is, frankly, stirring. Read this:

“At times in this editorial and the papers you’ll hear deep dissonance, the echoes of ‘statistics wars’ still simmering today (Mayo 2018). At other times you’ll hear melodies wrapping in a rich counterpoint that may herald an increasingly harmonious new era of statistics. To us, these are all the sounds of statistical inference in the 21st century, the sounds of a world learning to venture beyond ‘p < 0.05.’ This is a world where researchers are free to treat ‘p = 0.051’ and ‘p = 0.049’ as not being categorically different, where authors no longer find themselves constrained to selectively publish their results based on a single magic number. … As we venture down this path, we will begin to see fewer false alarms, fewer overlooked discoveries, and the development of more customized statistical strategies. Researchers will be free to communicate all their findings in all their glorious uncertainty, knowing their work is to be judged by the quality and effective communication of their science, and not by their p-values.”

(Has a paper on inferential statistical testing ever brought tears to your eyes? This brought tears to mine. The freedom in it, the vision of it, the clarion call to a remembered sacred purpose of meaningful scientific discovery—it’s poetry. And, honestly, the whole article is a delight to read and it’s open access. Treat yourself!)

So what’s wrong with p < 0.05? Why devote an entire issue of an journal to explaining why it should be done away with?

The problem is that it’s a seductively simple idea that was never equipped to be used the way that it has been. We took something that might have been an okay guideline for thinking about whether to explore a statistical relationship further and turned it into something so hard-and-fast that careers are made and broken on it, that people are incentivized to do bad science because of it (whether that’s bad actors manipulating results or more subtly the cumulative, unintentional harm of something like the “file drawer problem”). We took p < 0.05 and applied it thoughtlessly, rigidly, imperiously, and with total disregard for context. To the point that the statisticians tell us that the solution to p < 0.05 is not to tweak it, to start using “p < 0.10” instead or confidence intervals or to come up with a more complicated system of rules that let us keep doing essentially the same thing as we always have but “better” this time. Instead they ask us to recognize that the entire concept of a fixed threshold for statistical inference is flawed and we must shift to a way of thinking that incentivizes nuance, humility, and care. It is a call to transformation, to a world beyond p < 0.05.

So why am I bringing this up, since this post isn’t actually about statistics? (Surprise!) Bear with me—we’re about to go on a bit of journey.

I bring up p < 0.05 and its critiques because I realized that to me it speaks to the same issues I have with sex and gender binaries*. We have taken what is a fluid, complex interplay based around complementary elements that are meant to be more like tent poles, lifting up a fabric of possibility that drapes around and between them, and we have stripped them of nuance and severed what connects them. We have sacrificed depth and breadth in our understanding and experience of sex and gender for convenience, control, and predictability, leaving ourselves with two bare stakes in the ground. Because just like with p < 0.05, there’s something seductive about the notion of a fundamental, highly predictive, nearly-universal binary division of sex and gender, something deeply appealing about the idea that this dichotomy is rooted in biology and manifested at all individual and sociocultural levels. We can see the evidence of the attraction in how often it shapes the most basic of our everyday activities—going to the washroom, putting on clothes, talking about other people, filling out forms, etc. All of which feeds back into the perception that these are innate, meaningful, nigh-universal differences, which is why it is so powerful to take note of where the contrasts, contradictions, and variations arise, in language, biology, psychology, culture, and elsewhere. And how these variations are not statistical noise and or mere outliers but vital parts of the whole picture.

Because I’m not saying there is no pattern to sex and gender. I’m saying that the pattern has been reduced, over-simplified, and blown completely out of proportion, with individual components being mistaken for the entire phenomenon, like a painting being the sum of the pots of paints used to produce it. Now imagine a world in which every piece of art was classified based on its relative proportion of red and blue hues just because those are two primary colours. Whole galleries divided into “red wings” and “blue wings”. Vast swathes of art from across the colour spectrum lumped together without regard to the rest of their palettes (and never mind all the other defining characteristics one could consider about them). New technology devised for the sole purpose of pinpointing with stunning accuracy the exact amount of red and blue present in a given piece. The stubbornly unclassifiable relegated to storage because it’s only a small proportion anyway and it just upsets what is otherwise such an elegant, simple binary scheme. Because establishing clear, bright lines around sex and gender categories does make life easier in many ways. But convenience comes with a cost (think about Amazon and Uber), usually a profound human cost that is disproportionately distributed along lines of power and influence although we each pay our own price for it.

Rachel Pollack, a science fiction and comic book author and trans woman, captures the heart of the struggle with in her recent essay, “Trans Central Station”, where she shares her experience of coming out as a trans woman in the 1970s (a time before the language of “transition”, “transgender”, and “trans woman” even existed in English):

“What I felt, what I desired was unspeakable because, for me, at least, the words did not exist. Or rather, the telling did not exist. … The mind could not form the thought. I did not wish to tell people and didn’t dare. I simply could not imagine doing it. … I was not trapped in the wrong body, I was trapped in the wrong universe. In order to become who I was, I had to break the world open. I had to embrace a kind of science fiction life. … The physical world may be made out of elementary particles (and dark matter) but the world of our lives is made out of language. With the wrong language, one of strict categories and confinement, the world becomes a fake, a stage set whose actors don’t know they are in a play … Most people do not notice this because their own sense of self, of language, more or less fits the received version of existence. They still suffer, for in a world of strict and very limited categories, they must constantly check themselves against the model of a ‘real man’ or a ‘real woman’. The ones who reveal the fake are the ones who simply cannot make themselves fit. To not fit can bring great pain and often very real danger, yet who else can discover the light behind the screen?”

As I shared when I wrote about my pronouns, the words to describe my gender (and my sex and the relationship between them and my relationship with them) don’t exist in my language. I borrow words like “queer” and “trans” and “nonbinary”, because they come as close as they can, but they are a finger pointing at the moon, not the moon itself. I suppose all of language is a finger pointing at the moon, but there are varying degrees of remembering and forgetting that the finger is not the moon. With p < 0.05, we forgot the moon existed at all, and I think we do the same with “male” and “female”, “man” and “woman”, the lonely tent poles that they are. Because I don’t think my gender is any more mysterious or complex than anyone else’s or any less coherent, save for the misfortune of being born in the wrong language, one that has the idea that these things are governed by rules, clear and delineated and precise. Sex and gender are messy, multifaceted, ill-defined, and exhilarating. They’re never going to boil down to simple rules so we should start thinking about what kinds of principles might help us here. We need to embrace the ambiguity rather than keep trying to erase it, because it belongs to all of us, not just those of us who simply cannot make ourselves fit.

I don’t have 43 articles to share of deep exploration into this issue and possible alternatives. But even if I did, our friends at The American Statistician still had to warn their readers, “What you will NOT find in this issue is one solution that majestically replaces the outsized role that statistical significance has come to play. The statistical community has not yet converged on a simple paradigm for the use of statistical inference in scientific research—and in fact it may never do so. A one-size-fits-all approach to statistical inference is an inappropriate expectation, even after the dust settles from our current remodeling of statistical practice (Tong 2019).” A one-(or-two)-size-fits-all approach to sex and gender is unlikely to be forthcoming either. I also don’t have principles to offer for a better understanding of sex and gender right now, though, “Accept uncertainty. Be thoughtful, open, and modest”, are pretty good ones to start with.

We’re going to keep wrestling with these issues, standing in the world of now while we look to a world beyond. (And that world is coming—you can watch it happen in the flux and growth of the language and the ideas that are becoming available to us.) One thing we need to do while we experience this uncertainty is to not try to negate it or reduce it or look for a newer, better p < 0.05 that will let us carry on as usual. It’s not about tacking on an extra category or two. It’s about letting go of the seductive simplicity altogether, and finding a way forward that allows for nuance and wholeness.

That’s what I find so inspiring about the p < 0.05 article. It’s a beautiful exploration of how (and why) to move from disastrous over-simplification to an adaptive embrace of complexity (and within an institution that is itself complex, with a life and momentum of its own that resists change out of an impulse for self-preservation which is understandable even as we recognize that to resist change is also an act of self-destruction through obsolescence**). It may contain no absolute answers (that would be too simple), but it has an abundance of hope, compassion, and courage, as well as a frank reckoning of the systemic, institutional challenges to such a profound shift. I want to see the same combination of depth, hope, and strategy brought to our conversations around sex and gender as well.

The transformation is out there. What starts as science fiction can become the art that life imitates.

Happy Pride, y’all.

*I’m referring to “sex and gender” throughout because while they can be thought of as different things, they are BOTH complex and BOTH socially-constructed. It’s not a case of “sex is simple, gender is complex”. If you want to know more on that, check out the readings I reference at the end of my pronoun post.

**I think I’m going to start summarizing this paradox as “change and/or die”.

BONUS:

Podcast episode recommendation! Check out the Indigiqueer episode of the All My Relations podcast for a look into gender and sexuality from Indigenous viewpoints.