Women in Mathematics Day 2021

On May 12, it’ll be International Women in Maths Day. This is a joyful opportunity for the mathematical community to celebrate women in mathematics. The goal of the day is to inspire women everywhere to celebrate their achievements in mathematics, and to encourage an open, welcoming and inclusive work environment for everybody. The celebration takes place every year, all around the world. The first Women in Mathematics Day was held in 2019. But why May 12? Because that is the birthday of Maryam Mirzakhani (1977 – 2017). In 2014, Maryam Mirzakhani was awarded the Fields Medal for her outstanding contributions to the dynamics and geometry of Riemann surfaces and their moduli spaces, becoming the first woman to be recognised for her mathematical achievements by this top mathematical prize.

In 2019  the Australian Centre for Excellence for Mathematical and Statistical Frontiers created posters that celebrate women in mathematics and statistics, mainly in South Australia but with representatives from across the world.

In 2020 we were all in lockdown due to COVID-19 but the day did not go by unmarked! Have a look at ACEMS Women in Maths to find videos from prominent women in mathematics (and statistics!) including SCU consultant Marijke Welvaert and SCU Director Alice Richardson. In 2021 the main event from ACEMS is a virtual panel discussion on the day itself, 12 May.

It doesn’t escape my attention that the same day will this year mark the 201st birthday of Florence Nightingale. You can read the Conversation piece I co-wrote for her 200th birthday last year which focused on the healing power of data. Florence was a prodigious writer, which possibly makes her something of a role model for HDR students struggling to put pen to paper (or fingers to keyboard!) Indeed I think Florence would do very well with the text-based communication of the 21st century, and I imagine her with smart phone in hand like this.

Her vast array of correspondence means that she has provided us with a large number of quotable quotes for many situations. My favourite is the comment she made about her time in the Crimea 1854 – 1856, and the data she drew together from that experience to inform the reform of the British health and military systems as a result. However exhausted I might be, the sight of long columns of numbers was perfectly reviving to me.

I think many statistical consultants would feel the same way. The sight of a long column of numbers means data is available to address a research questions, and it’s the intersection between questions, data and methods where advice from a statistical consultant can make the biggest difference to a research project.

#May12 #WomeninMaths #May12WIM

Associate Professor Alice Richardson is Director of the Statistical Consulting Unit (SCU) at the Australian National University. Her research interests are in linear models and robust statistics; statistical properties of data mining methods; and innovation in statistics education. In her role at the SCU she applies statistical methods to large and small data sets, especially for research questions in population health and the biomedical sciences.

2020 in statistics

2020 has been an extraordinary year in every sense of the word. Many of you got a tough lesson in resilience with your research plan being interrupted or even obliterated due to extreme weather or pandemic. Some may have taken the time to use this forced pause to upskill and we have certainly been encouraged by the enrolment numbers in our online courses that statistics was one of those areas that researchers chose to focus on.

At the end of the year it is natural to reflect and review. So in that spirit, let’s have a look at some statistics in 2020.

The year social media got mixed in with world power

Trying to stay clear of politics, it is hard to deny that social media have played a major role in the latest US presidential election. When you look at the statistics it is not too hard to understand why. At the start of the year, more than 3.8 billion people used social media. That is more than half of the world population.

Interestingly, in the US this proportion is much higher with 79% of Americans having a social media profile. Whilst social media is still most popular with younger people, a blind person could see that if you want to disseminate your message to as many people as possible, social media is the way to go. Therefore, it is not unsurprising that social media have moved away from connecting with friend and family towards advertising platform for brands. In 2019 alone, the marketing spend on social media advertising exceeded 89 billion USD.

Social media is also on the rise within academia. With the introduction of altmetrics as a possible performance indicator, social media can boost your researcher profile like never before.

The year misuse of statistics got called out

As researchers, we are all committed to responsible research practice. This includes appropriate use of statistics. Therefore, it is noteworthy that p-hacking and optional stopping (with the aim of obtaining a statistically significant result) have been judged to have violated the code of conduct for research integrity in The Netherlands.

There is a bit of history to this and for a full expose, please refer to this blog post by Daniel Lakens. In brief, when the fraudulent research practices of social psychologist Diederik Stapel came out, the research culture was referred to as “sloppy science”. But they didn’t go as far as calling it a violation of the research integrity code of conduct.

However, in July 2020 the Dutch National Body for Scientific Integrity did judge a Dutch researcher from Leiden University to have violated the code by p-hacking and optional stopping. Norms change and what was once a misdemeanour is now a violation of the code.

P-hacking is only one of a range of data misuse techniques that are known as data dredging. The main aim is to obtain significant results by dramatically increasing the number of tests performed on a given dataset and only reporting the significant results. Similarly, optional stopping refers to the preliminary stopping of data collection when intermediate statistical tests obtain a significant result (so no further data collection is required). While the term has gone mainstream in 2019, its practice is still present in many areas of research and it’s only through education and promotion of good statistical practice that this will disappear.


The SCU saw many clients during 2020, be it in person or virtual. Our mission continues in 2021 and we are the port of call for assistance with your data analysis plan.

Translating your data for understanding #AcWriMo

November is Academic Writing Month at ANU. No better time thus to put the translation process of data to words in the spotlight.

“The secret language of statistics, so appealing in a fact minded culture, is employed to sensationalise, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, “opinion” polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.”

From: How to lie with Statistics – Darrell Huff (Chapter 1, p. 10)

This rather cynical quote points to the responsibility of any academic author, whether it is a thesis or a journal paper, to ensure that their writing is with the reader’s understanding in front of mind. At the same time, they should never loose sight of the objectivity and preciseness required to fulfil the integrity of the scientific process.

More specifically, when writing down the statistical methods and results section of an academic piece, a few of the goals that we aim for are: (1) reproducibility, (2) objectivity and clarity, and (3) preempt misunderstanding.

Reproducibility

In the context of academic writing, reproducibility refers to the ability for an independent reviewer/researcher to replicate the reported results based on the information provided in the paper.

Writing reproducible methods starts with conducting reproducible analyses. The National Academy of Sciences has published several guides to aid this process but a good starting point would be to document every single step of the analysis process. For science researchers, this would be akin to maintaining a lab book. In essence, it is a record of every data manipulation and calculation that was performed before obtaining the end result.

Statistical software can be of great help here. By avoiding copy/paste or point-and-click processes, but instead utilising the syntax or coding functions of the software you have an immediate record of all the steps performed during the analysis process. Ideally, this code will include every manipulation and estimation that were performed on the raw data.

Source: xkcd

Objectivity and clarity

“The real purpose of the scientific method is to make sure nature hasn’t misled you into thinking you know something you actually don’t know.”

Robert Pirsig, Zen and the Art of Motorcycle Maintenance

When reporting statistics your field of research or intended journal for publication will typically have guidelines as to what exactly needs to be reported. In general, the common rule would be that as a writer you need to report all the numbers required for a reader to have a full picture and enable them to draw their own conclusion.

For example, when reporting p-values you could simply go with stating whether a p < 0.05 or p > 0.05, indicating that you have evaluated the actual number against an arbitrary threshold. However, you rob the reader of the more precise information on the exact p-value.

Of course, often p-values have too many decimals which would hamper the flow to the text. But a good practice would be to always report exact p-values up to 3 decimals and when a p-value is too small, state p < 0.001. This way you compromise between the flow of your writing whilst maintaining the objectivity and clarity of your writing.

As a side note, it would be even better practice to accompany your p-values with supporting statistics of which the p-value was derived, as well as confidence intervals and/or effect sizes where appropriate.

Preempt misunderstanding

“… When we reason informally – call it intuition, if you like – we use rules of thumb which simplify problems for the sake of efficiency. Many of these shortcuts have been well characterised in a field called ‘heuristics’, and they are efficient ways of knowing in many circumstances.

This convenience comes at a cost – false beliefs – because there are systematic vulnerabilities in these truth-checking strategies which can be exploited. …”

From: Bad Science – Ben Goldacre (Chapter 13: Why clever people believe stupid things)

Everyone of us comes with their own set of beliefs. As a writer it is important to be conscious and open about your own set of beliefs. Simultaneously, you will need to have an understanding of your readers’ beliefs as conflicting belief systems may lead to incompatible interpretations.

When your writing is reproducible and objective, you have already taken important steps towards avoiding your research being misunderstood as the reader has all the necessary tools to draw their own conclusions. When your conclusions are supported by the data and your statistical analyses are sound, your readers will inevitably concur.

Avoiding misunderstanding also relates to the description of the statistical methods you used. All too often we see papers in which well-known statistical analysis techniques are referred to by their software name, or more obscure statistical analyses are not well referenced or justified. Readers will have confidence in your results when they belief that those results were obtained in the most appropriate way.

As a word of caution though, belief systems come also into play when selecting statistical methods for data analysis and while there is often more than one way to analyse your data, not every single method is always appropriate. But that is probably leading us too far from the writing aspect.


For further reading on how to translate your data for understanding, check out this post by The Writing Center and this online video tutorial.


The Statistical Consulting Unit values research integrity and we see writing as an integral part of that through appropriate representation of statistical analyses and results.


Marijke joined the Statistical Consulting Unit in May 2019. She is passionate about explaining statistics, especially to those who deem themselves not statistically gifted.

Connecting the world with data we trust #StatsDay2020

Today is World Statistics Day 2020!

Now, I can hear you think: Is there a day for statistics? Like, really?? Surely this is only celebrated by statisticians a.k.a. old grey bearded men?

Thankfully we are long past the time when checking the box for sporting a grey beard was required to obtain the qualification of a statistician. All jokes aside, with the theme “Connecting the world with data we trust” World Statistics Day 2020 has never been more relevant for everyone from statisticians to researchers to the general public. And while this day may not make the evening news highlights, it did make the newspaper!

Because this day is not only for statisticians, here are a few activities you can do to mark the day.


Fake news hunt

This could easily be a family activity. It is like a treasure hunt, but without the treasure. Everybody picks a piece of news that is based on data or statistics and presents it to the others as the news of the day. Other players try to verify the news and ultimately vote if it is fake news or not. For some added creativity you could create your own news story (fake or not). Find out how convincing a story you can tell from data or how easily you are misled.

Read a statistics book

No, we are not talking about statistics for dummies or an undergraduate textbook. But a book about the role of statistics in our society, or the history of statistics (and why we care about them), or maybe even how statistics results from research are being misused/misrepresented.

Ever wondered how you can lie with statistics? Now is the time to find out.

The cookie experiment

Baking the perfect cookie is the result of meticulous experimenting. Due to variability in oven temperature, ingredient quality and mixing techniques the results can vary from recipe to recipe and from baker to baker. Sounds familiar?

Involve your friends/roommates/family and have a perfect cookie competition. You all start from the same recipe and you are allowed as many attempts as you wish.

Hot tip: use your researcher skills to set up an experimental design to control for variation you know of.


These are just a few ideas to have some fun with statistics. It may be hard to see how engaging in World Statistics Day 2020 will benefit you as a researcher but sometimes you need to think outside the box to reap the most benefit.

By engaging with statistics in a different context, you will learn new skills or learn that you had more skills than you ever gave yourself credit for. Both are valuable when you bring them back to the office tomorrow.

Disclaimer

The Statistical Consulting Unit does not provide advice on how to obtain the perfect cookie or about how to spread fake news. However, if you have research related questions on study design, data collection and/or analysis, do get in touch!


Mathematics & Statistics Awareness Month

Since 1986, initiative of President Ronald Reagan, the Mathematics Awareness Week aimed to increase public understanding of and appreciation for mathematics. Grown over the years to last a month, the inclusion of statistics is due in part to the rapid growth of statistics jobs. Driven by the US based organisation This is Statistics, the Mathematics & Statistics Awareness Month is relatively unknown Down Under but a good opportunity to talk about the role of mathematics and statistics in our current environment.

Statistics Is For Everyone Source: This is Statistics

The role of mathematics and statistics in the COVID-19 pandemic

Strange times we live in. The information stream is endless and as we all are hybernating in our homes, every morsel of news coming in from reputable websites to social media is taken in and analysed. Should we wear masks? Is it safe to go for a run? How long is this situation going to last?

Whilst a lot of questions don’t quite have answers yet, it has become clear that data and visualisations of those data are forefront in the media. This is the time when statisticians come out of the woodworks (rather than there usual habitat of staying in the background) to help stear future directions and influence new policies (see what the ABS is doing here for example) but it is also a time in which scientists have a duty to help interpret key messages and enforce the ones that are evidence-based while countering urban myths. So in the spirit of the Mathematics & Statistics Awareness Month, it is useful to look at a few of these examples and think more critically about the role statistics plays.

Flatten the curve

It’s the talk of the street. We need to flatten the curve. But do we really understand what we mean by that? And which curve are we talking about?

ABC FactCheck did a good piece on explaining different formats of the curves that have been published in the media and why the numbers are not always what they are made up to be.

The same data graphed in a different format can lead to diverging interpretations if not investigated carefully. Mark Sanderson, Irene Hudson and Mark Osborn from RMIT University published a tutorial on The Bar Necessities in The Conversation. Republished by ABC News this hopefully reached an audience that exceeds the science-minded crowd.

Scott Morrison has announced that the curve is flattening based on the trend seen in the daily count of new infections. At the same time, they released the scientific modelling on which the government based their decisions to curb the COVID-19 infection rate. Rachael Brown from ANU provides a more in-depth analysis of what scientific modelling is exactly (The Conversation). Thus far, the released modelling has been a theoretical exercise and we are looking forward to the more in-depth modelling based on the actual Australian data to draw inference about if we are indeed flattening the curve to the extent that we think we do.

Irrespective of the challenges that COVID-19 is throwing at us from a public health and economical perspective, it has been an exceptional example to increase public awareness and understanding of data visualisation.

Thinking on the log scale

Another mathematical and statistical concept that is now in the midst of the public eye is the log scale. Basically, a log transform is applied to the COVID-19 cases count data to derive a growth rate. This practice is standard in time series analysis. In statistics, this transform is useful to analyse exponential time series within linear models whilst maintaining an intuitive interpretation.

Popular media have been using log-transformed numbers to discuss the rate at which cases double. The log operator is often seen as an abstract concept, but it is visualisations like this that attach practical meaning.

No two proportions are the same

Most stats that are published around COVID-19 are proportions or conditional counts, even if it is not always clear. For example, the number of cases that are reported is actually a count conditional on the number of tests performed. Interestingly though, the criteria for testing vary wide and broadly between countries. So when we get lured in comparing confirmed cases between countries, we tend to forget that there is a denominator there that is not taken into account.

For example, fatality rate is impacted by number of tests but test protocols between countries differ. In some countries in Europe tests are only performed on those with severe symptoms requiring hospitalisation whilst in Australia, being touted for its low death rate, tests are performed more broadly amongst those who are also displaying mild symptoms. On the other hand, in China, asymptomatic cases which have tested positive for COVID-19 are not being counted in their COVID-19 cases tally. So when we are comparing fatality rates between those countries we are really comparing apples and oranges. Even worse, in New York COVID-19 related deaths are only counted if they occur in a hospital. Meaning that the actual count is already an underestimation, let alone that we have a clear idea on the number of actual cases in the city.

When you see a proportion or a number, always ask yourself the question: Is this statistic conditional on something? If it’s not, should it be? Does it make sense to compare absolute counts when we know that underlying variables like population size and number of tests vary? It is all too easy to take numbers for granted as they are objective and quantitative. But the field of statistics is build around putting uncertainty around those numbers. Whether it is through measuring variance or through thinking through the conditional probabilities.

Miracle cures, misinformation and fake news

From celebrity chef (note chef, not scientist) Pete Evans’ $15,000 miracle machine to Harry Potter author J.K. Rowling’s dangerous breathing practices, you don’t have to look far to find a cure for COVID-19. One caveat, evidence for these so-called cures is typically based on an n=1 design. And whilst there is a place for case studies in science, one might hope that someone who has a bit of a critical mind will classify these particular “cures” as sham.

It becomes a little more complicated though when the touted cure is being backed by peer-reviewed science. Enter hydroxychloroquine, which has been the subject of lots of controversy within the scientific community but also in popular media after being publicised by the higher echelons. The claims are based on a French study which was published online after peer-review. As stated in the title, this was a non-randomised, non-blinded study and whilst the results could be easily interpreted as hydroxycloroquine indeed being an effective drug, the study has been hammered for its scientific flaws. There is uncertainty around discrepancies between the trial registration and reported data, mismatches between pre-published versions and the final paper, missing patients, etc. If you have time to read through the comments on the critique, you will soon see that all science goes out of the window and it becomes a personal and ideological argument.

Another paper from China shows less convincing results with regards to the drugs but also here there are issues in that the reported methods and results do not correspond with the trial registration. And while it was a randomised study it still wasn’t blinded. In the meantime, the hype around hydroxycloroquine has resulted in bans from countries like India after people died taking it unsupervised providing us with a sad example of what can happen when science rigour is not adhered to.

The role of statistics in the pandemic

It would be easy to dwell on the examples in which statistics were not applied properly or interpreted with bias, but the key message to take away here is that statistics is playing a crucial role to come out on the better end of the pandemic. Government decisions and policies are being based on modelling and science, and the general public is getting educated on how to interpret data in a critical way. For researchers, it is an opportune moment to maybe reserve some time that would normally be spent in the lab to upskill on their data analysis knowledge. Maybe you can find the time to produce that cool graph you always wanted to create. Or to learn to code in R and set up some routines to process your data for when you get out to collect again. Or join in on the conversation around how research is critical in handling the pandemic and how understanding data is crucial to have an informed opinion.

Further reading: A statistician’s guide to coronavirus numbers (Royal Statistical Society

Saul Newman uses large-scale human data to predict fitness-linked demographic traits at the newly-formed Biological Data Science Institute. Saul is also appointed to the ANU Research School of Biology, applying machine learning models to multi-species large scale experiment data using satellites and ground sensors.

Alice was appointed as Director of the Statistical Consulting Unit in October 2019. She is passionate about maintaining girls’ interest in maths and stats, and she enjoys collaborating on research projects in every part of the University.

Marijke joined the Statistical Consulting Unit in May 2019. She is passionate about explaining statistics, especially to those who deem themselves not statistically gifted.