How has COVID-19 impacted on clinical laboratories?

Honorary RSPH Associate Professor Tony Badrick is the CEO of the Royal College of Pathologists Australia Quality Assurance Program. He gave this talk on Thursday 25 March, and about a dozen people attended in person and on Zoom.

Tony also presented a variety of statistics and views of COVID-19, this time from the Australian perspective. His talk concentrated on the development of the assay for COVID-19, because of course that link in the chain did not exist before the virus began to spread.

From a pathology laboratory point of view, the most interesting outcome of the pandemic has been the changing perception of the work labs do. For instance, many people used to think that when you do a blood test in a lacks office down at the Mall or wherever, the sample was analysed right there behind the offices on site. Not so!

From a statistical point of view, Tony’s most interesting point was about the use of pooled samples to increase efficiency in the lab, because most of the samples are going to be negative in this country with such a successful response to the pandemic.

School transition estimation and projection (STEP) model: update on model development and activities

Professors James Raymer and Edith Gray, and Lili Zhang, all of the School of Demography, have been involved in this project for the ACT Education Directorate. James presented this Demography seminar on their behalf on Tuesday 30 March. I Zoomed in and could hear the disembodied voices of the attendees in the room.

The ACT school system is a complex arrangement, with 60,000 to 80,000 students per year, transitioning from primary to secondary, college and graduating, along with potentially moving between Catholic, independent and public schools, plus in- and out-migration!

The three year project that James described has established a multi state cohort component projection model for this system, which estimates numbers in all parts of the system simultaneously, and furthermore can be checked against the school census that occurs in March each year. So far their model is doing incredibly well, with less than 1% error. COVID impacts were also modelled through scenario analysis.

The data available to James and his team are high quality, yet it remains difficult to estimate when areas change rapidly, such as new suburbs in Molonglo. Ian McDermid told an interesting story about using housing approval data in the 1980s to predict school enrolment in the new suburbs at that time.

Predicting benefit of aggressive treatment for localised kidney cancer patients using modified principal stratification with latent survival classes

The Statistical Society of Australia is continuing to present online seminars into 2021, some sponsored by Sections such as this one on Thursday 25 March by the Biostatistics Section. Dr Brian Egleston is Associate Research Professor and Biostatistician at the Fox Chase Cancer Center in Philadelphia. He sensibly interspersed his talk with his own photos of views of his home town during the pandemic. Some were almost totally devoid of people, others surprisingly busy such as the outdoor eating scene that was part of the city’s COVID response.

Brian’s talk discussed the modelling of the increasing rate of kidney cancer, and the effect that increased rates of scanning for other purposes may behaving on that. Brian’s models included the Weibull proportional hazards model and a logistic model for class membership. The data Brian used is the SEER database (Surveillance Epidemiology and End Results), a long term follow-up of 14% of the United States population.

Bad statistics in health and medical research

The NSW Branch of the Statistical Society of Australia hosts the Lancaster Lecture every year. On Wednesday 24 March 2021, the lecture was presented by Professor Adrian Barnett of QUT. Over 70 people either attended in person or Zoomed in from around the world.

Adrian’s talk was engaging, shocking and entertaining all at once. Some of the sound grabs that really spoke to me were these ones.

Bad statistics is abetting bad science, with all the criminal overtones of the choice of verb. Some of the examples of bad static’s that Adrian had included smoothing syndrome, and the famous example of researcher degrees of freedom (also known as the garden of foreign paths.) I’m getting more interested in this phenomenon as I advise more research students, and have even read the relevant Jorge Luis Borges short story, but that’s a post for another time. Adrian’s own example of researcher degrees of freedom was the “one data set, many analyses” of the relationship between race and red cards in soccer.

There are some fields of research where there are more systematic reviews than there are pieces of original research. As someone put it on Twitter, “People! What are we doing?”

One of the hooks Adrian used to get us interested in his talk was the promise of the three worst statistical methods sections ever. Now to be fair these did come from the register of clinical trials not published research as such. I don’t think I should give them away right now, suffice to say they each consisted of fewer words than I have fingers on my right hand!

One of Adrian’s final points was about research projects require of medical students, another topic close to my heart as a statistical consultant. I liked Adrian’s concept of a focus on competence, not excellence, a being more likely to lead to successful graduates, and more likely to reduce research waste.

Advancing research on healthy longevity in Australia and the Asia-Pacific

Dr Collin Payne is a member of the School of Demography at ANU and he presented this School seminar on Tuesday 23 March. This time I joined on Zoom, appreciating the hybrid presentation model that the School is using.

Collin holds an ANU Futures grant and an ARC DECRA to advance his research in this area. The Futures project is focused on data harmonisation across social data sets from up to a dozen countries. The HILDA data from Australia HD the NZHWR from New Zealand are part of the project, amongst others.

The DECRA is focused on high-income countries as opposed to the Futures project which has a range of low to high income countries in it. The DECRA is also particularly for studying social gradients in longevity – looking for evidence around notions such as wealthier and more educated people living longer. Collin also introduced the concept of morbidity compression or expansion. What this means is that as lifespans get longer, the possibility for the amount of time spend living with disability or lowered quality of life may be expanded (because health care prolongs life but does not make it better) or compressed (because health care pushes health problems away into the future). In the US, morbidity expands, but the pattern may not be realised across other countries.

Finally, of interest to me and the smoking prevalence project I work on, Collin talked about the notion of drivers of inequality. He was particularly careful to separate out the effect on health outcomes of being a smoker from the effect of smoking. This comes about because smokers and non-smokers may differ in systematic ways that can be measured in the data you have, and these difference should be attributed to the smoker/non-smoker, not to the smoking/non-smoking.

A modern conceptual framework for statistics on migration and international mobility

Dr James Raymer is in the School of Demography at ANU and he gave this School seminar on Tuesday 16 March.

For the past two years James has been involved in a 50-person task force associated with the UN Expert Group on Migration Statistics in the Statistical Division. This task force has been closely examining concepts and definitions, because migration is not s simple a matter as you might think! A combination of citizenship, country of birth and “usual residence” is used to define migration in slightly different ways across the world. Only 40 countries even make migration data publicly available, and they’re mostly in Europe. James pointed out that a variety of data sources feed into migration statistics too, ranging from registers and administrative databases to border statistics and good old censuses and surveys. The intersection between duration of stay and international mobility in the definitions of migration is also tricky to pin down.

The task force has had to balance harmonisation to enable comparability between countries with the individual needs to countries and the practicalities of collecting data there. The new framework should provide a great basis for this. It’ll have a focus on stocks (the population) first, and then flows i.e. migration. Next steps in the process will be to tighten up the proposed definitions based on country feedback, and start work on examples of best practice and data integration.

High output writing or shut-up-and-write?

A provocative title for the RSPH seminar on Thursday 11 March. Emeritus Professor Brian Martin of the University of Wollongong was in the high output writing corner. His strategy hinged on writing for a set amount of time each day, even if it’s super-short, properly scheduled and rewarded at the end. This is in total contrast to the usual academic approach of procrastinate-then-binge which is learnt as an undergraduate (or even younger!) and is quickly learnt to be a very painful approach. Brain also supported writing before you’re ready – if you start, then it quickly becomes clear what resources you need to check and what other work needs to be done, rather than having to guess at that before you start.

Dr Cally Guerin of ANU was in the shut-up-and-write corner. She has been facilitating this approach through sessions held at ANU for a number of years now. Actually there are a number of similarities in the two approaches, as shut-up-and-write also involves concentrating on writing and nothing else for a couple of mid-length (maybe 50 minute) sessions, then rewarding yourself with something. Given the group nature of the shut-up-and-write sessions, the reward if often a chat and a drink with your fellow workshop attendees.

The presentation allowed plenty of time for questions at the end and there was plenty to discuss. The issue of productivity versus quality was discussed and touched n issues of high output editing as also being a possible activity. The danger of perfectionism in these writing sessions was also discussed. I was thinking about whether the concept of shut-up-and-analyse-data would work, possibly under the already existing banner of shut-up-and-code. It seems to me that sometimes the writing process gets mixed up with the results generation process, and there ma be different processes required in hard sciences compared to humanities.

Central Limit Theorem for spiked eigenvalues of high-dimensional sample autocovariance matrices

The RSFAS seminar for Thursday 11 March was a double act, by Daning Bi and Adam Nie of ANU. Around 20 people Zoomed in for the talk.

Their problem was motivated by data such as measures of particulate matter in multiple locations in the atmosphere, often taken half-hourly for long periods e.g. six months. Similar series arise in economics, with daily prices of multiple stocks over the course of a year; and also in population health, looking at mortality curves for every year of age in a given country over a period of time. Both of these examples are characterised by large numbers of series over a period not much longer than the number of series.

Interest in describing these series leads to interest in the spectral properties of the autocovariance matrices, which leads to the problem that Daning and Adam solved around a Central Limit Theorem for spiked eigenvalues from those matrices (the spike refers to the first few in order from largest to smallest).

The asymptotic theory is easy (!) if the number of eigenvalues, p, is fixed, but not so simple as p goes to infinity. Adam showed with some theory and some simulation studies that so long as a certain threshold is reached, asymptotically Normal distributions arise for the eigenvalues. This leads to z tests for comparisons e.g. the comparison of mortality curves for two countries.

Daning took the mortality curves example a little further, with a factor analysis that yielded groupings of countries requiring anywhere between one and five factors to describe the covariance matrix. Its my understanding that Daning and Adam carried out this work together while they were both PhD candidates, and I’d like to congratulate them on an elegant piece of mathematical statistics with a solid grounding in real-world applications.

How to see the forest from the trees: Removing pesky artefacts from discrete data for better visualisation

Dr Gordana Popovic is a statistical consultant in Stats Central at UNSW. She was the first speaker for 2021 at the Canberra Branch of the Statistical Society and around 30 people Zoomed in to hear her talk.

Gordana’s talk was co-hosted by the Canberra chapter of R-Ladies. I made the effort to use my R-Ladies Zoom background rather than erecting the R-Ladies banner, because the banner is a two-person job and if I’m going to sit behind my profile photo for most of the talk, the virtual background provides better return on effort!

Gordana’s presentation drew on publicly available data sets about penguins and spiders to illustrate the methods she introduced. Her main message around clever use of new tools (both methodological and implementational) to better visualise discrete data. The implementational tools she mentioned included tourr and ecoCopula, and tourr particularly looks as though it continues to expand on the notion of the rotating point cloud as a useful visualisation tool. Copula methods seem to have gained traction in some discplines rather than others, and Gordana’s biology-ecology examples were a testament to that.

On the methodological front, Gordana was a big fan of the Dunn-Smyth residuals available in R. I shall have to congratulate my colleague Peter Dunn on getting his name attached to a statistical concept, which is available through the R package mvabund. I think their technical name is randomised quantile residuals. I’m just sad that though I won Dunn and Smyth’s entire book on generalised linear models in R, I cant find these excellent residuals in the index!

Gordana presented her talk from Sydney, one of the great advanatages of Zoom talks. Nonetheless it was also a great sign of more COVID-relaxed times that I was able to meet in person with half a dozen other Society members for dinner in a local restaurant after the talk.