NEWS | economy

Data analysis: an oil to be handled with care

10 giugno 2021

Data analysis: an oil to be handled with care

Condividi su:

«Data are the oil of the 21st century, but they must be handled with great care». As Professor Guido Consonni, head of the Department of Statistical Science since November 2019, points out, statistical analysis of data requires appropriate methodologies to draw valid conclusions. This is the aim of the American scholar Elizabeth Ogburn, from Johns Hopkins University, guest speaker at the first meeting of the Statistical Bridges cycle, promoted by the Department of Statistical Science. On Friday 11 June, at 6pm, live on the Webex platform, she will give a talk entitled Social network dependence, unmeasured confounding, and the replication crisis.

«The webinars are aimed at all researchers who carry out empirical investigations and data analysis, with particular regard to those working in our University, a bridge between Statistics and other disciplines -Professor Consonni said-. The seminars in 2021 will focus on the replication crisis».

What is meant by this term?

It has been observed for many years that, with alarming frequency, results already validated in scientific journals, often with a high impact, e.g. regarding the effectiveness of a new drug, are not replicable, i.e. they are not matched by subsequent investigations. This generates confusion in the scientific community, but also economic damage, for example because research lines are financed that do not find a successful outlet. It should be noted that we are not talking about fraudulently obtained results or unprofessionally conducted experiments (there are also such things, of course), but rather about the incorrect use of statistical analysis techniques. This is the focus of the seminar series.

What can be the origins of the lack of reproducibility?

They are manifold. Professor Elisabeth Ogburn, in an interesting article entitled Dependence Can Lead to Spurious Associations and Invalid Inference, published in the Journal of the American Statistical Association, analyses one in particular. It shows that many of the analyses based on a data set widely used in the US to study heart problems were in fact marred by a fundamental flaw in the methods of analysis. Simply put, the fact that many of the people sampled were connected by friendship, social and sometimes even family relationships was not taken into account.

Can you explain?

Any statistical analysis is based on basic assumptions, which are often not adequately considered by the researcher. A frequent assumption is that the observations are independent within the sample, i.e. the units are not "linked" to each other. But if this assumption does not hold, the conclusions are not valid. In this case, the results are not replicable. In fact, a more correct methodology of analysis, i.e. one that takes into account the dependency between the subjects, would lead to very different results, as Ogburn shows.

What are the lessons we can learn from this?

The first: that an experiment is a very delicate activity and that data collection should be carried out with rigorous and transparent criteria so that others can reproduce the investigation as faithfully as possible. In this sense, replication projects should be further encouraged in all branches of science. The second lesson is that the analysis must be appropriate to the problem to be solved and the type of data. Data are precious, but care must be taken when collecting and analysing them, otherwise we can fall into various traps, and one of these is the reproducibility crisis.

What will be the topics of the other two webinars?

In September, we will be hosting freelance scientific writer Regina Nuzzo, who will talk to us about the "p-value", a tool still widely used for statistical surveys. Starting from one of her articles, whose publication in the journal Nature had a strong echo in the scientific community, she will show us that the "p-value" is greatly overestimated as an indicator of the reliability of an experiment. It must therefore be handled with care and those who use it must be aware of the limitations it entails and, most importantly, know that there are more reliable alternatives. In October, finally, it will be the turn of the quantitative psychologist from the University of Amsterdam, Eric J. Wagenmakers, who will present a review of the replication crisis with a particular focus on psychological studies, a field particularly exposed to this problem. For this seminar too, we will not only highlight the problems, but also learn analytical tools to deal with them.

Un articolo di

Katya Biondi

Katya Biondi

Condividi su:


Scegli che cosa ti interessa
e resta aggiornato