Last February, AIMed talked about the “crisis in science” as described by statistician Dr. Genevera Allen of Rice University in Houston during her speech at the American Association for the Advancement of Science (AAAS) annual meeting in Washington.

Dr. Allen said more scientists are engaged in machine learning (ML) techniques to analyze their data but the act may be a waste of time and money for some. That’s because ML only singles out noise found within the datasets that were used to train and test the algorithm, which may not be representative of the real World and be replicated in subsequent studies.

The problem surrounding reproducibility is not new and not unique to ML. As shown in a formal survey, up to 70% of researchers were found not to be able to recreate what other scientists had uncovered in their earlier findings. Of all facets of science research, only those specialized in the areas of chemistry, physics and engineering, and earth and environment science showed confidence in literature.

The reproducibility measures

Some researchers began to take serious action as they gathered for the recent conference on Neural Information Processing Systems (NeurIPS) took place between 8 and 14 December in Vancouver, Canada. They do not wish artificial intelligence (AI) to be the latest victim of reproducibility.

The 2019 NeurIPS was phenomenal not only because it attracted close to 13,000 researchers but also initiated an effort to democratize data by asking fellow scientists ahead of the AI meeting to share their codes for the purpose of increasing AI reproducibility. ML expert at McGill University and Facebook in Montreal, Canada Joelle Pineau and McGill’s PhD student Koustuv Sinha were the forces behind this.

In an interview with Nature, Pineau said while it makes sense that the same set of codes will lead the machines to execute the same action every time; the challenge comes in duplicating these instructions or codes from a research paper. She first realized this could be a problem when her students reflected their difficulties in getting back the same results and questioned if the methodology was correct. This is especially prominent in reinforcement learning whereby only the best results are often reported and there is no indication of the initial setting and how many trials had been performed.

The ultimate goal

As such, Pineau proposed a reproducibility challenge prior to NeurIPS to encourage fellow researchers to submit their codes and recreate some of the published results. A checklist including the types of metrics; error bars; model details were issued to facilitate the process. Pineau said the effort had yielded good responses as 75% of the papers accepted by NeurIPS 2019 contained a link to code and 34% of the reviewers expressed the usefulness of the checklist.

Overall, participants were surprised by the attention given to AI reproducibility. Reports generated during the reproducibility challenge are made available on OpenReview, a website which vows for open scientific communication. Pineau believes reviewing process for research papers tend to be short and limited and the real impact of a particular study can only be felt much latter. She hopes that the NeurIPS attempt can be a foundation for similar effort in time to come.


Author Bio

Hazel Tang A science writer with data background and an interest in the current affair, culture, and arts; a no-med from an (almost) all-med family. Follow on Twitter.