Skip to Main Content
Home  »  About  »  News & Events  »  News

NCI's Cloud Competency Centre Proudly Contributes to the Prestigious Nature Portfolio

Enhancing Findability, Reusability, and Reputation of Data Versions:

A FAIR-compliant Proposal 

Alba González-Cebrián, Michael Bradford, Adriana E. Chis, and Horacio González-Vélez contribute to Nature’s Scientific Data Journal.

Presenting a standardised dataset versioning framework for improved reusability, recognition, and data version tracking, facilitating comparisons and informed decision-making for data usability and workflow integration.

A passionate team of researchers at the Cloud Competency Centre at National College of Ireland have proudly contributed to the prestigious Nature Portfolio, renowned for facilitating the dissemination of research that substantially advances scientific disciplines and addresses societal challenges. 


Introduction: What Is Data Versioning and Why Is it Significant? 

Data versioning refers to when different versions of the same dataset are generated over time. Where a version of data is labelled is based on when that data was created and how it was changed. Data versioning is hugely significant due to its capacity to monitor changes over time. The main aim of data versioning is to promote reproducibility and encourage collaboration. The study presented by the research team at NCI acknowledge that the RDA (Research Data Alliance), and the DataCite Metadata Schema incorporate versioning for data citation. This versioning framework aims to address issues found within their frameworks. 


Explaining FAIR:

The FAIR principles are a set of data guidelines.
FAIR stands for …

  • Findable 
  • Accessible 
  • Interoperable 
  • Reusable 

These principles were created to combat data sharing issues.
The key goal of FAIR is to sharpen and refine the reuse of data. To achieve this, data needs to be extremely well-defined so that it may be replicated in other ways. 


Current Framework Issues: 

While the RDA and the DataCite Metadata Schema incorporate versioning for data citation, there are limitations within their frameworks with regards to aligning with FAIR principles. This new study points out that the current framework often relies on traditional publication indexing. Doing so hinders efficient discovery based on dataset attributes. The current framework also leads to inefficient discovery due to the tools for comprehending version differences being inadequate. These inadequacies lead to a low level of reusability, which means that unfortunately, previous data curation efforts are being undermined and underutilised.


The Proposal: 

To address the above-mentioned issues, National College of Ireland’s research team have presented a standardised dataset versioning framework, proposing to refine and optimise the DataCite schema. The aims of this new dataset versioning framework are to enhance the findability, reusability, and recognition of data versions. Several needs have been identified.

There is a need to identify new versions of data. There is a need to be able to differentiate between revisions. This study acknowledges that some research has explored data monitoring (timestamp-based data), but a clear standard approach that reflects the multiple changes in data versioning is lacking. 


“All researchers use datasets,” Horacio González-Vélez, Head of the Cloud Competency Centre explains, “a standardised versioning framework would serve as an open source to the research community. In a research market-place, data users would be given the ability to buy, sell, and rate data as well as tracking any changes that are made.” 


To put it in simpler terms, Horacio explained that when you have a dataset and changes are made, the changes may be relatively small. For example, if you have ten columns on a page and then you delete two of those columns, the general overview of what you’re looking at remains the same and you can glean the same information from these columns, but the dataset is still different. When people look at information like this, we can say something is practically the same, however when we do things on a computer, the computer operates on a Yes/No basis, so this proposed standardised versioning framework will track changes made. 


“It is like Turnitin for datasets.” ~ Horacio González-Vélez. 


Currently this project is at the proof-of-concept stage, and the versioning technique has been tested and proved through a series of experiments including the creations, updating, and deletions of data. While there is always room for further research, this proposed framework is the first step towards creating a more sophisticated research tool that will allow researchers to utilise data in the most efficient and up-to-date way possible. Innovation of any kind is always an exciting and rewarding process, and we are incredibly proud that this ambitious work has been published in the Nature Scientific Journal. For our research team, it is wonderful that their hard work and tremendous efforts have been recognised in the prestigious Nature Portfolio. 
To read the entire article, click here for open access to Nature’s Scientific Data Journal.

Learn More About the Cloud Competency Centre:

Located in the heart of Dublin, you will find the Cloud Competency Centre at National College of Ireland. The Cloud Competency Centre at NCI is renowned for its academic excellence in parallel and distributed computing and offers a range of postgraduate courses and research opportunities aimed at fostering innovation and understanding in computing technologies, data analytics, and related fields. 

Students in the MSc in Cloud Computing programme study in a state-of-the-art facility that supports the understanding of cloud technologies for commercial and research purposes. Offered on a full-time and part-time basis, students can develop competitive technological skills that they can go on to utilise in leading companies such as IMB and Microsoft. 

The Sunday Times praised the MSc in Cloud Computing programme, naming it one of the most consistent courses regarding employability.
For further information, visit the Cloud Competency Centre homepage on NCI’s website.