The research data of the 21st century is complex, multi-modal, voluminous and used in a multitude of ways with numerous obligations.
As part of a responsible research culture, these myriad of data activities need to be governed and controlled by creating standardised processes and by using systems and tools. But it can be difficult to think through all issues, as we plan, undertake and close out our research.
A systematic way of ensuring we have addressed all these issues is to use a value chain to group like activities together. This is a technique developed in corporate strategy by Michael Porter in the 1980s. Monash used this approach when thinking about the governance of health research data (see Health Research Data Governance Framework for a more detailed outline).
Research Data Value Chain
At each stage of the value chain, processes and products/ tools can be used to ensure that data is properly governed, and obligations are being met. You will note that each step in the value chain is essential to undertake your research, but not necessarily sequential.
Use the accordion below to explore what data activities are involved in each stage of the research data value chain.
These are the activities that clarify the clear purpose for why the data are being collected/ generated and will be part of the process of determining the research question and related methodology of defining the outcomes/ aims of your research.
The purpose of the data will be contained in documents such as:
research proposal
research protocol
thesis outline
business case for the research, or
articles of association / terms of reference/ collaboration agreement
You may need to use processes such as the Delphi Method to reach a consensus on what outcomes you are going to measure.
These are the activities involved in the secure generation, collection and capture of research data. These may involve data that is digital or analogue; primary or secondary.
There may be multiple pathways to obtain particular data and various workflows may be required.; Within these pathways, there may be a number of factors that need to be considered:
data systems that are used (e.g. survey tools, electronic laboratory notebooks, bespoke systems);
large infrastructure that is required and how data will be handled (e.g. microscopy, gene sequencing, imaging);
extensive experimental practice and workflows involved (e.g. animal research, materials);
staffing requirements and how access will be given and training provided (e.g. data collection staff, interviewers, laboratory staff); and
data provision/ sharing from third parties such as data linkage (e.g. what IP/ copyright impacts, data sharing requirements).
These are the activities involved in ensuring data is fit for purpose and of high quality i.e. complete, timely, accurate, consistent, relevant, reliable, traceable, cleaned, validated and well documented.
Each field of research will have different frameworks and ways of measuring data quality, but activities required may include:
inbuilt quality checks (including calibration),
manual verification checks and cleaning such as accounting for missing data
These are the activities involved in the systems, processes and frameworks for the analysis of data that enable synthesised information to be output. This output must be replicable, robust and scientifically sound.
Documentation of the analysis plan, the process undertaken and the data used will be required regardless of the processing, transforming, modelling and analysis techniques used (which will be specific to each field of research).
These are the activities involved in the publishing of research output (report, research paper, thesis, presentation, performance, artwork, etc.) for specific purposes in line with the purpose of the research and your obligations to funders and other parties.
This is the end result of the research which is shared with others to serve the public good.
This will often involve publishing data sets in data repositories and making data publicly available with limited controls (e.g. only licensing terms or open access).
These are the activities involved in the sharing of the research data outside the research team for reuse/ secondary purpose, with controls over how this data is shared (controls may be contractual, physical or systems).