At the CGIAR Big Data in Agriculture 2019 Convention, I moderated the “Good Practices Throughout the Research Data Life Cycle” panel.
Good Practice Throughout the Research Data Life Cycle Panel at the CGIAR Big Data in Agriculture 2019 Conveniton, photo courtesy of Abenet Yabowork (ILRI)
Over the past few years, all of the CGIAR Research Centers have been working to enhance their overall approach to research data management from the research design phase through to making research outputs findable, accessible, interoperable, and re-usable; this panel was designed to share good practices that could be adopted by other organizations.
I was joined by five colleagues from several of the CGIAR Research Centers:
- Henry Juarez – Systems and Data Management Officer (International Potato Center – CIP)
- Marie-Angelique Laporte – Associate Scientist (Bioversity International)
- Jacquie Muliro – Research Knowledge and Data Manager (WorldFish)
- Harrison Njamba – Data Systems Manager (International Livestock Research Institute – ILRI)
- Abhishek Rathore – Principal Scientist & Theme Leader for Statistics, Bio-Informatics & Data Management (International Crops Research Institute for the Semi-Arid Tropics – ICRISAT)
Some of the key themes highlighted by multiple panelists:
Organization-wide approaches to managing research data are needed. A key best practice highlighted by many of the panelists — and which is consistent with global trends and best practices — has been to move from ad hoc, individualized, and project-specific approaches to organization-wide approaches for research data management.
It is critical to support the entire life cycle and not only focus on data sharing and publishing data. With so much emphasis on open access, open data, and the FAIR principles, it is imperative that we not overlook all of the phases of the research life cycle that precede data sharing and publication.
Good practices start at time of project inception, during the proposal phase with realistic budgeting and planning, and they continue throughout all phases of data collection, data analysis, and beyond.
But re-use of data is critical. Big data offers the potential for aggregating, combining, re-using, and expanding upon research. But to do this, data must be findable, accessible, interoperable, and re-usable. Applying proper security measures to ensure the confidentiality, availability, and integrity of data; adopting of commonly-used ontologies; applying data curation techniques; and publishing data with machine-readable licenses all helps to achieve data re-use and the promise of big data.
Support from the top matters when it comes to changing organizational approaches to research data management. Shifting organizational culture and organization-wide approaches is always a daunting task, but leadership commitment to such a change is often a critical factor. When senior leaders and high-level managers demonstrate their commitment to new approaches to research data management, data governance, applying stronger security measures, using common ontologies, and investing time in data curation, the rest of the organization notices.
CGIAR Big Data in Agriculture 2019: Highlights of Good Practices from CGIAR Centers
Abhishek Rathore, Theme Leader for Statistics, Bio-Informatics & Data Management, provided a brief overview of how ICRISAT approaches research data management, including how a visual of the ICRISAT data ecosystem.
The Research Data Ecosystem at ICRISAT
Abishek noted that establishing the Statistics, Bioinformatics, and Data Management (SBDM) Unit dramatically changed data compliance and sharing within the organization. He explained that the SBDM unit works with researchers and projects from throughout ICRISAT to help design good data practices for projects. They look at how data will flow into a new project and can advise on all aspects of data management, including storage management.
Creating this centralized unit and having members of this unit work closely with project-based researchers was a critical success factor for ICRISAT to move towards putting good practices in place throughout the Institute.
Harrison Njamba, Data Systems Manager, talked about data collection and data analysis. In terms of data collection, Harrison noted that research projects increasingly have been shifting from paper-based to mobile data collection over the past few years. Mobile data collection is taking off, using tools such as ODK and KoboToolbox.
Harrison also highlighted some of the work the Research and Informatics Unit is doing to increase awareness of best practices for structuring data sets in order to make data as useful as possible — for instance, emphasizing using numeric codes rather than text when coding for STATA, R, and SPSS, and shifting towards adoption of common ontologies.
An important point stressed by Harrison: domain knowledge is critical for ontologies. Although ILRI has a centralized data management unit like many of the organizations represented on the panel, it is essential for researchers with subject matter expertise to be involved in determining how concepts are expressed in ontologies.
Following on Harrison’s comments regarding ontologies, Marie-Angelique Laporte, Associate Scientist and ontology expert, discussed the importance of using ontologies from the start of a project. Rather than trying to retro-fit ontologies onto data after it has been collected, today’s good practices emphasize designing data models using existing, widely-adopted ontologies such as the Crop Ontology, Gene Ontology, and Trait Ontology — in other words, using common terminology and field structure to describe traits, variables, and species across agricultural research projects to allow for interoperability between datasets.
Marie also highlighted the importance of proper planning and budgeting for data. She noted that with resources becoming strained, projects often are looking for ways to cut budgets. While projects are still looking for support for data management, they don’t want to pay for it — which is where support from leaders can make an impact. Bioversity is moving towards incorporating a line item for data management support in proposals for large projects. This is one way to formalize data management support throughout a project, rather than only budgeting for data sharing or publishing data at the end of a project.
Jacquie Muliro, the Research Knowledge and Data Manager, stressed the importance of leadership when it comes to data management. She noted that organizational thinking regarding data has evolved, that the organization now appreciates data as being valuable in itself. One major step in this evolution was ratifying an organizational Open Access/Data Management policy, one that was based on the CGIAR Open Access & Data Management Policy, but is unique for WorldFish.
In addition to the policy, WorldFish has invested in people, processes, and platforms. The organization has key platforms in place with Dataverse and DSpace, and the centralized data management team has been focused on writing easy-to-use documentation and processes. The goal is to make it as easy as possible for researchers to follow good practices and use the platforms WorldFish has put in place for data management.
Henry Juarez, the Systems and Data Management Officer, shared how CIP has aligned data management with the organization’s project life cycle, a formalized process for managing projects from beginning to end.
The Project Life Cycle and Data Life Cycle at CIP
Formalizing the roles and responsibilities and expectations for data management in connection to the project life cycle has been a big win for the organization. CIP also has an organization-specific Open Access & Data Management Policy as well as Research Data Management Guidelines and Procedures. Processes and practices are outlined in standalone annexes that support the Research Data Management Guidelines and Procedures, all of which link up to project life cycle phases.
Another good practice at CIP: data sprints. Data sprints have been held to encourage researchers to publish data sets with proper documentation. The event is a competition, with the winner receiving a prize.
While organizations have been solidifying good research practices throughout a project’s life cycle, work remains. None of the organizations represented on the panel are yet incorporating data management directly into performance evaluations, although all panelists indicated that long-term, sustainable incentives and/or accountability was needed.
All of the organizations indicated that they are still working to shift good practices earlier into the life cycle. Abishek noted that ICRISAT is in the process of redrafting their Open Access/Data Management Policy to require data management plans for all projects, whether or not a donor requests one. Furthermore, a data manager would need to approve the data management plan.
All of the panelists reiterated the message that if data is in good shape from the beginning, it is much easier to publish and get to re-use at the end of a project. Continued work is needed to support researchers from day 1!