At the CGIAR Big Data in Agriculture 2019 Convention, FireOak Strategies moderated the “Good Practices Throughout the Research Data Life Cycle” panel.
Over the past few years, all of the CGIAR Research Centers have been working to enhance their overall approach to research data management from the research design phase through to making research outputs findable, accessible, interoperable, and re-usable; this panel was designed to share good practices that could be adopted by other organizations.
The panelists included colleagues from several of the CGIAR Research Centers:
- Henry Juarez – Systems and Data Management Officer (International Potato Center – CIP)
- Marie-Angelique Laporte – Associate Scientist (Bioversity International)
- Jacquie Muliro – Research Knowledge and Data Manager (WorldFish)
- Harrison Njamba – Data Systems Manager (International Livestock Research Institute – ILRI)
- Abhishek Rathore – Principal Scientist & Theme Leader for Statistics, Bio-Informatics & Data Management (International Crops Research Institute for the Semi-Arid Tropics – ICRISAT)
Key themes highlighted by multiple panelists:
- Organization-wide approaches to managing research data are needed. A key best practice highlighted by many panelists—and consistent with global trends and best practices—has been to move from ad hoc, individualized, and project-specific approaches to organization-wide approaches for research data management.
- Supporting the entire life cycle is critical—not only focusing on data sharing and publishing data. Emphasis on open access, open data, and the FAIR principles should not overlook the phases of the research life cycle that precede data sharing and publication.
- Good practices start at project inception, during the proposal phase with realistic budgeting and planning, and continue throughout all phases of data collection, data analysis, and beyond.
- Data re-use is critical. Big data offers the potential for aggregating, combining, re-using, and expanding upon research. To enable re-use, data must be findable, accessible, interoperable, and re-usable. Applying proper security measures, adopting commonly-used ontologies, applying data curation techniques, and publishing data with machine-readable licenses all support data re-use.
- Support from organizational leadership matters. Changing culture and organization-wide approaches is daunting, but leadership commitment is often a critical success factor. When senior leaders demonstrate their commitment to research data management, data governance, security, and curation, the rest of the organization notices.
CGIAR Big Data in Agriculture 2019: Highlights of Good Practices from CGIAR Centers
International Crops Research Institute for Semi-Arid Tropics (ICRISAT)
Abhishek Rathore, Theme Leader for Statistics, Bio-Informatics & Data Management, provided an overview of ICRISAT's approach. Establishing the Statistics, Bioinformatics, and Data Management (SBDM) Unit dramatically improved data compliance and sharing. The SBDM unit advises researchers and projects throughout ICRISAT on designing effective data practices, including planning for data flow and storage. Creating a centralized unit to work closely with project-based researchers has been critical for implementing good practices institute-wide.
International Livestock Research Institute (ILRI)
Harrison Njamba, Data Systems Manager, discussed the shift from paper-based to mobile data collection (using tools such as ODK and KoboToolbox). ILRI emphasizes best practices for structuring data sets to maximize utility—such as using numeric codes over text and adopting common ontologies. Domain knowledge is essential for designing effective ontologies, and researchers with subject matter expertise need to be involved.
Bioversity International
Marie-Angelique Laporte, Associate Scientist and ontology expert, highlighted the importance of using established ontologies from the start of a project. Good practices now emphasize designing data models using widely-adopted ontologies (e.g., Crop, Gene, Trait Ontology) to enable interoperability across datasets. She also addressed the need for proper planning and budgeting for data management, suggesting that formalizing support for data management in project proposals is vital.
WorldFish
Jacquie Muliro, Research Knowledge and Data Manager, emphasized the role of leadership in driving data management policy. WorldFish ratified its own Open Access/Data Management policy, based on CGIAR guidelines but customized for the organization. Investments in people, processes, and platforms (such as Dataverse and DSpace) have supported this shift. The goal is to make good practices easy for researchers through clear documentation and efficient processes.
The International Potato Center (CIP)
Henry Juarez, Systems and Data Management Officer, described how CIP aligns data management with the project life cycle, formalizing roles, responsibilities, and expectations. CIP maintains an organizational Open Access & Data Management Policy and dedicated Research Data Management Guidelines and Procedures. “Data sprints” encourage researchers to publish data sets with proper documentation, with prizes awarded for participation.
Moving Forward
While organizations have advanced good research practices across project life cycles, work remains. None have yet incorporated data management into performance evaluations, though there is agreement on the need for sustainable incentives and accountability. All centers are working to shift best practices earlier into their project cycles. For example, ICRISAT is redrafting its Open Access/Data Management Policy to require data management plans for all projects, regardless of donor mandates. Data managers will approve these plans.
The consensus: If data is managed well from the beginning, it is much easier to publish and promote re-use at the end of a project. Continued, proactive support for researchers from day one remains essential.