In this interview Amy Nurnberger, who is the Program Head, Data Management Services and Interim Department Head, Data and Specialized Services at MIT Libraries, answers questions from Katie Mika about the importance of creating open infrastructure to share and, crucially, cite research data. Amy discusses:
A current MIT project to automatically index research data outputs published by Institute affiliates in open repositories around the world;
The importance of creating and upholding systems of openness and equity in open access movements;
The value of normalizing data citation for developing an understanding of data use and sharing practices; and
Resisting impulses to treat potential or emerging data citation metrics like existing article citation metrics which may replicate existing systemic problems.
Thank you for the invitation to participate! My name is Amy Nurnberger (she/her), and I currently hold the roles at the MIT Libraries of Program Head, Data Management Services, and Interim Department Head, Data and Specialized Services. At MIT, I also co-chair the ad-hoc Research Data Group and the Open Access Task Force Implementation Team.
Outside of MIT, I am an adjunct assistant professor at Teachers College, Columbia University in the Learning Analytics program, teaching management of education data. I also represent MIT Libraries on the Research Data Alliance’s Organisational Advisory Board and contribute to a variety of editorial boards and advisory committees.
Perhaps needless to say, but the following are my opinions, and not those of any associated organizations. In all of these roles, my overarching goal is to increase the impact and effectiveness of research through good management and stewardship of diverse research outputs, with a focus on research data (broadly interpreted). Bringing about good management and stewardship means focusing on infrastructure, which comprises people, policies and technologies, and systems, which include individuals, teams, practices, and processes, in context and in interaction. Implicit in all of this work are the principles of openness, equity, and respect. Repeatedly, scholarly systems are demonstrably more successful in achieving both research and societal goals when they strive for openness and equity [PDF] while respecting their stakeholders and collaborators.
As you can tell from the names, our main focus is around issues concerning research data. However, we also collaborate closely with our colleagues in Scholarly Communications and Collections Strategy (SCCS) in the larger space of Open Scholarship. The connecting theme is that in order for scholarship to be shared, built on, and extended productively it must be accomplished and promulgated in ways that support these activities rather than raise barriers to them. Within the space of research data, we actively support training, guidance, practices, and tools that contribute to the production of data and research outputs that can be made open, in ways that align with the FAIR principles of findable, accessible, interoperable and reusable for our research community. We also participate in conversations at the institutional, regional, national, and global levels around effecting Open Data and Open Scholarship at each of these levels. Internally, we work to build awareness of the many roles that libraries play in the research data and Open Data spaces. One project we are currently working on across the MIT Libraries, and with select faculty partners, is creating a tool that can automatically index MIT-produced research data products from open repositories. We think this is an exciting and useful way to demonstrate the breadth of what is available and the importance of Open Data.
The types of open access that work in partnership between stakeholders to support researcher and societal goals, that create and uphold systems of openness and equity, and that allow a diversity of participants and perspectives. Latin America has some exemplary models from which we should be actively learning more.
In no way do I think I can explain it more effectively and completely than the Force11 Data Citation Principles and Make Data Count project have! The first and most necessary realization is that data, or the information collected, assembled, derived, created, etc. during the course of research and scholarship, are the foundations of any subsequent findings. Without understanding where these come from, or how they are used, we lose a large part of the provenance of scholarship, and it makes the whole ‘standing on giants’ shoulders and seeing further’ thing much more difficult! Briefly, normalizing data citation is a necessary first step for developing an understanding of data use and sharing practices as part of the conduct of scholarship and the scholarly record, positioning data as a first class research object, and creating incentive systems around the data use and sharing practices that most benefit the conduct of scholarship.
There could be, but currently they would be a poor representation of, well, anything. The basic research necessary to develop meaningful metrics around data citation has been difficult to do because of the lack of normalized data citation practices.
The initial challenge is creating a normalized and open practice of data citation that can be tracked and on which bibliometric research can be conducted. Following that, we must learn from current systems of tracking article citations and not recapitulate their issues.
To start, we must fight the urge to treat data citation metrics like article citation metrics, and simply repeat all the issues of misrepresentation, misinterpretation, bias, and inequity present in that system. Building on that, their use in incentive systems must be considered so the metrics created actually reward the behavior we hope to see in the conduct of scholarship when it comes to data use and sharing.
Thank you, again, for inviting me to participate! As I mentioned, we are working on a tool that will automatically index MIT-produced research data. Tools like this will help us understand the breadth and depth of our research endeavors and allow the scholarly and public community better access and understanding of what MIT offers when it says,
The Institute is committed to generating, disseminating, and preserving knowledge, and to working with others to bring this knowledge to bear on the world’s great challenges.
As a starting point, the tool will harvest metadata hosted in select open repositories, but we hope to extend its effective reach in collaboration with other members of the scholarly community who may be interested in developing similar tools for their institutions (please, contact me!). Truly, the only way tools like this will realize their potential is when data citation becomes normalized, and we can definitively incorporate data use and sharing into the scholarly incentives system.
Text: © 2021 the President and Fellows of Harvard College and Amy Nurnberger, and licensed under a Creative Commons Attribution (CC BY 4.0) license