Data and the infrastructure that is necessary to support their preservation, access, use, and reuse are increasing in size and complexity, making digital curation a critical research area at present. Digital curation refers to maintaining, preserving, and adding value to digital research data throughout their lifecycles. As a digital curation researcher, I study a broad range of important issues related to data and data sharing as well as data infrastructure and capacity. My research interests include: digital repositories, data sharing practices, mass digitization, research data management, trust, security, and users’ perceptions of archives and archival content.
Selected Current Projects
The Bridging the Gap between Scientists, Institutional Repositories, and Data Management Practices
In this Early Career Development project, I will conduct a three year empirical investigation into the use of data repositories by scientists. The research will investigate how institutional repositories (IRs), data management plans, and librarian expertise support the sharing and preservation of research data. The research will expand knowledge about scientists’ data needs and practices in domains where attitudes toward data sharing are currently evolving and shifting. The investigation will inform best practices for librarians who decide which data repositories to recommend to researchers, what features to add to IRs, when to use IRs for handling research data, and when alternative data repositories are more appropriate. I am the principal investigator of this research. I was awarded $330,408 from the the Institute of Museum and Library Services (IMLS) Laura Bush 21st Century Librarian Program to conduct this research.
The Library Capacity Assessment and Development for Big Data Curation Project
The goal of this project is to develop a conceptual framework for assessing libraries’ capacity for big data curation which is essential for implementing sustainable and scalable big data curation programs. To develop our framework, we are performing a systematic review of the literature on organizational capacity, data curation, and big data practices (Yoon & Donaldson, in preparation); a national online survey of libraries (Yoon & Donaldson, 2019); and in-person focus groups with various classes of stakeholders (e.g., library staff, information technology (IT) staff, and researchers who collect and/or analyze big data) to identify relevant dimensions of capacity for big data curation from their perspectives (Donaldson & Yoon, in preparation). From our findings, we hope to provide a foundation for developing a toolkit for academic and public libraries since both types of libraries are increasingly facing challenges regarding big data and are increasingly expected to help preserve and provide access to these data. Dr. Ayoung Yoon, Assistant Professor of Library and Information Science at IUPUI, is the PI on this project, and I am co-PI. This $49,773 award is funded by the National Leadership Grants for Libraries (NLG-L) program of the Institute of Museum and Library Services (#LG-72-17-0139-17).
The Impact of Trust in Archives on Trust in Archival Content in the Digital Age Project
For centuries, archives have served as valuable, dependable sources of information. Archives preserve documents that provide accountability of government, protect citizens’ rights, and solve historical puzzles. More recently, archives have begun digitizing large quantities of content to provide greater access and to address users’ information needs and preferences. Also, archives are increasingly collecting and preserving born-digital primary source materials as more organizations, governments, agencies, and individuals create records and documents in digital form. Although trust in records has been an area of concern in archival science research for quite some time, the digital environment raises new questions about trust in digital documents and records. In particular, research on users’ trust in digital archival content has begun to emerge, raising new questions about what trust means and how users interpret the term, as well as what influences users’ perceptions of trust in digital archival content, broadly defined. The objective of my research project is to develop a conceptual framework for understanding the influence of users’ trust in archives on their trust in digital archival content. This conceptual framework will be the first of its kind to combine perceptions of trust at the document level and the archive or repository level together in a unified framework. The project also includes a survey of over 2,000 archives users and potential archives users to test the framework. A report of the findings from this project is currently in press in Archivaria.
Selected Past Projects
The Media Digitization and Preservation Initiative Project
The Media Digitization and Preservation Initiative (MDPI) is a massive project representing Indiana University’s comprehensive work to preserve historical and cultural time-based media for the research, education, and enrichment of future generations. It involves digitizing time-based media that has been deemed to be of scholarly value by experts. My research centers on understanding how the MDPI implements the Digital Curation Centre’s (DCC) Curation Lifecycle Model (Higgins, 2008). Since its creation nearly a decade ago, the Digital Curation Centre (DCC) Curation Lifecycle Model has become the quintessential framework for understanding digital curation. It outlines specific actions for digital curation. Projects/programs are to: 1) create or receive digital research data, 2) appraise and select digital research data, 3) ingest digital research data, 4) perform preservation actions on digital research data, 5) store research data, 6) ensure access, use, and the ability to reuse those data, and to 7) transform those data, either by migrating them into a different format or by creating a subset, by selection or query, to create newly derived results, perhaps for publication as necessary. Findings from my research underscore the success of MDPI in performing digital curation by illustrating the ways it implements each of the model’s components.
This study contributes to the digital curation research literature in two specific ways. First, it adds to the body of research literature aimed at performing empirical tests of the DCC Curation Lifecycle Model. Second, this study advances the concept of mass digitization by providing a more nuanced definition for the term. In addition to defining mass digitization as “conversion of materials on an industrial scale” as Coyle (2006, p. 641) suggests, I recommend considering six characteristics when determining whether a digitization project is a mass digitization project: 1) Aggregation and Production, 2) Openness, 3) Business Model and Cost, 4) Scope, 5) Format, and 6) Time Spent Digitizing. More research on and comparison of mass digitization projects could validate my proposed definition of mass digitization, or help to refine it.
A peer-reviewed journal article based on these findings is published in the International Journal of Digital Curation (IJDC) (Donaldson, McClanahan, Christiansen, Bell, Narlock, Martin, & Suby, 2018).
The Perspectives on Sharing Neutron Data at Oak Ridge National Laboratory Project
My research on data sharing practices has focused on perspectives on sharing data that are very expensive to produce. The rationale for focusing on these data is that their expense could be further justified if more researchers can use these data beyond those who originally produced them. To start, I have focused on understanding the perspectives of data consumers, managers, and producers on sharing neutron data. I found that the neutron scientists who participated in my study had an interest in reusing others’ data. They could imagine important scenarios for data reuse, including: 1) comparing or verifying the results of prior studies against their own measurements, and 2) testing new theories using existing data (Donaldson, Martin, & Proffen, 2017). This is a significant finding because within the field of neutron science as well as across many different scientific research communities, not all are convinced of the value of data sharing.
Additionally, based on this study’s findings, I have produced a framework called the Consumers Managers Producers (CMP) Model for understanding the interplay of data consumers, managers, and producers regarding reuse of neutron data at ORNL. This model may be useful for describing the interactions of similar classes of stakeholders at other national laboratories where neutron data are produced. The CMP Model may also apply to other scientific domains that utilize expensive research data; however, more empirical data need to be collected to test the model in this regard. I have a research article that reports on the initial phase of this research in the SciDataCon special issue of Data Science Journal (Donaldson, Martin, & Proffen, 2017). I also gave a research presentation and poster on this project at the Research Data Alliance 8th plenary in Denver, CO during International Data Week. This research was funded by the United States Department of Energy.
The Perceived Value of Audit and Certification of Trustworthy Digital Repositories Project
Digital repository trustworthiness is one of the most pressing issues raised in digital curation research. Members of the digital curation research community understand that data will not preserve itself. Data infrastructure, digital repositories for example, have to be built with the goal of long term preservation in mind if that data are going to be accessible in the future. Also, members of the digital curation research community understand that anyone can say that a digital repository is trustworthy. It is much harder and more important to provide evidence to prove that any organization that is responsible for preserving and protecting data for the long term is actually able to do so. The Data Seal of Approval (DSA) is one of the most widely used standards for Trusted Digital Repositories to date. Those who developed this standard have articulated seven main benefits of acquiring DSAs: 1) Stakeholder confidence, 2) Improvements in communication, 3) Improvement in processes, 4) Transparency, 5) Differentiation from others, 6) Awareness raising about digital preservation, and 7) Less labor- and time-intensive. Little research has focused on whether and how those who have acquired DSAs actually perceive these benefits. Consequently, my study examines the benefits of acquiring DSAs from the point of view of those who have them (Donaldson, Dillo, Downs, & Ramdeen, 2017). In a series of 15 semi-structured interviews with representatives from 16 different organizations, participants described the benefits of having DSAs. Findings suggest that participants experience all seven benefits that those who developed the standard promised. Additionally, the findings reflect the greater importance of some of those benefits as compared to others. For example, participants mentioned the benefits of Stakeholder confidence, Transparency, Improvement in processes and Awareness raising about digital preservation more frequently than they discussed Less labor- and time-intensive (e.g., it being less labor- and time-intensive to acquire DSAs than becoming certified by other standards), Improvements in communication, and Differentiation from others. Participants also mentioned two additional benefits of acquiring DSAs that are not explicitly listed on the DSA website that were very important to them: 1) the impact of acquiring the DSA on documentation of their workflows, and 2) assurance that they were following best practice. A report of the study, including implications and future directions for research are discussed in a peer-reviewed article in IJDC. I also presented posters on this research at the Research Data Alliance 6th plenary in Paris, France and the Research Data Alliance 7th plenary in Tokyo, Japan. I also delivered a presentation on this research at the Archival Education and Research Institute at Kent State University (AERI’2016). This is the first in-depth, empirical systematic analysis of the perceived benefit of certification of TDRs. This research is important because it helps the digital curation community to understand the extent to which all of the effort to establish TDRs actually has value to those who undergo audit and certification.
The Securing Trustworthy Digital Repositories Project
In addition to studying the benefits of audit and certification of TDRs, I have investigated how those who are responsible for managing and securing digital repositories think about the concept of security and the security criteria in standards for TDRs. This is important because how staff members think about security may affect their approach toward securing their digital repositories. I conducted surveys and interviews with digital repository staff members to understand their attitudes about three central principles of security as defined in the computer science research literature: confidentiality, integrity, and availability. I found empirical, statistical support for digital repository staff members being more concerned about integrity as opposed to the availability or confidentiality of the digital resources under their care. I also found that the digital repository staff that participated in my studies considered security as a prerequisite for trustworthiness. A peer-reviewed conference paper based on these findings is published in proceedings of the 13th International Conference on Digital Preservation (iPres’2016) (Donaldson, Hill, Dowding, & Keitel, 2016).