Data and the infrastructure that is necessary to support their preservation, access, use, and reuse are increasing in size and complexity, making digital curation a critical research area at present. Digital curation refers to maintaining, preserving, and adding value to digital research data throughout their lifecycles. As a digital curation researcher, I study a broad range of important issues related to data and data sharing as well as data infrastructure and capacity. My research interests include: digital repositories, data sharing practices, mass digitization, research data management, trust, security, and users’ perceptions of archives and archival content.
Selected Current Projects
The Bridging the Gap between Scientists, Institutional Repositories, and Data Management Practices
In this Early Career Development project, I will conduct a three year empirical investigation into the use of data repositories by scientists. The research will investigate how institutional repositories (IRs), data management plans, and librarian expertise support the sharing and preservation of research data. The research will expand knowledge about scientists’ data needs and practices in domains where attitudes toward data sharing are currently evolving and shifting. The investigation will inform best practices for librarians who decide which data repositories to recommend to researchers, what features to add to IRs, when to use IRs for handling research data, and when alternative data repositories are more appropriate. I am the principal investigator of this research. I was awarded $330,408 from the the Institute of Museum and Library Services (IMLS) Laura Bush 21st Century Librarian Program to conduct this research.
The Library Capacity Assessment and Development for Big Data Curation Project
The goal of this project is to develop a conceptual framework for assessing libraries’ capacity for big data curation, which will be essential in implementing sustainable and scalable big data curation programs. Dr. Ayoung Yoon, Assistant Professor of Library and Information Science at IUPUI, is the PI on this project, and I am co-PI. Assessing capacity is a critical tool for planning, monitoring, and evaluating programs prior to defining outcomes or launching curation. From our findings, we hope to provide a foundation for developing a toolkit for academic and public libraries. Both types of libraries are increasingly facing challenges regarding big data and are expected to help preserve and provide access to these data. To develop a framework, we will begin by performing a systematic review of the literature on organizational capacity, data curation, and big data practices; a large-scale, online survey of libraries; and in-person focus groups. My prior experience with focus groups and surveys will help us understand the perceptions and perspectives of various stakeholders. Our research will provide academic and public libraries with a reference point when considering whether they are up to the challenge of curating big data. We were awarded a $49,773 planning grant from the Institute of Museum and Library Services (IMLS) to conduct this research.
The Impact of Trust in Archives on Trust in Archival Content in the Digital Age Project
For centuries, archives have served as valuable, dependable sources of information. Archives preserve documents that provide accountability of government, protect citizens’ rights, and solve historical puzzles. More recently, archives have begun digitizing large quantities of content to provide greater access and to address users’ information needs and preferences. Also, archives are increasingly collecting and preserving born-digital primary source materials as more organizations, governments, agencies, and individuals create records and documents in digital form. Although trust in records has been an area of concern in archival science research for quite some time, the digital environment raises new questions about trust in digital documents and records. In particular, research on users’ trust in digital archival content has begun to emerge, raising new questions about what trust means and how users interpret the term, as well as what influences users’ perceptions of trust in digital archival content, broadly defined. The objective of my research project is to develop a conceptual framework for understanding the influence of users’ trust in archives on their trust in digital archival content. This conceptual framework will be the first of its kind to combine perceptions of trust at the document level and the archive or repository level together in a unified framework. The project also includes a survey of over 2,000 archives users and potential archives users to test the framework. A report of the findings from this project is currently in press in Archivaria.
Selected Past Projects
The Media Digitization and Preservation Initiative Project
Indiana University has a strong record of commitment to audio preservation and has earned recognition as a national leader in research and development of best practices in the field of digital curation. The Media Digitization and Preservation Initiative (MDPI) is a massive project representing Indiana University’s comprehensive work to preserve historical and cultural time-based media for the research, education, and enrichment of future generations. The project involves digitizing time-based media that has been deemed to be of scholarly value by experts.
My research project centers on understanding how the MDPI project implements the Digital Curation Centre’s (DCC) Curation Lifecycle Model. The lifecycle model outlines very specific actions for digital curation. Projects/programs are to: 1) create or receive digital research data, 2) appraise and select digital research data, 3) ingest digital research data, 4) perform preservation actions on digital research data, 5) store research data, 6) ensure access, use, and the ability to reuse those data, and to 7) transform those data, either by migrating them into a different format or by creating a subset, by selection or query, to create newly derived results, perhaps for publication as necessary. I have presented findings from my in-depth, qualitative case study analysis of how the MDPI project approaches digital curation at the Research Data Alliance 9th plenary in Barcelona, Spain, Archival Education and Research Institute (AERI’2017) at the University of Toronto, Digital Directions’2017 in Seattle, WA, and the Black Doctoral Network Conference (BDN’2017) in Atlanta, GA. A peer-reviewed journal article based on these findings is published in the International Journal of Digital Curation (IJDC) (Donaldson, McClanahan, Christiansen, Bell, Narlock, Martin, & Suby, 2018).
This study contributes to the digital curation research literature in two specific ways. First, it adds to the body of research literature aimed at performing empirical tests of the DCC Curation Lifecycle Model. Recently, researchers have begun exploring the impact of the DCC Curation Lifecycle Model on understanding how digital curation is performed in various contexts with different types of digital data. These include, but are not limited to: video data in social studies of interaction, and brain images in psychiatric research. One unexplored research area involves the utility of the DCC Curation Lifecycle Model to act as a lens for understanding digital curation in mass digitization projects. This type of research is critical given the rise of mass digitization projects over the past decade, which is expected to increase within the cultural heritage and library services domains. Second, this study advances the concept of mass digitization by providing a more nuanced definition for the term. In addition to defining mass digitization as “conversion of materials on an industrial scale” as Coyle (2006, p. 641) suggests, I recommend consideration of six characteristics when determining whether a digitization project is a mass digitization project: 1) Aggregation and Production, 2) Openness, 3) Business Model and Cost, 4) Scope, 5) Format, and 6) Time Spent Digitizing. More research on and comparison of mass digitization projects could validate my proposed definition of mass digitization, or help to refine it.
The Perspectives on Sharing Neutron Data at Oak Ridge National Laboratory Project
My research on data sharing practices has focused on perspectives on sharing data that are very expensive to produce. The rationale for focusing on these data is that their expense could be further justified if more researchers can use these data beyond those who originally produced them. To start, I have focused on understanding the perspectives of data consumers, managers, and producers on sharing neutron data. I have partnered with Dr. Thomas Proffen, Director for Neutron Data Analysis and Visualization in the Neutron Sciences Directorate at Oak Ridge National Laboratory (ORNL) in Oak Ridge, TN, for this research. I found that the neutron scientists who participated in my study had an interest in reusing others’ data. They could imagine important scenarios for data reuse, including: 1) comparing or verifying the results of prior studies against their own measurements, and 2) testing new theories using existing data (Donaldson, Martin, & Proffen, 2017). This is a significant finding because within the field of neutron science as well as across many different scientific research communities, not all are convinced of the value of data sharing. Additionally, based on this study’s findings, I have produced a framework called the Consumers Managers Producers (CMP) Model for understanding the interplay of data consumers, managers, and producers regarding reuse of neutron data at ORNL. This model may be useful for describing the interactions of similar classes of stakeholders at other national laboratories where neutron data are produced. The CMP Model may also apply to other scientific domains that utilize expensive research data; however, more empirical data need to be collected to test the model in this regard. I have a research article that reports on the initial phase of this research in the SciDataCon special issue of Data Science Journal (Donaldson, Martin, & Proffen, 2017). I also gave a research presentation and poster on this project at the Research Data Alliance 8th plenary in Denver, CO during International Data Week. This research was funded by the United States Department of Energy.
The Perceived Value of Audit and Certification of Trustworthy Digital Repositories Project
Digital repository trustworthiness is one of the most pressing issues raised in digital curation research. Members of the digital curation research community understand that data will not preserve itself. Data infrastructure, digital repositories for example, have to be built with the goal of long term preservation in mind if that data are going to be accessible in the future. Also, members of the digital curation research community understand that anyone can say that a digital repository is trustworthy. It is much harder and more important to provide evidence to prove that any organization that is responsible for preserving and protecting data for the long term is actually able to do so. The Data Seal of Approval (DSA) is one of the most widely used standards for Trusted Digital Repositories to date. Those who developed this standard have articulated seven main benefits of acquiring DSAs: 1) Stakeholder confidence, 2) Improvements in communication, 3) Improvement in processes, 4) Transparency, 5) Differentiation from others, 6) Awareness raising about digital preservation, and 7) Less labor- and time-intensive. Little research has focused on whether and how those who have acquired DSAs actually perceive these benefits. Consequently, my study examines the benefits of acquiring DSAs from the point of view of those who have them (Donaldson, Dillo, Downs, & Ramdeen, 2017). In a series of 15 semi-structured interviews with representatives from 16 different organizations, participants described the benefits of having DSAs. Findings suggest that participants experience all seven benefits that those who developed the standard promised. Additionally, the findings reflect the greater importance of some of those benefits as compared to others. For example, participants mentioned the benefits of Stakeholder confidence, Transparency, Improvement in processes and Awareness raising about digital preservation more frequently than they discussed Less labor- and time-intensive (e.g., it being less labor- and time-intensive to acquire DSAs than becoming certified by other standards), Improvements in communication, and Differentiation from others. Participants also mentioned two additional benefits of acquiring DSAs that are not explicitly listed on the DSA website that were very important to them: 1) the impact of acquiring the DSA on documentation of their workflows, and 2) assurance that they were following best practice. A report of the study, including implications and future directions for research are discussed in a peer-reviewed article in IJDC. I also presented posters on this research at the Research Data Alliance 6th plenary in Paris, France and the Research Data Alliance 7th plenary in Tokyo, Japan. I also delivered a presentation on this research at the Archival Education and Research Institute at Kent State University (AERI’2016). This is the first in-depth, empirical systematic analysis of the perceived benefit of certification of TDRs. This research is important because it helps the digital curation community to understand the extent to which all of the effort to establish TDRs actually has value to those who undergo audit and certification.
The Securing Trustworthy Digital Repositories Project
Digital repositories are essential infrastructures for the preservation of digital research data. Digital repositories must prove that they are trustworthy in the sense that they are actually able to preserve digital materials for the long-term. The digital curation community has developed standards with criteria that must be met in order for digital repositories to attain “trustworthy” status. Part of what it means for a digital repository to be trustworthy is for it to be secure. One understudied area in this regard is how those who are responsible for managing and securing digital repositories think about the concept of security and the security criteria in standards for Trustworthy Digital Repositories (TDRs). This is important because how staff members think about security may affect their approach toward securing their digital repositories. I have researched this topic with colleagues across different academic disciplines, bringing together computer scientists, librarians, and archivists within as well as outside the United States. I found empirical, statistical support for staff members who are responsible for managing and securing TDRs being more concerned about integrity as opposed to the availability or confidentiality of the digital resources under their care. These findings are based on participants’ responses to a survey I developed that is useful for understanding digital repository staff members’ attitudes about three central principles of security as defined in the computer science research literature: confidentiality, integrity, and availability. This research appears in proceedings of the 13th International Conference on Digital Preservation (iPres’2016) (Donaldson, Hill, Dowding, & Keitel, 2016).