Data and the infrastructure that is necessary to support their preservation, access, use, and reuse are increasing in size and complexity, making digital curation a critical research area at present. Digital curation refers to maintaining, preserving, and adding value to digital research data throughout their lifecycles. As a digital curation researcher, I study a broad range of important issues related to data and data sharing as well as data infrastructure and capacity. My research interests include: digital repositories, data sharing practices, mass digitization, research data management, trust, security, and users’ perceptions of archives and archival content.
My research agenda primarily focuses on trust as a social phenomenon. Within the field of digital curation, research has shown that trust operates at different levels and has different definitions at each level, and yet, all of these notions of trust are interrelated. For example, trust in a digital repository is related to the reputation of the organization responsible for that repository, while trust in content found within a digital repository is related to the reputation of the author of that information. Even though trust in repositories and trust in information differ, there is a relationship between the two. For example, trust in a digital repository can positively affect trust in content found within that repository. My research seeks to better understand: 1) what it means for a digital repository to be trustworthy, 2) what it means for content found within a digital repository to be perceived as trustworthy, and 3) what type of relationship can exist between trust at both levels (e.g., the repository level and the document or content level). To approach these questions from a variety of different perspectives, I have engaged in four research projects:
1. The Digitized Archival Document Trustworthiness Scale (DADTS) Project,
2. The Perceived Value of Audit and Certification of Trustworthy Digital Repositories Project,
3. The Securing Trustworthy Digital Repositories Project, and
4. The Impact of Trust in Archives on Trust in Archival Content in the Digital Age Project.
In addition to trust issues, I also research topics more broadly related to data sharing practices, capacity, and mass digitization. My projects related to these topics include:
5. The Perspectives on Sharing Neutron Data at Oak Ridge National Laboratory Project,
6. The Library Capacity Assessment and Development for Big Data Curation Project,
7. The Media Digitization and Preservation Initiative Project.
1. The Digitized Archival Document Trustworthiness Scale (DADTS) Project
A digital repository is thought to be successful at preserving content (i.e., perceived as trustworthy) when the designated community of users for whom the content is being preserved and made accessible can understand and use the content. Because one first has to understand content in order to judge its trustworthiness, measuring the perceived trustworthiness of content found within a digital repository can act as a proxy for measuring a repository’s effectiveness in preservation (i.e., its trustworthiness). Thus, I created a research project centering on developing, testing, and assessing a scale for measuring trustworthiness perception. A report of the project findings was published in the 11th volume and 1st Number of the International Journal of Digital Curation (IJDC)(Donaldson, 2016). It presents empirical, statistical results regarding development of an original scale I developed for measuring genealogists’ concept of trustworthiness as it pertains to digitized genealogical records – The Digitized Archival Document Trustworthiness Scale (DADTS). Since no bona fide measures of the understandability of preserved information by a designated community exist, DADTS represents a novel approach to assessing users’ perception of content. Specifically, digital curators can use their designated community members’ ratings of DADTS items as evidence of perceived understandability thereby addressing criteria in standards for Trustworthy Digital Repositories related to understanding and monitoring designated communities as well as ensuring the understandability and usability of preserved information by designated communities.
2. The Perceived Value of Audit and Certification of Trustworthy Digital Repositories Project
Digital repository trustworthiness is one of the most pressing issues raised in digital curation research. Members of the digital curation research community understand that data will not preserve itself. Data infrastructure, digital repositories for example, have to be built with the goal of long term preservation in mind if that data are going to be accessible in the future. Also, members of the digital curation research community understand that anyone can say that a digital repository is trustworthy. It is much harder and more important to provide evidence to prove that any organization that is responsible for preserving and protecting data for the long term is actually able to do so. The Data Seal of Approval (DSA) is one of the most widely used standards for Trusted Digital Repositories to date. Those who developed this standard have articulated seven main benefits of acquiring DSAs: 1) Stakeholder confidence, 2) Improvements in communication, 3) Improvement in processes, 4) Transparency, 5) Differentiation from others, 6) Awareness raising about digital preservation, and 7) Less labor- and time-intensive. Little research has focused on whether and how those who have acquired DSAs actually perceive these benefits. Consequently, my study examines the benefits of acquiring DSAs from the point of view of those who have them (Donaldson, Dillo, Downs, & Ramdeen, in press). In a series of 15 semi-structured interviews with representatives from 16 different organizations, participants described the benefits of having DSAs. Findings suggest that participants experience all seven benefits that those who developed the standard promised. Additionally, the findings reflect the greater importance of some of those benefits as compared to others. For example, participants mentioned the benefits of Stakeholder confidence, Transparency, Improvement in processes and Awareness raising about digital preservation more frequently than they discussed Less labor- and time-intensive (e.g., it being less labor- and time-intensive to acquire DSAs than becoming certified by other standards), Improvements in communication, and Differentiation from others. Participants also mentioned two additional benefits of acquiring DSAs that are not explicitly listed on the DSA website that were very important to them: 1) the impact of acquiring the DSA on documentation of their workflows, and 2) assurance that they were following best practice. A report of the study, including implications and future directions for research are discussed in a peer-reviewed article that is currently in press in the International Journal of Digital Curation. I also presented posters on this research at the Research Data Alliance 6th plenary in Paris, France and the Research Data Alliance 7th plenary in Tokyo, Japan. I also delivered a presentation on this research at the Archival Education and Research Institute at Kent State University (AERI’2016). This is the first in-depth, empirical systematic analysis of the perceived benefit of certification of TDRs. This research is important because it helps the digital curation community to understand the extent to which all of the effort to establish TDRs actually has value to those who undergo audit and certification.
3. The Securing Trustworthy Digital Repositories Project
Digital repositories are essential infrastructures for the preservation of digital research data. Digital repositories must prove that they are trustworthy in the sense that they are actually able to preserve digital materials for the long-term. The digital curation community has developed standards with criteria that must be met in order for digital repositories to attain “trustworthy” status. Part of what it means for a digital repository to be trustworthy is for it to be secure. One understudied area in this regard is how those who are responsible for managing and securing digital repositories think about the concept of security and the security criteria in standards for Trustworthy Digital Repositories (TDRs). This is important because how staff members think about security may affect their approach toward securing their digital repositories. I have begun researching this topic with colleagues across different academic disciplines, bringing together computer scientists, librarians, and archivists within as well as outside the United States. Thus far, I have found empirical, statistical support for staff members who are responsible for managing and securing TDRs being more concerned about integrity as opposed to the availability or confidentiality of the digital resources under their care. These findings are based on participants’ responses to a survey I recently developed that is useful for understanding digital repository staff members’ attitudes about three central principles of security as defined in the computer science research literature: confidentiality, integrity, and availability. To date, this research appears in proceedings of the 13th International Conference on Digital Preservation (iPres’2016) (Donaldson, Hill, Dowding, & Keitel, 2016).
4. The Impact of Trust in Archives on Trust in Archival Content in the Digital Age Project
For centuries, archives have served as valuable, dependable sources of information. Archives preserve documents that provide accountability of government, protect citizens’ rights, and solve historical puzzles. More recently, archives have begun digitizing large quantities of content to provide greater access and to address users’ information needs and preferences. Also, archives are increasingly collecting and preserving born-digital primary source materials as more organizations, governments, agencies, and individuals create records and documents in digital form. Although trust in records has been an area of concern in archival science research for quite some time, the digital environment raises new questions about trust in digital documents and records. In particular, research on users’ trust in digital archival content has begun to emerge, raising new questions about what trust means and how users interpret the term, as well as what influences users’ perceptions of trust in digital archival content, broadly defined. The objective of my research project is to develop a conceptual framework for understanding the influence of users’ trust in archives on their trust in digital archival content. This conceptual framework will be the first of its kind to combine perceptions of trust at the document level and the archive or repository level together in a unified framework. The project also includes semi-structured interviews and surveys to test the framework and to further assess the impact of trust in archives on users’ trust in digital archival content. The initial foundation for the framework that I plan to develop for this project is described in an article I wrote that is currently in press in the Midwest Archives Conference (MAC) Newsletter (Donaldson, 2018).
5. The Perspectives on Sharing Neutron Data at Oak Ridge National Laboratory Project
My research on data sharing practices has focused on perspectives on sharing data that are very expensive to produce. The rationale for focusing on these data is that their expense could be further justified if more researchers can use these data beyond those who originally produced them. To start, I have focused on understanding the perspectives of data consumers, managers, and producers on sharing neutron data. I have partnered with Dr. Thomas Proffen, Director for Neutron Data Analysis and Visualization in the Neutron Sciences Directorate at Oak Ridge National Laboratory (ORNL) in Oak Ridge, TN, for this research. Thus far, I have found that the neutron scientists who participated in my study have an interest in reusing others’ data. They could imagine important scenarios for data reuse, including: 1) comparing or verifying the results of prior studies against their own measurements, and 2) testing new theories using existing data (Donaldson, Martin, & Proffen, 2017). This is a significant finding because within the field of neutron science as well as across many different scientific research communities, not all are convinced of the value of data sharing. Additionally, based on this study’s findings, I have produced a framework called the Consumers Managers Producers (CMP) Model for understanding the interplay of data consumers, managers, and producers regarding reuse of neutron data at ORNL. This model may be useful for describing the interactions of similar classes of stakeholders at other national laboratories where neutron data are produced. The CMP Model may also apply to other scientific domains that utilize expensive research data; however, more empirical data need to be collected to test the model in this regard. I have a research article that reports on the initial phase of this research in the SciDataCon special issue of Data Science Journal (Donaldson, Martin, & Proffen, 2017). I also gave a research presentation and poster on this project at the Research Data Alliance 8th plenary in Denver, CO during International Data Week. While this research is currently funded by the United States Department of Energy, I intend to apply for funding from additional federal funding agencies, such as the National Science Foundation, as this research continues.
6. The Library Capacity Assessment and Development for Big Data Curation Project
The goal of this project is to develop a conceptual framework for assessing libraries’ capacity for big data curation, which will be essential in implementing sustainable and scalable big data curation programs. Dr. Ayoung Yoon, Assistant Professor of Library and Information Science at IUPUI, is the PI on this project, and I am co-PI. Assessing capacity is a critical tool for planning, monitoring, and evaluating programs prior to defining outcomes or launching curation. From our findings, we hope to provide a foundation for developing a toolkit for academic and public libraries. Both types of libraries are increasingly facing challenges regarding big data and are expected to help preserve and provide access to these data. To develop a framework, we will begin by performing a systematic review of the literature on organizational capacity, data curation, and big data practices; a large-scale, online survey of libraries; and in-person focus groups. My prior experience with focus groups and surveys will help us understand the perceptions and perspectives of various stakeholders. Our research will provide academic and public libraries with a reference point when considering whether they are up to the challenge of curating big data. We were just recently awarded a one-year, $49,773 planning grant from the Institute of Museum and Library Services (IMLS) to conduct this research.
7. The Media Digitization and Preservation Initiative Project
Indiana University has a strong record of commitment to audio preservation and has earned recognition as a national leader in research and development of best practices in the field of digital curation. The Media Digitization and Preservation Initiative (MDPI) is a massive project representing Indiana University’s comprehensive work to preserve historical and cultural time-based media for the research, education, and enrichment of future generations. The project involves digitizing time-based media that has been deemed to be of scholarly value by experts.
My research project centers on understanding how the MDPI project implements the Digital Curation Centre’s (DCC) Curation Lifecycle Model. The lifecycle model outlines very specific actions for digital curation. Projects/programs are to: 1) create or receive digital research data, 2) appraise and select digital research data, 3) ingest digital research data, 4) perform preservation actions on digital research data, 5) store research data, 6) ensure access, use, and the ability to reuse those data, and to 7) transform those data, either by migrating them into a different format or by creating a subset, by selection or query, to create newly derived results, perhaps for publication as necessary. I have presented findings from my in-depth, qualitative case study analysis of how the MDPI project approaches digital curation at the Research Data Alliance 9th plenary in Barcelona, Spain, Archival Education and Research Institute (AERI’2017) at the University of Toronto, Digital Directions’2017 in Seattle, WA, and the Black Doctoral Network Conference (BDN’2017) in Atlanta, GA. A peer-reviewed journal article based on these findings is currently under review in IJDC (Donaldson, McClanahan, Christiansen, Bell, Narlock, Martin, & Suby, under review).
This study contributes to the digital curation research literature in two specific ways. First, it adds to the body of research literature aimed at performing empirical tests of the DCC Curation Lifecycle Model. Recently, researchers have begun exploring the impact of the DCC Curation Lifecycle Model on understanding how digital curation is performed in various contexts with different types of digital data. These include, but are not limited to: video data in social studies of interaction, and brain images in psychiatric research. One unexplored research area involves the utility of the DCC Curation Lifecycle Model to act as a lens for understanding digital curation in mass digitization projects. This type of research is critical given the rise of mass digitization projects over the past decade, which is expected to increase within the cultural heritage and library services domains. Second, this study advances the concept of mass digitization by providing a more nuanced definition for the term. In addition to defining mass digitization as “conversion of materials on an industrial scale” as Coyle (2006, p. 641) suggests, I recommend consideration of six characteristics when determining whether a digitization project is a mass digitization project: 1) Aggregation and Production, 2) Openness, 3) Business Model and Cost, 4) Scope, 5) Format, and 6) Time Spent Digitizing. More research on and comparison of mass digitization projects could validate my proposed definition of mass digitization, or help to refine it.