It’s been a long three months of research, but it’s also been really informative. I’ve gone from learning the foundations of metadata and linked data to truly understanding the integral role that metadata plays in describing and understanding the world around us. I have learned that metadata needs to be able to be flexible and willing to change with time, circumstances, and understanding of others, particularly if it wants to remain relevant over time. Researchers often have implicit, unintentional biases that affect how they conduct research, especially if all that is available to them is the vocabularies, algorithms, search engines, and other tools that have returned biased results in the past. However, if tools like the one that is being proposed can be developed and implemented in metadata, search, and other areas, perhaps the issues of bias in metadata and searches can be slowly but surely combated and improved.
We’ve seen a general overview of why and how biased metadata impacts marginalized communities. However, each of these communities has been deeply impacted by the issues created by biased metadata, and should be explored a bit more to understand why inclusive and adaptive metadata is important. Since I covered the disabled community somewhat in a previous post, I’ll focus on the others here.
For the LGBTQ community, the idea of terminology is a fraught subject. From slurs and hate speech to eventual reclamation of terms, this community has had its share of struggles regarding terms and classification. The shift from terms like “homosexual,” “f****t,” and “queer” to terms like “gay,” a reclaimed version of “queer,” and others illustrates the steps forward that the community and society has taken in its understanding of this community. However, not everyone has progressed at the same rate, both in larger society as well as in certain academic circles and research environments. If researchers are not knowledgeable as to the most recent acceptable terms and research methods, they may use a term in a way that is offensive, outdated, or even triggering to their audience.
Black/African American communities and indigenous populations have both been treated less than admirably by conventional terminology and metadata systems. I’m grouping them together here mostly to make this post a bit less lengthy, but they are distinct communities that really should be considered separately. Both of these communities have been affected by racist, oppressive histories and events that have strongly impacted the way that these communities are thought about, and how documentation about them has been categorized. Too often, the histories of these groups are lumped under categories like “colonialism,” diminishing or erasing the unique cultures and experiences of these peoples and instead allowing their stories to be dominated by the white majority.
Women are sometimes an afterthought when it comes to marginalized communities and how they have been classified. However, they are an important group to consider in matters of categorization, especially as their metadata has evolved from conceptions of them as wives and homemakers to being women who hold careers, offices, and achievements. Though many attribute the shift in women’s roles and their struggle for independence to the rise of feminism, the way that women are thought about and classified has in fact been slowly shifting and evolving for many years.
Without improved and inclusive metadata, the struggles and achievements of these communities are overlooked as their historical roots and oppression are the headings that they are categorized under. While the work of current metadata and library professionals has been helping to improve the metadata and classification of these groups, it has a ways to go before it completely overcomes the shortcomings of the past.
So, you’ve seen all of the steps of this information retrieval tool in its various parts. What does it look like as one cohesive tool? Let’s take a look at the steps of using the tool. Sorry for the less-than-great screenshots!
Step 1: IUCAT is configured to allow a user to browse the catalog using one of the following community-created vocabularies: Homosaurus, Brian Deer Indigenous People’s Classification System, or the Disabled People’s Association. Once a vocabulary is selected, the vocabulary opens up to all of the terms that have SKOS links to LCSH.
Step 2: Once a vocabulary and a term in that vocabulary are selected, the term maps to other terms in the vocabulary (in this case Homosaurus version 1), and those terms link via established SKOS relationships to a term in LCSH.
Step 3: Using the terms and relationships from both Homosaurus (or another vocabulary) and LCSH, a user can then insert a query into a system such as IUCAT using a term or terms that incorporates features/portions from both the community vocabulary and LCSH, returning results that most closely match those terms.
And that’s it! The idea is that, if the vocabulary is narrowed from the beginning based on what a user is searching for, and the links to LCSH are established, the research process becomes more cohesive and the user can be more self-sufficient as they conduct their research.
We’ve clearly established a need for a tool that helps to mitigate instances of bias in metadata, but what are the questions that need to be posed going forward in the development of this project, and what are the implications if it is successful?
Realistically, this research and proposal is only the first step in helping this project come to fruition. Now that a proposal and models have been created, the proposal will put through further development steps. It will be taken to a faculty working group, as well as presented in a Brown Bag presentation. The feedback from those groups will help to develop and polish the tool, as well as help to answer some of the questions that need to be asked going forward such as: Should the entire vocabulary be shown, or only the terms that have SKOS links to LCSH? Also, how should the project proceed if a vocabulary is available but not in linked data form? Finally, is an online library catalog like IUCAT the best source for experimenting with the proof-of-concept information retrieval aid or would multiple sources provide more comprehensive results?
The development of this tool is going to require time, polishing, and likely more research. But, this initial research and drafting of a proposal is
Ok, now on to the third stage of this information retrieval tool. After this, there will be a post that brings all of the parts together to show the entirety of what’s being proposed.
So, after a vocabulary and term have been selected, and the term has been mapped and linked to LCSH, the third step is using the terms and SKOS links to search in a local system, such as IUCAT, and return sources that are more accurate and relevant. What does this look like, and what does it accomplish? Once the user’s selected term is found in LCSH, the terms from the search bar in LCSH can be plugged into a search in IUCAT. This should return results that are most closely related to both the vocabulary term and the LCSH term, since the community-created vocabulary and its related terms have contributed to the term and connections in LCSH.
Obviously, seeing this process just written out without examples can be confusing and hard to visualize. But, have no fear! As I said earlier, one of the upcoming posts will have the entire process of using the tool, plus some visualizations so that the entire process makes more sense. So, stay tuned for that!
As was mentioned before, a lot of the research is focusing on finding controlled vocabularies and metadata or classification schemes. However, research and sources on the changing nature of how description and classification of these communities has changed over time, and the circumstances that have motivated that change, are also important to the success of developing this information retrieval tool. So, what has been found so far and how does it add to the project? Well, the short answer is it’s a lot of resources, and it’s honestly a bit overwhelming. When you look at it all more closely though, the sources span information on the history and nature of classification, the often inherently problematic nature of classification, the changes in terminology and the evolution of the classification of each of these communities, and more. Sometimes these sources also recommend vocabularies or classifications that can be pulled out for use in the project. However, even if a source only provides general information and background, it is invaluable to not only establishing a need for a project such as this one but also to helping us better understand the communities that are part of this project, where the issues in their classification exist, and how to frame and shape this tool in a way that will help it most effectively meet its goals for all of the communities and researchers it is meant to help.
So it’s halfway through the summer semester, and it’s also the beginning of the next stage of my part in this project. Although we’ve found a variety of very useful and thorough vocabularies, only one–Homosaurus version 1– will be used for the project proposal. What’s the next step, exactly? Well, that’s to take the terms from Homosaurus that relate to our topic, search for them in the Library of Congress Subject Headings, and determine links via SKOS relationships between the vocabulary and LCSH. This sounds confusing, but here’s what it means and why it’s important. Linking the terms from Homosaurus to LCSH will be important for the next stage of this tool, which will be explained in a later post. By comparing the related terms, broader terms, and close matches between the Homosaurus term and the term in LCSH, we can establish SKOS relationships that will help link the community vocabulary to an authoritative source such as LCSH, and having terms from both sources will help to determine the best terms to use to search a system such as IUCAT. I know that all of this is a bit confusing, and I promise that a later post will make sense of all of this. Thanks for bearing with me!
So, I thought that it was important to briefly address my own perspective and unintentional biases and how they may impact or have been impacted by this research process. I will be the first to admit that my knowledge of existing metadata standards, metadata creation, and the biases of metadata are a continuing learning process, and I am by no means an expert. However, I have seen firsthand in my own research the way that biases in metadata and terminology can impact metadata creation, search results, the accuracy of certain resources, and ultimately the final research product that users put forth. In particular, my perspective and experiences as a disabled person have illustrated how, for mine and other marginalized communities, outdated and biased terminology, harmful assumptions, and resistance to change can all impact not only the way that people research and the results that are given precedence, but can also have real-world implications. People still think that words like “crippled” and “handicapped” are acceptable terms, and rather than asking people in the community how we prefer to be referenced to, they instead make assumptions as to what the community needs or wants. As long as these majority-centric perspectives and processes are allowed to dominate the research environment, they also dominate the public discourse and the way that society perceives disabled people and other marginalized communities. Utilizing community-created vocabularies allows individuals to have a voice and agency in the larger research environment.
As for how my perspective may impact this research and project, I hope that it can serve as both a fresh set of eyes to the research process, and allow me to help locate reliable sources that best represent the thoughts, preferences, and needs of all of the communities concerned. I hope that my somewhat novice level of metadata knowledge will not be a hindrance to this project, and will rather let the project and my role serve as a learning experience and as a deeper introduction into the world of the metadata of marginalized communities.
As I mentioned in an earlier post, a large part of this project is finding controlled vocabularies and metadata schemas/standards that are created by or accepted by marginalized communities. While this may sound like a simple task given the many organizations that have emerged to aid and represent minority groups, often those groups do not include the input of people from these communities, which perpetuates the same outdated terms and assumptions that have plagued these communities over the years. By locating and finding ways to implement community-created vocabularies, terms that have been deemed acceptable by members of these communities–rather than chosen by those in the majority–are the ones that can be implemented to create more accurate and inclusive metadata, even if the process is ongoing and may proceed slowly.
The following are controlled vocabularies and classification schemas that the project research has determined best represent marginalized communities: The Homosaurus LGBTQ Vocabulary, the Disabled People’s Association of Singapore Vocabulary, the Brian Deer Indigenous Peoples Classification System, and the North Carolina Council on Developmental Disabilities Glossary. While these are likely not the only resources that contain vocabularies, these were all found to have a certain level of community input into their creation, and a few have been cited by other sources as being a reliable source for community-determined terminology and classification.
Locating and knowing how to use vocabularies and classifications such as these is crucial to understanding where the issues and gaps in metadata occur, and how to improve it. By making these resources more visible and bringing them into the mainstream research process, the voices of these marginalized communities can be prioritized over the voices of the majority and majority institutions, which can potentially shape both metadata creation and the research process as a whole.
Metadata makes up the very foundation of most library classification and knowledge organization standards. However, like all things that are created by individuals, metadata often takes on the biases and exclusions– both intentional and unintentional– of the time and of its creators. The existing literature on recognizing and combating bias in metadata is ever-expanding. As societal understanding and acceptance of minority and marginalized communities steadily improves, the way in which these groups are categorized by knowledge information systems and metadata has attempted to slowly evolve as well. From Library of Congress subject headings, to community-driven vocabularies and standards, researchers have worked to understand where the gaps in metadata exist, how they can be addressed and remedied, and the implications for all researchers whether or not they are part of a specific community. With all of the metadata that exists for marginalized communities as well as for society in general, one would think that it would be fairly simple to establish consistent, useful metadata standards for these communities. However, persistent implicit biases and the continued use of outdated or even offensive terms for the sake of “historical accuracy” are detrimental to research done by and about these marginalized communities. What, then, can we as librarians do to help combat these biases in metadata and create a more inclusive literature and research atmosphere?
Although there are gaps and issues in metadata for most marginalized communities, one of the most prevalent gaps that I have observed is in metadata for the disabled community. This is due largely to the fact that the terms “disabled” and “disability” can be so broadly defined. While broad terminology is useful in that it can encompass a variety of physical, mental, and other forms of disability, it can become cumbersome and problematic when it allows outdated and even offensive terminology and ideas about disabled people to persist. The persistence of terms such as “crippled” and “handicapped” in search results and other headings does not reflect the advances that have been made in regard to term reclamation by the community, or the ways that researchers and other individuals have a better understanding of the medical and social implications of certain terms and perspectives.
Something to consider is whether the inclusivity and usability of metadata is improved when members of marginalized communities are involved in the creation of metadata and determination of standards. Evaluation of the current metadata landscape and the push to make it less biased and more community-driven is an intricate and ongoing process. Although there is an increasing amount of research and literature on metadata diversity, there seems to be a lack of initiatives or projects to actually make less-biased metadata a reality. My hope is that this project will expose me to more of these metadata efforts, as well as what we as researchers can do to further the process.
Adler, Melissa, et al. “Stigmatizing Disability: Library Classifications and the Marking and Marginalization of Books about People with Disabilities.” University of Chicago Press Journals, 17 May 2019,
Bowker, Geoffrey C., and Susan Leigh Star. Sorting Things Out. Classification and Its Consequences. MIT Press, 1999.
“Glossary of Disability Terms.” NCCDD, 2016,
Johnson, Matt. “Gay, Lesbian, Bisexual, and Transgender Subject Access: History and Current Practice.” Academia.edu, 2007,
Koford, Amelia. “How Disability Studies Scholars Interact with Subject Headings.” Cataloging & Classification Quarterly, 2014,