Meaningful Metadata for Discoverability and Findability
The following article was first published in Taxonomy Times, a publication of SLA, in November of 2015.
Abby Clobridge, Taxonomy Times -- November 2015
The experience of launching a publicly-accessible repository can quickly bring to mind the Field of Dreams quote, “If you build it, they will come.” There’s an expectation that once a repository is built -- as soon as it is publicly accessible, indexed, and available through search engines -- people will automatically find, access, and download the items from the treasure trove of valuable resources collected and disseminated by the repository. Unfortunately, most repositories and websites aren’t immediately inundated with users, and page views are lower than managers would hope. So what’s the problem? How can we increase website access and usage? The secret, of course, is the metadata.
Metadata tends to be one of the overlooked, under-appreciated, and time-consuming aspects of maintaining a repository. Yet without meaningful metadata, it is harder or impossible for repository contents to be found. In order to maximize the potential impact, uptake, and usage of repository contents, the digital objects stored within must be findable via search and discoverable via browse -- which requires robust metadata.
Within this context, we are referring to repositories in their broadest possible sense -- any sort of system that collects, stores, and disseminates content. Content Management Systems (CMS) such as WordPress, which frequently is used as the platform for public-facing websites, includes a combination of automatically-applied metadata (publication dates for posts) and user-applied metadata (“categories” and “tags” for blog posts and pages). (See Figure 1 for examples of metadata in WordPress.)
SharePoint, a commonly-used intranet/portal system heavily relies on taxonomies and has the potential to incorporate metadata through taxonomy terms throughout an organization’s installation. When used as an enterprise knowledge management (KM) system, an expertise locator system within SharePoint can draw upon the same taxonomy terms used within an organizational knowledge base, thereby strengthening the linkages within the web of organizational knowledge. By enabling users to be able to browse from one subject area to another related one through consistent application of taxonomy terms, repository contents are more easily discoverable.
For Open Access to research -- including open data -- research outputs stored and disseminated via repository systems such as DSpace, EPrints, and Dataverse need to be findable via search engines such as Bing, Google, and Google Scholar. Considering the massive -- and growing -- amount of content published on the internet, it is imperative that research outputs are well-described in order to make it possible for someone to find the right needle in the haystack.
Fortunately, metadata has been baked into all of these systems in various ways. But the real trick is using metadata fields, categories, and tags effectively. Following are 8 tips for using meaningful metadata to encourage findability and discoverability of repository contents -- whether the repository is a public-facing website running on a CMS such as WordPress or Drupal, a restricted-access intranet or KM system such as SharePoint, or an open knowledge repository such as DSpace, EPrints, Invenio, or Dataverse.