Thursday, May 3, 2007

Future of MARC -- Dr. Bill Moen

Future of MARC: the Challenges and Opportunities of 21st Century Cataloging
[Eventually, the link to this presentation will be on the MLA Conference website]

  • "We need to be adding value through our practices....We'd better be saving people time and money."
  • "We need to be meeting the needs of our users.
  • Less focus on the methods. Our methods should be invisible and unobstructive. How can we take our structures and hide them, but not hide the power that they provide?

There's nothing wrong with thinking in terms of market share -- that's our reality. We no longer have a lock on the target market that we used to. We have a limited set of resources that are availble through our catalog, and now users can see how limited they are compared with everything else that's out there.

What do we mean when we say "MARC"?
Record format, as defined by ISO 2709/ANSIZ39.2, & structural elements of the format -- this is going to go away and be replaced by markup languages
Metadata scheme -- defined by MARC 21 and fields, subfields, indicators and their semantics

Approaching MARC's future:

Requirements for a record format/metadata scheme
Goldsmith & Knudson's Requirements:
Granularity -- how fine a detail can you get to
Transparency --
Extensibility --

Roy Tennant's Requirements (slide went by too fast)

McCallum's 10 Format Attributes
  • XML
  • Granularity
  • Versatility
  • Extensibility
  • Modularity
  • Hierarchy support
  • Crosswalks
  • Tools
  • Cooperative Management
  • Pervasive

Functional Requirements for Bibliographic Records (FRBR)
Produce a framework that would provide a clear, precisely stated and commonly shared understanding of what it is that the bib record aims to provide information about and what it is that we expect the record to achieve in terms of answering user needs.
Based on Entity-Relationship modeling (work, expression, manifestation, item / persons, corporate bodies)
An important part of the FRBR report was the focus on users & user tasks: find, identify, select, obtain
If bib records are not supporting user tasks, what's the point?

FRBR is introducing new vocabulary and a new understanding of the items we catalog. We're seeing the implementation of these ideas in new library catalogs.
We currently ask our users to put up with a lot of noise, lots of individual record, rather than letting them actively drill down with limited choices

Responding to recent developments
No AARC3; Resource Description and Access (RDA), which are more guidelines on content creation, and have separation from syntax or record format

Library Systems and Data Formats (wiki) -- grassroots efforts to look at

  • Essential in library applications
  • Variety of metadata schemes
  • Variety of functions and services supported
  • Increasing use of machine-generated metadata - there aren't enough catalogers in the world
  • Role of handcrafted metadata needs continuing review & assessment

These are not threats to the livelihood of catalogers/TS librarians. There's plenty of work around, but they have to change the approach to handcrafted metadata -- where's the value added of that hands-on work.

Looking at empirical data
The cataloging record you create is an artifact that reflects decisions, policies and choices, and can be investigated to see patterns and needs.
Catalogers create metadata that can be very rich (MARC).

There had never been a study before of exactly how catalogs actually construct MARC records and what they actually do. So, Dr. Moen did one.
There is a *lot* of redundancy in the records. The 80/20 rule holds: 4% of fields/subfields accounted for 80% of occurrences, 96% of all fields accounted for 20% occurrences (where occurrence = data in the field)

MCDU Project -- Reports containing results of analysis of utilization, commonly used elements
OCLC gave them the entire WorldCat as of May 2005, so they're working from the whole dataset
82 hours for a script to process and load the records as MySQL and 258 GB = mad data set
Millions of book records
Categories of Questions: General profile of the dataset & actual numbers of occurrences
167 fields used
14 fields accounted for 80% of all occurrences
21 fields accounted for 90% of all occurrences
110 fields occur in less than 1% of all records
"656 a:" occurred in 1 record of out 7.5 million. Why?

They also looked at field/subfield combinations. They keep finding a small core of elements that are commonly used -- are these what catalogers should be focusing on? Is this what the machine-generated cataloging should be focusing on?

Making Sense of Numbers
Not interpreting the value of an individual fields, but looking at patterns and larger recommendations/guidelines. Comparing the data to FRBR user tasks.Is there a common core of elements that are used? Is there a threshold below which things just aren't used so much?

Are library catalogers providing data to support FRBR tasks?
In MCDU dataset, only 59 fields/subfields (13% of total) occur at or above the threshold of use in OCLC book records.

Questions for consideration
  • What is needed in a bib record? Are catalogers working too hard and creating stuff no one uses?
  • Support for four user tasks? In the context of FRBR, what does it mean to support a user task?
  • How can we use metadata for effective management of information resources?
  • How do your systems use the infrequently used data? What about the 62% of all fields used in less than 1% of records?
  • Can we argue persuasively for the cost/benefit for your existing practice?
  • Should the focus be on the high-value, high-impact, high-quality data in a few fields/subfields? Can we identify these? What would it mean to costs of cataloging to focus this way? What would this mean for training new catalogers?
  • Can MCDU results inform your local practices?
  • What metadata scheme will we use? It won't be MARC.
  • [missed one]

Confluence of change -- all of this data and the realities of life now mean that change will happen. It's just a question of when and will we be prepared?

Is the next study a look at which fields users are using? Is this useful to know? What data about how users search is useful? There have been a lot of studies done of how users search in the Search Engine world; how do these translate into the library field?

[Jenn sez: There are a lot of stunned heads in this audience. As a non-cataloger, I'm hoping this hasn't just caused heart attacks throughout the room.]

No comments: