Main

October 12, 2004

Suggestions

I've already received some interesting suggestions of topics to cover here, including the new transition to full UPC code in the U.S.--13 digits instead of 12--and how a Wiki-biblio-o-pedia might be expanded to include other media, like DVDs and music.

I have to say that I'm pretty excited about the notion of building a Wiki sort of informational site. The Wikipedia itself is a shining example of how to manage such a project, and because facts about books aren't copyrighted, there's very little risk of having people contribute tainted data. In fact, depending on how well the project is structured, some online bookstores might provide a core of data to start with in the project in order to get the benefit of collaborative error fixing.

I know as an author it frustrates me that there's no good way for me to say to the world of bookselling, here's the definitive, authoritative, author-a-tative version of my book's details.

October 06, 2004

What makes me an expert, anyway?

One word: experience.

I'm launching this site in the interests of starting conversations about the way in which book details -- author, title, subject, and even page count -- are collected, sold, disseminated, updated, broken, and misused.

My credentials? I worked for Amazon.com from 1996 to 1997 as catalog manager. In that capacity, I worked with data vendors, and developed or help developed in-house resources as well as Web site tools that would provide the best information about each book we listed. We wanted the list to be exhaustive, but also accurate. I developed an intimate knowledge with several data feeds from book distributors and the Library of Congress in the process.

Part of the outgrowth of my time at Amazon was the realization that the way in which most online bookstores and library catalogs dealt with searching was structured around the ISBN or a library cataloguing number, like the LCCN. Instead of dealing with books as works--that is a discrete idea not an instantiation as an edition--the searches always seemed to pull down every instance. If I search on Wizard of Oz, I'm generally thinking about the work "Wizard of Oz"--I need tools that turn my concept into a set which I can then refine down into the members of.

I used these concepts as part of my consulting for Powell's Books and Half.com after my non-compete with Amazon.com expired, although neither really implemented this part of my idea. I built isbn.nu partly as a programming experiment in 1999, and partly as a forum in which to try out my ideas on this topic.

Years after I left Amazon.com, they introduced a version of this kind of linkage, which has no particular name. Search on Wizard of Oz, and you're still presented with a long list, but the second item has a link that lets you see all 98 titles linked to a single authoritative Wizard of Oz. It's still not great: you can't view through editions. And if you click through to the first result in the list, you see a link that shows eight other editions in a better format. But what about the other 91 results?

ISBN.nu has had a kind of authority that I dub work authority for all of the books I list, which is nearly three million. Library scientists use the term authority to denote how to normalize multiple instances of, say, an author's name into a single definitive version. So if Jimmy Carter, James E. Carter, and James Earl Carter III are really the same person, the authority entry for this person maps those to, say, Jimmy Carter (which is President Carter's preference, I believe).

Authority doesn't mean that the information is correct. Rather, it means that you have authoritatively settled on a single form of a category of information that might be represented in several ways. It's a way to collapse lots of individual information that is fundamentally about the same thing into a single set of information that is mapped to the same thing. This is closer to how people conceive of what they want from a book search than any of the tools I've seen.

ISBN.nu's system isn't perfect. I haven't developed full author or title authority yet, so the same author may be listed in different ways and different authors with the same name are though to have written books they have not. I have worked with a programmer to fully normalize our data, which means to remove a lot of the detritus in information that comes into my system from a licensed source and file down the parts that aren't legitimate differences, like capitalization, extra spaces, and punctuation.

The next step beyond normalization will be full authority development, which I have in process.

In the coming entries, I hope to discuss a number of book information and meta-information problems I have encountered and coped with including The Scottish Problem, The New, Bad Information Problem, The Chunky Problem, and others with names that I hope you find similarly amusing.

I'll start talking about OCLC Research's xISBN project, and how I believe that with a little effort and some coordination, along with the tenets of Feist v. Rural, the Internet community could develop a WikiBook-a-pedia, or a compendium of updatable bibliographic information along with authority and chunky linkages that could be distributed freely.

Comments are welcome. I'm using TypeKey from Movable Type to prevent comment spam and other problems. I apologize for requiring a centralized, email-verified registration, but this seems like the ony solution to ensure that comments aren't turned into a horrible mush as happened on other sites I operate.