<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>ISBlogN</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/" />
    <link rel="self" type="application/atom+xml" href="http://blog.bookinfo.info/atom.xml" />
   <id>tag:blog.bookinfo.info,2009://13</id>
    <link rel="service.post" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13" title="ISBlogN" />
    <updated>2009-01-23T17:55:53Z</updated>
    <subtitle>information about book information -- yes, you heard me: book information information</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.33</generator>
 
<entry>
    <title>Testing New Ideas</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2009/01/testing_new_ide.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=8567" title="Testing New Ideas" />
    <id>tag:blog.bookinfo.info,2009://13.8567</id>
    
    <published>2009-01-23T17:55:51Z</published>
    <updated>2009-01-23T17:55:53Z</updated>
    
    <summary>A test version of isbn.nu at beta.isbn.nu is up and running. In this new version, the informational backend is still more or less the same, but after all the cosmetic and flow issues are dealt with, I&apos;ll be adding more ways in which information can be reported for correction by users....</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Site Information" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>A test version of isbn.nu at <a href="http://beta.isbn.nu/"><strong>beta.isbn.nu</strong></a> is up and running. In this new version, the informational backend is still more or less the same, but after all the cosmetic and flow issues are dealt with, I'll be adding more ways in which information can be reported for correction by users.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Fixing Authors in Their Place</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2008/09/fixing_authors.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=8435" title="Fixing Authors in Their Place" />
    <id>tag:blog.bookinfo.info,2008://13.8435</id>
    
    <published>2008-09-11T18:22:07Z</published>
    <updated>2008-09-11T18:22:10Z</updated>
    
    <summary>I haven&apos;t written on this blog for some time, due to the exigencies of other demands, but I wanted to note here (in case I have any readers left) that I just updated isbn.nu a few days ago to provide unique author pages. This means that 100 John Smiths can all be uniquely identified with a permanent and distinct code and URL location for their unique set of books. It also means that I could license and add author biographies, link to author Web sites, and allow reader and author submitted biographies or other details. That&apos;s to come. My colleague Jeff built the outline of this months ago, and I finally had the time to integrate all the pieces. Jeff did all the heavy lifting on isbn.nu for the past few years in terms of refactoring old code, turning into a maintainable and scalable beast, and answering my programming questions. He&apos;s helped me become an object-oriented programmer, and to understand how to build code that works. In the last few weeks, I&apos;ve built an API for isbn.nu, an interface that will allow an application programmer I&apos;m working with to create an iPhone/iPod touch application that will work as a front-end...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>I haven't written on this blog for some time, due to the exigencies of other demands, but I wanted to note here (in case I have any readers left) that I just updated isbn.nu a few days ago to provide unique author pages. This means that 100 John Smiths can all be uniquely identified with a permanent and distinct code and URL location for their unique set of books. It also means that I could license and add author biographies, link to author Web sites, and allow reader and author submitted biographies or other details. That's to come.</p>

<p>My colleague Jeff built the outline of this months ago, and I finally had the time to integrate all the pieces. Jeff did all the heavy lifting on isbn.nu for the past few years in terms of refactoring old code, turning into a maintainable and scalable beast, and answering my programming questions. He's helped me become an object-oriented programmer, and to understand how to build code that works.</p>

<p>In the last few weeks, I've built an API for isbn.nu, an interface that will allow an application programmer I'm working with to create an iPhone/iPod touch application that will work as a front-end to isbn.nu. That API sort of necessitated fixing authors into their unique identities and solving a number of other site problems that have been lagging.</p>

<p>I've got a bunch of other features and improvements coming at long last, but this unique author page/identity is one of the ones I've wanted to add for years.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Switching over to ISBN-13</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2007/04/switching_over.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=7527" title="Switching over to ISBN-13" />
    <id>tag:blog.bookinfo.info,2007://13.7527</id>
    
    <published>2007-04-02T03:15:37Z</published>
    <updated>2007-04-02T03:15:05Z</updated>
    
    <summary> Isbn.nu was abruptly dragged in 2007 when several bookstores that I work with to retrieve price information suddenly switched to ISBN-13 as their feed or query format! We&apos;re in the process of rooting out all the 10-digit ISBN references and updating databases, so we&apos;re compliant with ISBN-13. There are no 979 ISBN-13s out there yet (as far as I can tell), so it&apos;s a good time to make the transition. I&apos;ve written up a page that explains to my isbn.nu users precisely what&apos;s going on. It might be too technical/industry for them....</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Normalization" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>
Isbn.nu was abruptly dragged in 2007 when several bookstores that I work with to retrieve price information suddenly switched to ISBN-13 as their feed or query format! We're in the process of rooting out all the 10-digit ISBN references and updating databases, so we're compliant with ISBN-13.
</p><p>
There are no 979 ISBN-13s out there yet (as far as I can tell), so it's a good time to make the transition.
</p><p>
I've written up a <strong><a href="http://isbn.nu/whatsanisbn.html">page</a></strong> that explains to my isbn.nu users precisely what's going on. It might be too technical/industry for them.
</p>]]>
        
    </content>
</entry>
<entry>
    <title>Get Ready for ISBN-13 and 979</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2006/10/get_ready_for_i.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=7035" title="Get Ready for ISBN-13 and 979" />
    <id>tag:blog.bookinfo.info,2006://13.7035</id>
    
    <published>2006-10-09T05:44:41Z</published>
    <updated>2006-10-09T05:31:05Z</updated>
    
    <summary> We&apos;re rapidly approaching January 1, 2007, when two important developments in the history of ISBN occur. First, the 10-digit ISBN (known now as ISBN-10) will no longer be the standard for use. R.R. Bowker, which coordinates ISBNs in the US, says that ISBN-10s can be phased out from use on books starting on January 1. It&apos;s not obligatory to get rid of them, but it will make less and less sense, because of point 2. Second, the use of ISBN-13s that start with 979 will come into being. Up until Dec. 31, 2006, ISBN-10s have been freely convertible into EAN UPCs, or the global barcodes used to identify national origin and product stockkeeping unit (SKU) numbers. ISBN-10 can be converted by taking the digits 978 plus the first nine digits of the ISBN-10 and computing a new base-10 checksum for digit 13. (ISBNs use base 11, where the 10th digit can be zero through nine, or X representing 10 in base 11.) With the introduction of 979s, not all ISBN-13s will be convertible into ISBN-10s; only 978-prepended ISBN-13s have a corresponding ISBN-10. The reason for this is numberspace. They&apos;ve run out of ISBNs in the current 10-digit space, and...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>
We're rapidly approaching January 1, 2007, when two important developments in the history of ISBN occur.
</p><p>
First, the 10-digit ISBN (known now as ISBN-10) will no longer be the standard for use. R.R. Bowker, which coordinates ISBNs in the US, <strong><a href="http://www.isbn.org/standards/home/isbn/transition.asp">says that ISBN-10s can be phased out</a></strong> from use on books starting on January 1. It's not obligatory to get rid of them, but it will make less and less sense, because of point 2.
</p><p>
Second, the use of ISBN-13s that start with 979 will come into being. Up until Dec. 31, 2006, ISBN-10s have been freely convertible into EAN UPCs, or the global barcodes used to identify national origin and product stockkeeping unit (SKU) numbers. ISBN-10 can be converted by taking the digits 978 plus the first nine digits of the ISBN-10 and computing a new base-10 checksum for digit 13. (ISBNs use base 11, where the 10th digit can be zero through nine, or X representing 10 in base 11.)
</p><p>
With the introduction of 979s, not all ISBN-13s will be convertible into ISBN-10s; only 978-prepended ISBN-13s have a corresponding ISBN-10.
</p><p>
The reason for this is numberspace. They've run out of ISBNs in the current 10-digit space, and by adding another EAN prefix, they buy 10 to the 9th power potential new ISBNs. These are assigned in blocks to publishers, based on the publisher's scale and country of operation, and thus the numbers are used inefficiently. Certain ranges are reserved, as well.
</p><p>
I wrote about this at some length and less clarity <strong><a href="http://blog.bookinfo.info/archives/2004/10/lucky_number_13.html">two years ago</a></strong>. The deadline is finally upon us!
</p>]]>
        
    </content>
</entry>
<entry>
    <title>The Chunky Problem</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2005/11/the_chunky_prob.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=6049" title="The Chunky Problem" />
    <id>tag:blog.bookinfo.info,2005://13.6049</id>
    
    <published>2005-11-13T18:05:59Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary> Back in October 2004, I promised to write about three problems of book information: The New, Bad Information Problem (information hygiene over time); The Scottish Problem (normalization errors); and the Chunky Problem. I wrote about the other two back then; here&apos;s my exegesis on chunkiness. The Chunky Problem relates to how people conceptualize information and then how that information is represented in databases and on Web sites, to name two concrete instantiations. Despite decades of analogy between the psychic construct that is the mind and card catalogs, computer filing systems, and so forth, it&apos;s clear from equally long stretches of research that people represent information in chunks that interrelate. Thus, even someone who can remember 1000 discrete ISBNs or phone numbers isn&apos;t representing that information with one neuron per digit. Rather, there&apos;s a holistic standing wave in which information flows as a basis of triggers and relationship that allows us to contain these facts. You wouldn&apos;t know this to look at most Web sites that present information of any kind about books, media, or, really any data that&apos;s available in masses of larger than a few. Web sites typically represent data as stored in discrete, and offer very flat...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>
Back in October 2004, I promised to write about three problems of book information: <strong><a href="http://blog.bookinfo.info/archives/2004/10/new_bad_informa.html">The New, Bad Information Problem</a></strong> (information hygiene over time); <strong><a href="http://blog.bookinfo.info/archives/2004/10/if_it_isnt_scot.html">The Scottish Problem</a></strong> (normalization errors); and the Chunky Problem. I wrote about the other two back then; here's my exegesis on chunkiness.
</p><p>
The Chunky Problem relates to how people conceptualize information and then how that information is represented in databases and on Web sites, to name two concrete instantiations. Despite decades of analogy between the psychic construct that is the mind and card catalogs, computer filing systems, and so forth, it's clear from equally long stretches of research that people represent information in chunks that interrelate. Thus, even someone who can remember 1000 discrete ISBNs or phone numbers isn't representing that information with one neuron per digit. Rather, there's a holistic standing wave in which information flows as a basis of triggers and relationship that allows us to contain these facts.
</p><p>
You wouldn't know this to look at most Web sites that present information of any kind about books, media, or, really any data that's available in masses of larger than a few. Web sites typically represent data as stored in discrete, and offer very flat databases. A flat databases lacks relationships: each record is defined uniquely by one or more fields, and all of the information in the record is fielded. There's always an entry (even null) for each field. A relational database allows richer information by creating virtual objects: a list of attributes in one table might be assigned via another table to an object that resides in a third table which uniquely defines it. 
</p><p>
For instance, a book might be a library edition paperback with an embossed cover. One table stores book attributes paperback, hardcover, turtlebound, etc. Another table stores a list of books by ISBN. The joining table then pairs the unique ISBN with the attributes to create an object.
</p><p>
XML is quite exciting to most people involved in structuring information because it allows the definition of an object-oriented structure and attributes that adhere to it without requiring the rigid formalized storage of a database. In an XML file, you can create objects explicitly, producing human-readable and machine-parseable results that represent data in a simpler and substantially more powerful method.
</p><p>
Now The Chunky Problem is that most media data is in very flat form. In fact, when I license data or review data for license for media like books and movies (in DVD and other form), I find that there is often neither no table structure and no normalization. Rather than have a table of actors with attributes (like a biographic sketch) and tie those by reference using a unique identifier to a table of SKUs (individual movie UPC codes, for instance), it's just a long list with repetitive information all organized uniquely by SKU.
</p><p>
Since information wants to be chunked, or so I think, I have spent nearly a decade working on systems--mentally and in actual code--that rework flat information into rich relationships that can be represented chunkily to an end user.
</p><p>
Thus, my long-standing example of The Wizard of Oz. There is a work of fiction called The Wizard of Oz, and it is instantiated as many, many things, which includes dozens or of ISBNs, dozens of books that precede the days of ISBNs and are out of print, collections in which it appears as an item in a larger compendium, essays about it, and parodies.
</p><p>
At the chunkiest level, you have the work, The Wizard of Oz. Move down a chunk and you're into categories of things: books, movies, parodies. Move into the books chunk, because you're thinking "I want to buy the book, Wizard of Oz," and you find the sea of types: paperback, hardcover, large print. Finally, we can delve into a specific SKU or ISBN.
</p><p>
There are ways to short circuit this, too. Imagine the chunky statement, "I want to buy a new copy of the latest edition of the book Wizard of Oz." Zoom down through and the user is on that precise book's offering page with details. Or, "find me a collection in print of The Wizard of Oz and Ozma of Oz." A system that's chunky knows about those two works, and can track those words as uniquely identified items across containers, which are instantiated editions. Thus, the system knows that ISBN Z is a container of unique work Wizard of Oz and unique work Ozma of Oz.
</p><p>
Chunkiness works as a way to share information across objects, too. A review of Wizard of Oz as a work should adhere at the chunkiest level. A review of a particular newly edited edition that might appear as several ISBNs is much less chunky. Chunkiness needs to adhere to objects at various scales.
</p><p>
There's a great little joke about a professor who brings a large glass jar into a class along with a bin of rocks, a bin of pebbles, and a container of sand. He pours rocks into the jar and asks the class if it's full. They agree it is. Then he shakes in pebbles. Is it full now? Yes. Then he pours in sand? Now? Yes. Then (in some versions) a couple of beers. It's apparently a metaphor for life. (The moral in <a href="http://www.google.com/search?q=sand+rocks+jar+professor&amp;sourceid=mozilla-search&amp;start=0&amp;start=0&amp;ie=utf-8&amp;oe=utf-8&amp;client=firefox-a&amp;rls=org.mozilla:en-US:official">many tellings</a>: there's always room for a couple of beers.)
</p><p>
The Chunky Problem looks at this as the reverse: it used to be when you went to a bookstore online, the whole jar was full of sand. Over time, it became full of pebbles. Now it's a rocks, pebbles, and sand. I'd rather view it this way: the jar contains rocks. The rocks, when broken down, turn into pebbles. The pebbles, when pulverized, are sand. But unlike physical objects you can go from a jar to a grain of sand in one step, and the process is reversible at any time.
</p><p>
Most product sites would benefit from getting chunky.
</p>]]>
        
    </content>
</entry>
<entry>
    <title>The Problem of Persistence</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2005/11/the_problem_of.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=6047" title="The Problem of Persistence" />
    <id>tag:blog.bookinfo.info,2005://13.6047</id>
    
    <published>2005-11-12T21:37:26Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary> I&apos;ve left this blog alone for far too long, so I come back with The Problem of Persistence, which is another elements in my series of problems relating to structuring data, which include The New, Bad Information Problem and The Scottish Problem. The problem with persistence relates to creating objects that represent a particular set of ideas that have a persistence as that object over time. Good example: The K&amp;#246;chel or K Numbers assigned to Mozart&apos;s works. Mozart himself didn&apos;t number or catalog his own oeuvre, but a 19th century intellectual did. The works are numbered by appearance and the numbers have been used widely to refer to specific works. The numbers don&apos;t change, even if additional works are discovered between existing numbers, to avoid destroying decades of research and publishing. The numbers are persistent object identifiers to which many attributes are attached. In Bookland, our happy 978 and 979 UPC code prefix world, there&apos;s no such animal. I am finding there is no such animal among many realms of information in which there are discrete entities that have members but not a persistent and consistent method of numbering them. In developing a new DVD and video price comparison...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Normalization" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>
I've left this blog alone for far too long, so I come back with <strong>The Problem of Persistence</strong>, which is another elements in my series of problems relating to structuring data, which include <strong>The New, Bad Information Problem</strong> and <strong>The Scottish Problem</strong>.
</p><p>
The problem with persistence relates to creating objects that represent a particular set of ideas that have a persistence as that object over time. Good example: The K&#246;chel or K Numbers assigned to Mozart's works. Mozart himself didn't number or catalog his own oeuvre, but a 19th century intellectual did. The works are numbered by appearance and the numbers have been used widely to refer to specific works. The numbers don't change, even if additional works are discovered between existing numbers, to avoid destroying decades of research and publishing. The numbers are persistent object identifiers to which many attributes are attached.
</p><p>
In Bookland, our happy 978 and 979 UPC code prefix world, there's no such animal. I am finding there is no such animal among many realms of information in which there are discrete entities that have members but not a persistent and consistent method of numbering them.
</p><p>
In developing a new DVD and video price comparison service that I hope to launch later this year or in early 2006, I worked with a programmer to develop persistence in the kind of unique work objects that I create for isbn.nu every time the database is regenerated.
</p><p>
The idea is simple, when we finally figured it out: You have to build a structure in which the information present in the structure is authoritative when a record exists. New information is always considered less authoritative for specific values in an object, and reconciliation can happen or a queue of details to reconcile can be built based on the manner in which a particular new detail is out of sync with existing records.
</p><p>
Practically speaking, this means you prime the pump of a database on, say, movies with a good authoritative and deep source and then choose specific values that are unique to become authoritative, such as a director's name (even the spelling) or a movie production's title, like "Star Wars: Episode I." When new information arrives, it's checked by SKU against existing information. If the authoritative details vary they are either ignored or stuck into a queue for review--it's possible that the SKU had the wrong title, for instance. In either case, non-authoritative details, such as the movie's length or the DVD region encoding, are updated to whatever the new information provides.
</p><p>
This isn't the full depth of hygiene and reconciliation I'd like, but it means that once an object is created and numbered for a unique movie production such as "Pride and Prejudice" as produced in 2005, that object can retain the same ID number or other code permanently. All other details will accrue to or revolve around that initial unique movie production information.
</p><p>
I'll write more about this as the system rolls out, as it will for both the new movie site and for my existing isbn.nu book site.
</p><p>
Remarkable coincidence: This David Weinberger piece about the issues of metadata and book identification and sub-book (chapters, etc.) identification <strong><a href="http://www.boston.com/ae/books/articles/2005/11/13/crunching_the_metadata/">appears today</a></strong> in the Boston Globe.
</p>]]>
        
    </content>
</entry>
<entry>
    <title>Why Certain Books Are Popular Searches</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2005/02/why_certain_boo.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4767" title="Why Certain Books Are Popular Searches" />
    <id>tag:blog.bookinfo.info,2005://13.4767</id>
    
    <published>2005-02-02T03:48:59Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary> I started seeing lions everywhere. Okay, not everywhere. They were just among the topics of the top book searches at isbn.nu. I have a page that shows me and anyone else who cares the most recent 10 searches and (since I added this a few days ago) the top 10 searches since the database for popularity was reset. This has helped me understand how the Internet works a little better, as I can explain the source of the popularity of some of these links. How to Have Sex in the Woods. Many people search Yahoo asking &quot;how to have sex&quot;--there&apos;s nothing about the woods in there at all. The book price page on my site is the #11 answer to this question. 101 Ways to Promote Your Web Site. One spammer trick is to flood your site with referrals. They hope that you either review or publish your statistics, thus increasing their Google Whuffie. This book was highly &quot;promoted&quot; by get-rich-quick sites through referral spam. It doesn&apos;t have anything to do with the quality or nature of the book; I suspect it&apos;s just a random link choice. Dollar Bill Origami. Sounds like a cool book, but folks arrive here...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>
I started seeing lions everywhere.
</p><p>
Okay, not everywhere. They were just among the topics of the top book searches at isbn.nu. I have a <a href="http://isbn.nu/recent.html">page</a> that shows me and anyone else who cares the most recent 10 searches and (since I added this a few days ago) the top 10 searches since the database for popularity was reset. 
</p><p>
This has helped me understand how the Internet works a little better, as I can explain the source of the popularity of some of these links.
</p><p>
<strong><a href="http://isbn.nu/0609804022/price">How to Have Sex in the Woods</a></strong>. Many people search Yahoo asking "<a href="http://search.yahoo.com/search?p=how+to+have+sex&#38;fr=F%0AP-tab-web-t&#38;toggle=1&#38;ei=UTF-8">how to have sex</a>"--there's nothing about the woods in there at all. The book price page on my site is the #11 answer to this question.
</p><p>
<strong><a href="http://isbn.nu/188506845X/price">101 Ways to Promote Your Web Site</a></strong>. One spammer trick is to flood your site with referrals. They hope that you either review or publish your statistics, thus increasing their Google <a href="http://en.wikipedia.org/wiki/Whuffie">Whuffie</a>. This book was highly "promoted" by get-rich-quick sites through referral spam. It doesn't have anything to do with the quality or nature of the book; I suspect it's just a random link choice.
</p><p>
<strong><a href="http://isbn.nu/0486429822/price">Dollar Bill Origami</a></strong>. Sounds like a cool book, but folks arrive here because they're searching on <a href="http://www.google.com/search?hl=en&#38;q=dollar+bill+origami&#38;btnG=Google+Search">dollar bill origami</a> over at Google. I did not know so many people were interested in folding dollar bills.
</p><p>
<strong><a href="http://isbn.nu/0778718964/price">Endangered Tigers</a></strong>. Google <a href="http://www.google.com/search?hl=en&#38;q=%22endangered+tigers%22&#38;btnG=Google+Search">thinks</a> I know something about this topic.
</p><p>
<strong><a href="http://isbn.nu/0131423436/price">Rapid Application Development with Mozilla</a></strong>. A <a href="http://www.ilrt.bris.ac.uk/discovery/rdf/resources/">single page</a> on RDF links to where to find prices for one book. And it provokes dozens of clickthroughs. Must be a popular page.
</p><p>
<strong><a href="http://isbn.nu/0321168909/price">Quarkexpress 6: For Print and Web Design</a></strong>. The program name is misspelled in this iteration of the book title in my database. It should be QuarkXPress. That extra 'e' in the middle means that Google points folks to me whenever they mis-search on Quark's flagship program.
</p><p>
<strong><a href="http://isbn.nu/0736809643/price">Lions: Life in the Pride</a></strong> and <strong><a href="http://isbn.nu/089686328X/price">The African Lion</a></strong>: This one flummoxed me until I found that these two books and links to my site were used as part of an example in "<a href="http://www.w3.org/TR/swbp-classes-as-values/">Representing Classes As Property Values on the Semantic Web</a>." I wrote the author telling her how amusing I thought this random traffic was! She was apologetic, and I said I did not mind: the more people that find my site, the better.
</p><p>
Finally, an old favorite, Susie Bright's book <strong><a href="http://isbn.nu/156025551X/price">Mommy's Little Girl: On Sex, Motherhood, Porn, and Cherry Pie</a></strong>. She's a fantastic writer. But why me? (Or rather, why isbn.nu?) The answer is, unfortunately, horrible. Searches on "little girl porn" lead one to my price service's doorstep. I feel like putting up a custom page for those referrers--not for this book--"Y'oughta be ashamed of yourselves, pervs!"
</p>]]>
        
    </content>
</entry>
<entry>
    <title>I Still Love Meta-Information</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2005/01/i_still_love_me.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4683" title="I Still Love Meta-Information" />
    <id>tag:blog.bookinfo.info,2005://13.4683</id>
    
    <published>2005-01-12T18:42:46Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary> I haven&apos;t posted anything for months to this blog while dealing with piles of writing, research, and baby holding. But I&apos;m still thinking about the bigger issues of book information information. I&apos;m starting to think more about how to build a Wiki-based repository of book details. There are a huge number of problems with such a project, but I do know how to seed the data. The Library of Congress&apos;s Cataloguing Distribution Service resells the vast data that the LOC has compiled on books, which include their holdings and general catalog information. The information is not copyrighted (per se) for use in the U.S. U.S. citizens have certain copyright interests when it comes to information created by the government. NASA, for instance, holds the copyright for images it takes, but releases them for use by U.S. citizens without advance or specific permission or licensing fees. Both the LOC, NASA, and other agencies can charge cost-recovery fees for distributing and maintaining data, which is why the CDS charges many tens of thousands of dollars for a complete set and subscription. This information is available from other sources. Some libraries and other groups have purchased, say, Books All, the full set...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>
I haven't posted anything for months to this blog while dealing with piles of writing, research, and baby holding. But I'm still thinking about the bigger issues of book information information.
</p><p>
I'm starting to think more about how to build a Wiki-based repository of book details. There are a huge number of problems with such a project, but I do know how to seed the data. The Library of Congress's <a href="http://www.loc.gov/cds/">Cataloguing Distribution Service</a> resells the vast data that the LOC has compiled on books, which include their holdings and general catalog information. The information is not copyrighted (per se) for use in the U.S. U.S. citizens have certain copyright interests when it comes to information created by the government. NASA, for instance, holds the copyright for images it takes, but releases them for use by U.S. citizens without advance or specific permission or licensing fees. Both the LOC, NASA, and other agencies can charge cost-recovery fees for distributing and maintaining data, which is why the CDS charges many tens of thousands of dollars for a complete set and subscription.
</p><p>
This information is available from other sources. Some libraries and other groups have purchased, say, Books All, the full set of book data, and could re-distribute it for free to a Wiki-bibliographic project. (Wikibib? Wikilibris?)
</p><p>
If the project were set up correctly, there would be an API that would allow publishers and other parties to import massive amounts of data in the right format into the project, too. So there would be many inputs that would have be monitored and contended with.
</p><p>
As I've written previously, corrections are a particularly troubling problem. Authoritativeness has to be an aspect of correcting information, but who decides who is authoritative? That's a problem that the Wikipedia contends with (some say) or thrives upon (others say).
</p>]]>
        
    </content>
</entry>
<entry>
    <title>Lucky Number 13</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2004/10/lucky_number_13.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4387" title="Lucky Number 13" />
    <id>tag:blog.bookinfo.info,2004://13.4387</id>
    
    <published>2004-10-25T19:49:57Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary>The Millions (A Blog About Books) writes about the 13-digit ISBN transition. By 2007, the 10-digit ISBN (nine real information digits and one checksum digit) will be replaced by a 13-digit ISBN (3-digit prefix, 9 info digits, 1 checksum digit), which will be identical to the current 13-digit BookLand EAN but starting with 979 instead of 978. C. Max Magee explains that there will be some real transition issues for smaller booksellers who have to convert legacy systems. On the other hand, I&apos;ve been told by a lot of the bookseller aggregator sites--ones like Alibris and ABEBooks who entirely or largely list books from independent booksellers--that most of the stores use one of a few inventory management packages. (Hey, so all the online booksellers have to move to using EANs primarily, too; some allow them now.) The 978 prefix, which identifies a mythical country in the EAN worldspace called BookLand, allowed ISBNs to be freely converted into internationally compatible EAN systems. More confusingly, the US has been using a 12-digit UPC code in retailing which is transitioning to the full 13 digits for better globalization. (Mark of the Beast theorists, start your biblical engines!) It&apos;s an expansion of the namespace...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>The Millions (A Blog About Books) <a href="http://www.realisticrecords.net/themillions/2004/10/crazy-insane-isbn-newsthis-will.html">writes about the 13-digit ISBN transition</a>. By 2007, the 10-digit ISBN (nine real information digits and one checksum digit) will be replaced by a 13-digit ISBN (3-digit prefix, 9 info digits, 1 checksum digit), which will be identical to the current 13-digit BookLand EAN but starting with 979 instead of 978.</p>

<p>C. Max Magee explains that there will be some real transition issues for smaller booksellers who have to convert legacy systems. On the other hand, I've been told by a lot of the bookseller aggregator sites--ones like Alibris and ABEBooks who entirely or largely list books from independent booksellers--that most of the stores use one of a few inventory management packages. (Hey, so all the online booksellers have to move to using EANs primarily, too; some allow them now.)</p>

<p>The 978 prefix, which identifies a mythical country in the EAN worldspace called BookLand, allowed ISBNs to be freely converted into internationally compatible EAN systems. More confusingly, the US has been using a 12-digit UPC code in retailing which is transitioning to the full 13 digits for better globalization.</p>

<p>(Mark of the Beast theorists, start your biblical engines!)</p>

<p>It's an expansion of the namespace for ISBNs, too, because all existing ISBNs will be honored in the 978-prefix namespace forever. Publishers with existing ISBN inventories can continue to use them by converting them into the 978 system. New ISBNs will be assigned as a complete EAN number starting with 979 and will not be convertible back to the old 10-digit system. This opens 1,000,000,000 new ISBNs for assignment, just incidentally, since the two namespaces (978 and 979) will be independent.</p>

<p>Here's the U.S. ISBN authority's <a href="http://www.isbn.org/standards/home/isbn/transition.asp">explanation</a>.</p>

<p>Magee's site is worth reading for any inside-baseball bibliophile.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Whither Onix?</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2004/10/whither_onix.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4361" title="Whither Onix?" />
    <id>tag:blog.bookinfo.info,2004://13.4361</id>
    
    <published>2004-10-18T18:35:07Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary>Is anyone using Onix, an attempt to standardize fielded data for book information in an XML schema and DTD? The site appears to be stuck in 2001 except for what one blog reader noted: the 2.1 specification was released in July 2004. So I ask: is anyone using Onix? And Onix doesn&apos;t solve normalization or authority, just data fielding and formatting....</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Normalization" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>Is anyone using <a href="http://www.bisg.org/onix/index.html">Onix</a>, an attempt to standardize fielded data for book information in an XML schema and DTD? The site appears to be stuck in 2001 except for what one blog reader noted: the 2.1 specification was released in July 2004.</p>

<p>So I ask: is anyone using Onix? </p>

<p>And Onix doesn't solve normalization or authority, just data fielding and formatting.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Suggestions</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2004/10/suggestions.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4333" title="Suggestions" />
    <id>tag:blog.bookinfo.info,2004://13.4333</id>
    
    <published>2004-10-12T23:17:56Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary>I&apos;ve already received some interesting suggestions of topics to cover here, including the new transition to full UPC code in the U.S.--13 digits instead of 12--and how a Wiki-biblio-o-pedia might be expanded to include other media, like DVDs and music. I have to say that I&apos;m pretty excited about the notion of building a Wiki sort of informational site. The Wikipedia itself is a shining example of how to manage such a project, and because facts about books aren&apos;t copyrighted, there&apos;s very little risk of having people contribute tainted data. In fact, depending on how well the project is structured, some online bookstores might provide a core of data to start with in the project in order to get the benefit of collaborative error fixing. I know as an author it frustrates me that there&apos;s no good way for me to say to the world of bookselling, here&apos;s the definitive, authoritative, author-a-tative version of my book&apos;s details....</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Site Information" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>I've already received some interesting suggestions of topics to cover here, including the new transition to full UPC code in the U.S.--13 digits instead of 12--and how a Wiki-biblio-o-pedia might be expanded to include other media, like DVDs and music.</p>

<p>I have to say that I'm pretty excited about the notion of building a Wiki sort of informational site. The Wikipedia itself is a shining example of how to manage such a project, and because facts about books aren't copyrighted, there's very little risk of having people contribute tainted data. In fact, depending on how well the project is structured, some online bookstores might provide a core of data to start with in the project in order to get the benefit of collaborative error fixing.</p>

<p>I know as an author it frustrates me that there's no good way for me to say to the world of bookselling, here's the definitive, authoritative, author-a-tative version of my book's details.<br />
</p>]]>
        
    </content>
</entry>
<entry>
    <title>New, Bad Information</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2004/10/new_bad_informa.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4332" title="New, Bad Information" />
    <id>tag:blog.bookinfo.info,2004://13.4332</id>
    
    <published>2004-10-12T23:15:11Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary>There&apos;s a very large category of informational problem that I&apos;m sure the information science people have a good name for, but I call it The New, Bad Information Problem. The problem, which I discussed in an early post a little bit, is how you deal with corrections and new details of all sorts. This is a very common issue with bibliographic details for books in print which are often changed during pre-publication--I have written books that go from 600 to 1,000 pages between signing a contract and delivering electronic files. Errors are bound to occur because most bibliographic information isn&apos;t a neat XML-based hand-off of triple-checked data provided by the publisher to booksellers. No. Heaven forbid. Far too easy. Instead, the vast majority of bibliographic data is entered by many hands in many places. The publisher may provide one form of database, or even a very clear XML dump, but probably in their own schema. As I talked about in my Scottish post just previously, unless you agree to normalization and authority, you&apos;re lost. If Stephen King is listed as &quot;King, Stephen (author)&quot; in the data sent by one publisher and &quot;Stephen E. King&quot; by another, there has to be...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Normalization" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>There's a very large category of informational problem that I'm sure the information science people have a good name for, but I call it <b>The New, Bad Information Problem</b>. The problem, which I discussed in an early post a little bit, is how you deal with corrections and new details of all sorts. This is a very common issue with bibliographic details for books in print which are often changed during pre-publication--I have written books that go from 600 to 1,000 pages between signing a contract and delivering electronic files. Errors are bound to occur because most bibliographic information isn't a neat XML-based hand-off of triple-checked data provided by the publisher to booksellers.</p>

<p>No.</p>

<p>Heaven forbid. Far too easy.</p>

<p>Instead, the vast majority of bibliographic data is entered by many hands in many places. The publisher may provide one form of database, or even a very clear XML dump, but probably in their own schema. As I talked about in my Scottish post just previously, unless you agree to <strong>normalization</strong> and <strong>authority</strong>, you're lost.</p>

<p>If Stephen King is listed as "King, Stephen (author)" in the data sent by one publisher and "Stephen E. King" by another, there has to be a mechanism by which you re-normalize or map all incoming autonomously normalized data into one standard set. This is the process of authority, generally speaking, but all data suppliers--authors, publishers, independent collectors of book data like Books In Print--must be consulting the same authority, or have a mapping from their normalization to an agreed-on normalization.</p>

<p>Thus, the new, bad information problem. If a name is misspelled and a new piece of data is collected that indicates the name is misspelled, how can a database be reconciled among those two pieces of data? Most booksellers and book information providers, and I speak from broad experience, produce flat outputs in which all new information is supposed to always be better than old information.</p>

<p>This produces the additional effect, obviously, that if you take information from multiple sources and they each produce feeds of corrections and updates, how can you even tell among two sources updated on the same date even at the same minute whether one is now authoritative and one is not?</p>

<p>You can't.</p>

<p>I support a wholesale revision in the method by which book information should be corrected, and am thinking about how to build a system to support my method of thinking. In my world, you have a definitive current snapshot of the most reliable reconciliation of all data--that's your live feed. But you also have a very deep mine of the change history with authoritativeness attached to it.</p>

<p>My proposed database structure would have a table for each field, and each table would have a datestamp, the informational value, and a score representing authority; a comment field would also be necessary for human review when needed.</p>

<p>The score would correspond to values like:<br/><i>Physically examined the book<br />Author provided detail in email<br />Provided in updated XML feed from publisher<br />Manually renormalized<br /></i>and so on.</p>

<p>The tricky part here is established what trumps what. For a page count field, you might have a simple hierarchy in which a physical examination by a trusted party of the actual book trumps everything else except a "physical examination was incorrect" override by a master authority.</p>

<p>It gets more complicated when you look into normalization, as always. Let's say you get data from sources A, B, and C. A tells you in an initial entry for a new book that the author's name is A. Alfred Smith. B tells you it's Alfred A. Smith. C tells you it's A.A. Smith. Don't laugh: I've seen the same title of a book--often with the image on the booksellers' sites--appear six different ways on eight stores.</p>

<p>How does poor Mr. Smith get his name correctly formed? One aspect might be to score the corrections from each source. If A has fewer patches applied to its data by a certain margin than B and C, A trumps. If B provides a new record later, then B's changes to the name might be ignored, unless the name goes from A. Alfred Smith to B. Benjamin Smythe. Notice the level of logic you need to compare normalizations: we'd want to score the differences between components of each kind of fielded data.</p>

<p>This is all to say that <b>new, bad information</b> is a substantial problem, and any method of trying to fix the accuracy, authoritativeness, and normalization of book--or any kind of fielded data--has to involve scoring, comparisons, and history.</p>]]>
        
    </content>
</entry>
<entry>
    <title>If it isn&apos;t Scottish, it&apos;s crap!</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2004/10/if_it_isnt_scot.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4196" title="If it isn't Scottish, it's crap!" />
    <id>tag:blog.bookinfo.info,2004://13.4196</id>
    
    <published>2004-10-07T01:03:09Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary>When I worked at Amazon.com, we had a situation we called the Scottish Problem. I think Rebecca, the catalog programmer, was the coiner, as she was wont to a nice turn of phrase. The Scottish Problem wasn&apos;t to do with Macbeth or tartans, but rather that Ingram Book Company delivered its electronic book information to us in a normalized fashion. Their normalization routines stunk, frankly. One of the normalizations was that whenever the word macintosh appeared anywhere in a book title, the elves or daemons in their clean-up routine changed it to MacIntosh. That&apos;s right: with hundreds of computer books including the computer model Macintosh (no capital letter i) in the title, Ingram was overriding this. The Scottish Problem was a knotty one because of a related issue that I call the New, Bad Information Problem which I will spend some time on this blog discussing. If you fail to tag every field of information you receive from another party or enter with characteristics then when you receive new, bad information, it can easily overwrite your existing revised, good information. Thus you need depth to every field, from page count to title to scholastic level. Otherwise, there&apos;s no way for...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Normalization" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>When I worked at Amazon.com, we had a situation we called <b>the Scottish Problem</b>. I think Rebecca, the catalog programmer, was the coiner, as she was wont to a nice turn of phrase. The Scottish Problem wasn't to do with Macbeth or tartans, but rather that Ingram Book Company delivered its electronic book information to us in a normalized fashion. Their normalization routines stunk, frankly. </p>

<p>One of the normalizations was that whenever the word <b>macintosh</b> appeared anywhere in a book title, the elves or daemons in their clean-up routine changed it to <b>MacIntosh</b>. That's right: with hundreds of computer books including the computer model <b>Macintosh</b> (no capital letter i) in the title, Ingram was overriding this.</p>

<p>The Scottish Problem was a knotty one because of a related issue that I call the <b>New, Bad Information Problem</b> which I will spend some time on this blog discussing. If you fail to tag every field of information you receive from another party or enter with characteristics then when you receive new, bad information, it can easily overwrite your existing revised, good information.</p>

<p>Thus you need depth to every field, from page count to title to scholastic level. Otherwise, there's no way for anyone managing the information or collecting it to understand why a change was made, and whether that change should permanently overwrite any <b>new, bad information</b>. In fact, you can't even evaluate whether new information is bad or not without a history and a process of reconciliation.</p>

<p>I'll write more about this soon, including my ideas on how to use a revision history per field per record along with prioritization.</p>]]>
        
    </content>
</entry>
<entry>
    <title>What makes me an expert, anyway?</title>
    <link rel="alternate" type="text/html" href="http://blog.bookinfo.info/archives/2004/10/what_makes_me_a.html" />
    <link rel="service.edit" type="application/atom+xml" href="https://db.isbn.nu/mt3/mt-atom.cgi/weblog/blog_id=13/entry_id=4195" title="What makes me an expert, anyway?" />
    <id>tag:blog.bookinfo.info,2004://13.4195</id>
    
    <published>2004-10-06T21:48:31Z</published>
    <updated>2005-11-24T04:21:11Z</updated>
    
    <summary>One word: experience. I&apos;m launching this site in the interests of starting conversations about the way in which book details -- author, title, subject, and even page count -- are collected, sold, disseminated, updated, broken, and misused. My credentials? I worked for Amazon.com from 1996 to 1997 as catalog manager. In that capacity, I worked with data vendors, and developed or help developed in-house resources as well as Web site tools that would provide the best information about each book we listed. We wanted the list to be exhaustive, but also accurate. I developed an intimate knowledge with several data feeds from book distributors and the Library of Congress in the process. Part of the outgrowth of my time at Amazon was the realization that the way in which most online bookstores and library catalogs dealt with searching was structured around the ISBN or a library cataloguing number, like the LCCN. Instead of dealing with books as works--that is a discrete idea not an instantiation as an edition--the searches always seemed to pull down every instance. If I search on Wizard of Oz, I&apos;m generally thinking about the work &quot;Wizard of Oz&quot;--I need tools that turn my concept into a...</summary>
    <author>
        <name>Glenn Fleishman</name>
        <uri>http://glennf.com</uri>
    </author>
            <category term="Site Information" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.bookinfo.info/">
        <![CDATA[<p>One word: experience.</p>

<p>I'm launching this site in the interests of starting conversations about the way in which book details -- author, title, subject, and even page count -- are collected, sold, disseminated, updated, broken, and misused.</p>

<p>My credentials? I worked for Amazon.com from 1996 to 1997 as catalog manager. In that capacity, I worked with data vendors, and developed or help developed in-house resources as well as Web site tools that would provide the best information about each book we listed. We wanted the list to be exhaustive, but also accurate. I developed an intimate knowledge with several data feeds from book distributors and the Library of Congress in the process.</p>

<p>Part of the outgrowth of my time at Amazon was the realization that the way in which most online bookstores and library catalogs dealt with searching was structured around the ISBN or a library cataloguing number, like the LCCN. Instead of dealing with books as works--that is a discrete idea not an instantiation as an edition--the searches always seemed to pull down every instance. If I search on Wizard of Oz, I'm generally thinking about the work "Wizard of Oz"--I need tools that turn my concept into a set which I can then refine down into the members of.</p>

<p>I used these concepts as part of my consulting for Powell's Books and Half.com after my non-compete with Amazon.com expired, although neither really implemented this part of my idea. I built <a href="http://isbn.nu">isbn.nu</a> partly as a programming experiment in 1999, and partly as a forum in which to try out my ideas on this topic.</p>

<p>Years after I left Amazon.com, they introduced a version of this kind of linkage, which has no particular name. Search on Wizard of Oz, and you're still presented with a long list, but the second item has a <a href="http://www.amazon.com/exec/obidos/search-handle-url/index=books&#38;field-titleid=29560&#38;ve-field=none/qid=1097095752/sr=12-2/104-9662394-9291921">link</a> that lets you see all 98 titles linked to a single authoritative Wizard of Oz. It's still not great: you can't view through editions. And if you click through to the first result in the list, you see a link that shows <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0812523350/qid=1097095774/sr=1-1/ref=sr_1_1/104-9662394-9291921?v=glance&#38;s=books">eight other editions</a> in a better format. But what about the other 91 results?</p>

<p>ISBN.nu has had a kind of authority that I dub <b>work authority</b> for all of the books I list, which is nearly three million. Library scientists use the term authority to denote how to normalize multiple instances of, say, an author's name into a single definitive version. So if Jimmy Carter, James E. Carter, and James Earl Carter III are really the same person, the authority entry for this person maps those to, say, Jimmy Carter (which is President Carter's preference, I believe).</p>

<p>Authority doesn't mean that the information is correct. Rather, it means that you have authoritatively settled on a single form of a category of information that might be represented in several ways. It's a way to collapse lots of individual information that is fundamentally about the same thing into a single set of information that is mapped to the same thing. This is closer to how people conceive of what they want from a book search than any of the tools I've seen.</p>

<p>ISBN.nu's system isn't perfect. I haven't developed full author or title authority yet, so the same author may be listed in different ways and different authors with the same name are though to have written books they have not. I have worked with a programmer to fully <b>normalize</b> our data, which means to remove a lot of the detritus in information that comes into my system from a licensed source and file down the parts that aren't legitimate differences, like capitalization, extra spaces, and punctuation.</p>

<p>The next step beyond normalization will be full authority development, which I have in process.</p>

<p>In the coming entries, I hope to discuss a number of book information and meta-information problems I have encountered and coped with including <b>The Scottish Problem</b>, <b>The New, Bad Information Problem</b>, <b>The Chunky Problem</b>, and others with names that I hope you find similarly amusing.</p>

<p>I'll start talking about <a href="http://www.oclc.org/research/projects/xisbn/default.htm">OCLC Research's xISBN project</a>, and how I believe that with a little effort and some coordination, along with the tenets of <a href="http://www.fact-index.com/f/fe/feist_v__rural.html">Feist v. Rural</a>, the Internet community could develop a WikiBook-a-pedia, or a compendium of updatable bibliographic information along with authority and chunky linkages that could be distributed freely.</p>

<p>Comments are welcome. I'm using TypeKey from Movable Type to prevent comment spam and other problems. I apologize for requiring a centralized, email-verified registration, but this seems like the ony solution to ensure that comments aren't turned into a horrible mush as happened on other sites I operate.</p>]]>
        
    </content>
</entry>

</feed> 

