SPARC webinar: The Google Books Settlement: What’s Up and What’s Next?

SPARC offered a free webinar on the Google Books settlement today, with attorney Jonathan Band.  Band represented the national library associations in connection with the settlement. He has written extensively on the GBS, producing the widely used Guide to the Perplexed, and is the architect of the oft-consulted “GBS March Madness: Paths forward for the Google Book Settlement” diagram.  (Thanks to SPARC for that wording.)

Here are some of my notes from the webinar…all errors, omissions, and confusion are mine!

  • Basic problem:  Copyright law acts as an obstacle to the mass digitization of books.  About 70% of books in the Library of Congress are in copyright but out of print:  orphan works.  This is what the discussion is about.  There’s no publisher to talk to about clearances for these books.  Very large number of books + complex copyright laws creates hugely expensive problem.
  • In 2004, Google proposed to sign partnerships with large research libraries to scan all their books & display snippets (a couple of sentences around a search term, no more than 3 terms per book) online.  This is still what you see in Google Books unless the book is public domain or in partnership with Google.
  • Also, Google was going to give libraries digital copies of the books that were scanned.  Currently, a lot of those digital copies are held in the Hathi Trust.  (Rather than have each library participating in the Google project have its own copy and figure out what to do with it–the Hathi Trust is the shared repository for those copies.)
  • The legal theory that Google used for this was fair use.  They also offered an opt-out to publishers.

 

man sitting in library stacks reading a book

Image from State Library and Archives of Florida.

 

  • In Oct 2005, the Authors’ Guild and American Association of Publishers filed class action suits against Google.  Argument:  It was not fair use for Google to scan all these books into its database.  Snippet display not as much of a concern as Google scanning all the books and providing copies to the libraries.  (Libraries were never sued.)
  • Settlement discussions started shortly after the suits were filed.  Three years of quiet while discussions progressed.

 

black and white photo of boys reading books in an old library

Image from New York Public Library.

 

  • In Oct 2008 the Google Books Settlement was announced, and it was a doozy.  Its terms went far broader than the initial project, which were just scan and snippet display.  The settlement created two subclasses for the class action suit:  all authors of books with US copyright, and all publishers of books with US copyright.
  • Proposed terms of the settlement for in-copyright, out-of-print (“orphan”) books:  1.)  online preview display (up to 20% of the book, several pages before and after search terms)  2.)  option to buy within the US  3.)  institutional subscription (for an annual fee, authorized users can see the whole book)  4.)  free public access at public libraries–users could access the entire book from a single computer terminal.
  • Because the suit was class action, the settlement had to be approved by a judge.
  • The Dept of Justice raised objections:  1.)  the settlement was much broader than the original proposal;  2.)  the class members were too diverse to constitute a legal class:  academics, foreign rights holders, rights holders of orphan works; 3.)  competition concerns:  display services only available to Google, not to any other competitors.
  • Dept of Justice proposed cutting back on the settlement to scan and snippet display, and all the other features on an opt-in basis only.
  • After negotiation, the parties created an Amended Settlement Agreement, which eliminated foreign books from the agreement.

 

hands holding an open book, with some pages uncut at the top

Image from LSE Library.

 

  • In Feb 2010 a Fairness Hearing was held.  All parties presented views to Judge Denny Chin.
  • Chin deliberated for another year.
  • In March 2011 Judge Chin rejected the settlement.  Main reason:  the settlement was opposed by many class members.  (However, the vast majority did not oppose it actively.)
  • Chin’s concerns included:  that copyright permissions should be an opt-in process (not opt-out), and that class members have diverse interests:  academics may want open access, publishers may not.  Google’s monopoly over orphan works also problematic.  Reader privacy was uncertain under the proposed arrangement.  The settlement’s consistency with international law a concern.
  • Chin’s guidance to go forward was minimal:  suggested a change from opt-out to opt-in for all services except scan and snippet display.

 

white woman with sixties hair studying from notes and a binder

Image from LSE Library.

 

  • Several options for resolution from here…parties come to an agreement, settlement rejection gets appealed, more litigation and new settlement, Google abandons project (unlikely), plaintiffs throw in the towel and allow scan and snippet display without more pushback (unlikely.)
  • Meanwhile, the Hathi Trust has its own set of books digitized by Google and is offering different display services–they will presumably do whatever copyright allows.  (This will be up for debate between Hathi and rights holders.)
  • We still have the protection of fair use, while litigation resumes.
  • Open access is a way forward, which will prevent us from having these same problems in the future.  People will keep writing books and sharing them in new and innovative ways.  Presumably these copyright issues will not apply in the same way to future works, because of changes in publishing technology and licensing.

 

young black woman in sixties clothes studying at a library table

Image from New York Public Library.

 

Question & Answer

Q:  Why did libraries ask for so little from the Google Books deal?  Why not ask for some ongoing financial support from Google as well?

A:  Not sure…at the time, libraries were excited about the project and didn’t have their own rapid digitization plans in place.  Subsequently it may have seemed not such a good deal, not very cost-effective.

Q:  What might a new settlement look like?

A:  From the beginning Google has had an opt-in feature allowing more display access to publishers’ works.  So the question is, in part, why do we need the settlement?  (This is one of the proposed terms of the revised settlement.)  Scan and snippet display seems like the only item still at issue.  Possible that some rights holders want some more compensation for the use.

Q:  How does print-on-demand affect “in/out of copyright”?

A:  Settlement has cutoff date:  only books published before Jan 5, 2009.  But we’re starting to assume that books in the future will always be in print, because of POD.  So these issues may not apply; someone will actively manage the copyright for those items.

Q:  What’s a likely subscription cost for academic libraries to access the Google works database (assuming an institutional subscription is an option in the final settlement.)

A:  Numbers are being floated…Google may be doing some test-marketing to see what the market will bear.  But question may be moot:  seems unlikely that there will be a settlement that will have an institutional subscription feature attached to it.

Q:  Will this go to the Supreme Court?

A:  Only if they go back to the fair use issue.

Q:  Will some of this be solved by litigation, other aspects by legislation?

A:  Scan & snippet display settlement, opt-in for more robust displays, and some kind of orphan works legislation to deal with those who don’t opt in.  But also possible that legislation will be too hard to enact.  Legislation solution could come from EU model, or around 10 years from now.

Q:  Could Hathi Trust offer limited access to works by library patrons, no matter what happens with Google?

A:  Hathi Trust + fair use will be a major aspect of this going forward; institutions may decide individually what access is allowable.  Likely: full-text access within the Hathi Trust institution, more limited access to the public.  If a rights holder steps forward with objections, Hathi Trust may have to negotiate to avoid litigation.

 

 

Advertisements

Robert Darnton weighs in on libraries, books, and the information age.

After my last post, I was encouraged and pleased to see this piece in the Chronicle by Robert Darnton, Harvard’s university librarian.*  Darnton says a lot of things that I think are absolutely true for many libraries these days.  We’re busy, our chairs are full, we’re more involved in our institutions and communities than ever.  Books aren’t going away anytime soon, though the digital world is on the rise.  The two can coexist.  A very small percentage of what’s published and used for research is available online. Librarians help people find what is there, just as we’ve always helped people find what’s in print.  (We still do that too.)

Or as Darnton puts it (more eloquently than I just did):

A more nuanced view would reject the common notion that old books and e-books occupy opposite and antagonistic extremes on a technological spectrum. Old books and e-books should be thought of as allies, not enemies.

Bottom line:  if a library is getting cut, it’s not because its fundamental services aren’t needed anymore.  The world of information didn’t start suddenly researching and interpreting itself, and delivering itself to untrained searchers in a comprehensible format.  People didn’t wake up knowing how to find census data and copyright-free images.  The need for a space to study, a networked computer, a self-help book, an answer to a question, a kids’ summer reading program–none of that went away.  And chances are very good that nobody else is supplying those needs.

Library for the blind, New York Public Library

*  Thanks to Sarah McDaniel, most awesome Coordinator of Library & Information Literacy Instruction at University of Wisconsin, Madison for sharing this with me.  On Facebook, of course.

HarperCollins does pretty much what we expected.

Claire Dannenbaum recently sent me this link to a summary of what’s going down with e-books at HarperCollins:  essentially, they’ve placed a 26-loan limit on their ebooks.  After a library lends an HC ebook 26 times, they have to pay for it again–at a reduced cost, but still.  They pay for the title they’ve already paid for.

HC helpfully estimates that 26 loans should provide about a year’s worth of use for a given title, and they note that they aim to charge less for ebooks than paper books.  But there are a lot of problems with this model, from my point of view.  (And from the point of view of lots of other librarians.)  As I wrote to Claire:

This further undermines the rights we’ve come to expect from owning content—the rights of first purchase.  Instead of owning e-books, libraries will effectively be leasing them, without ever having an option to buy (as I understand it.)  I find it hard to believe that 26 circs of a print copy would put it out of circulation at a public library—I suspect that print lasts a lot longer than a year and 26 readers.  And none of this even touches on the rights to share, photocopy portions, etc.  All gone with the doctrine of first purchase, replaced by perpetual recharge for “access” to content.

And I suspect there’s an additional hidden cost to libraries in this new model—tracking on whether the access we’re supposed to have is really there, contacting publishers to restore it if it isn’t, managing our own records and explaining this all to our users.

I should say, I’m not clear on whether the 26-loan renewal policy repeats, or is one-time only:  if you have a hit book that circulates hundreds of times, will you have to pay for it five, six, or ten times over?

The HC boycott is an interesting response, but it makes me wonder whether there’s any other appropriate response for this.  HC has a legal right to set the terms of their sales, I guess–and customers have a legal right to boycott.  But I wonder if the brave new world of access vs. ownership opens up legal issues over what constitutes a “sale,” and whether  companies writing their own licensing terms carte blanche (at a disadvantage to their customers) merits further legal discussion.

Online Northwest on the horizon

online northwest 2011 logo

I’m on the conference planning committee for Online Northwest, so I’ll definitely be attending–and I encourage everyone reading this within a 400-mile radius of Corvallis, OR to attend too!  The program is looking great, with sessions on e-books, presenting with Prezi, search engine optimization, cloud computing, QR codes, and more.

I’ll try to blog some of these sessions, and we’re asking some of our volunteers to blog and Tweet as well.  If you can’t make it there, follow us from afar!

How far can you get…digitally?

This entry was first posted at Re: Generations.

I was recently asked by a senior administrator at my university why the library still buys books.  Everything’s going digital, he told me.  Why don’t we just get some iPads and stop buying books that we have to process, store, circulate, mend, and replace?  Books cost a lot of money.  If we became an all-digital library, surely we’d save money.

This was a very quick conversation, so I answered him as briefly as I could:  yes, some disciplines have adopted digital formats.  Others, like the architecture department that I work with closely, have not.  We buy electronic materials for the disciplines that use them, and print for disciplines that prefer print.  We aim to please.

But the question stayed with me, and for the last few days it’s been collecting lint in my brain.  I’m particularly stuck on it, I think, because this was a senior administrator, someone with decision-making power (not directly over the Library, but still a decision-maker at the institution.)  I think it’s been a wake-up call to me that just because I think I have a fairly balanced understanding of where we stand, digital-content-wise, that doesn’t mean that others do.  And it’s also forced me to re-examine some of what I thought I understood about digital content and new media.

Our print circulation stats, like the print circulation stats of all ARL libraries, are declining.  I think the last stat I heard was that print circulation accounts for eight percent of our overall circulation.  (Don’t quote me on that, but I think it’s right.)  That figure startled me.  Eight percent?  The millions of volumes of books, documents, bound journals, all those years and years of buying and building a research-level collection–and it’s less than ten percent of our overall circulation?  And dropping?  Wow.

Well, surely that means that users are getting access to those books, documents, and bound journals electronically, right?  They’re still using all the same materials, they’re just shifting to online access.  Except…I know for a fact that most of the scholarly books we buy today aren’t available online.  We do occasionally buy access to an e-book version of a print book, but since that costs more than buying one format alone, usually we only buy one or the other.  So if a book is on the shelf, chances are very good we don’t have it electronically.  Not to mention, many of the books we buy just aren’t offered as e-books.  For all that we talk about going digital, the scholarly book market is still (in many disciplines) very print-based.

woman reading in book stacks
Image courtesy
LSE Library, via Flickr Commons

But hey, government documents are largely e-only now…at least the current ones.  (Which raises all sorts of issues about archiving, discovery, and access, by the way.)  But the older documents, the ones that fill the shelves in our library, are often only print.  It’s rare for a government agency to spend money on digitizing its print archive, because there’s always somewhere else that money could go–into servers hosting new documents, for instance.

Some of our bound journals are definitely available digitally, thanks to Project Muse, JSTOR, and other digital publishers and archives.  In fact, many libraries see old issues of JSTOR and Project Muse journals as the low-hanging fruit that can be moved to offsite repositories when the library’s shelves fill up.  But other journal back runs aren’t available online, as anyone who’s ever done a research project knows.  And we still get plenty of current subscriptions in print, either because there’s no online version available, or because publishers bundle online + digital as a package price (with penalties for canceling one or the other), or because we’ve made consortial agreements to keep a particular title in exchange for another library keeping a different one, or because our users see value in the print version and don’t want us to cancel it.

Other people have done actual studies on electronic access versus print access–who’s using what, how often, and sometimes even why.  I can’t claim to have done research on this, but based on everything above, I think it’s fair to say that declining print stats can’t mean that users are just migrating to digital access of the same materials.  Some of that material–a lot of it–isn’t there for them to access in the first place.  We do know that new kinds of digital resources are getting lots of use–our digital image collections, institutional repositories, and special collections & archives projects are seeing increased use as other types of use are decreasing.  But does any of this prove anything about the viability of a digital-only library?

I don’t know, but here’s a thought experiment:  pick a research project, and see how far you can go with it if you use only materials that are available to you digitally.  That means no books off the shelf, no old newspaper articles, no browsing images in the current journals.  I’m still playing with this, but it’s fascinating to see how different kinds of projects would do (even hypothetically) in an e-only library.

Here’s an example:  a research project on the experiences of first-generation (issei) Japanese immigrants to Washington state at the turn of the 19th century.  Online, I can get some great images from Washington and California digital repositories.  I can get a few relevant Wikipedia articles (with citations pointing me to print sources.)  I can get some full-text articles from our online databases. I can get historical New York Times articles from ProQuest.

I can’t get any books off our shelves, even though we have plenty.  I can’t get any of the articles I find in America: History and Life that are only available in print.  At my own library (YMMV) I can’t get historical newspaper articles from any West Coast newspapers.  I can’t get any of the photos, personal narratives, maps, drawings, or references that are in those sources.

Basically, I estimate that I can get about a quarter to a third of a decent starting picture on a project like this.  I’m left with gaping holes in my research, and a lot of ILL requests to make.  And since ILL requests cost money, the institution is now paying money to borrow items that it could otherwise pay to own.

I’m not nostalgic about the days when libraries had nothing but print indexes and card catalogs–I’m old enough to remember them, and they weren’t fast or easy to use.  I’m strongly in favor of making our cultural heritage (and currency) accessible to all, as quickly and transparently as possible.  But even though we’re entering the Age of the Digital, we’re not fully there yet.  We still need print, and I expect we will for many years to come.  And as for the question of whether we’ll ever be able to just buy a few iPads and sit back to count our savings, secure in the knowledge that our users will be able to find, filter, and synthesize what they need–well, I wouldn’t bet on it.

women reading tarot cards
Image courtesy George Eastman House via Flickr Commons.

Some thoughts on Google’s eBookstore

Paul Oliver, over at MobyLives, reviews the new Google eBookstore venture, and finds it neat but wanting.  For one thing, his downloaded copy of A Tale of Two Cities came complete with a scan of the contributing library’s aged book pocket, stamped with past borrowers’ dates.  For another, it was volume two…of a two-volume set.

Google’s cataloging, OCR quality, and organization of its digital book files seem to be still stumbling around a little, like a toddler just learning to walk.  In a separate MobyLives post, more than a few sticky cataloging and quality-of-information glitches were exposed.  (Mae West biographies filed under “Religion,” anyone?)  It seems likely that these problems will be cleaned up before too long, but for now at least, caveat searcher.

Other folks have been pointing out that if you use a Kindle, you’re out of luck as far as the eBookstore is concerned.  No .mobi, .prc, or .azw files up there…for obvious reasons.  I’ve seen some folks blaming Google for this, but since those are proprietary Amazon file types, and since you can’t read .epub files on the Kindle, that one seems to sit pretty squarely in Amazon’s lap.

Google is making an effort (at least a marketing effort) to include indie booksellers in its sales strategy, which is not only smart (long tail!) but seems less determinedly hostile to a rich bookselling ecosystem than the “kill them all” Amazon approach.  And Google is (for now) wisely staying out of the business of making a proprietary device or file format, instead making money off doing what it does best; mediating access to information, and skimming a little profit off every exchange.

If I had to lay money on where we’ll be in three years with purchasing e-books, I’d lay it on Google over Amazon.  I have a Kindle 2.  I’ve written here before about my reservations about its physical design and the business model on which it relies.  The way Google’s positioned, it seems to me that it’s not so much Google vs. Amazon, as it is Amazon vs. every other e-reader designer out there that wants to make money off Google’s huge reach.  Amazon’s going to have to do something pretty amazing to stay ahead of that.