Skip to main content

A brief introduction to digital preservation

·1643 words·8 mins
MSP Staff
access journal admin publishing
Table of Contents
The MSP Guide to Journal Publishing - This article is part of a series.
Part 9: This Article

Despite the best of intentions, even the most robust journals and publishers must sometimes wind down their operations. And if the published content has primarily (or only) been made available online, there are some distinctly negative outcomes to be avoided:

  • Users of the content — such as libraries — might lose access to electronic content that they’ve paid for, despite being entitled to it.
  • The content itself, formerly available on the publisher’s website, might vanish from existence completely.

To prevent this, the scholarly community has developed a number of digital preservation initiatives and services. Generally, these operate by making copies of the electronic content, and housing the copies on various servers not controlled by the publisher. That way, if something happens to the publisher, the copies are not affected.

Preservation of scholarly content is a high priority for the academic community. Some libraries even include requirements about long-term preservation in their user agreements. Long-term preservation plans are also required by certain journal-focused programs and networks; the Directory of Open Access Journals, for instance, will only accept journals that are “continuously deposited in a long-term digital archiving system”. And of course, preservation is also important to the authors whose work you publish — after all, in a worst-case scenario, it’s their hard work that would disappear.

Many digital preservation services and systems exist, but here we’ll introduce you to three of the most popular ones: LOCKSS, CLOCKSS, and Portico.

The Global LOCKSS Network (GLN)
#

In the late 1990s, staff at the Stanford Libraries began the LOCKSS (“Lots of Copies Keep Stuff Safe”) Program, in an effort to develop an open-source software system that would allow libraries to manage the preservation of their own content.

The term “LOCKSS” is now used to refer variously to this software itself; to the program that developed (and continues to develop) it; and to the Global LOCKSS Network (GLN), a worldwide operational network that uses the LOCKSS software to archive and collaboratively safeguard digital content.

This network is now a crucial part of how LOCKSS and its supporters achieve their preservation aims. Each library participant in the network installs the LOCKSS software on its local systems. Once configured, the software saves copies of library content (if the content is from participating publishers) to the library’s onsite “node”, sometimes also known as a “LOCKSS box”.

The software periodically audits the node’s contents against other nodes in the Network or against the publisher sites, filling any gaps and making repairs where files may have been damaged.

Meanwhile, publishers can request to join the LOCKSS network and give participating libraries permission to store copies of their content. Participating publishers are responsible for ensuring their website allows proper access and permissions to the LOCKSS software (with help from the Global Network’s staff). Publishers must also notify the Network about any new publications they have released, so that the Network can add it to their systems.

Through this group effort, the continued existence of published works is protected, and access for its paid subscribers is preserved — no matter what happens to the publisher, its website, or its archives.

It’s important to note that once created, LOCKSS box copies can be accessed by the library at any time. This means they’re useful to the library in more situations than the complete closure of the publisher. For example, libraries can use their LOCKSS box to make copies of content temporarily available to their patrons in the event that the main publisher website is down for maintenance. Additionally, LOCKSS membership is also used by some publishers to provide a perpetual access option for their subscribers, in lieu of (or in addition to) continuing to maintain IP-address or login information for subscribers after the expiration of their paid access terms. This type of archive — to which users have direct, immediate access — is called a “light archive”.

Publisher requests to join the Network are not guaranteed to be approved — it depends on “the needs of the library community that the LOCKSS Program serves, the preservation status of the content, the archivability of the content, and other factors”. (MSP is part of the Global LOCKSS Network.) If approved, however, there are no fees for the publisher to participate. Rather, the Global LOCKSS Network is funded primarily through library membership fees, and has in the past received funding from a number of scientific, archival, and technology-focused organizations.

The CLOCKSS Archive
#

The CLOCKSS archive, also known as simply “CLOCKSS”, is a nonprofit organization that uses LOCKSS technology to preserve content. But whereas nodes of the Global LOCKSS Network are controlled and maintained by the end-user libraries, the twelve CLOCKSS nodes — all identical copies, geographically dispersed at the sites — are stewarded by the CLOCKSS organization collectively.

Individual libraries do not have direct access to CLOCKSS archives. Rather, content in CLOCKSS is kept in trust in case the original version of the content becomes permanently unavailable for some reason, such as the publisher becoming unable or unwilling to host that specific content anymore, or a disaster of some kind wiping out the original servers. In that event (called a “trigger event”), the content is assigned a Creative Commons license and becomes fully open access.

This type of archive is called a “dark archive.” Its purpose is general long-term availability of the content to the world at large, not specific availability to any particular paid subscriber.

Like LOCKSS, CLOCKSS receives funding from supporting libraries. But unlike LOCKSS, CLOCKSS also charges publishers for its services. (MSP is one such participating publisher.) There’s an annual fee, scaled to the publisher’s total revenue, plus some setup fees that vary depending on complexity. Additionally, high-volume publishers may incur a small ingestion fee per article (though the threshold is such that a new independent journal would be unlikely to reach it). Overall, we’d say the charges are reasonable enough: setup may cost up to several thousand dollars for more complicated journals; but afterwards, a new independent publication would likely pay only a few hundred dollars (US) per year.

Portico
#

Portico began as a project by JSTOR (as the “Electronic-Archiving Initiative”) with a grant from the Andrew W. Mellon Foundation. Today, it is part of the nonprofit organization ITHAKA.

Portico is similar to CLOCKSS insofar that it is primarily a “dark archive” hosting copies (deposited into Portico’s servers by the publishers) on a centralized platform.

However, whereas CLOCKSS uses the open-source (and largely library-run) LOCKSS software, Portico uses its own bespoke software. It also uses some different formats for the content it collects — LOCKSS software simply copies content in the publisher’s original format, whereas Portico normalizes files upon ingesting them, aiming to make its content as application-neutral as possible.

The most important difference, however, is that after a “trigger event”, content archived with Portico does not necessarily become open access. Rather, any subscriber-only content archived with Portico remains accessible only to Portico subscribers.

Additionally, although Portico-archived content is not generally available to library users, publishers can choose to use Portico’s services to offer post-cancellation access to content. Again, this service is only available to Portico-participating libraries.

To join Portico, publishers must submit their content for consideration. If accepted, there’s an annual fee, scaled to the journal’s annual revenue. At the time of writing, Portico’s annual fees for smaller publishers are roughly comparable to those charged by CLOCKSS. (Only accepted publishers, of course, can use Portico to provide post-cancellation access.) Aside from publisher fees, Portico also receives financial support from more than a thousand participating library institutions.

Some libraries prefer Portico, for various reasons. Portico describes itself as “the first digital preservation service to be independently audited by the Center for Research Libraries (CRL)”; that means it has long established itself as a preservation solution that serves libraries’ concerns. Its longevity also means that it has built and curated a robust community of users and publishers. Joining Portico grants access to that community, which may be appealing to libraries and publishers alike.

What else?
#

These aren’t the only digital preservation options available. They are, however, three of the best-known. So understanding the differences between them is a good start towards understanding what your options are.

If you strongly believe in open scholarship, then you may want an option that will automatically apply a Creative Commons License to all triggered content. CLOCKSS is the only option we’ve listed here that satisfies this criterion.

However, if you also want to provide post-cancellation access — and in our experience, libraries prefer that we do — then you’ll need to look further. You could consider joining the LOCKSS Global Network along with CLOCKSS; joining Portico instead of either; or seeking out a different option. Depending on your subscriber base, you might find the libraries you serve prefer one post-cancellation solution or another.

Or of course — if it’s in your budget — you can join multiple services. Some publishers are members of CLOCKSS, the LOCKSS Global Network, and Portico at the same time. This certainly gives users extra flexibility for post-cancellation access, and may give the publisher more peace of mind as well.

One last thing: Long-term preservation of digital content isn’t just about libraries retaining access to what they’ve paid for, or publishers satisfying the requirements of indexing databases or other networks. Fundamentally, it’s about ensuring that the work of the scholarly community continues to be available for generations to come — and this is something that all responsible journals and publishers should make a priority.

Further reading
#

Comparison of Clockss, Global Lockss Network, and Portico, jointly prepared by the CLOCKSS Archive, the LOCKSS Program, and Portico.
Dobson, Chris, “From Bright Idea to Beta Test: The Story of LOCKSS”, Searcher: The Magazine for Database Professionals February 2003, 50–53.


Header image by Catrin Johnson, available from Unsplash. Free to use under the Unsplash license.

The MSP Guide to Journal Publishing - This article is part of a series.
Part 9: This Article