Thursday, December 21, 2017

Canadiana DevOps 2017 year review and look to the future

Many ongoing changes for the DevOps team this year.

CRKN update

The CRKN December 2017 Newsbrief provides some updates about Canadiana itself.  The short-form is that there will be a Canadiana membership meeting mid-January to vote on an offer to consolidate the operations of the organizations.

I'm excited about the possibility of being a CRKN employee in the next few months.  As an organization they maintain close ties with their educational sector members, and they don't get confused with being a 'vendor' as Canadiana has been.  I look forward to not only the new employer, but the closer relationship with other people working in library technology across Canada.

In the new year we'll also be meeting some new staff.  As well as the existing CRKN staff, there are two positions we will need to fill in DevOps.

  • A Metadata Architect, as Julienne left for a job at LAC.
  • A System Administrator, as we have more work than the 2 of us remaining can handle.

IIIF Update

Our custom content server has been replaced with Cantaloup and a few support applications.   We are using our existing authentication model, which requires a signature for each specific request.  This means that regular IIIF clients won't work yet without a separate authentication.

We set up a demonstration for Sascha's talk at Access 2017, which allows specific logged-in users to set a cookie that will tell the content server to allow access to everything.  For these users the "About" tab for any document on any of our portals also has a "IIIF Presentation Document" section which includes a URL that can be cut-and-pasted into any IIIF viewer.   If you wish access to this demo, please get in touch with Sascha.

We have plans for a new authentication system and adopting more of the IIIF APIs in the future, but we need more work done on other aspects of the platform before we can do that.

Future of Canadiana's preservation platform

Back in April I wrote about JHOVE, and how we would be integrating format identification and validation into our preservation platform.

After we started some of that work in the spring we decided to explore some alternatives.  Our metadata architect Julienne did an environmental scan and evaluated some of the available tools.   We came to the conclusion that rather than continue to update and maintain our current OAIS platform that we would adopt an existing and already maintained platform.   The plan is to migrate from our custom OAIS platform to Archivematica.

This will involve providing a clean separation between our preservation platform and our access platform.

As well as changing the OAIS platform, we also plan to upgrade from using our own custom replication and validation services on top of ZFS to using OpenStack SWIFT.

The question of when we will be able to do these changes is dependent on the new staff, and how familiar they already are (or how quickly they can become) on these new technologies.

Future of Canadiana's access platform

As part of the environmental scan we also evaluated access tools such as Access to Memory, also primarily developed by Artefactual Systems.  While a great tool for archival description and access, Canadiana exists in that mushy-middle where we aren't exactly a library, exactly an archive, or exactly like any specific part of the GLAM community.

Before Julienne left she did some data modeling work and pointed us in the right direction for next steps.  We look forward next year to exploring the Portland Common Data ModelFedora, and components of Samvera.

Docker and GitHub

More of our software and configuration is up on our c7a GitHub, an initiative started in the summer.  While we maintain a local private subversion, we are slowly moving everything to public repositories on GitHub.

At the same time as the move to GitHub we started to deploy more of our software via docker, with most of our software and configurations now moved.

Repository servers

A repository server node, which currently also provides public access to repository content, has the following docker images:

  • cihm-public-cos  has the Apache image that sits in front of all related web services.
  • cihm-cookie-authorizer is used to verify JWTs and set related cookies.
  • cihm-file-access verifies JWTs and provides direct access to files, such as for PDF downloads
  • cihm-cantaloupe is our Cantaloupe configuration, with Cantaloupe provides derivative images using the IIIF Image API.
  • cihm-repomanage has tools for managing the file repository, such as replication and validation (fixity)
  • I am currently switching over to using the official Docker CouchDB image (tag 1.7.1 for the time being).

Application servers

An Application server node has the following images:

  • cihm-public-cap has the Apache image which is in front of all the web facing application services.
  • 'cap' , which is currently only in our subversion repository and only deployed to development servers.  In production we still use the older deployment mechanism (capistrano).
  • cihm-metadatabus has scripting to stream search data distributed by CouchDB to Solr
  • We are using the official Docker Solr image (tag 5.5.5)
  • These servers also have CouchDB servers

Demo server

We have a demo server which mostly runs legacy applications like the CDP, rdf and AV which we don't maintain, but also have a current demo:
  • cihm-iiif-presentation offers the IIIF Presentation API demonstration, which reads our current CouchDB presentation documents and provides in a IIIF compatible way.

Other servers that don't provide public access

We have other servers which are used internally which don't have a publicly accessible interfaces, and are used by staff to manage processes.

  • The servers that build SIPs, and the servers we use to ingest SIPs into AIPs to be stored on repository servers, use the cihm-ingestwip image.
  • The server used to manage our metadata bus databases use the cihm-metadatabus image (extracts metadata from SIPs, produces documents for presentation and search) and a server running the official Docker Solr image for local search.
  • We also have CouchDB running on many servers, as we use it as a reliable multi-datacenter replication service for most of our metadata.

Best wishes of the season, and Happy New Year.

No comments: