31 July 2012
The SuperOERGlue project is to pilot the integration of OER Glue with Newcastle University's Dynamic Learning Maps.
Tatamae, our partners in the SuperOERGlue project and the creator of OERGlue have been busy building a harvester that can aggregate data and provide search and recommendations based on this aggregated data. After learning that this was now operational we arranged a Skyke call with their developer Justin Ball to discuss how best we could interact with this harvester and what would be their preferred format for our data to be sent. From this call it was decided that accepted formats would be RSS or OAI-PMH, with OAI-PMH being the preferred method. We decided to go with the OAI-PMH option.
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is used to harvest the metadata descriptions of the records in an archive so that services can be built using metadata from many archives.
We started by investigating to see if we could find an existing Python package which would output our DLM resources in an OAI-PMH format. This search lead us to the OAI-PMH Python Module. Installing the package was fairly pain free, simply doing an easy_install on pyoai. However after getting it installed it wasn't obvious exactly how to use the package to output OAI-PMH and the documentation was pretty weak with the majority of it about how to read in OAI-PMH feeds rather than outputting them. After looking through the code there did seem to be some classes to handle exposing metadata but these seemed like empty containers rather than fully functional ones. After a little more searching to no prevail we decided to write our own code in Python.
We did this by looking at the spec on the openarchives web site. This provided a good basis of what functions were needed and what the outputs should look like. Reading from the documentation the protocol required a set of six verbs that are invoked within HTTP.
Example HTTP requests to harvest the data can be accessed at the following urls:
Notice that each of these takes in a verb and metadata prefix which are minimum requirements for the request. Some of these requests also take in optional parameters to help with selective harvesting such as ListRecords which accept from and until dates to limit records.
To aid in performance during testing we decided to return only a selection of the DLM resources rather than the whole set until we were happy that the feed was formatted correctly and there was no problems in accessing it. The way we limited results was to only included resources which were flagged as having a Wikipedia Commons licence.
Although OAI-PMH supports a number of metadata representations we only provide 1 currently - Dublin Code. This is the minimum requirement although we may look into providing additional formats in the future.
We also don't support sets at the moment which are used for selective harvesting. The main reason for this was that our data did not easily map to a set but again this could be looked at in the future if required.
We have now passed these feeds on to OERGlue and are waiting to hear if they have managed to process them or whether we need to update our code. Once we are both happy we will expose the rest of the data for them to harvest.
Related tags: oerri, rapid innovation, RIDLR, supoerglue, ukoer
Posted by: John Peterson