[OCWR] Week 4 - OpenCitations Weekly Report

Week from Aug 24 to Aug 30

Introduction

During the fourth week I continued working on the oc_ocdm project.

  • I fixed some issues in the source code
  • I completed the code refactoring of dataset.py and distribution.py (which will be added to the online repository as soon as possible)
  • I verified that the oc_ocdm package can be used in place of the original graphlib.py module without causing integration errors with the CCC workflow (namely the BEE and the SPACIN software)

Report

Fixed some issues

Firstly, I checked the entire codebase of oc_ocdm while comparing it with the documentation provided by the OCDM, looking for potential implementation mistakes. I found the following two methods, both added last week, whose parameter was of the type str whilst it should have been of type URIRef, since the documentation refers neither to a literal nor to a string.

ResourceEmbodiment

  • has_url
  • has_media_type

Moreover, I found some little inconsistencies in the OCDM itself, which I signalled by opening some GitHub issues (#3, #4, #5, #6) in the opencitations/metadata repository.

Dataset and Distribution code cleaning

I finished cleaning up Dataset and Distribution classes, moving some functions into the right class and documenting every method with short comments. The Distribution class now internally works in a very similar fashion to the other classes of oc_ocdm, storing its URIRef reference as self.res.

Integration test with BEE and SPACIN

I decided to run a test to verify whether the oc_ocdm package could have been used as a drop-in replacement for the original graphlib.py module. In the CCC repository where it’s located, graphlib.py is used primarily by SPACIN. In the OpenCitations workflow, SPACIN is a software which is intended to be launched with (or after) another software called BEE. Hence, I cloned the CCC repository locally and I removed from it the file graphlib.py. Then, I changed every import statement from within SPACIN and BEE source files in order for them to point at the globally installed oc_ocdm package. When everything was ready, I firstly launched BEE to produce the JSON-LD files that SPACIN consumes and then I launched SPACIN itself, hoping everything will run smoothly without throwing exceptions. In the end, I got 3 different exceptions:

  1. the first one was caused by a very little bug in the ProvSet class, which I fixed immediately with this commit
  2. the second exception was thrown because the jats2oc.py module from SPACIN assumes that DiscourseElement objects have a “create_number” method see jats2oc.py, line 1666 (while this is not documented in the OCDM)
  3. the third exception was thrown because the jats2oc.py module from SPACIN assumes that ReferencePointer objects have a “has_next_de” method see jats2oc.py, line 1568 (while the ReferencePointer class already has a “has_next_rp” method which does the same thing)

While the first one was already fixed and the third one should be easily fixed just by changing line 1568 of jats2oc.py from

    cur_rp.has_next_de(next_rp)

to

    cur_rp.has_next_rp(next_rp)

, the second one requires a review of the OCDM to be properly handled.