[OCWR] Week 4 - OpenCitations Weekly Report
Week from Aug 24 to Aug 30
Introduction
During the fourth week I continued working on the oc_ocdm project.
- I fixed some issues in the source code
- I completed the code refactoring of
dataset.pyanddistribution.py(which will be added to the online repository as soon as possible) - I verified that the
oc_ocdmpackage can be used in place of the originalgraphlib.pymodule without causing integration errors with the CCC workflow (namely the BEE and the SPACIN software)
Report
Fixed some issues
Firstly, I checked the entire codebase of oc_ocdm while comparing it with the documentation provided by the OCDM, looking
for potential implementation mistakes. I found the following two methods, both added last week, whose parameter was of the type str whilst it
should have been of type URIRef, since the documentation refers neither to a literal nor to a string.
ResourceEmbodiment
- has_url
- has_media_type
Moreover, I found some little inconsistencies in the OCDM itself, which I signalled by opening some GitHub issues (#3, #4, #5, #6) in the opencitations/metadata repository.
Dataset and Distribution code cleaning
I finished cleaning up Dataset and Distribution classes, moving some functions into the right class and documenting every method with short
comments. The Distribution class now internally works in a very similar fashion to the other classes of oc_ocdm,
storing its URIRef reference as self.res.
Integration test with BEE and SPACIN
I decided to run a test to verify whether the oc_ocdm package could have been used as a drop-in replacement for the
original graphlib.py module. In the CCC repository where it’s located, graphlib.py is used primarily by SPACIN. In the
OpenCitations workflow, SPACIN is a software which is intended to be launched with (or after) another software called BEE. Hence, I
cloned the CCC repository locally and I removed from it the file graphlib.py. Then, I changed every import statement from within SPACIN
and BEE source files in order for them to point at the globally installed oc_ocdm package. When everything was ready, I firstly launched
BEE to produce the JSON-LD files that SPACIN consumes and then I launched SPACIN itself, hoping everything will run smoothly without throwing
exceptions. In the end, I got 3 different exceptions:
- the first one was caused by a very little bug in the
ProvSetclass, which I fixed immediately with this commit - the second exception was thrown because the
jats2oc.pymodule from SPACIN assumes thatDiscourseElementobjects have a “create_number” method see jats2oc.py, line 1666 (while this is not documented in the OCDM) - the third exception was thrown because the
jats2oc.pymodule from SPACIN assumes thatReferencePointerobjects have a “has_next_de” method see jats2oc.py, line 1568 (while theReferencePointerclass already has a “has_next_rp” method which does the same thing)
While the first one was already fixed and the third one should be easily fixed just by changing line 1568 of jats2oc.py from
cur_rp.has_next_de(next_rp)
to
cur_rp.has_next_rp(next_rp)
, the second one requires a review of the OCDM to be properly handled.