[OCWR] Week 4 - OpenCitations Weekly Report
Week from Aug 24 to Aug 30
Introduction
During the fourth week I continued working on the oc_ocdm project.
- I fixed some issues in the source code
- I completed the code refactoring of
dataset.py
anddistribution.py
(which will be added to the online repository as soon as possible) - I verified that the
oc_ocdm
package can be used in place of the originalgraphlib.py
module without causing integration errors with the CCC workflow (namely the BEE and the SPACIN software)
Report
Fixed some issues
Firstly, I checked the entire codebase of oc_ocdm
while comparing it with the documentation provided by the OCDM, looking
for potential implementation mistakes. I found the following two methods, both added last week, whose parameter was of the type str
whilst it
should have been of type URIRef
, since the documentation refers neither to a literal
nor to a string
.
ResourceEmbodiment
- has_url
- has_media_type
Moreover, I found some little inconsistencies in the OCDM itself, which I signalled by opening some GitHub issues (#3, #4, #5, #6) in the opencitations/metadata repository.
Dataset and Distribution code cleaning
I finished cleaning up Dataset
and Distribution
classes, moving some functions into the right class and documenting every method with short
comments. The Distribution
class now internally works in a very similar fashion to the other classes of oc_ocdm,
storing its URIRef
reference as self.res
.
Integration test with BEE and SPACIN
I decided to run a test to verify whether the oc_ocdm package could have been used as a drop-in replacement for the
original graphlib.py
module. In the CCC repository where it’s located, graphlib.py
is used primarily by SPACIN. In the
OpenCitations workflow, SPACIN is a software which is intended to be launched with (or after) another software called BEE. Hence, I
cloned the CCC repository locally and I removed from it the file graphlib.py
. Then, I changed every import statement from within SPACIN
and BEE source files in order for them to point at the globally installed oc_ocdm
package. When everything was ready, I firstly launched
BEE to produce the JSON-LD files that SPACIN consumes and then I launched SPACIN itself, hoping everything will run smoothly without throwing
exceptions. In the end, I got 3 different exceptions:
- the first one was caused by a very little bug in the
ProvSet
class, which I fixed immediately with this commit - the second exception was thrown because the
jats2oc.py
module from SPACIN assumes thatDiscourseElement
objects have a “create_number” method see jats2oc.py, line 1666 (while this is not documented in the OCDM) - the third exception was thrown because the
jats2oc.py
module from SPACIN assumes thatReferencePointer
objects have a “has_next_de” method see jats2oc.py, line 1568 (while theReferencePointer
class already has a “has_next_rp” method which does the same thing)
While the first one was already fixed and the third one should be easily fixed just by changing line 1568 of jats2oc.py
from
cur_rp.has_next_de(next_rp)
to
cur_rp.has_next_rp(next_rp)
, the second one requires a review of the OCDM to be properly handled.