[OCWR] Week 21 - OpenCitations Weekly Report
Week from Dec 21 to Dec 27
Introduction
During the twentyfirst (and last) week I’ve done a lot of work because I had to finalize the work done on the
oc_ocdm package. Various fixes were made to the Storer class, the CounterHandler class with
its subclasses and also the GraphEntity, GraphSet, ProvEntity and ProvSet classes. The new Reader class
was created, adding to it the load method previously owned by Storer. Anyway, most of
the work was done on a big code refactoring which changed a lot of the internal structure of the package and that
made easier to integrate the new MetadataSet and MetadataEntity.
Report
Various fixes for the Storer class
The set_preface_query and get_preface_query methods from Storer were removed: they were only needed for the old
dataset_handler.py module and are not needed anymore since that module was replaced by the new metadata entities added
this week to the codebase (see below).
The method update_all was fixed in order for it to take into account also cases when the sparql update query related
to a particular entity is an empty string (no triples to add and/or remove).
The load method of the Storer class was moved to the newly created Reader class. An old “hack” contained in the load
method was removed since it wasn’t needed anymore: it was put in place in order to overcome a bug contained by older versions of
rdflib, which wasn’t able to parse/load an RDF file if its path happened to contain spaces. Moreover, a little bug in the
store methods was fixed. The function addN of the rdflib package was used in a wrong way, passing only one quad
instead that a collection of quads.
Code refactoring
A big code refactoring was applied to the entire package. Code was mainly split into three different folders: graph (for
entities which represent the actual data), prov (for entities which describe history snapshots of the graph entities)
and metadata (for entities that describe datasets and distributions, with code taken from the old datasethandler.py module
and docstrings taken from the OCDM specification).
GraphEntity, ProvEntity and MetadataEntity are now subclasses of AbstractEntity, while GraphSet, ProvSet and
MetadataSet are now subclasses of AbstractSet. ProvEntity is nomore a subclass of GraphEntity and ProvSet is nomore
a subclass of GraphSet. Test files were prepared for most of the new classes.
The Storer was consequently updated so that it now accepts a generical AbstractSet: it’s now able to store every type of
entity, treating each type in the appropriate way.
New CounterHandler methods
The CounterHandler interface was extended with 4 new methods: read_metadata_counter, increment_metadata_counter,
set_counter and set_metadata_counter. The first ones were added because the new MetadataSet class required a different
strategy for managing the entity counters. The last two ones were needed in order to support a new behaviour of the
GraphSet, ProvSet and MetadataSet classes: if a new entity which is added to them has a res value with a count
value greater than the one which is stored inside the CounterHandler, than the counter inside the CounterHandler
gets updated with this new value.
Other changes
A lot of methods from ProvEntity were moved into a new EntitySnapshot class in which some of the methods had their
name changed:
get_snapshot_ofbecameget_is_snapshot_of;snapshot_ofbecameis_snapshot_of;remove_snapshot_ofbecameremove_is_snapshot_of.
Missing type hints were added to the support.py module and the constructor signatures of GraphEntity, GraphSet, ProvEntity,
ProvSet, MetadataEntity and MetadataSet were changed. New signatures are the following:
GraphEntity(g: Graph, g_set: GraphSet, res: URIRef = None, res_type: URIRef = None,
resp_agent: str = None, source_agent: str = None, source: str = None,
count: str = None, label: str = None, short_name: str = "",
preexisting_graph: Graph = None)
GraphSet(base_iri: str, info_dir: str = "", supplier_prefix: str = "",
wanted_label: bool = True))
ProvEntity(prov_subject: GraphEntity, g: Graph, p_set: ProvSet,
res: URIRef = None, resp_agent: str = None, source_agent: str = None,
source: str = None, res_type: URIRef = None, count: str = None,
label: str = None, short_name: str = "")
ProvSet(prov_subj_graph_set: GraphSet, base_iri: str, info_dir: str = "",
supplier_prefix: str = "", wanted_label: bool = True)
MetadataEntity(g: Graph, base_iri: str, dataset_name: str, m_set: MetadataSet,
res: URIRef = None, res_type: URIRef = None, resp_agent: str = None,
source_agent: str = None, source: str = None, count: str = None,
label: str = None, short_name: str = "", preexisting_graph: Graph = None)
MetadataSet(base_iri: str, info_dir: str = "", wanted_label: bool = True)
Please note that the user shouldn’t pass a CounterHandler instance to the “Set” constructors anymore: a CounterHandler
of the right subclass is automatically instantiated based of the fact that the info_dir parameter is an None value or
a non-empty string.
The ShEx files were updated with the DatasetShape and the DistributionShape.
All import statements pointing to internal modules were changed: they now consist in full absolute paths so that circular dependencies are less probable. Importing only the needed modules, code execution should also be a little faster.
The clean_info_dir script was removed because it turned out to be useless.