[OCWR] Week 9 - OpenCitations Weekly Report

Week from Sep 28 to Oct 04

Introduction

During the ninth week, I worked on cleaning the package in various ways. Every bit of code responsible for handling the entity counters was extrapolated from GraphSet/ProvSet and moved into the oc_ocdm.counter_handler subpackage. Every variable and function argument related to that same responsibility was removed from the signature of both GraphSet and ProvSet constructors; the GraphSet.add_ci method signature was modified too, reflecting the changes made last week. As a result, a new 1.0.0 major release of the package was released on the online GitHub repository.

Report

Encapsulation of counter handling code inside CounterHandler subclasses

The entire logic related to the handling of entity counters was extracted from GraphSet/ProvSet and it’s now encapsulated inside the FilesystemCounterHandler class. As an alternative, the InMemoryCounterHandler class was implemented, together with the abstract CounterHandler base class and the two test modules for both implementations. This enables the possibility for the user/developer to choose one of the available implementations, with way more flexibility than before. Furthermore, extending the existing implementations is now much simpler for the package mantainers. As their respective names suggest, FilesystemCounterHandler provides an implementation based upon filesystem storage, whilst InMemoryCounterHandler uses RAM memory as a –quicker but non-persistent– storage support.

GraphSet.__init__, ProvSet.__init__ and GraphSet.add_ci signatures changed

From now on, the URI of Citation objects won’t contain the OCI but a sequential integer just like every other entity. GraphSet.add_ci method signature changed accordingly:

# OLD signature
add_ci(self, resp_agent: str, citing_res: BibliographicResource, cited_res: BibliographicResource,
               rp_num: str = None, source_agent: str = None, source: str = None,
               res: URIRef = None) -> Citation
# NEW signature
add_ci(self, resp_agent: str, source_agent: str = None, source: str = None, res: URIRef = None) -> Citation

TIP: the OCI can still be associated to the citation as an Identifier object (see Identifier.create_oci method).

GraphSet and ProvSet constructors have now a different signature. The old parameters info_dir, n_file_item (of both classes) and dir_split, default_dir (of ProvSet) were removed. Moreover, both constructors require a new counter_handler parameter whose type should be a subclass of the CounterHandler abstract class.

# OLD signatures
class GraphSet:
    def __init__(self, base_iri: str, context_path: str, info_dir: str, n_file_item: int = 1,
                 supplier_prefix: str = "", forced_type: bool = False, wanted_label: bool = True) -> None:
class ProvSet(GraphSet):
    def __init__(self, prov_subj_graph_set: GraphSet, base_iri: str, context_path: str, default_dir: str,
                 info_dir: str, dir_split: int, n_file_item: int, supplier_prefix: str,
                 triplestore_url: str, wanted_label: bool = True) -> None:

# NEW signatures
class GraphSet:
    def __init__(self, base_iri: str, context_path: str, counter_handler: CounterHandler,
                 supplier_prefix: str = "", forced_type: bool = False, wanted_label: bool = True) -> None:
class ProvSet(GraphSet):
    def __init__(self, prov_subj_graph_set: GraphSet, base_iri: str, context_path: str,
                 counter_handler: CounterHandler, supplier_prefix: str,
                 triplestore_url: str, wanted_label: bool = True) -> None:

New script and import statements optimization

A new clean_info_dir script was implemented. It can be useful to empty the info_dir folder that is populated when the tests are run. To execute the script, the following command should be issued:

poetry run clean_info_dir

Moreover, every import statement in the package was optimized (shortened) thanks to the new hierarchical module structure and to the particular usage of the __init__.py files. For example:

# OLD statements
from oc_ocdm.graph_set import GraphSet
from oc_ocdm.entities.bibliographic.bibliographic_resource import BibliographicResource

# NEW statements
from oc_ocdm import GraphSet
from oc_ocdm.entities.bibliographic import BibliographicResource