Workshop on Research Objects 2019
(Disclosure: I’ve been collaborating with Eoghan Ó Carragáin on the concept of DaMaHub since 2017)
Data generated through publicly funded scientific studies must be preserved in persistent, discoverable, verifiable and re-usable formats in order to facilitate reproducible scientific research across disciplines, institutions and continents. In order to insure usability of data for a long time (and for anyone in the world), it has to be decided/agreed within the community of academic librarians, data archive and repository maintainers what kind of metadata must be captured and how it must be permanently linked to the valuable research data.
RO-Crate proposes to combine currently available openly licensed technologies, tools and linked-data formats that can be used independent of infrastructure to facilitate FAIR sharing of scientific datasets and employment of computational analytical methods.
The proposal is well-written and existing challenges of reaching consensus among stakeholders are well defined. However it’s not clear how community efforts around this initiative to be expanded and sustained on the long timeline.
Is the text easy to follow? Are core concepts defined or referenced? Is it clear what is the author’s contribution?
the abstract is easy to follow, present a fair motivation and links to relevant initiatives/works, as well as the key contribution
URL for a Research Object or Zenodo record provided? Guidelines followed? Open format (e.g. HTML)? Sufficient metadata, e.g. links to software? Some form of Data Package provided? Add text below if you need to clarify your score.
Please provide a brief review, including a justification for your scores. Both score and review text are required.
This abstract introduces RO-Crate, a lightweight approach to Research Object (RO) data packaging. RO-crate started in response to discussions and feedback received regarding the existing research object packaging alternatives and recommendations. In fact, the RO model and its formalisation as a suite of ontologies do not formalize how ROs are saved or transmitted. Additionally, many domain experts, as well as a great number of developers are typically using similar principles for data packaging, e.g, JSON-LD manifest, schema.org annotation, BagIT, etc, and in some case they find the full specification of an RO and its manifest slightly complicated.
RO-crate takes all these shared principles and applies them to the RO packaging, allowing to simplify the creation and maintenance of metadata ( both manually or programmatic), but also to create full ROs if needed. Furthermore, RO-crate consider the possibility of domain extensions when needed, but using schema.org for annotations whenever possible.
RO-crate is quite relevant for the workshop, and even though it is still in an early phase, the workshop will be a great opportunity to discuss with other participants about their challenges when dealing with ROs, and ideas they could bring to the RO-crate specification, which is open for contributions.
It will be in particular important to discuss about the challenges to come up with the “minimal” and “sufficient” set of metadata elements an RO-crate should include, how the profiling for different domains should be made or guided, generation guides for full ROs, possibilities to reuse existing tooling (both RO and non-RO aware), etc. Additionally, it would be good for authors to provide a clear distinction, if any, and possibilities when dealing with full ROs or RO-crates, and to provide some use cases for that.
Is the text easy to follow? Are core concepts defined or referenced? Is it clear what is the author’s contribution?
URL for a Research Object or Zenodo record provided? Guidelines followed? Open format (e.g. HTML)? Sufficient metadata, e.g. links to software? Some form of Data Package provided? Add text below if you need to clarify your score.
Please provide a brief review, including a justification for your scores. Both score and review text are required.
This abstract describes RO crate, a lightweight data packaging approach for Research objects.
The topic of the talk is highly relevant to the RO community, the workshop and potentially to eScience participants. I specially like the idea of relying on existing standards or commonly used vocabularies to define the packaging initiative, and look forward to discussing aspects like the metadata inference in a RO crate.
I am curious about other challenges that could be faced by a RO crate, but are not mentioned in the abstract. For example, many data products now include metadata properties that describe important parts of the provenance of the object. How to propagate these metadata to the ROcrate itself? What if the ROcrate infers metadata that are inconsistent with these metadata? Another example would be the need of a resource to be physically present in a RO crate. Can the ROcrate be virtual, e.g., if the datasets are big? How to handle provenance and outside changes then?
Finally, I think that the need for Linked Data in this context needs a better motivation. A packaging approach by design aims to create a silo of self-contained information. Where is the need to connect it to outside resources if I just want to reproduce the experiment?
Minor comments: Codemeta is described as a data packaging initiative for software, but it’s not. It’s a metadata vocabulary for software.
I also don’t know what a filename glob pattern is.