Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions softwarereview_policies.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,17 @@ Note that not all rOpenSci projects and packages are in-scope or go through peer

### Package categories {#package-categories}

- **data retrieval**: Packages for accessing and downloading data from online sources with scientific applications. Our definition of scientific applications is broad, including data storage services, journals, and other remote servers, as many data sources may be of interest to researchers. However, retrieval packages should be focused on data *sources* / *topics*, rather than *services*. For example a general client for Amazon Web Services data storage would not be in-scope. (Examples: [**rotl**](https://github.com/ropensci/software-review/issues/17),
[**gutenbergr**](https://github.com/ropensci/software-review/issues/41))
- **data retrieval**: Packages for accessing and downloading data from online sources with scientific applications.
Our definition of scientific applications is broad, including data storage services, journals, and other remote servers, as many data sources may be of interest to researchers.
However, retrieval packages should be focused on data *sources* or *topics*, rather than *services*, and should do [more than just download data](https://ropensci.org/blog/2022/06/16/publicize-api-client-yes-no).
For example a general client for Amazon Web Services data storage would not be in-scope, nor would a package which only offered download functionality without any pre-processing or pre-filtering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would https://docs.ropensci.org/riem/ still be in scope? Really minimal processing, just checking an airport exists for instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but most important is that scope evolves, and so all packages remain valid and implicitly in scope according to the time they were first judged. Scope decisions are immutable, and are not revisited.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious, to help polish the new text.

Example reviews of in-scope data retrieval packages include [**rotl**](https://github.com/ropensci/software-review/issues/17), which offer a [wide range of functions](https://docs.ropensci.org/rotl/reference/index.html), and [**gutenbergr**](https://github.com/ropensci/software-review/issues/41), which pre-processes an enormous set of metadata to make it much easier to interface with a [huge and varied database](https://docs.ropensci.org/gutenbergr/).

- **data extraction**: Packages that aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment. Statistical/ML libraries for modelling or prediction are typically not included in this category, nor are code parsers. Trained models that act as utilities (e.g., for optical character recognition), may qualify. (Examples: [**tabulizer**](https://github.com/ropensci/software-review/issues/42) for extracting tables from PDF documents, [**genbankr**](https://github.com/ropensci/software-review/issues/47) for parsing files from GenBank, [**treeio**](https://github.com/ropensci/software-review/issues/179) for phylogentic reading in phylogentic tree files, [**lightr**](https://github.com/ropensci/software-review/issues/267) for parsing files from spectroscopic instruments))

- **data munging**: Packages for processing data from formats above. This area does not include broad data manipulation tools such as **reshape2** or **tidyr**, or tools for extracting data from R code itself. Rather, it focuses on tools for handling data in specific scientific formats generated from scientific workflows or exported from scientific instruments. (Examples: [**plateR**](https://github.com/ropensci/software-review/issues/60) for reading in data structured as plate maps for scientific instruments, or [**phonfieldwork**](https://github.com/ropensci/software-review/issues/385) for processing annotated audio files for phonics research)

- **data deposition**: Packages that support deposition of data into research repositories, including data formatting and metadata generation.
- **data deposition**: Packages that support deposition of data into research repositories, including data formatting and metadata generation.
(Example: [**EML**](https://github.com/ropensci/software-review/issues/80))

- **data validation and testing**: Tools that enable automated validation and checking of data quality and completeness as part of scientific workflows. (Example: [**assertr**](https://github.com/ropensci/software-review/issues/23))
Expand Down
Loading