diff --git a/softwarereview_policies.Rmd b/softwarereview_policies.Rmd index 6e6c5b9ec..bd83e8f81 100644 --- a/softwarereview_policies.Rmd +++ b/softwarereview_policies.Rmd @@ -60,14 +60,17 @@ Note that not all rOpenSci projects and packages are in-scope or go through peer ### Package categories {#package-categories} -- **data retrieval**: Packages for accessing and downloading data from online sources with scientific applications. Our definition of scientific applications is broad, including data storage services, journals, and other remote servers, as many data sources may be of interest to researchers. However, retrieval packages should be focused on data *sources* / *topics*, rather than *services*. For example a general client for Amazon Web Services data storage would not be in-scope. (Examples: [**rotl**](https://github.com/ropensci/software-review/issues/17), - [**gutenbergr**](https://github.com/ropensci/software-review/issues/41)) +- **data retrieval**: Packages for accessing and downloading data from online sources with scientific applications. + Our definition of scientific applications is broad, including data storage services, journals, and other remote servers, as many data sources may be of interest to researchers. + However, retrieval packages should be focused on data *sources* or *topics*, rather than *services*, and should do [more than just download data](https://ropensci.org/blog/2022/06/16/publicize-api-client-yes-no). + For example a general client for Amazon Web Services data storage would not be in-scope, nor would a package which only offered download functionality without any pre-processing or pre-filtering. + Example reviews of in-scope data retrieval packages include [**rotl**](https://github.com/ropensci/software-review/issues/17), which offer a [wide range of functions](https://docs.ropensci.org/rotl/reference/index.html), and [**gutenbergr**](https://github.com/ropensci/software-review/issues/41), which pre-processes an enormous set of metadata to make it much easier to interface with a [huge and varied database](https://docs.ropensci.org/gutenbergr/). - **data extraction**: Packages that aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment. Statistical/ML libraries for modelling or prediction are typically not included in this category, nor are code parsers. Trained models that act as utilities (e.g., for optical character recognition), may qualify. (Examples: [**tabulizer**](https://github.com/ropensci/software-review/issues/42) for extracting tables from PDF documents, [**genbankr**](https://github.com/ropensci/software-review/issues/47) for parsing files from GenBank, [**treeio**](https://github.com/ropensci/software-review/issues/179) for phylogentic reading in phylogentic tree files, [**lightr**](https://github.com/ropensci/software-review/issues/267) for parsing files from spectroscopic instruments)) - **data munging**: Packages for processing data from formats above. This area does not include broad data manipulation tools such as **reshape2** or **tidyr**, or tools for extracting data from R code itself. Rather, it focuses on tools for handling data in specific scientific formats generated from scientific workflows or exported from scientific instruments. (Examples: [**plateR**](https://github.com/ropensci/software-review/issues/60) for reading in data structured as plate maps for scientific instruments, or [**phonfieldwork**](https://github.com/ropensci/software-review/issues/385) for processing annotated audio files for phonics research) -- **data deposition**: Packages that support deposition of data into research repositories, including data formatting and metadata generation. +- **data deposition**: Packages that support deposition of data into research repositories, including data formatting and metadata generation. (Example: [**EML**](https://github.com/ropensci/software-review/issues/80)) - **data validation and testing**: Tools that enable automated validation and checking of data quality and completeness as part of scientific workflows. (Example: [**assertr**](https://github.com/ropensci/software-review/issues/23))