Update NMDC ingest script to include Study name in `data_collections` field

As shown in the code snippet below, the ingest script currently only includes the study's ID and the URL to the study's page on the NMDC data portal; both of which it derives from information in the Biosample. Retrieving additional details about the study, such as its name and its description, will require fetching data from the `study_set` collection (via some Runtime API endpoint, such as `GET /studies`).

https://github.com/ber-data/data/blob/87fab60513f49119b75a19176ec6ae1b1d46c2f5/contrib/nmdc/ingest.py#L152-L169

I think this will be a straightforward change to make, but may require renaming some variables and wrapping the cached data within a higher-level JSON object (e.g. one that has a `biosamples` property and a `studies` property).

	def get_part_of_collection(self) -> list[bertron.DataCollection]:
	"""Returns a list of `DataCollection` instances, each describing one of the Biosample's associated studies.

	References:
	- https://ber-data.github.io/bertron-schema/DataCollection/
	- https://microbiomedata.github.io/nmdc-schema/associated_studies/

	TODO: Retrieve the name and description of the Study from the NMDC Runtime API, then include it here.
	"""
	data_collections = []
	if self.associated_studies is not None and len(self.associated_studies) > 0:
	for study_id in self.associated_studies:
	data_collection = bertron.DataCollection(
	id=study_id,
	url=f"https://api.microbiomedata.org/studies/{study_id}",
	)
	data_collections.append(data_collection)
	return data_collections

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update NMDC ingest script to include Study name in `data_collections` field #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update NMDC ingest script to include Study name in data_collections field #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Update NMDC ingest script to include Study name in `data_collections` field #30