Skip to content

Conversation

@pfurio
Copy link
Member

@pfurio pfurio commented Sep 22, 2025

No description provided.

@pfurio pfurio requested a review from jtarraga September 22, 2025 10:32
@halender
Copy link
Contributor

Base automatically changed from TASK-7442 to develop December 17, 2025 11:34
@pfurio pfurio marked this pull request as ready for review December 17, 2025 13:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements comprehensive clinical pipeline functionality including preprocessing, variant assessment, and reporting capabilities. The main focus is on integrating clinical analysis workflows with enhanced reporting features and support for additional trait association sources (HGMD, CIVIC).

Key changes:

  • Clinical Report Refactoring: Deprecated several fields (title, logo, signedBy, signature, supportingEvidences, files) and introduced new structured fields (signatures, references, images, methodology, limitations, experimentalProcedure)
  • Clinical Pipeline APIs: Added three new REST endpoints for clinical pipeline execution (prepare, genomics, affy)
  • Checksum Algorithm Update: Changed from MD5 to SHA-256 for file checksums across the platform
  • Clinical Configuration Enhancements: Added tier configuration and report configuration to study clinical settings
  • File Content Management: New endpoint to update file contents directly

Reviewed changes

Copilot reviewed 168 out of 253 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
ClinicalReport.java Major refactoring with deprecated fields and new structured report fields
ClinicalAnalysis.java Added reportedFiles field and removed deprecated methods
ClinicalWebService.java Added 3 new pipeline endpoints and updated report parameter names
FileWSServer.java Added updateContent endpoint and changed checksum description to SHA-256
FileManager.java Implemented updateContent method and changed checksum from MD5 to SHA-256
SampleIndexVariantAnnotationConverter.java Uncommented code to extract clinical sources from traitAssociation
SampleIndexConfiguration.java Added 'hgmd' and 'civic' to clinical source values
ClinicalAnalysisStudyConfiguration.java Added tiers, report config, and new status types (INCONCLUSIVE, REJECTED)
Various test files Updated tests for new ClinicalReport constructor, fixed upload method signatures
Client files (Python, JS, R, Java) Auto-generated updates for new endpoints and parameters
Comments suppressed due to low confidence (3)

opencga-catalog/src/main/java/org/opencb/opencga/catalog/db/mongodb/MongoDBAdaptor.java:904


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Path localPath;

// CEL files
List<File> opencgaCelFiles = new ArrayList<>();
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contents of this container are never accessed.

Copilot uses AI. Check for mistakes.
# + ["-M", str(dedup_bam_file_metrics)])
# self.run_command(cmd_dedup)

dedup_bai_file = dedup_bam_file + ".bai"
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable dedup_bai_file is not used.

Copilot uses AI. Check for mistakes.
if (Path(bam_file).is_file()):
# files = input.get("files", [])

vcf_file = Path(bam_file).stem + ".strelka2.vcf"
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable vcf_file is not used.

Copilot uses AI. Check for mistakes.
cmd = ["configureStrelkaGermlineWorkflow.py", "--referenceFasta", strelka2_tool_config.get("reference"), "--bam", bam_file, "--runDir", str(self.output / "strelka2_run")]
self.run_command(cmd)

run_cmd = [str(self.output / "strelka2_run" / "runWorkflow.py"), "-m", "local", "-j", "4"]
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable run_cmd is not used.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,96 @@
import os
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to 48
wget http://github.com/samtools/bcftools/releases/download/1.22/bcftools-1.22.tar.bz2 && \
tar xjvf bcftools-1.22.tar.bz2 && \
bash -c 'wget -P bcftools-1.22 http://raw.githubusercontent.com/DrTimothyAldenDavis/SuiteSparse/stable/{SuiteSparse_config/SuiteSparse_config,CHOLMOD/Include/cholmod}.h' && \
cd bcftools-1.22 && \
## liftover plugin
bash -c '/bin/rm -f plugins/{score.{c,h},{munge,liftover,metal,blup}.c,pgs.{c,mk}}' && \
bash -c 'wget -P plugins http://raw.githubusercontent.com/freeseek/score/master/{score.{c,h},{munge,liftover,metal,blup}.c,pgs.{c,mk}}' && \
## affy2vcf plugin
bash -c '/bin/rm -f plugins/{idat2gtc.c,gtc2vcf.{c,h},affy2vcf.c}' && \
bash -c 'wget -P plugins http://raw.githubusercontent.com/freeseek/gtc2vcf/master/{idat2gtc.c,gtc2vcf.{c,h},affy2vcf.c,BAFregress.c}' && \
make && \
bash -c '/bin/cp bcftools plugins/{munge,liftover,score,metal,pgs,blup}.so /usr/local/bin/' && \
bash -c '/bin/cp bcftools plugins/{munge,liftover,score,metal,pgs,blup,idat2gtc,gtc2vcf,affy2vcf,BAFregress}.so /usr/local/bin/' && \
wget -P /usr/local/bin http://raw.githubusercontent.com/freeseek/score/master/assoc_plot.R && \
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this Dockerfile, multiple external tools and source files (bcftools tarball and several plugin and header files) are downloaded over plain HTTP (wget http://...) without any integrity verification, which allows a network attacker to supply malicious binaries or source and have them built into the image. Because these binaries and plugins run inside the opencga-ext-tools container, a compromised download could lead to arbitrary code execution within your analysis environment and potentially access to sensitive data or credentials. Switch all these downloads to HTTPS and add integrity checks (e.g., pinned versioned URLs plus checksums or signatures) so that the build fails if the fetched artifacts are tampered with.

Copilot uses AI. Check for mistakes.

ObjectMap storageOptions = analysisParams.getPipelineParams().getVariantIndexParams() != null
// ? new ObjectMap(analysisParams.getPipelineParams().getVariantIndexParams())
? analysisParams.getPipelineParams().getVariantIndexParams().toObjectMap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work directly.
Check VariantIndexOperationTool#check. There, you can see that the conversion between "indexParams" into "options" is not direct. Need some mapping.
Suggest to move that mapping to an static method

VariantIndexOperationTool {
    public static ObjectMap toStorageOptions(VariantIndexParams params) {..}
}

File vcfFile = catalogManager.getFileManager().link(study, new FileLinkParams(vcfPath.toAbsolutePath().toString(),
"", "", "", null, null, null, null, null), false, token).first();

ObjectMap storageOptions = analysisParams.getPipelineParams().getVariantIndexParams() != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

// private static AbstractClinicalManagerTest clinicalTest;
// private static ResourceManager resourceManager;

private final static Path NGS_PIPELINE_DATA_PATH = Paths.get("/opt/ngs-pipeline");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the data that should, somehow, be available for the testing environment? Isn't it possible to make it smaller so it can be pushed to git?

contentList.add(line);
}
// Split the content into lines
String[] allLines = content.split("\r?\n", -1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe it or not, this is a "complex" pattern that should be compiled and moved to a static variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

public class GenomicsVariantCallingPipelineTool extends PipelineTool {

@DataField(id = "reference", description = "Reference genome path")
private String reference;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reference is a file. It doesn't have the suffix "file", so it would be missed by the getJobInputFilesFromParams

public class PrepareClinicalPipelineParams extends ToolParams {

@DataField(id = "referenceGenome", description = "Reference genome to be used in the clinical pipeline.")
private String referenceGenome;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a catalog file as well?

@DataField(id = "pipelineParams", description = "Parameters to execute the pipeline preparation (i.e., creating the indexes,...)")
private PrepareClinicalPipelineParams pipelineParams;

@DataField(id = "outdir", description = FieldConstants.JOB_OUT_DIR_DESCRIPTION)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this outdir needed?

// }
if (ToolParams.class.isAssignableFrom(rawPrimaryType)) {
if (hasNestedFields(((Class<? extends ToolParams>) rawPrimaryType))) {
Logger logger = org.slf4j.LoggerFactory.getLogger(aClass);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hummm... Although I might like this warning, this section was already commented out in a prev commit. Do we really want to keep this warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't make any harm and given the amount of changes it may be helpful to understand future problems (hopefully there won't but...)

<artifactId>log4j-core</artifactId>
</dependency>
<dependency>
<groupId>jakarta.validation</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this jakarta validation being used?
Is it really needed? If so, move it to the dependency manager, and create a "jakarta.version" property

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to be used anywhere so just deleted it

juanfeSanahuja
juanfeSanahuja previously approved these changes Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

7 participants