Added support for Top Alleles, added more report options#18
Added support for Top Alleles, added more report options#18mjung2019 wants to merge 5 commits intoIllumina:developfrom
Conversation
mjung2019
commented
Jun 13, 2019
- added support for Top Alleles
- added more options for gtc_final_report.py example
- added new example script gtc_final_report_matrix.py which will create matrix reports depending on the Allele type and optional genotype scores. The format is identical to the matrix export option from GenomeStudio.
…final_report script and added a new example script to support building matrix reports`x
module/BeadPoolManifest.py
Outdated
| Unknown = 0 | ||
| TOP = 1 | ||
| BOT = 2 | ||
| PLUS = 1 |
There was a problem hiding this comment.
These should have different values, otherwise to_string behavior below will be undesirable. If necessary, could generalize get_base_calls generic to take a list of report strands (instead of a single value)
There was a problem hiding this comment.
Changed attribute assignments, can now be handled by get_base_calls_generic, see below.
module/BeadPoolManifest.py
Outdated
| Raises: | ||
| ValueError: Unexpected value for Illumina strand | ||
| """ | ||
| if ilmn_strand == "U" or ilmn_strand == "": |
There was a problem hiding this comment.
Shouldn't empty string also raise a ValueError?
There was a problem hiding this comment.
Right, empty string is now an exception.
Question though:
for class RefStrand it states:
if ref_strand == "U" or ref_strand == "":
return RefStrand.Unknown
Is that the desired behavior for empty string?
module/GenotypeCalls.py
Outdated
| The genotype basecalls on the report strand as a list of strings. | ||
| The characters are A, C, G, T, or - for a no-call/null. | ||
| """ | ||
| return self.get_base_calls_generic(snps, ilmn_strand_annotations, IlmnStrand.TOP, RefStrand.Unknown) |
There was a problem hiding this comment.
e.g., here could potentially pass
return self.get_base_calls_generic(snps, ilmn_strand_annotations, [IlmnStrand.TOP, IlmnStrand.PLUS], IlmnStrand.Unknown)
There was a problem hiding this comment.
Also, should be passing IlmnStrand.Unknown here instead of RefStrand.Unknown. (get_base_calls_forward strand also passes RefStrand.Unknown, but that is probably not the correct behavior there either)
There was a problem hiding this comment.
Added list support for get_base_calls_generic. That should solve the ambiguity. The list type for the method is optional in order to leave the other methods untouched. Within the method converting non list types to list before iterating through every allele as list type checks in the inner loop might be something of a performance drain compared to looping through only one element.
Changed to IlmnStrand.Unknown.
|
|
||
| try: | ||
| manifest = BeadPoolManifest(args.manifest) | ||
| manifest = BeadPoolManifest.BeadPoolManifest(args.manifest) |
There was a problem hiding this comment.
Does the reference to this import need to change?
There was a problem hiding this comment.
Reference should be fine, checked with running gtf_final_report_matrix.py. There is some ambiguity as the module name and class name are the same.
|
|
||
| args = parser.parse_args() | ||
|
|
||
| if len(sys.argv) != NUM_ARGUMENTS: |
There was a problem hiding this comment.
The argument parser should be able to handle this error?
There was a problem hiding this comment.
In this case yes as only one matrix report is supported per run, also didn't make sense to me to add a default case.
examples/gtc_final_report_matrix.py
Outdated
| parser.add_argument("manifest", help="BPM manifest file") | ||
| parser.add_argument("gtc_directory", help="Directory containing GTC files") | ||
| parser.add_argument("output_file", help="Location to write report") | ||
| parser.add_argument("--forward", help="python gtc_final_report_matrix.py <path_to_manifest> <path_to_gtc_directory> <path_to_output_file> --forward 1, print matrix with forward alleles") |
There was a problem hiding this comment.
Should there be an enumerated format option here?
There was a problem hiding this comment.
I don't see any harm in explicitly spelling out the arguments in the parser as long as we are not expecting many more cases. But if you like I can use an enumerator for the options. I would think the trade-off in better readability not that big though.
examples/gtc_final_report_matrix.py
Outdated
| parser.add_argument("manifest", help="BPM manifest file") | ||
| parser.add_argument("gtc_directory", help="Directory containing GTC files") | ||
| parser.add_argument("output_file", help="Location to write report") | ||
| parser.add_argument("--forward", help="python gtc_final_report_matrix.py <path_to_manifest> <path_to_gtc_directory> <path_to_output_file> --forward 1, print matrix with forward alleles") |
There was a problem hiding this comment.
Is there a reason you have to pass a "1" instead of using something like action='store_true' which would make something like args.forward eval to True? Also, why does the "help" have the entire command written, wouldn't that be redundant?
There was a problem hiding this comment.
Not really, you can pass action="store_true", default=False to the argument and then you are fine. Ok, I can make this change.
There was a problem hiding this comment.
Also, concerning should we write the entire command. Normally I wouldn't bother but I got so many requests how to run commands for folks not familiar with the console, so doesn't hurt to be too obvious.
There was a problem hiding this comment.
Ok, added the changes with the latest commit
…into an official release
…into an official release