Skip to content

FETCH hangs in the middle of classification. #41

@astrogewgaw

Description

@astrogewgaw

We are using FETCH as part of the transient search pipeline for the SPOTLIGHT project (a commensal survey for FRBs/pulsars at the GMRT). We are currently facing an issue where FETCH often hangs in the middle of classification. This happens even though:

  1. The number of candidates is not very large.
  2. The model is run on an NVIDIA A100, with 80 GB of GPU memory.

Unfortunately we have not been able to reliably reproduce the bug. Currently it seems to happen randomly, and does not seem to be triggered by a particular candidate. We verified the latter by rerunning FETCH on the same candidate, and it runs successfully. Any idea what could be causing the issue? I am using tensorflow v2.15.0.post1, and keras v2.15.0, since higher versions just do not work, with Python 3.10.14. I am aware that the bug will be difficult to solve since there is no reproducibility (as far as we can see), but I thought I will still open an issue so that we can discuss what could be the possible causes at the very least.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions