Skip to content

How to preprocess data #4

@bstee615

Description

@bstee615

Hi, I am trying to run your project because I would love to use it in my research work. I'm working from branch new_implementation (e93a3b6) since that seems to be the most stable and I wasn't able to run the master branch. I was able to run through the training code with the provided sample data. Thank you for making your project available and for providing preprocessed data.

I want to train the model on a new dataset. Unfortunately, I can't figure out how to preprocess raw code to produce a form like IVDetect/data/graph_data/devign_sample_output.json. Can you give me detailed instructions for how to preprocess data from raw source code, running through Joern, and into JSON so I can faithfully reproduce your work? This is how far I got:

  • I assume that the keys CPG, CDG, and DDG are the direct output of Joern, from running joern-export --repr XXX --out outdir with --repr as cpg14, cdg, and ddg respectively. Is this correct?
  • How is the key No2St generated? As far as I can tell, this maps nodes to statements, but I don't know what the indexes mean.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions