torch_geometric.nn.models.MoleculeGPT

class MoleculeGPT(llm: LLM, graph_encoder: Module, smiles_encoder: Module, mlp_out_channels: int = 32, max_tokens: Optional[int] = 20)[source]

Bases: Module

The MoleculeGPT model from the “MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction” paper.

Parameters:
  • llm (LLM) – The LLM to use.

  • graph_encoder (torch.nn.Module) – Encode 2D molecule graph.

  • smiles_encoder (torch.nn.Module) – Encode 1D SMILES.

  • mlp_out_channels (int, optional) – The size of each embedding after qformer encoding. (default: 32)

  • max_tokens (int, optional) – Max output tokens of 1D/2D encoder. (default: 20)

Warning

This module has been tested with the following HuggingFace models

  • llm_to_use="lmsys/vicuna-7b-v1.5"

and may not work with other models. See other models at HuggingFace Models and let us know if you encounter any issues.

Note

For an example of using MoleculeGPT, see examples/llm/molecule_gpt.py.

forward(x: Tensor, edge_index: Tensor, batch: Tensor, edge_attr: Optional[Tensor], smiles: List[str], instructions: List[str], label: List[str], additional_text_context: Optional[List[str]] = None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.