torch_geometric.data.Database

class Database(schema: ~typing.Union[~typing.Any, ~typing.Dict[str, ~typing.Any], ~typing.Tuple[~typing.Any], ~typing.List[~typing.Any]] = <class 'object'>)[source]

Bases: ABC

Base class for inserting and retrieving data from a database.

A database acts as a persisted, out-of-memory and index-based key/value store for tensor and custom data:

db = Database()
db[0] = Data(x=torch.randn(5, 16), y=0, z='id_0')
print(db[0])
>>> Data(x=[5, 16], y=0, z='id_0')

To improve efficiency, it is recommended to specify the underlying schema of the data:

db = Database(schema={  # Custom schema:
    # Tensor information can be specified through a dictionary:
    'x': dict(dtype=torch.float, size=(-1, 16)),
    'y': int,
    'z': str,
})
db[0] = dict(x=torch.randn(5, 16), y=0, z='id_0')
print(db[0])
>>> {'x': torch.tensor(...), 'y': 0, 'z': 'id_0'}

In addition, databases support batch-wise insert and get, and support syntactic sugar known from indexing lists, e.g.:

db = Database()
db[2:5] = torch.randn(3, 16)
print(db[torch.tensor([2, 3])])
>>> [torch.tensor(...), torch.tensor(...)]
Parameters:

schema (Any or Tuple[Any] or Dict[str, Any], optional) – The schema of the input data. Can take int, float, str, object, or a dictionary with dtype and size keys (for specifying tensor data) as input, and can be nested as a tuple or dictionary. Specifying the schema will improve efficiency, since by default the database will use python pickling for serializing and deserializing. (default: object)

connect() None[source]

Connects to the database. Databases will automatically connect on instantiation.

close() None[source]

Closes the connection to the database.

abstract insert(index: int, data: Any) None[source]

Inserts data at the specified index.

Parameters:
  • index (int) – The index at which to insert.

  • data (Any) – The object to insert.

multi_insert(indices: Union[Sequence[int], Tensor, slice, range], data_list: Sequence[Any], batch_size: Optional[int] = None, log: bool = False) None[source]

Inserts a chunk of data at the specified indices.

Parameters:
  • indices (List[int] or torch.Tensor or range) – The indices at which to insert.

  • data_list (List[Any]) – The objects to insert.

  • batch_size (int, optional) – If specified, will insert the data to the database in batches of size batch_size. (default: None)

  • log (bool, optional) – If set to True, will log progress to the console. (default: False)

abstract get(index: int) Any[source]

Gets data from the specified index.

Parameters:

index (int) – The index to query.

multi_get(indices: Union[Sequence[int], Tensor, slice, range], batch_size: Optional[int] = None) List[Any][source]

Gets a chunk of data from the specified indices.

Parameters:
  • indices (List[int] or torch.Tensor or range) – The indices to query.

  • batch_size (int, optional) – If specified, will request the data from the database in batches of size batch_size. (default: None)