torch_geometric.data.Database
- class Database(schema: ~typing.Union[~typing.Any, ~typing.Dict[str, ~typing.Any], ~typing.Tuple[~typing.Any], ~typing.List[~typing.Any]] = <class 'object'>)[source]
Bases:
ABC
Base class for inserting and retrieving data from a database.
A database acts as a persisted, out-of-memory and index-based key/value store for tensor and custom data:
db = Database() db[0] = Data(x=torch.randn(5, 16), y=0, z='id_0') print(db[0]) >>> Data(x=[5, 16], y=0, z='id_0')
To improve efficiency, it is recommended to specify the underlying
schema
of the data:db = Database(schema={ # Custom schema: # Tensor information can be specified through a dictionary: 'x': dict(dtype=torch.float, size=(-1, 16)), 'y': int, 'z': str, }) db[0] = dict(x=torch.randn(5, 16), y=0, z='id_0') print(db[0]) >>> {'x': torch.tensor(...), 'y': 0, 'z': 'id_0'}
In addition, databases support batch-wise insert and get, and support syntactic sugar known from indexing Python lists, e.g.:
db = Database() db[2:5] = torch.randn(3, 16) print(db[torch.tensor([2, 3])]) >>> [torch.tensor(...), torch.tensor(...)]
- Parameters:
schema (Any or Tuple[Any] or Dict[str, Any], optional) – The schema of the input data. Can take
int
,float
,str
,object
, or a dictionary withdtype
andsize
keys (for specifying tensor data) as input, and can be nested as a tuple or dictionary. Specifying the schema will improve efficiency, since by default the database will use python pickling for serializing and deserializing. (default:object
)
- connect() None [source]
Connects to the database. Databases will automatically connect on instantiation.
- abstract insert(index: int, data: Any) None [source]
Inserts data at the specified index.
- Parameters:
index (int) – The index at which to insert.
data (Any) – The object to insert.
- multi_insert(indices: Union[Sequence[int], Tensor, slice, range], data_list: Sequence[Any], batch_size: Optional[int] = None, log: bool = False) None [source]
Inserts a chunk of data at the specified indices.
- Parameters:
indices (List[int] or torch.Tensor or range) – The indices at which to insert.
data_list (List[Any]) – The objects to insert.
batch_size (int, optional) – If specified, will insert the data to the database in batches of size
batch_size
. (default:None
)log (bool, optional) – If set to
True
, will log progress to the console. (default:False
)
- abstract get(index: int) Any [source]
Gets data from the specified index.
- Parameters:
index (int) – The index to query.
- multi_get(indices: Union[Sequence[int], Tensor, slice, range], batch_size: Optional[int] = None) List[Any] [source]
Gets a chunk of data from the specified indices.
- Parameters:
indices (List[int] or torch.Tensor or range) – The indices to query.
batch_size (int, optional) – If specified, will request the data from the database in batches of size
batch_size
. (default:None
)