all options
trixie  ] [  forky  ] [  sid  ]
[ Source: borghash  ]

Package: python3-borghash (0.1.1-1)

Links for python3-borghash

Screenshot

Debian Resources:

Download Source Package borghash:

Maintainer:

External Resources:

Similar packages:

Memory-efficient hashtable implementations as a Python library implemented in Cython.

Memory-efficient hashtable implementations as a Python library implemented in Cython.

HashTable ---------

``HashTable`` is a fairly low-level implementation; usually one will want to use the ``HashTableNT`` wrapper. Read on for the basics...

Keys and Values ~~~~~~~~~~~~~~~

The keys MUST be perfectly random ``bytes`` of arbitrary but fixed length, like from a cryptographic hash (SHA-256, HMAC-SHA-256, ...). The implementation relies on this "perfectly random" property and does not implement its own hash function; it just takes 32 bits from the given key.

The values are ``bytes`` of arbitrary but fixed length.

The lengths of the keys and values are defined when creating a ``HashTable`` instance; thereafter, the lengths must always match the defined size.

Implementation details ~~~~~~~~~~~~~~~~~~~~~~

To have little memory overhead overall, the hashtable only stores ``uint32_t`` indices into separate keys and values arrays (short: kv arrays).

A new key is appended to the keys array. The corresponding value is appended to the values array. After that, the key and value do not change their index as long as they exist in the hashtable and the ht and kv arrays are in memory. Even when kv pairs are deleted from ``HashTable``, the kv arrays never shrink and the indices of other kv pairs don't change.

This is because we want to have stable array indices for the keys/values, so the indices can be used outside of ``HashTable`` as memory-efficient references.

Memory allocated ~~~~~~~~~~~~~~~~

For a hashtable load factor of 0.1 – 0.5, a kv array growth factor of 1.3, and N kv pairs, memory usage in bytes is approximately:

- Hashtable: from ``N * 4 / 0.5`` to ``N * 4 / 0.1`` - Keys/Values: from ``N * len(key + value) * 1.0`` to ``N * len(key + value) * 1.3`` - Overall: from ``N * (8 + len(key + value))`` to ``N * (40 + len(key + value) * 1.3)``

When the hashtable or the kv arrays are resized, there will be brief memory-usage spikes. For the kv arrays, ``realloc()`` is used to avoid copying data and to minimize memory-usage spikes, if possible.

HashTableNT -----------

``HashTableNT`` is a convenience wrapper around ``HashTable``:

- Accepts and returns ``namedtuple`` values. - Implements persistence: can read the hashtable from a file and write it to a file.

Keys and Values ~~~~~~~~~~~~~~~

Keys: ``bytes``, see ``HashTable``.

Values: any fixed ``namedtuple`` type that can be serialized to ``bytes`` by Python's ``struct`` module using a given format string.

When setting a value, it is automatically serialized. When a value is returned, it will be a ``namedtuple`` of the given type.

Persistence ~~~~~~~~~~~

``HashTableNT`` has ``.write()`` and ``.read()`` methods to save/load its contents to/from a file, using an efficient binary format.

When a ``HashTableNT`` is saved to disk, only the non-deleted entries are persisted. When it is loaded from disk, a new hashtable and new, dense kv arrays are built; thus, kv indices will be different!

API ---

HashTable / HashTableNT have an API similar to a dict:

- ``__setitem__`` / ``__getitem__`` / ``__delitem__`` / ``__contains__`` - ``get()``, ``pop()``, ``setdefault()`` - ``items()``, ``len()`` - ``read()``, ``write()``, ``size()``

Example code ------------

::

    # HashTableNT mapping 256-bit key [bytes] --> Chunk value [namedtuple]
    Chunk = namedtuple("Chunk", ["refcount", "size"])
    ChunkFormat = namedtuple("ChunkFormat", ["refcount", "size"])
    chunk_format = ChunkFormat(refcount="I", size="I")

    # 256-bit (32-byte) key, 2x 32-bit (4-byte) values
    ht = HashTableNT(key_size=32, value_type=Chunk, value_format=chunk_format)

    key = b"x" * 32  # the key is usually from a cryptographic hash function
    value = Chunk(refcount=1, size=42)
    ht[key] = value
    assert ht[key] == value

    for key, value in ht.items():
        assert isinstance(key, bytes)
        assert isinstance(value, Chunk)

    file = "dump.bin"  # giving an fd of a file opened in binary mode also works
    ht.write(file)
    ht = HashTableNT.read(file)

Building / Installing --------------------- ::

    python setup.py build_ext --inplace
    python -m build
    pip install dist/borghash*.tar.gz

. Want a demo? ------------

Run ``borghash-demo`` after installing the ``borghash`` package.

It will show you the demo code, run it, and print the results for your machine.

Results on an Apple MacBook Pro (M3 Pro CPU) look like:

::

    HashTableNT in-memory ops (count=50000): insert: 0.042s, lookup: 0.045s, pop: 0.043s.
    HashTableNT serialization (count=50000): write: 0.014s, read: 0.010s.

State of this project ---------------------

**API is still unstable and expected to change as development continues.**

**As long as the API is unstable, there will be no data migration tools, e.g., for reading an existing serialized hashtable.**

There might be missing features or optimization potential; feedback is welcome!

Other Packages Related to python3-borghash

  • depends
  • recommends
  • suggests
  • enhances

Download python3-borghash

Download for all available architectures
Architecture Package Size Installed Size Files
alpha (unofficial port) 253.5 kB2,607.0 kB [list of files]
amd64 267.8 kB2,349.0 kB [list of files]
arm64 242.8 kB2,349.0 kB [list of files]
armhf 251.0 kB2,207.0 kB [list of files]
hppa (unofficial port) 239.0 kB2,327.0 kB [list of files]
i386 251.0 kB2,351.0 kB [list of files]
loong64 258.7 kB2,541.0 kB [list of files]
m68k (unofficial port) 232.2 kB2,253.0 kB [list of files]
ppc64 (unofficial port) 253.3 kB2,613.0 kB [list of files]
ppc64el 257.0 kB2,605.0 kB [list of files]
riscv64 264.3 kB2,213.0 kB [list of files]
s390x 264.6 kB2,357.0 kB [list of files]
sh4 (unofficial port) 262.7 kB2,335.0 kB [list of files]
sparc64 (unofficial port) 230.6 kB5,819.0 kB [list of files]
x32 (unofficial port) 261.8 kB2,297.0 kB [list of files]