Grid Volunteer Computing For A.I.

Difference between revisions from 2024/01/26 20:52 and 2024/01/26 20:38.
!!Distributed Computing, Grid Computing
{br}
# Berkeley Open Infrastructure for Network Computing - BOINC
# Internet Archive, they maintain a library for everyone to access for free. https://warrior.archiveteam.org/
# European Grid Infrastructure (EGI) - a series of projects funded by the European Commission.
# OurGrid - https://ourgrid.org/
# Petals, Run large language models at home, BitTorrent‑style  - https://petals.dev/, https://huggingface.co/
{br}
!!Distributed storage network (DSN)
{br}
Filesize is 10mb, 10 computers hold 1mb each. Like RAID striping over a wide area network. Running commands over the data rather than downloading the data such as a search request, SQL, regex, awk or to train an LLM.
IPFS (InterPlanetary File System): Employs a content-addressed system with erasure coding to distribute files efficiently across a peer-to-peer network. 
{br}
# TensorFlow: TensorFlow provides the tf.distribute API for distributed training. You can choose strategies like MirroredStrategy or MultiWorkerStrategy depending on your setup (single machine with multiple GPUs or multiple machines).
# PyTorch: PyTorch offers the torch.distributed package for distributed training across GPUs and machines. Choose between MPI, GLOO, or NCCL backends based on communication needs.
# Horovod: This open-source library simplifies distributed training across diverse platforms (CPUs, GPUs, clusters) and works with TensorFlow, PyTorch, and MXNet.
# Ray Tune: This tool offers distributed Hyperparameter Tuning, which can optimize your LLM training by running several trials on separate machines.
{br}
Data parallelization: Divide your training data across machines to distribute the workload and speed up training.
Model parallelization: For even larger models, consider splitting the model itself across machines for parallel processing.

{br}
# https://github.com/search?q=+distributed+storage+system&type=repositories&s=stars&o=desc&p=2
# https://github.com/horovod/horovod
  

 📜 ⏱️  ⬆️