Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data without first putting it all in one (central) place.
The Syft ecosystem seeks to change this system, allowing you to write software which can compute over information you do not own on machines you do not have (total) control over. This not only includes servers in the cloud, but also personal desktops, laptops, mobile phones, websites, and edge devices. Wherever your data wants to live in your ownership, the Syft ecosystem exists to help keep it there while allowing it to be used privately for computation.
This repo contains multiple projects which work together, namely PySyft and PyGrid. PyGrid will be added soon, in the mean time this is the directory structure.
├── README.md <-- You are here 📌
├── grid <-- Coming to this Mono repo 🔜
└── syft <-- The Syft droids you are looking for 👋🏽
NOTE Changing the entire folder structure will likely result in some minor issues. If you spot one please let us know or open a PR.
PySyft is the centerpiece of the Syft ecosystem. It has two primary purposes. You can either use PySyft to perform two types of computation:
- 1.Dynamic: Directly compute over data you cannot see.
- 2.Static: Create static graphs of computation which can be deployed/scaled at a later date on different compute.
The PyGrid library serves as an API for the management and deployment of PySyft at scale. It also allows for you to extend PySyft for the purposes of Federated Learning on web, mobile, and edge devices using the following Syft worker libraries:
- PySyft (Python, you can use PySyft itself as one of these "FL worker libraries")
However, the Syft ecosystem only focuses on consistent object serialization/deserialization, core abstractions, and algorithm design/execution across these languages. These libraries alone will not connect you with data in the real world. The Syft ecosystem is supported by the Grid ecosystem, which focuses on the deployment, scalability, and other additional concerns around running real-world systems to compute over and process data (such as data compliance web applications).
- PySyft is the library that defines objects, abstractions, and algorithms.
PySyft has also been explained in videos on YouTube:
PySyft is available on PyPI and Conda.
$ conda create -n pysyft python=3.9
$ conda activate pysyft
$ conda install jupyter notebook
We support Linux, MacOS and Windows and the following Python and Torch versions. Older versions may work, however we have stopped testing and supporting them.
$ pip install syft
Coming soon! Until then, please view the Examples below.
These tutorials cover a variety of Python libraries for data science and machine learning.
All the examples can be played with by launching a Jupyter Notebook and navigating to the
$ jupyter notebook
Duet is a peer-to-peer tool within PySyft that provides a research-friendly API for a Data Owner to privately expose their data, while a Data Scientist can access or manipulate the data on the owner's side through a zero-knowledge access control mechanism. It's designed to lower the barrier between research and privacy-preserving mechanisms, so that scientific progress can be made on data that is currently inaccessible or tightly controlled. The main benefit of using Duet is that allows you to get started using PySyft, without needing to manage a full PyGrid deployment. It is the simplest path to using Syft, without needing to install anything (except Syft 😉).
This software is in beta. Use at your own risk.
We are very grateful for contributions to PySyft from the following organizations!