Projects API (.projects
)#
Estimated reading time: 5’
What you’ll learn#
The objective of this guide is to explain how projects are used in collaboration between data owners and data scientists.
Introduction#
Remote data science enables researchers to answer questions using data that they cannot see.
This usually happens as part of a larger research project, and PySyft helps you model this with the Project
object.
Purpose#
The main purpose of a project is to establish, from the beginning, the purpose of the project, so that the data owner can better understand a data sicentist’s submission, their motivation for using the private data and make informed decisions regarding wether to approve or reject their incoming code requests. Moreover, a project can be viewed as an invitation to collaborate with the data owner in answering the data scientist’s question.
Definition#
In PySyft, projects are syft
objects that can contain one or more iterative code requests. Code Requests are syft
objects which hold the code (UserCode
) submitted by the data scientists to run on the private assets. To learn more about code requests, check out the Code API guide.
Let’s do a preliminary setup, to explore the functionalities offered by the Project
object.
Show code cell source
import syft as sy
import pandas as pd
# launching a test server
server = sy.orchestra.launch(
name="my_server",
port="auto",
dev_mode=False,
reset=True
)
do_client = sy.login(
email="[email protected]",
password="changethis",
port=server.port
)
dataset = sy.Dataset(
name="Dataset name",
description="**Placehoder Dataset description**",
asset_list=[sy.Asset(
name="asset_name",
data=[1,2,3], # real data
mock=[4,5,6], # mock data
)]
)
do_client.upload_dataset(dataset)
do_client.register(
email="[email protected]",
password="123",
password_verify="123",
name="Curious Scientist"
)
ds_client = sy.login(
email="[email protected]",
password="123",
port=server.port
)
# Let's login into the server, as a data owner and data scientist respectively
do_client = sy.login(
email="[email protected]",
password="changethis",
port=server.port
)
ds_client = sy.login(
email="[email protected]",
password="123",
port=server.port
)
Create a new project#
Creating a project requires the following attributes:
name
: The name of your project. This is an unique name, so you cannot re-submit a project with the same name as an exisiting project you previously created.description
: Brief overview of the project you aim to accomplish. If you have links that are relevant for the purpose of your project, include them here.members
: PySyft projects can be expanded to work with datasets from multiple Datasites. This field requires the data scientist to pass a list of clients, each authenticated with every server hosting data for this project.
# create a project as a data scientist
new_project = sy.Project(
name="Sum of private numbers",
description="I am researching the sum of the data points.",
members=[ds_client],
)
new_project
Submit a project#
When you are ready, you can submit this project to the Datasites you would like to make aware of this project. This is possible via:
new_project.send()
Note
Updating projects Once submitted, a project cannot be further updated. If you want to make changes to the scope of your project, you need to create a new project.
To check the submission was succesful, you can look into the client.projects
API:
project = ds_client.projects[0]
project
Add requests to a projects#
One or multiple requests can be created on a project. To create a new request, you can use the project.create_code_request(syft_function)
method. This method accepts only Syft Functions, as exemplified below. More details on Syft Functions are available in the Code API guide.
@sy.syft_function_single_use(data=ds_client.datasets[0].assets[0])
def sum_of_numbers(data):
return sum(data)
project.create_code_request(sum_of_numbers)
Inspect existing projects#
All users have access to the client.projects
API. However, they have different permissions based on their roles:
ADMIN
andDATA_OWNER
users can see all projects submitted on the serverDATA_SCIENTIST
users can only see their own projects
do_client.projects