Projects API (.projects)#

Estimated reading time: 5’

What you’ll learn#

The objective of this guide is to explain how projects are used in collaboration between data owners and data scientists.

Introduction#

Remote data science enables researchers to answer questions using data that they cannot see.

This usually happens as part of a larger research project, and PySyft helps you model this with the Project object.

Purpose#

The main purpose of a project is to establish, from the beginning, the purpose of the project, so that the data owner can better understand a data sicentist’s submission, their motivation for using the private data and make informed decisions regarding wether to approve or reject their incoming code requests. Moreover, a project can be viewed as an invitation to collaborate with the data owner in answering the data scientist’s question.

Definition#

In PySyft, projects are syft objects that can contain one or more iterative code requests. Code Requests are syft objects which hold the code (UserCode) submitted by the data scientists to run on the private assets. To learn more about code requests, check out the Code API guide.


Projects Overview

Let’s do a preliminary setup, to explore the functionalities offered by the Project object.

Hide code cell source
import syft as sy
import pandas as pd

# launching a test server
server = sy.orchestra.launch(
    name="my_server",
    port="auto",
    dev_mode=False,
    reset=True
)

do_client = sy.login(
    email="[email protected]",
    password="changethis",
    port=server.port
)

dataset = sy.Dataset(
    name="Dataset name",
    description="**Placehoder Dataset description**",
    asset_list=[sy.Asset(
        name="asset_name",
        data=[1,2,3], # real data
        mock=[4,5,6], # mock data
    )]
)

do_client.upload_dataset(dataset)

do_client.register(
    email="[email protected]",
    password="123",
    password_verify="123",
    name="Curious Scientist"
)

ds_client = sy.login(
    email="[email protected]",
    password="123",
    port=server.port
)
# Let's login into the server, as a data owner and data scientist respectively

do_client = sy.login(
    email="[email protected]",
    password="changethis",
    port=server.port
)

ds_client = sy.login(
    email="[email protected]",
    password="123",
    port=server.port
)

Create a new project#

Creating a project requires the following attributes:

  • name: The name of your project. This is an unique name, so you cannot re-submit a project with the same name as an exisiting project you previously created.

  • description: Brief overview of the project you aim to accomplish. If you have links that are relevant for the purpose of your project, include them here.

  • members: PySyft projects can be expanded to work with datasets from multiple Datasites. This field requires the data scientist to pass a list of clients, each authenticated with every server hosting data for this project.

# create a project as a data scientist

new_project = sy.Project(
    name="Sum of private numbers",
    description="I am researching the sum of the data points.",
    members=[ds_client],
)
new_project

Submit a project#

When you are ready, you can submit this project to the Datasites you would like to make aware of this project. This is possible via:

new_project.send()

Note

Updating projects Once submitted, a project cannot be further updated. If you want to make changes to the scope of your project, you need to create a new project.

To check the submission was succesful, you can look into the client.projects API:

project = ds_client.projects[0]
project

Add requests to a projects#

One or multiple requests can be created on a project. To create a new request, you can use the project.create_code_request(syft_function) method. This method accepts only Syft Functions, as exemplified below. More details on Syft Functions are available in the Code API guide.

@sy.syft_function_single_use(data=ds_client.datasets[0].assets[0])
def sum_of_numbers(data):
    return sum(data)

project.create_code_request(sum_of_numbers)

Inspect existing projects#

All users have access to the client.projects API. However, they have different permissions based on their roles:

  • ADMIN and DATA_OWNER users can see all projects submitted on the server

  • DATA_SCIENTIST users can only see their own projects

do_client.projects