Code API (.code)

Estimated reading time: 10’

This guide’s objective is to help you get familiarized with PySyft’s Code API. You will learn to:

  • write a Syft Function

  • test a Syft Function

What is a Syft Function?

A Syft function is any Python function decorated with the @sy.syft_function decorator. This means you can seamlessly write arbitrary Python code and use it directly in PySyft.

The decorator creates a function object which is recognized by PySyft and allows you to create code requests for remote execution.

Here is a quick example on how a Syft function would be created and used.

Hide code cell source
import syft as sy
import pandas as pd

node = sy.orchestra.launch(name="demo_datasite", port="auto", dev_mode=False, reset=True)

admin_client = sy.login(
    url='localhost',
    port=node.port,
    email="[email protected]",
    password="changethis",
)

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})
mock_df = pd.DataFrame({'A': [1, 2, 1], 'B': [20, 10, 20]})

main_contributor = sy.Contributor(
    name='John Doe',
    role='Uploader',
    email='[email protected]'
)

asset = sy.Asset(
    name='demo_asset',
    data=df,
    mock=mock_df,
    contributors=[main_contributor]
)

dataset = sy.Dataset(
    name='Demo Dataset',
    description='Demo Dataset',
    asset_list=[asset],
    contributors=[main_contributor]
)

admin_client.upload_dataset(dataset)
admin_client.settings.allow_guest_signup(enable=True)
Hide code cell output
Starting demo_datasite server on 0.0.0.0:55035
Waiting for server to start Done.
SyftInfo:
You have launched a development server at http://0.0.0.0:55035.It is intended only for local use.

Logged into <demo_datasite: High side Datasite> as <[email protected]>
SyftWarning:
You are using a default password. Please change the password using `[your_client].account.set_password([new_password])`.

Uploading:   0%|          | 0/1 [00:00<?, ?it/s]
Uploading: demo_asset:   0%|          | 0/1 [00:00<?, ?it/s]
Uploading: demo_asset: 100%|██████████| 1/1 [00:00<00:00, 10.00it/s]

SyftSuccess:
Registration feature successfully enabled

First, a Data Scientist would connect to the domain and explore available datasets. (See the Users API for more details on user accounts on Datasites.)

import syft as sy

# connect to the Datasite
datasite = sy.login_as_guest(url='localhost', port=node.port).register(
    email='[email protected]',
    name='Data Scientist',
    password='123',
    password_verify='123'
)
Logged into <demo_datasite: High-side Datasite> as GUEST
ds_client = sy.login(
    url='localhost', port=node.port,
    email='[email protected]',
    password='123'
)

ds_client.datasets
Logged into <demo_datasite: High side Datasite> as <[email protected]>

Dataset Dicttuple

Total: 0

ds_client.datasets[0].assets[0]

demo_asset

Asset ID: 0cd690ed046d4e63a4550434b5ce4df3

Action Object ID: 1090cb335ecb408fba03cafc05157d57

Uploaded by: John Doe ([email protected])

Created on: 2024-08-02 13:01:08

Data:

You do not have permission to access private data.

Mock Data:

A B
Loading... (need help?)

Once the mock dataset is inspected, the Data Scientist can prototype a Python function for analysis (using the mock dataset).

def example_function(private_dataset):
    return private_dataset.sum()

example_function(ds_client.datasets[0].assets[0].mock)
A     4
B    50
dtype: int64

Then, the Data Scientist can convert the Python function into a Syft function (with the @sy.syft_function decorator) and submit a code request to the Data Owner’s Datasite.

import syft as sy
from syft.service.policy.policy import ExactMatch, SingleExecutionExactOutput

@sy.syft_function(
    input_policy=ExactMatch(data=ds_client.datasets[0].assets[0]),
    output_policy=SingleExecutionExactOutput()
)
def example_function(private_dataset):
    return private_dataset.sum()

ds_client.code.request_code_execution(example_function)
SyftSuccess:
Syft function 'example_function' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

Request

Id: ea3e41ab9ac548cd886e297fe235c090

Request time: 2024-08-02 13:01:10

Status: RequestStatus.PENDING

Requested on: Demo_datasite of type Datasite

Requested by: Data Scientist ([email protected])

Changes: Request to change example_function (Pool Id: default-pool) to permission RequestStatus.APPROVED. No nested requests.

In the example above example_function becomes a Syft function which can be executed remotely on the private dataset once it is approved.

I/O Policies

Input and Output Policies are rules that define what data can go IN and what data can come OUT of a code request. They are mainly used for ensuring that code submissions are properly paired with the datasets they are intended for.

Input policies deal with questins like:

What datasets or assets can your code be run on?

The input policy ensures that the code will run only on the specified assets (passed as arguments to the function). This means that an approved code request can’t run on any other asset.

Output policies deal with questions like:

How many times can your code be run?

Output policies are used to maintain states between executions. They are useful to imposing limits such as allowing an execution of a code only for a number of times. This gives the data owner control over how many times a code request can be executed and what the output structure looks like.

You can read more about IO Policies in the Syft Policies Guide.

Writing a Syft Function

Since a Syft function is an object designed to work in a remote workflow, there are some aspects you need to take into consideration when writing one.

Function Body

The function’s body shouldn’t contain any references to objects from outside the function’s scope. This includes any:

  • objects

  • functions

  • classes

  • modules

Writing Syft Functions

A general rule of thumb is that a Syft function should be self-contained.

Here are some examples:

🚫 Don’t use variables from outside the function’s scope.

CONST = 10

@sy.syft_function()
def example():
    return CONST * 2
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

Do define every used variable inside the function.

@sy.syft_function()
def example():
    CONST = 10
    return CONST * 2
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

🚫 Don’t use functions from outside the function’s scope.

def helper(x):
    return x ** 2

@sy.syft_function()
def example():
    return helper(10)
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

Do define helper functions inside the Syft function.

@sy.syft_function()
def example():
    def helper(x):
        return x ** 2
    return helper(10)
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

🚫 Don’t use modules imported outside the function’s scope.

import numpy as np

@sy.syft_function()
def example():
    return np.sum([1, 2, 3])
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

Do import used modules inside the Syft function.

@sy.syft_function()
def example():
    import numpy as np
    return np.sum([1, 2, 3])
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

Allowed Return Types

PySyft has a custom implementation for serializing objects, so only those types are allowed as return types from Syft functions.

Here is a complete list of objects that can be serialized by PySyft:

  • Python primitives (including collections)

  • pandas.DataFrame

  • pandas.Series

  • pandas.Timestamp

  • numpy.ndarray

  • numpy numeric types

  • datetime.date

  • datetime.time

  • datetime.datetime

  • result.Ok

  • result.Err

  • result.Result

  • pymongo.collection.Collection

  • io.BytesIO

  • inspect.Signature

  • inspect.Parameter

The serialization process is recursive, so any combination of datatypes mentioned above will work (e.g. a dictionary containing numpy arrays).

Using other data types as return values for a Syft function will likely cause an error.

However, if you need to use a datatype not serializable by PySyft, you can convert it into a supported data type as a workaround. For example, you can convert an image containing a plot in a binary buffer before returning the value from the function:

@sy.syft_function()
def example():
    from io import BytesIO
    import matplotlib.pyplot as plt
    import numpy as np

    x = np.arange(10)
    y = np.sin(x)

    plt.plot(x, y)
    
    figfile = BytesIO()
    plt.savefig(figfile, format='png')
    return figfile
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

from io import BytesIO

b = BytesIO()

type(b)
_io.BytesIO

Test a Function

To increase the likelyhood that a code request gets approved, it’s important to test your functions before creating code requests. You can do this both locally and remotely.

Local Testing

To test a function locally, simply run your experiment on the mock data, without creating a Syft function.

def example(data):
    return data.sum()

mock_data = ds_client.datasets[0].assets[0].mock

example(mock_data)
A     4
B    50
dtype: int64

If everything looks allright, create a Syft function out of it and test it server-side.

Testing in an Emulated Server

You can test a Syft function in an “ephemeral” server that emulates a Data Owner’s server (using only mock data) simply by creating a Syft function and invoking it.

Asset restrictions

When testing a function server-side, you need to pass the whole asset. PySyft will automatically select the mock data for invoking the underlying function.

@sy.syft_function(
    input_policy=ExactMatch(data=ds_client.datasets[0].assets[0]),
    output_policy=SingleExecutionExactOutput()
)
def example(data):
    return data.sum()

data = ds_client.datasets[0].assets[0]

example(data=data)
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

SyftInfo: Closing the server after time_alive=300 (the default value)
SyftInfo:
You have launched a development server at http://0.0.0.0:None.It is intended only for local use.

Logged into <ephemeral_server_example_8641: High side Datasite> as <[email protected]>
SyftWarning:
You are using a default password. Please change the password using `[your_client].account.set_password([new_password])`.

Approving request on change example for datasite ephemeral_server_example_8641
SyftInfo: Landing the ephmeral server...

Pointer

A 4 B 50 Name: 0, dtype: int64

Testing on a Remote Server

In some scenarios it makes more sense to test code directly on the Data Owner’s server (using only mock data before the request is approved). This approach might be useful, for example, when the nature of the experiment involves heavy computing. In such cases, the Data Owner may grant the Data Scientist access to a computing cluster to test their code on mock data.

Warning

For this to work, the Data Owner must enable mock execution for any external researcher using this feature.

# the Datasite admin must enable mock execution for a specific user

admin_client.users[1].allow_mock_execution()
SyftSuccess:
User details successfully updated.

# the data scientist can now test their code on the remote server

@sy.syft_function(
    input_policy=ExactMatch(data=ds_client.datasets[0].assets[0]),
    output_policy=SingleExecutionExactOutput()
)
def example(data):
    return data.sum()

ds_client.code.submit(example)
SyftSuccess:
Syft function 'example' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

SyftSuccess:
User Code Submitted

After submitting the Syft function, you can test it using client.code.FUNCTION(args)

ds_client.code

UserCode List

Total: 0

ds_client.code.example(data=mock_data)

Pointer

A 4 B 50 Name: 0, dtype: int64

Blocking vs Non-Blocking Execution

When submitting code requests for execution on the Data Owner’s server, it runs by default in a blocking manner. This means the Data Owner’s server won’t process anything else until that computation is done. This can impact performance when working with heavy computations or when many Data Scientis send requests at the same time.

To mitigate this issue, you can send non-blocking requests that are queued and execute only when the server has enough available resources. See the Jobs API for more details on how to work with non-blocking requests.

Nested Code Requests

For heavy computations, a single code execution environment might not be enough.

As a Data Owner, you can deploy PySyft in a cluster (read the Deployment Guides and the WorkerPool API for more details) and allow Data Scientists to use something like the MapReduce model for computations.

For example, this is how you could submit an aggregated computation in PySyft:

asset = ds_client.datasets[0].assets[0]
mock_data = ds_client

# setup processing functions
@sy.syft_function()
def process_batch(batch):
    return batch.to_numpy().sum()


@sy.syft_function()
def aggregate_job(job_results):
    return sum(job_results)

    
# Syft function with nested requests
@sy.syft_function_single_use(data=asset)
def process_all(datasite, data):
    import numpy as np
    
    job_results = []
    for batch in np.array_split(data, 2):
        batch_job = datasite.launch_job(process_batch, batch=batch)
        job_results += [batch_job.result]

    job = datasite.launch_job(aggregate_job, job_results=job_results)
    return job.result


# submit the processing functions so they are available on the Data Owner's server
ds_client.code.submit(process_batch)
ds_client.code.submit(aggregate_job)

# create a code request
ds_client.code.request_code_execution(process_all)
SyftSuccess:
Syft function 'process_batch' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

SyftSuccess:
Syft function 'aggregate_job' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

SyftSuccess:
Syft function 'process_all' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

Request

Id: aff2ef3c054a4a2ca747aadf143aa5b3

Request time: 2024-08-02 13:01:18

Status: RequestStatus.PENDING

Requested on: Demo_datasite of type Datasite

Requested by: Data Scientist ([email protected])

Changes: Request to change process_all (Pool Id: default-pool) to permission RequestStatus.APPROVED. Nested Requests not resolved.