Part 4: Review Code Request

$Code Review Workflow$

By now, we have seen how Rachel could submit her research project through PySyft, which is now waiting for review by Owen, the data owner.

What you will learn

By the end of part 4 you will learn:

How to access incoming projects requests;
How to review user code;
How to approve a code request.

4.1. Review Incoming Requests

As always, the first step will be to login to the Datasite. This time, we will login using Owen’s credentials as a data owner.

import syft as sy

data_site = sy.orchestra.launch(name="cancer-research-centre")

client = data_site.login(email="[email protected]", password="cancer_research_syft_admin")

Then, we can get access to existing projects through our client instance:

client.projects

Project List

Total: 0

As expected, the Datasite currently includes a request from Rachel for her “Breast Cancer ML Project”. Looking at the description, Owen can get a general understanding of what to expect in the incoming code request.

Let’s get access to the request, to be further inspected. Existing requests can be accessed by index:

request = client.requests[0]
request

Request

Id: 500366fbe5eb48abb8670eed8242d1af

Request time: 2024-07-25 10:41:44

Status: RequestStatus.PENDING

Requested on: Cancer-research-centre of type Datasite

Requested by: Dr. Rachel Science ([email protected]) Institution: Data Science Institute

Changes: Request to change ml_experiment_on_breast_cancer_data (Pool Id: default-pool) to permission RequestStatus.APPROVED. No nested requests.

Starting fromt the request object, we can immediately get a reference to the code associated to it. This code corresponds to the code submitted by the data scientist, and attached to the original project.

Before proceeding to test the code execution, the data owner can review the code, and double check that the expectations set in the project description are met:

request.code

UserCode

id: UID = 0fd60c6f0e9d46ae80bdf8a23e3a39a9

service_func_name: str = ml_experiment_on_breast_cancer_data

shareholders: list = ['cancer-research-centre']

status: list = ['Server: cancer-research-centre, Status: pending']

inputs: dict =

{
  "assets": {
    "features_data": {
      "action_id": "7ce00a4444414d8db52e61c2e8c4393f",
      "source_asset": "Breast Cancer Data: Features",
      "source_dataset": "Breast Cancer Biomarker",
      "source_server": "151a31aa3965414dbbf9f27d4e4108aa"
    },
    "labels": {
      "action_id": "96cf1a9e9fd64af6ad843b7e40b6fe89",
      "source_asset": "Breast Cancer Data: Targets",
      "source_dataset": "Breast Cancer Biomarker",
      "source_server": "151a31aa3965414dbbf9f27d4e4108aa"
    }
  }
}

code:

def ml_experiment_on_breast_cancer_data(features_data, labels, seed: int = 12345) -> tuple[float, float]:
    # include the necessary imports in the main body of the function
    # to prepare for what PySyft would expect for submitted code.
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import accuracy_score

    X, y = features_data, labels.values.ravel()
    # 1. Data Partition
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=seed, stratify=y)
    # 2. Data normalisation
    scaler = StandardScaler()
    scaler.fit(X_train, y_train)
    X_train = scaler.transform(X_train)
    X_test = scaler.transform(X_test)
    # 3. Model training
    model = LogisticRegression().fit(X_train, y_train)
    # 4. Metrics Calculation
    acc_train = accuracy_score(y_train, model.predict(X_train))
    acc_test = accuracy_score(y_test, model.predict(X_test))

    return acc_train, acc_test

Warning

Privacy & Security Code Review

Code reviews must go through a security and privacy assessment before approval. To ensure the privacy is respected, the data owner must check the code abides to their own data release rules. To ensure security, it is good practice to keep an eye for malicious code requests. Here are a few security code review links which you might find useful:

4.2. Execute Code

After having reviewed the code, the next step for Owen would be to execute the code on both the mock and the real data of the assets specified in the submitted code. After reviewing Rachel’s code, we can see that the function expects both features and labels assets, as available in the “Breast Cancer Biomarker” dataset.

First, let’s get the reference to the specific syft function:

syft_function = request.code

Then, let’s get access to the required assets:

bc_dataset = client.datasets["Breast Cancer Biomarker"]
features, labels = bc_dataset.assets

At this point, the data owner can first run the syft_function on features.mock and targets.mock, and then repeating the same for features.data and labels.data:

result_mock_data = syft_function.run(features_data=features.mock, labels=labels.mock)
result_mock_data

SyftWarning:

This code was submitted by a User and could be UNSAFE.

(0.6737089201877934, 0.5874125874125874)

Execution on mock data

Results on mock data are exactly the same as the one we originally obtained with Rachel’s code in part 3. This is due by the fact that Rachel code is indeed using random seed properly, for reproducible results.

Checked that code runs on the mock data, we can test the code on the real data, and gather the results Rachel is waiting for:

result_real_data = syft_function.run(features_data=features.data, labels=labels.data)
result_real_data

SyftWarning:

This code was submitted by a User and could be UNSAFE.

(0.9859154929577465, 0.972027972027972)

4.3. Approve the request

Now that we have reviewed, checked, and tested Rachel’s function on the selected assets, and we also gathered the result on the real non-public data, Owen can proceed to approve the code request:

request.approve()

Approving request on change ml_experiment_on_breast_cancer_data for datasite cancer-research-centre

SyftSuccess:

Request 500366fbe5eb48abb8670eed8242d1af changes applied

We can verify that the request has been approved, by looking at the status of current available requests:

client.requests

Request List

Total: 0

As expected, the status of Rachel’s request is now Approved.

Congrats on completing Part 4 🎉

Congratulations for completing part 4 of the tutorial.

In this part we have explored the entire workflow of request review, and approval in PySyft. In particular, after receiving a new request from Rachel, Owen could read the purpose of her study, review the submitted code, and checked that all the expectations and data access criteria were met. Finally, after having executed the code on both mock and real data, the request was approved.

In the last part, part 5, we will see how it will be possible for Rachel to retrieve the result she is waiting for, after receiving her request approved.

Part 4: Review Code Request

On This Page

Part 4: Review Code Request

What you will learn

4.1. Review Incoming Requests

Total: 0

Request

UserCode

4.2. Execute Code

4.3. Approve the request

Total: 0

Congrats on completing Part 4 🎉