Part 4: Review Code Request#


Code Review Workflow

By now, we have seen how Rachel could submit her research project through PySyft, which is now waiting for review by Owen, the data owner.

What you will learn#

By the end of part 4 you will learn:

  • How to access incoming projects requests;

  • How to review user code;

  • How to approve a code request.

|:data_scientist:| 4.1. Review Incoming Requests#


Review Incoming Requests

As always, the first step will be to login to the Datasite. This time, we will login using Owen’s credentials as a data owner.

import syft as sy

data_site = sy.orchestra.launch(name="cancer-research-centre")

client = data_site.login(email="[email protected]", password="cancer_research_syft_admin")

Then, we can get access to existing projects through our client instance:

client.projects

As expected, the Datasite currently includes a request from Rachel for her “Breast Cancer ML Project”. Looking at the description, Owen can get a general understanding of what to expect in the incoming code request.

Let’s get access to the request, to be further inspected. Existing requests can be accessed by index:

request = client.requests[0]
request

Starting fromt the request object, we can immediately get a reference to the code associated to it. This code corresponds to the code submitted by the data scientist, and attached to the original project.

Before proceeding to test the code execution, the data owner can review the code, and double check that the expectations set in the project description are met:

request.code

Warning

Privacy & Security Code Review

Code reviews must go through a security and privacy assessment before approval. To ensure the privacy is respected, the data owner must check the code abides to their own data release rules. To ensure security, it is good practice to keep an eye for malicious code requests. Here are a few security code review links which you might find useful:

|:data_scientist:| 4.2. Execute Code#

Execute Code

After having reviewed the code, the next step for Owen would be to execute the code on both the mock and the real data of the assets specified in the submitted code. After reviewing Rachel’s code, we can see that the function expects both features and labels assets, as available in the “Breast Cancer Biomarker” dataset.

First, let’s get the reference to the specific syft function:

syft_function = request.code

Then, let’s get access to the required assets:

bc_dataset = client.datasets["Breast Cancer Biomarker"]
features, labels = bc_dataset.assets

At this point, the data owner can first run the syft_function on features.mock and targets.mock, and then repeating the same for features.data and labels.data:

result_mock_data = syft_function.run(features_data=features.mock, labels=labels.mock)
result_mock_data

Execution on mock data

Results on mock data are exactly the same as the one we originally obtained with Rachel’s code in part 3. This is due by the fact that Rachel code is indeed using random seed properly, for reproducible results.

Checked that code runs on the mock data, we can test the code on the real data, and gather the results Rachel is waiting for:

result_real_data = syft_function.run(features_data=features.data, labels=labels.data)
result_real_data

|:data_scientist:| 4.3. Approve the request#


Approve request

Now that we have reviewed, checked, and tested Rachel’s function on the selected assets, and we also gathered the result on the real non-public data, Owen can proceed to approve the code request:

request.approve()

We can verify that the request has been approved, by looking at the status of current available requests:

client.requests

As expected, the status of Rachel’s request is now Approved.

Congrats on completing Part 4 🎉#

Congratulations for completing part 4 of the tutorial.

In this part we have explored the entire workflow of request review, and approval in PySyft. In particular, after receiving a new request from Rachel, Owen could read the purpose of her study, review the submitted code, and checked that all the expectations and data access criteria were met. Finally, after having executed the code on both mock and real data, the request was approved.

In the last part, part 5, we will see how it will be possible for Rachel to retrieve the result she is waiting for, after receiving her request approved.