Custom API Endpoints (.api.services.<custom>
)#
Estimated reading time: 16’
What you’ll learn#
This guide’s objective is to show data owners how to expand the existing Python API to:
expose API methods that can be run without approval (pre-approved code)
create API bridges to enable access to private assets hosted on third-party platforms
Introduction#
Custom API endpoints are extensions of the Python API available in the Syft Client of a Datasite that allows admins and data owners to expand what methods are available to the researchers.
This is not something new:
This is - in fact - exactly what you do when you create a Syft Function. For example, if you have defined and submitted the following:
@syft_function_single_use(df=some_dataset)
def my_special_method():
...
you would have extended our <client>.code
API with your custom method that can be only runned upon appoval as <client>.code.my_special_method()
.
However, in this case, admins define in advance the custom methods data scientists can run, alongside the above functionality to serve various goals.
Let’s walk through a few scenarios where this is particularly useful.
1. Data owner: My private data is in Microsoft Azure SQL Database and too big to be moved to PySyft
Let’s say you are storing dozens of TBs of data in a SQL database. There are multiple concerns that could come to your mind if you were to directly upload it to PySyft’s blob storage:
Extra work is needed to host most recent snapshot
Duplicating data and that is costly
SQL databases are very efficient at processing SQL queries and would be cheaper to use the Microsoft Azure SQL engine for computation
The same way you would host a mock and private dataset asset on PySyft, you can define a brige to your mock and private dataset.
2. Data owner: My model is hosted on Google Vertex AI and too big to be moved to PySyft
Let’s say you are storing your proprietary model in Google Vertex. Likewise, you might be concerned that if you were to host it directly in PySyft:
Extra work is needed to host the most recent model checkpoint
Training or infenrece on the model requires a lot of resources (e.g. GPUs) or it is optimised to run on TPUs; or, simply, compute ends up being cheaper running it via Google Vertex.
The same way you would host a mock and private model on PySyft, you can define a brige to your mock and private model.
2. Data owner: My model is hosted on Google Vertex AI and too big to be moved to PySyft
Let’s say you already know what your data scientists might need and request - to make it easier for them to conduct their research, you can expose a few public API endpoints they can use.
An extensive how-to-guide on how to create such an API bridge to a third-party platform is available here.
Show code cell source
import syft as sy
datasite = sy.orchestra.launch(
name="test-datasite",
dev_mode=False,
create_producer=True,
n_consumers=1,
reset=True,
port="auto",
)
client = datasite.login(email="[email protected]", password="changethis")
client.register(
email="[email protected]",
password="verysecurepassword",
password_verify="verysecurepassword",
name="New User",
)
guest_client = datasite.login(email="[email protected]", password="verysecurepassword")
Show code cell output
Starting test-datasite server on 0.0.0.0:54319
Waiting for server to start Done.
You have launched a development server at http://0.0.0.0:54319.It is intended only for local use.
Logged into <test-datasite: High side Datasite> as <[email protected]>
You are using a default password. Please change the password using `[your_client].account.set_password([new_password])`.
Logged into <test-datasite: High side Datasite> as <[email protected]>
Create API endpoints#
Public API endpoint#
A public API endpoint is esentially a method that anyone can use freely, without your approval. This is the way you can expose pre-approved methods.
The elemnts of an endpoint are:
a
sy.api_endpoint
decorator with following args:path
: the path is composed of an API name and a subpath as suchapi_name.subpath
; once defined in this manner, your API is going to be available uner<client>.api.services.api_name.subpath
description
: supports markdown for extensive explanations on how researchers are expected to use itsettings
: optional, this is a dictionary with values you might want to access in your API endpoint. This is ideal for passing key APIs, since this way none can see or access them.
context
: always part of the method signature, this allows you to access settings variables or the user client and it looks like this:
@sy.api_endpoint(
path="my_api.subpath",
description="This is an example endpoint",
settings={"key": "value"}
)
def my_preapproved_func(
context,
query: str,
) -> Any:
pass
Let’s take an example:
@sy.api_endpoint(
path="my_org.sum_numbers",
description="This is an example endpoint",
)
def my_preapproved_func(
context,
number_1,
number_2
):
return number_1 + number_2
client.custom_api.add(endpoint=my_preapproved_func)
Endpoint successfully created.
Let’s test it quick!
client.api.services.my_org.sum_numbers(number_1=2, number_2=3)
Pointer
5
Note
Passing values Due to the way they are defined, custom API endpoints require always specificying in full the kwargs.
Twin API endpoint#
As we saw before, custom API endpoints can help as a bridge, enabling access to private assets using the mock-private paradigm.
To do so, you have to define two methods: one for mock access and one for private access and after create the final endpoint, as such:
Mock method:
@sy.api_endpoint_method(
settings={"key": MY_API_KEY}
)
def mock_method(context, query:str):
pass
Private method:
@sy.api_endpoint_method(
settings={"key": MY_API_KEY}
)
def private_method(context, query:str):
pass
and using both, you can just define a twin endpoint, including its path and description:
new_endpoint = sy.TwinAPIEndpoint(
path="my_org.my_data",
mock_function=mock_method,
private_function=private_method,
description="This is an example endpoint"
)
Let’s see an example:
MOCK_STORAGE_API_KEY = ""
PRIVATE_STORAGE_API_KEY = ""
@sy.api_endpoint_method(
settings={"key": MOCK_STORAGE_API_KEY}
)
def mock_get_sample_method(context, length: int):
import pandas as pd
# CALL external API using the key
# result = call(key=context.settings.key, length=length)
return pd.DataFrame.from_dict({'data': ['mock_example', 'another_mock_example']})
@sy.api_endpoint_method(
settings={"key": PRIVATE_STORAGE_API_KEY}
)
def private_get_sample_method(context, length: int):
import pandas as pd
# CALL external API using the key
# result = call(key=context.settings.key, length=length)
return pd.DataFrame.from_dict({'data': ['private_example', 'another_private_example']})
new_endpoint = sy.TwinAPIEndpoint(
path="my_org.get_data_sample",
mock_function=mock_get_sample_method,
private_function=private_get_sample_method,
description="Get a fixed number of rows of the data."
)
client.custom_api.add(endpoint=new_endpoint)
Endpoint successfully created.
Context Manager#
As you can see above, a context
argument is part of the method signature. This is what helps you:
Settings: access settings and any non-public secrets, such as API keys
Example:
context.settings["key]
Stateful APIs: keep state - you can save the state across the runs of the same API by an user, allowing to build upon prior state or implement functionalities like rate-limiting
Example:
context.state["time_of_run"] = time()
Access metadata about the running user
Example
context.user.email
Access running user’s client: access the running user’s client to fetch various information, such as email, or even do operations on their behalf, such as submitting queries.
Example
context.user_client.code.request_code_execution(execute_query)
.Your own external code: you can pass external methods or classes that you define in the process to the API endpoint definition via `helper_functions
Example:
def count_user_runs_in_last_hour(context.state):
# ...
@sy.api_endpoint_method(
helper_functions=[count_user_runs_in_last_hour]
):
context.code.count_user_runs_in_last_hour
# ...
This leaves a great amount of flexibility to the admin. To look at an example, please refer to this guide.
Using third-party libraries#
It is very common, epecially when trying to set up bridges to external platforms, that you might need to use third-party Python packages, such as Python libraries offered by these platforms to interact with their APIs.
In this case, you can configure your custom API endpoints to always run on a custom worker pool (that runs the appropriate image). This can be specified as such:
public API endpoint
@sy.api_endpoint(
path="my_org.sum_numbers",
description="This is an example endpoint",
worker_pool="math-libs-pool"
)
def my_preapproved_func(context, number_1, number_2):
import my_third_party_math_lib
return number_1 + number_2
Twin API endpoint
new_endpoint = sy.TwinAPIEndpoint(
path="my_org.get_data_sample",
mock_function=mock_get_sample_method,
private_function=private_get_sample_method,
description="Get a fixed number of rows of the data."
worker_pool="azure-sql-pool"
)
View & use API endpoints#
By this point, the work of defining the behaviour of the custom API endpoints should be done and we would like to understand how we should use it, either as admin, data owner or data scientists.
Note that both admin and data owners have equal permissions in this case.
Fetch all available custom endpoints#
All users can inspect the available endpoints via:
client.custom_api
TwinAPIEndpoint List
Total: 0
We can even look at individual endpoints further, by using the path defined above, as such:
client.api.services.my_org.get_data_sample
API: my_org.get_data_sample
Description: Get a fixed number of rows of the data.
Private Code:
def private_get_sample_method(context, length: int):
import pandas as pd
# CALL external API using the key
# result = call(key=context.settings.key, length=length)
return pd.DataFrame.from_dict({'data': ['private_example', 'another_private_example']})
Public Code:
def mock_get_sample_method(context, length: int):
import pandas as pd
# CALL external API using the key
# result = call(key=context.settings.key, length=length)
return pd.DataFrame.from_dict({'data': ['mock_example', 'another_mock_example']})
Viewing code definition#
You can see above that as an admin, we can see all the code we defined.
By default, data scientists can only see the mock definition. However, this can be updated so that data scientists see nothing of how an endpoint works (check the update part below).
guest_client.api.services.my_org.get_data_sample
API: my_org.get_data_sample
Description: Get a fixed number of rows of the data.
Private Code:
N / A
Public Code:
def mock_get_sample_method(context, length: int):
import pandas as pd
# CALL external API using the key
# result = call(key=context.settings.key, length=length)
return pd.DataFrame.from_dict({'data': ['mock_example', 'another_mock_example']})
Running it directly#
For a public endpoint, you do not need any approval - you run it directly and get the final answer without anyone needing to approve your query, as such:
# As a data scientist / data owner / admin
guest_client.api.services.my_org.sum_numbers(number_1=2, number_2=3)
Pointer
5
This creates a job which is immediately executed and a result is returned. The jobs created for a specific endpoint can be inspected via client.jobs
:
guest_client.jobs
Job List
Total: 0
However, for Twin Endpoints, that have both a mock and private version, this is a multi-step process which includes an approval step.
Let’s see first what we are able to run without approval:
# As a data scientist / data owner / admin
guest_client.api.services.my_org.get_data_sample.mock(length=2)
data | |
---|---|
0 | mock_example |
1 | another_mock_example |
# As a data scientist
guest_client.api.services.my_org.get_data_sample.private(length=2)
SyftError: You're not allowed to run this code.
SyftError: You're not allowed to run this code.
SyftError: You're not allowed to run this code.
You're not allowed to run this code.
# As a data owner / admin
client.api.services.my_org.get_data_sample.private(length=2)
data | |
---|---|
0 | private_example |
1 | another_private_example |
Warning
Who can access private one As we can see, data scientists are restricted to public and mock custom API runs. To run the real one, approval is needed from the data owner, which is possible by defining a syft function.
Use it in your syft function#
Custom API endpoints can be used like any other private asset on your server and can be specified as part of the input policy.
Note
Using the .mock
or .private
is great for testing, similar to how one interacts with datasets or models. However, when passing it as an input endpoint, we expect you want to compute it against the private data, thus PySyft only allows you to run it if approved.
@sy.syft_function_single_use(
data_endpoint=guest_client.api.services.my_org.get_data_sample)
def get_sample_fixed_length(data_endpoint):
data = data_endpoint(length=2)
return data.iloc[0]
guest_client.code.request_code_execution(get_sample_fixed_length)
Syft function 'get_sample_fixed_length' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.
Request
Id: 4dda4d363f9043a589f6f5eae9ff889c
Request time: 2024-08-24 19:48:04
Status: RequestStatus.PENDING
Requested on: Test-datasite of type Datasite
Requested by: New User ([email protected])
Changes: Request to change get_sample_fixed_length (Pool Id: default-pool) to permission RequestStatus.APPROVED. No nested requests.
guest_client.code.get_sample_fixed_length()
Your code is waiting for approval. Code status on server 'test-datasite' is 'UserCodeStatus.PENDING'.
Let’s approve and try again:
# as data owner / admin
client.requests[-1].approve()
# as data scientist
result = guest_client.code.get_sample_fixed_length(data_endpoint=guest_client.api.services.my_org.get_data_sample)
result.get()
Approving request on change get_sample_fixed_length for datasite test-datasite
data private_example
Name: 0, dtype: object
Update API endpoints#
Update code definition#
To update the code definition, you can define a new mock or private method and update it as:
@sy.api_endpoint_method(
settings={"key": MOCK_STORAGE_API_KEY}
)
def new_mock_get_sample_method(context, length: int):
import pandas as pd
return "Oh, look, I updated the mock function!"
client.custom_api.update(endpoint_path="my_org.get_data_sample", mock_function=new_mock_get_sample_method)
Endpoint successfully updated.
client.api.services.my_org.get_data_sample.mock(length=2)
Pointer
‘Oh, look, I updated the mock function!’
You can proceed similarly for the private_function
.
Update permissions to read code#
As stated before, by default, the data scientist can access the mock definition. In some cases, this is very useful so that the data scientists understands the assumptions behind the mock response, thus seeing the code.
However, it is possible to hide it by updating the appropriate setting for an endpoint:
client.custom_api.update(endpoint_path="my_org.get_data_sample", hide_mock_definition=True)
Endpoint successfully updated.
# as data scientist, now:
guest_client.api.services.my_org.get_data_sample
API: my_org.get_data_sample
Description: Get a fixed number of rows of the data.
Private Code:
N / A
Public Code:
N / A
Update timeout length#
This specifies how many seconds should be given for an API endpoint to timeout. By default, this value is 60s.
client.custom_api.update(endpoint_path="my_org.get_data_sample", endpoint_timeout=120) # Update to 120s
Endpoint successfully updated.
Delete API endpoints#
Deleting an API endpoint is as simple as:
client.custom_api.delete(endpoint_path="my_org.get_data_sample")
Endpoint successfully deleted.
client.custom_api
TwinAPIEndpoint List