Custom API Endpoints (.api.services.<custom>)#

Estimated reading time: 16’

What you’ll learn#

This guide’s objective is to show data owners how to expand the existing Python API to:

  • expose API methods that can be run without approval (pre-approved code)

  • create API bridges to enable access to private assets hosted on third-party platforms

Introduction#

Custom API endpoints are extensions of the Python API available in the Syft Client of a Datasite that allows admins and data owners to expand what methods are available to the researchers.

This is not something new:

This is - in fact - exactly what you do when you create a Syft Function. For example, if you have defined and submitted the following:

@syft_function_single_use(df=some_dataset)
def my_special_method():
    ...

you would have extended our <client>.code API with your custom method that can be only runned upon appoval as <client>.code.my_special_method().

However, in this case, admins define in advance the custom methods data scientists can run, alongside the above functionality to serve various goals.

Let’s walk through a few scenarios where this is particularly useful.

|:data_scientist:| 1. Data owner: My private data is in Microsoft Azure SQL Database and too big to be moved to PySyft

Let’s say you are storing dozens of TBs of data in a SQL database. There are multiple concerns that could come to your mind if you were to directly upload it to PySyft’s blob storage:

  • Extra work is needed to host most recent snapshot

  • Duplicating data and that is costly

  • SQL databases are very efficient at processing SQL queries and would be cheaper to use the Microsoft Azure SQL engine for computation

The same way you would host a mock and private dataset asset on PySyft, you can define a brige to your mock and private dataset.

|:data_scientist:| 2. Data owner: My model is hosted on Google Vertex AI and too big to be moved to PySyft

Let’s say you are storing your proprietary model in Google Vertex. Likewise, you might be concerned that if you were to host it directly in PySyft:

  • Extra work is needed to host the most recent model checkpoint

  • Training or infenrece on the model requires a lot of resources (e.g. GPUs) or it is optimised to run on TPUs; or, simply, compute ends up being cheaper running it via Google Vertex.

The same way you would host a mock and private model on PySyft, you can define a brige to your mock and private model.

|:data_scientist:| 2. Data owner: My model is hosted on Google Vertex AI and too big to be moved to PySyft

Let’s say you already know what your data scientists might need and request - to make it easier for them to conduct their research, you can expose a few public API endpoints they can use.

An extensive how-to-guide on how to create such an API bridge to a third-party platform is available here.

Hide code cell source
import syft as sy

datasite = sy.orchestra.launch(
    name="test-datasite",
    dev_mode=False,
    create_producer=True,
    n_consumers=1,
    reset=True,
    port="auto",
)

client = datasite.login(email="[email protected]", password="changethis")
client.register(
    email="[email protected]",
    password="verysecurepassword",
    password_verify="verysecurepassword",
    name="New User",
)
guest_client = datasite.login(email="[email protected]", password="verysecurepassword")
Hide code cell output
Starting test-datasite server on 0.0.0.0:54319
Waiting for server to start Done.
SyftInfo:
You have launched a development server at http://0.0.0.0:54319.It is intended only for local use.

Logged into <test-datasite: High side Datasite> as <[email protected]>
SyftWarning:
You are using a default password. Please change the password using `[your_client].account.set_password([new_password])`.

Logged into <test-datasite: High side Datasite> as <[email protected]>

|:data_scientist:| Create API endpoints#

Public API endpoint#

A public API endpoint is esentially a method that anyone can use freely, without your approval. This is the way you can expose pre-approved methods.

The elemnts of an endpoint are:

  • a sy.api_endpoint decorator with following args:

    • path: the path is composed of an API name and a subpath as such api_name.subpath; once defined in this manner, your API is going to be available uner <client>.api.services.api_name.subpath

    • description: supports markdown for extensive explanations on how researchers are expected to use it

    • settings: optional, this is a dictionary with values you might want to access in your API endpoint. This is ideal for passing key APIs, since this way none can see or access them.

  • context: always part of the method signature, this allows you to access settings variables or the user client and it looks like this:

@sy.api_endpoint(
    path="my_api.subpath",
    description="This is an example endpoint",
    settings={"key": "value"}
)
def my_preapproved_func(
    context,
    query: str,
) -> Any:
    pass

Let’s take an example:

@sy.api_endpoint(
    path="my_org.sum_numbers",
    description="This is an example endpoint",
)
def my_preapproved_func(
    context,
    number_1,
    number_2
):
    return number_1 + number_2
client.custom_api.add(endpoint=my_preapproved_func)
SyftSuccess:
Endpoint successfully created.

Let’s test it quick!

client.api.services.my_org.sum_numbers(number_1=2, number_2=3)

Pointer

5

Note

Passing values Due to the way they are defined, custom API endpoints require always specificying in full the kwargs.

Twin API endpoint#

As we saw before, custom API endpoints can help as a bridge, enabling access to private assets using the mock-private paradigm.

To do so, you have to define two methods: one for mock access and one for private access and after create the final endpoint, as such:

  • Mock method:

@sy.api_endpoint_method(
    settings={"key": MY_API_KEY}
)
def mock_method(context, query:str):
    pass
  • Private method:

@sy.api_endpoint_method(
    settings={"key": MY_API_KEY}
)
def private_method(context, query:str):
    pass

and using both, you can just define a twin endpoint, including its path and description:

new_endpoint = sy.TwinAPIEndpoint(
    path="my_org.my_data",
    mock_function=mock_method,
    private_function=private_method,
    description="This is an example endpoint"
)

Let’s see an example:

MOCK_STORAGE_API_KEY = ""
PRIVATE_STORAGE_API_KEY = ""

@sy.api_endpoint_method(
    settings={"key": MOCK_STORAGE_API_KEY}
)
def mock_get_sample_method(context, length: int):
    import pandas as pd
    # CALL external API using the key
    # result = call(key=context.settings.key, length=length)
    return pd.DataFrame.from_dict({'data': ['mock_example', 'another_mock_example']})

@sy.api_endpoint_method(
    settings={"key": PRIVATE_STORAGE_API_KEY}
)
def private_get_sample_method(context, length: int):
    import pandas as pd
    # CALL external API using the key
    # result = call(key=context.settings.key, length=length)
    return pd.DataFrame.from_dict({'data': ['private_example', 'another_private_example']})
new_endpoint = sy.TwinAPIEndpoint(
    path="my_org.get_data_sample",
    mock_function=mock_get_sample_method,
    private_function=private_get_sample_method,
    description="Get a fixed number of rows of the data."
)

client.custom_api.add(endpoint=new_endpoint)
SyftSuccess:
Endpoint successfully created.

Context Manager#

As you can see above, a context argument is part of the method signature. This is what helps you:

  • Settings: access settings and any non-public secrets, such as API keys

    Example: context.settings["key]

  • Stateful APIs: keep state - you can save the state across the runs of the same API by an user, allowing to build upon prior state or implement functionalities like rate-limiting

    Example: context.state["time_of_run"] = time()

  • Access metadata about the running user

    Example context.user.email

  • Access running user’s client: access the running user’s client to fetch various information, such as email, or even do operations on their behalf, such as submitting queries.

    Example context.user_client.code.request_code_execution(execute_query).

  • Your own external code: you can pass external methods or classes that you define in the process to the API endpoint definition via `helper_functions

    Example:

def count_user_runs_in_last_hour(context.state):
    # ...

@sy.api_endpoint_method(
    helper_functions=[count_user_runs_in_last_hour]
):
    context.code.count_user_runs_in_last_hour
    # ...

This leaves a great amount of flexibility to the admin. To look at an example, please refer to this guide.

Using third-party libraries#

It is very common, epecially when trying to set up bridges to external platforms, that you might need to use third-party Python packages, such as Python libraries offered by these platforms to interact with their APIs.

In this case, you can configure your custom API endpoints to always run on a custom worker pool (that runs the appropriate image). This can be specified as such:

  • public API endpoint

@sy.api_endpoint(
    path="my_org.sum_numbers",
    description="This is an example endpoint",
    worker_pool="math-libs-pool"
)
def my_preapproved_func(context, number_1, number_2):
    import my_third_party_math_lib
    return number_1 + number_2
  • Twin API endpoint

new_endpoint = sy.TwinAPIEndpoint(
    path="my_org.get_data_sample",
    mock_function=mock_get_sample_method,
    private_function=private_get_sample_method,
    description="Get a fixed number of rows of the data."
    worker_pool="azure-sql-pool"
)

|:data_scientist:| |:data_owner:| View & use API endpoints#

By this point, the work of defining the behaviour of the custom API endpoints should be done and we would like to understand how we should use it, either as admin, data owner or data scientists.

Note that both admin and data owners have equal permissions in this case.

Fetch all available custom endpoints#

All users can inspect the available endpoints via:

client.custom_api

TwinAPIEndpoint List

Total: 0

We can even look at individual endpoints further, by using the path defined above, as such:

client.api.services.my_org.get_data_sample

API: my_org.get_data_sample

Description: Get a fixed number of rows of the data.

Private Code:

def private_get_sample_method(context, length: int):
    import pandas as pd
    # CALL external API using the key
    # result = call(key=context.settings.key, length=length)
    return pd.DataFrame.from_dict({'data': ['private_example', 'another_private_example']})

Public Code:

def mock_get_sample_method(context, length: int):
    import pandas as pd
    # CALL external API using the key
    # result = call(key=context.settings.key, length=length)
    return pd.DataFrame.from_dict({'data': ['mock_example', 'another_mock_example']})

Viewing code definition#

You can see above that as an admin, we can see all the code we defined.

By default, data scientists can only see the mock definition. However, this can be updated so that data scientists see nothing of how an endpoint works (check the update part below).

guest_client.api.services.my_org.get_data_sample

API: my_org.get_data_sample

Description: Get a fixed number of rows of the data.

Private Code:

N / A

Public Code:

def mock_get_sample_method(context, length: int):
    import pandas as pd
    # CALL external API using the key
    # result = call(key=context.settings.key, length=length)
    return pd.DataFrame.from_dict({'data': ['mock_example', 'another_mock_example']})

Running it directly#

For a public endpoint, you do not need any approval - you run it directly and get the final answer without anyone needing to approve your query, as such:

# As a data scientist / data owner / admin

guest_client.api.services.my_org.sum_numbers(number_1=2, number_2=3)

Pointer

5

This creates a job which is immediately executed and a result is returned. The jobs created for a specific endpoint can be inspected via client.jobs:

guest_client.jobs

Job List

Total: 0

However, for Twin Endpoints, that have both a mock and private version, this is a multi-step process which includes an approval step.

Let’s see first what we are able to run without approval:

# As a data scientist / data owner / admin
guest_client.api.services.my_org.get_data_sample.mock(length=2)
data
0 mock_example
1 another_mock_example
# As a data scientist
guest_client.api.services.my_org.get_data_sample.private(length=2)
SyftError: You're not allowed to run this code.
SyftError: You're not allowed to run this code.
SyftError: You're not allowed to run this code.
SyftError:
You're not allowed to run this code.

# As a data owner / admin
client.api.services.my_org.get_data_sample.private(length=2)
data
0 private_example
1 another_private_example

Warning

Who can access private one As we can see, data scientists are restricted to public and mock custom API runs. To run the real one, approval is needed from the data owner, which is possible by defining a syft function.

Use it in your syft function#

Custom API endpoints can be used like any other private asset on your server and can be specified as part of the input policy.

Note

Using the .mock or .private is great for testing, similar to how one interacts with datasets or models. However, when passing it as an input endpoint, we expect you want to compute it against the private data, thus PySyft only allows you to run it if approved.

@sy.syft_function_single_use(
    data_endpoint=guest_client.api.services.my_org.get_data_sample)
def get_sample_fixed_length(data_endpoint):
    data = data_endpoint(length=2)
    return data.iloc[0]

guest_client.code.request_code_execution(get_sample_fixed_length)
SyftSuccess:
Syft function 'get_sample_fixed_length' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.

Request

Id: 4dda4d363f9043a589f6f5eae9ff889c

Request time: 2024-08-24 19:48:04

Status: RequestStatus.PENDING

Requested on: Test-datasite of type Datasite

Requested by: New User ([email protected])

Changes: Request to change get_sample_fixed_length (Pool Id: default-pool) to permission RequestStatus.APPROVED. No nested requests.

guest_client.code.get_sample_fixed_length()
SyftError:
 Your code is waiting for approval. Code status on server 'test-datasite' is 'UserCodeStatus.PENDING'.

Let’s approve and try again:

# as data owner / admin
client.requests[-1].approve()

# as data scientist
result = guest_client.code.get_sample_fixed_length(data_endpoint=guest_client.api.services.my_org.get_data_sample)

result.get()
Approving request on change get_sample_fixed_length for datasite test-datasite
data    private_example
Name: 0, dtype: object

|:data_scientist:| Update API endpoints#

Update code definition#

To update the code definition, you can define a new mock or private method and update it as:

@sy.api_endpoint_method(
    settings={"key": MOCK_STORAGE_API_KEY}
)
def new_mock_get_sample_method(context, length: int):
    import pandas as pd
    return "Oh, look, I updated the mock function!"


client.custom_api.update(endpoint_path="my_org.get_data_sample", mock_function=new_mock_get_sample_method)
SyftSuccess:
Endpoint successfully updated.

client.api.services.my_org.get_data_sample.mock(length=2)

Pointer

‘Oh, look, I updated the mock function!’

You can proceed similarly for the private_function.

Update permissions to read code#

As stated before, by default, the data scientist can access the mock definition. In some cases, this is very useful so that the data scientists understands the assumptions behind the mock response, thus seeing the code.

However, it is possible to hide it by updating the appropriate setting for an endpoint:

client.custom_api.update(endpoint_path="my_org.get_data_sample", hide_mock_definition=True)
SyftSuccess:
Endpoint successfully updated.

# as data scientist, now:
guest_client.api.services.my_org.get_data_sample

API: my_org.get_data_sample

Description: Get a fixed number of rows of the data.

Private Code:

N / A

Public Code:

N / A

Update timeout length#

This specifies how many seconds should be given for an API endpoint to timeout. By default, this value is 60s.

client.custom_api.update(endpoint_path="my_org.get_data_sample", endpoint_timeout=120) # Update to 120s
SyftSuccess:
Endpoint successfully updated.

|:data_scientist:| Delete API endpoints#

Deleting an API endpoint is as simple as:

client.custom_api.delete(endpoint_path="my_org.get_data_sample")
SyftSuccess:
Endpoint successfully deleted.

client.custom_api

TwinAPIEndpoint List

Total: 0