Ginkgo’s AI model API client

Work in progress: this repo was just made public and we are still working on integration

A python client for Ginkgo’s AI model API, to run inference on public and Ginkgo-proprietary models. Learn more in the Model API announcement.

Prerequisites

Register at https://models.ginkgobioworks.ai/ to get credits and an API KEY (of the form xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx). Store the API KEY in the GINKGOAI_API_KEY environment variable.

Installation

Install the python client with pip:

pip install ginkgo-ai-client

Usage:

Note: This is an alpha version of the client and its interface may vary in the future.

Example : masked inference with Ginkgo’s AA0 model

The client requires an API key (and defaults to os.environ.get("GINKGOAI_API_KEY") if none is explicitly provided)

from ginkgo_ai_client import GinkgoAIClient, aa0_masked_inference_params

client = GinkgoAIClient()
prediction = client.query(aa0_masked_inference_params("MPK<mask><mask>RRL"))
# prediction["sequence"] == "MPKYLRRL"

predictions = client.batch_query([
    aa0_masked_inference_params("MPK<mask><mask>RRL"),
    aa0_masked_inference_params("M<mask>RL"),
    aa0_masked_inference_params("MLLM<mask><mask>R"),
])
# predictions[0]["result"]["sequence"] == "MPKYLRRL"

Note that you can get esm predictions by using esm_masked_inference_params in the example above.

Example : embedding computation with Ginkgo’s 3’UTR language model

from ginkgo_ai_client import GinkgoAIClient, three_utr_mean_embedding_params

client = GinkgoAIClient()
prediction = client.query(three_utr_mean_embedding_params("ATTGCG"))
# prediction["embedding"] == [1.05, -2.34, ...]

predictions = client.batch_query([
    three_utr_mean_embedding_params("ATTGCG"),
    three_utr_mean_embedding_params("CAATGC"),
    three_utr_mean_embedding_params("GCGCACATGT"),
])
# predictions[0]["result"]["embedding"] == [1.05, -2.34, ...]

Available models

See the example folder and reference docs for more details on usage and parameters.

Model

Description

Reference

Supported queries

Versions

ESM2

Large Protein language model from Meta

Github

Embeddings, masked inference

3B, 650M

AA0

Ginkgo’s proprietary protein language model

Announcement

Embeddings, masked inference

650M

3UTR

Ginkgo’s proprietary 3’UTR language model

Preprint

Embeddings, masked inference

v1

License

This project is licensed under the MIT License. See the LICENSE file for details.

Releases

Make sure the changelog is up to date, increment the version in pyproject.toml, create a new tag, then create a release on Github (publication to PyPI is automated).


API Documentation

GinkgoAIClient

class ginkgo_ai_client.client.GinkgoAIClient(api_key: str | None = None, polling_delay: float = 1)[source]

A client for the public Ginkgo AI models API.

Parameters:
  • api_key (str (optional)) – The API key to use for the Ginkgo AI API. If none is provided, the GINKGOAI_API_KEY environment variable will be used.

  • polling_delay (float (default: 1)) – The delay between polling requests to the Ginkgo AI API, in seconds.

Examples

client = GinkgoAIClient()
query_params = aa0_masked_inference("MPK<mask><mask>RRL")
response = client.query(query_params)
# response["sequence"] == "MPKYLRRL"
responses = client.batch_query([query_params, other_query_params])
batch_query(params_list: List[Dict], timeout: float = None) List[Dict][source]

Query the Ginkgo AI API in batch mode.

Parameters:
  • params_list (list of dict) – The parameters of the queries (depends on the model used) used to send to the Ginkgo AI API. These will typically be generated using the helper methods in ginkgo_ai_client.queries.

  • timeout (float (optional)) – The maximum time to wait for the batch to complete, in seconds.

Returns:

The responses from the Ginkgo AI API. It will be different depending on the query, see the different docstrings in ginkgo_ai_client.queries.

Return type:

list of dict

query(params: Dict, timeout: float = 60) Dict[source]
Parameters:
  • params (dict) – The parameters of the query (depends on the model used) used to send to the Ginkgo AI API. These will typically be generated using the helper methods ending in *_params.

  • timeout (float (default: 60)) – The maximum time to wait for the query to complete, in seconds.

Returns:

The response from the Ginkgo AI API, for instance {“sequence”: “ATG…”}. It will be different depending on the query, see the different docstrings of the helper methods ending in *_params.

Return type:

dict

Query Parameters

Helpers for generating query parameters for the Ginkgo AI API.

ginkgo_ai_client.query_parameters.aa0_masked_inference_params(sequence: str, model: str = 'ginkgo-aa0-650M') Dict[source]

Generate the query parameters for a masked inference query with Ginkgo’s AA0 protein-language model.

The mean embedding of a protein sequence refers to the mean of the token embedding in the encoder’s last layer.

Parameters:
  • sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.

  • model (str (default: "ginkgo-aa0-650M")) – The model to use for the inference (only “ginkgo-aa0-650M” is supported for now).

  • results (Query)

  • ------------

  • {sequence (str}) – The predicted sequence where every masked position has been replaced by the “ATGC” nucleotide with the highest probability at this position.

Examples

>>> client.query(aa0_masked_inference_params("MLPP<mask>PPLM<mask>"))
>>> # {"sequence": "MLPPKPPLMR"}
ginkgo_ai_client.query_parameters.aa0_mean_embedding_params(sequence: str, model: str = 'ginkgo-aa0-650M') Dict[source]

Generate the query parameters for a AA0 mean embedding query.

The mean embedding refers to the mean of the token embedding in the encoder’s last layer.

Parameters:
  • sequence (str) – The sequence for which to compute the mean embedding.

  • model (str (default: "ginkgo-aa0-650M")) – The model to use for the embedding (only “ginkgo-aa0-650M” is supported for now).

  • results (Query)

  • -------------

  • List[float] – The mean embedding of the sequence.

Examples

>>> client.query(aa0_mean_embedding_params("MLPP<mask>PPLM"))
>>> # {"embedding": [1.05, 0.002, ...]}
ginkgo_ai_client.query_parameters.esm_masked_inference_params(sequence: str, model: str = 'esm2-650M') Dict[source]

Generate the query parameters for a ESM masked inference query.

Parameters:
  • sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.

  • model (str (default: "esm2-650M")) – The model to use for the inference (“esm2-650M” or “esm2-3B”).

  • results (Query)

  • ------------

  • {sequence (str}) – The predicted sequence where every masked position has been replaced by the “ATGC” nucleotide with the highest probability at this position.

Examples

>>> client.query(esm_masked_inference_params("MLPP<mask>PPLM<mask>"))
>>> # {"sequence": "MLPPKPPLMR"}
ginkgo_ai_client.query_parameters.esm_mean_embedding_params(sequence: str, model: str = 'esm2-650M') Dict[source]

Generate the query parameters for mean embedding inference with Ginkgo’s AA0 protein-language model.

The mean embedding of a protein sequence refers to the mean of the token embedding in the encoder’s last layer.

Parameters:
  • sequence (str) – The sequence for which to compute the mean embedding.

  • model (str (default: "esm2-650M")) – The model to use for the embedding (“esm2-650M” or “esm2-3B”).

  • results (Query)

  • ------------

  • List[float] – The mean embedding of the sequence.

Examples

>>> client.query(esm_mean_embedding_params("MLPP<mask>PPLM"))
>>> # {"embedding": [1.05, 0.002, ...]}
ginkgo_ai_client.query_parameters.three_utr_masked_inference_params(sequence: str, model: str = 'ginkgo-maskedlm-3utr-v1') Dict[source]

Generate the query parameters for a masked inference query for Ginkgo’s 3UTR language model.

Parameters:
  • sequence (str) – The sequence to unmask. The sequence should be of the form “ATGC<mask>ATGC” with as many masks as desired.

  • model (str (default: "ginkgo-maskedlm-3utr-v1")) – The model to use for the inference (only “ginkgo-maskedlm-3utr-v1” is supported for now).

  • results (Query)

  • ------------

  • {sequence (str}) – The predicted sequence where every masked position has been replaced by the “ATGC” nucleotide with the highest probability at this position.

ginkgo_ai_client.query_parameters.three_utr_mean_embedding_params(sequence: str, model: str = 'ginkgo-maskedlm-3utr-v1') Dict[source]

Generate the query parameters for a mean embedding query for Ginkgo’s 3UTR language model.

The mean embedding refers to the mean of the token embedding in the encoder’s last layer.

Parameters:
  • sequence (str) – The sequence for which to compute the mean embedding, of the form “ATGC…”

  • model (str (default: "ginkgo-maskedlm-3utr-v1")) – The model to use for the embedding (only “ginkgo-maskedlm-3utr-v1” is supported for now).

  • results (Query)

  • ------------

  • List[float] – The mean embedding of the sequence.

Examples

>>> client.query(three_utr_mean_embedding_params("MLPP<mask>PPLM<mask>"))
>>> # {"embedding": [1.05, 0.002, ...]}

Contents: