Ginkgo’s AI model API client¶
Work in progress: this repo was just made public and we are still working on integration
A python client for Ginkgo’s AI model API, to run inference on public and Ginkgo-proprietary models. Learn more in the Model API announcement.
Prerequisites¶
Register at https://models.ginkgobioworks.ai/ to get credits and an API KEY (of the form xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx
).
Store the API KEY in the GINKGOAI_API_KEY
environment variable.
Installation¶
Install the python client with pip:
pip install ginkgo-ai-client
Usage:¶
Note: This is an alpha version of the client and its interface may vary in the future.
Example : masked inference with Ginkgo’s AA0 model
The client requires an API key (and defaults to os.environ.get("GINKGOAI_API_KEY")
if none is explicitly provided)
from ginkgo_ai_client import GinkgoAIClient, aa0_masked_inference_params
client = GinkgoAIClient()
prediction = client.query(aa0_masked_inference_params("MPK<mask><mask>RRL"))
# prediction["sequence"] == "MPKYLRRL"
predictions = client.batch_query([
aa0_masked_inference_params("MPK<mask><mask>RRL"),
aa0_masked_inference_params("M<mask>RL"),
aa0_masked_inference_params("MLLM<mask><mask>R"),
])
# predictions[0]["result"]["sequence"] == "MPKYLRRL"
Note that you can get esm predictions by using esm_masked_inference_params
in the example above.
Example : embedding computation with Ginkgo’s 3’UTR language model
from ginkgo_ai_client import GinkgoAIClient, three_utr_mean_embedding_params
client = GinkgoAIClient()
prediction = client.query(three_utr_mean_embedding_params("ATTGCG"))
# prediction["embedding"] == [1.05, -2.34, ...]
predictions = client.batch_query([
three_utr_mean_embedding_params("ATTGCG"),
three_utr_mean_embedding_params("CAATGC"),
three_utr_mean_embedding_params("GCGCACATGT"),
])
# predictions[0]["result"]["embedding"] == [1.05, -2.34, ...]
Available models¶
See the example folder and reference docs for more details on usage and parameters.
Model |
Description |
Reference |
Supported queries |
Versions |
---|---|---|---|---|
ESM2 |
Large Protein language model from Meta |
Embeddings, masked inference |
3B, 650M |
|
AA0 |
Ginkgo’s proprietary protein language model |
Embeddings, masked inference |
650M |
|
3UTR |
Ginkgo’s proprietary 3’UTR language model |
Embeddings, masked inference |
v1 |
License¶
This project is licensed under the MIT License. See the LICENSE
file for details.
Releases¶
Make sure the changelog is up to date, increment the version in pyproject.toml, create a new tag, then create a release on Github (publication to PyPI is automated).
API Documentation¶
GinkgoAIClient¶
- class ginkgo_ai_client.client.GinkgoAIClient(api_key: str | None = None, polling_delay: float = 1)[source]¶
A client for the public Ginkgo AI models API.
- Parameters:
api_key (str (optional)) – The API key to use for the Ginkgo AI API. If none is provided, the GINKGOAI_API_KEY environment variable will be used.
polling_delay (float (default: 1)) – The delay between polling requests to the Ginkgo AI API, in seconds.
Examples
client = GinkgoAIClient() query_params = aa0_masked_inference("MPK<mask><mask>RRL") response = client.query(query_params) # response["sequence"] == "MPKYLRRL" responses = client.batch_query([query_params, other_query_params])
- batch_query(params_list: List[Dict], timeout: float = None) List[Dict] [source]¶
Query the Ginkgo AI API in batch mode.
- Parameters:
params_list (list of dict) – The parameters of the queries (depends on the model used) used to send to the Ginkgo AI API. These will typically be generated using the helper methods in ginkgo_ai_client.queries.
timeout (float (optional)) – The maximum time to wait for the batch to complete, in seconds.
- Returns:
The responses from the Ginkgo AI API. It will be different depending on the query, see the different docstrings in ginkgo_ai_client.queries.
- Return type:
list of dict
- query(params: Dict, timeout: float = 60) Dict [source]¶
- Parameters:
params (dict) – The parameters of the query (depends on the model used) used to send to the Ginkgo AI API. These will typically be generated using the helper methods ending in *_params.
timeout (float (default: 60)) – The maximum time to wait for the query to complete, in seconds.
- Returns:
The response from the Ginkgo AI API, for instance {“sequence”: “ATG…”}. It will be different depending on the query, see the different docstrings of the helper methods ending in *_params.
- Return type:
dict
Query Parameters¶
Helpers for generating query parameters for the Ginkgo AI API.
- ginkgo_ai_client.query_parameters.aa0_masked_inference_params(sequence: str, model: str = 'ginkgo-aa0-650M') Dict [source]¶
Generate the query parameters for a masked inference query with Ginkgo’s AA0 protein-language model.
The mean embedding of a protein sequence refers to the mean of the token embedding in the encoder’s last layer.
- Parameters:
sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.
model (str (default: "ginkgo-aa0-650M")) – The model to use for the inference (only “ginkgo-aa0-650M” is supported for now).
results (Query)
------------
{sequence (str}) – The predicted sequence where every masked position has been replaced by the “ATGC” nucleotide with the highest probability at this position.
Examples
>>> client.query(aa0_masked_inference_params("MLPP<mask>PPLM<mask>")) >>> # {"sequence": "MLPPKPPLMR"}
- ginkgo_ai_client.query_parameters.aa0_mean_embedding_params(sequence: str, model: str = 'ginkgo-aa0-650M') Dict [source]¶
Generate the query parameters for a AA0 mean embedding query.
The mean embedding refers to the mean of the token embedding in the encoder’s last layer.
- Parameters:
sequence (str) – The sequence for which to compute the mean embedding.
model (str (default: "ginkgo-aa0-650M")) – The model to use for the embedding (only “ginkgo-aa0-650M” is supported for now).
results (Query)
-------------
List[float] – The mean embedding of the sequence.
Examples
>>> client.query(aa0_mean_embedding_params("MLPP<mask>PPLM")) >>> # {"embedding": [1.05, 0.002, ...]}
- ginkgo_ai_client.query_parameters.esm_masked_inference_params(sequence: str, model: str = 'esm2-650M') Dict [source]¶
Generate the query parameters for a ESM masked inference query.
- Parameters:
sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.
model (str (default: "esm2-650M")) – The model to use for the inference (“esm2-650M” or “esm2-3B”).
results (Query)
------------
{sequence (str}) – The predicted sequence where every masked position has been replaced by the “ATGC” nucleotide with the highest probability at this position.
Examples
>>> client.query(esm_masked_inference_params("MLPP<mask>PPLM<mask>")) >>> # {"sequence": "MLPPKPPLMR"}
- ginkgo_ai_client.query_parameters.esm_mean_embedding_params(sequence: str, model: str = 'esm2-650M') Dict [source]¶
Generate the query parameters for mean embedding inference with Ginkgo’s AA0 protein-language model.
The mean embedding of a protein sequence refers to the mean of the token embedding in the encoder’s last layer.
- Parameters:
sequence (str) – The sequence for which to compute the mean embedding.
model (str (default: "esm2-650M")) – The model to use for the embedding (“esm2-650M” or “esm2-3B”).
results (Query)
------------
List[float] – The mean embedding of the sequence.
Examples
>>> client.query(esm_mean_embedding_params("MLPP<mask>PPLM")) >>> # {"embedding": [1.05, 0.002, ...]}
- ginkgo_ai_client.query_parameters.three_utr_masked_inference_params(sequence: str, model: str = 'ginkgo-maskedlm-3utr-v1') Dict [source]¶
Generate the query parameters for a masked inference query for Ginkgo’s 3UTR language model.
- Parameters:
sequence (str) – The sequence to unmask. The sequence should be of the form “ATGC<mask>ATGC” with as many masks as desired.
model (str (default: "ginkgo-maskedlm-3utr-v1")) – The model to use for the inference (only “ginkgo-maskedlm-3utr-v1” is supported for now).
results (Query)
------------
{sequence (str}) – The predicted sequence where every masked position has been replaced by the “ATGC” nucleotide with the highest probability at this position.
- ginkgo_ai_client.query_parameters.three_utr_mean_embedding_params(sequence: str, model: str = 'ginkgo-maskedlm-3utr-v1') Dict [source]¶
Generate the query parameters for a mean embedding query for Ginkgo’s 3UTR language model.
The mean embedding refers to the mean of the token embedding in the encoder’s last layer.
- Parameters:
sequence (str) – The sequence for which to compute the mean embedding, of the form “ATGC…”
model (str (default: "ginkgo-maskedlm-3utr-v1")) – The model to use for the embedding (only “ginkgo-maskedlm-3utr-v1” is supported for now).
results (Query)
------------
List[float] – The mean embedding of the sequence.
Examples
>>> client.query(three_utr_mean_embedding_params("MLPP<mask>PPLM<mask>")) >>> # {"embedding": [1.05, 0.002, ...]}