API Documentation

GinkgoAIClient

class ginkgo_ai_client.client.GinkgoAIClient(api_key: str | None = None, polling_delay: float = 1)[source]

A client for the public Ginkgo AI models API.

Parameters:
  • api_key (str (optional)) – The API key to use for the Ginkgo AI API. If none is provided, the GINKGOAI_API_KEY environment variable will be used.

  • polling_delay (float (default: 1)) – The delay between polling requests to the Ginkgo AI API, in seconds.

Examples

client = GinkgoAIClient()
query = MaskedInferenceQuery("MPK<mask><mask>RRL", model="ginkgo-aa0-650m")
response = client.send_request(query)
# response["sequence"] == "MPKYLRRL"
responses = client.send_batch_request([query_params, other_query_params])
send_batch_request(queries: List[QueryBase], timeout: float = None, on_failed_queries: Literal['ignore', 'warn', 'raise'] = 'ignore') List[Any][source]

Send multiple queries at once to the Ginkgo AI API in batch mode.

All the queries are sent at once and returned list has results in the same order as the queries. Additionally, if the queries have a query_name attribute, it will be preserved in the query_name attribute of the results.

Parameters:
  • queries (list of dict) – The parameters of the queries (depends on the model used) used to send to the Ginkgo AI API. These will typically be generated using the helper methods in ginkgo_ai_client.queries.

  • timeout (float (optional)) – The maximum time to wait for the batch to complete, in seconds.

  • on_failed_queries (Literal["ignore", "warn", "raise"] = "ignore") – What to do if some queries fail. The default is to ignore the failures, they will be returned as part of the results and will carry the corresponding query_name. The user will have to check and handle the queries themselves. “warn” will print a warning if there are failed queries, “raise” will raise an exception if at least one query failed.

Returns:

responses – A list of responses from the Ginkgo AI API. the class of the responses depends on the class of the queries. If some

Return type:

List[Any]

Raises:

RequestException – If the request failed due to the query content or a system error.

Examples

client = GinkgoAIClient()
queries = [
    MaskedInferenceQuery("MPK<mask><mask>RRL", model="ginkgo-aa0-650m"),
    MaskedInferenceQuery("MES<mask><mask>YKL", model="ginkgo-aa0-650m")
]
responses = client.send_batch_request(queries)
send_request(query: QueryBase, timeout: float = 60) Any[source]

Send a query to the Ginkgo AI API.

Parameters:
  • query (QueryBase) – The query to send to the Ginkgo AI API.

  • timeout (float (default: 60)) – The maximum time to wait for the query to complete, in seconds.

Returns:

The response from the Ginkgo AI API, for instance {“sequence”: “ATG…”}. It will be different depending on the query, see the different docstrings of the helper methods ending in *_params.

Return type:

Response

Raises:

RequestError – If the request failed due to the query content or a system error. The exception carries the original query and (if it reached that stage) the url it was polling for the results.

send_requests_by_batches(queries: List[QueryBase] | Iterator[QueryBase], batch_size: int = 20, timeout: float = None, on_failed_queries: Literal['ignore', 'warn', 'raise'] = 'ignore', max_concurrent: int = 3, show_progress: bool = True)[source]

Send multiple queries at once to the Ginkgo AI API in batch mode.

This method is useful for sending large numbers of queries to the Ginkgo AI API and process results in small batches as they are ready. It avoids running out of RAM by holding thousands of requests and their results in memory, and avoids overwhelming the web API servers.

The method divides the queries in small batches, then submits the batches to the web API (only 3 batches are submitted at the same time by default), and returns the list of results in each batch as soon as a full batch is ready.

Important Warning: this means that the batch results are not returned strictly in the same order as the batches sent. The best way to attribute results to inputs is to give each input query a query_name attribute, which will be preserved in the query_name attribute of the results. This is done automatically by some query methods such as .iter_from_fasta() which will attribute the sequence name each query.

Examples

model="esm2-650m"
queries = MeanEmbeddingQuery.iter_from_fasta("sequences.fasta", model=model)
    for batch_result in client.send_requests_by_batches(queries, batch_size=10):
         for query_result in batch_result:
              query_result.write_to_jsonl("results.jsonl")
Parameters:
  • queries (Union[List[QueryBase], Iterator[QueryBase]]) – The queries to send to the Ginkgo AI API. This can be a list or any iterable or an iterator

  • batch_size (int (default: 20)) – The size of the batches to send to the Ginkgo AI API.

  • timeout (float (optional)) – The maximum time to wait for one batch to complete, in seconds.

  • on_failed_queries (Literal["ignore", "warn", "raise"] = "ignore") – What to do if some queries fail. The default is to ignore the failures, they will be returned as part of the results and will carry the corresponding query_name. The user will have to check and handle the queries themselves. “warn” will print a warning if there are failed queries, “raise” will raise an exception if at least one query failed.

exception ginkgo_ai_client.client.RequestError(cause: Exception, query: QueryBase | None = None, result_url: str | None = None)[source]

An exception raised by a request, due to the query content or a system error.

This exception carries the original query and the result url to enable users to better handle failure cases.

Parameters:
  • cause (Exception) – The original exception that caused the request to fail.

  • query (QueryBase (optional)) – The query that failed. This enables users to retrieve and re-try the failed queries in a batch query

  • result_url (str (optional)) – The url where the result can be retrieved from. This enables users to get the result later if the failure cause was a temporary network error or an accidental timeout.

Mean embedding Queries

Used to get embedding vectors for protein or nucleotide sequences, using models such as ESM, Ginkgo-AA0, etc.

class ginkgo_ai_client.queries.MeanEmbeddingQuery(*, sequence: str, model: str, query_name: str | None = None)[source]

A query to infer mean embeddings from a DNA or protein sequence.

Parameters:
  • sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.

  • model (str) – The model to use for the inference.

  • query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.

Returns:

client.send_request(query) returns an EmbeddingResponse with attributes embedding (the mean embedding of the model’s last encoder layer) and query_name (the original query’s name).

Return type:

EmbeddingResponse

Examples

>>> query = MeanEmbeddingQuery("MLPP<mask>PPLM", model="ginkgo-aa0-650M")
>>> client.send_request(query)
EmbeddingResponse(embedding=[1.05, 0.002, ...])
class ginkgo_ai_client.queries.EmbeddingResponse(*, embedding: List[float], query_name: str | None = None)[source]

A response to a MeanEmbeddingQuery, with attributes embedding (the mean embedding of the model’s last encoder layer) and query_name (the original query’s name).

Masked inference queries

Used to get maximum-likelihood predictions for masked protein or nucleotide sequences, using models such as ESM, Ginkgo-AA0, etc.

class ginkgo_ai_client.queries.MaskedInferenceQuery(*, sequence: str, model: str, query_name: str | None = None)[source]

A query to infer masked tokens in a DNA or protein sequence.

Parameters:
  • sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.

  • model (str) – The model to use for the inference (only “ginkgo-aa0-650M” is supported for now).

  • query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.

Returns:

client.send_request(query) returns a SequenceResponse with attributes sequence` (the predicted sequence) and ``query_name (the original query’s name).

Return type:

SequenceResponse

class ginkgo_ai_client.queries.SequenceResponse(*, sequence: str, query_name: str | None = None)[source]

A response to a MaskedInferenceQuery, with attributes sequence (the predicted sequence) and query_name (the original query’s name).

Promoter activity prediction queries

Used to predict the activity of promoters in various human tissues, using Borzoi and Ginkgo’s Promoter-0

class ginkgo_ai_client.queries.PromoterActivityQuery(*, promoter_sequence: str, orf_sequence: str, tissue_of_interest: Dict[str, List[str]], source: str, inference_framework: Literal['promoter-0'] = 'promoter-0', borzoi_model: Literal['human-fold0'] = 'human-fold0', query_name: str | None = None)[source]

A query to infer the activity of a promoter in different tissues.

Parameters:
  • promoter_sequence (str) – The promoter sequence. Only ATGCN characters are allowed.

  • orf_sequence (str) – The ORF sequence. Only ATGCN characters are allowed.

  • tissue_of_interest (Dict[str, List[str]]) – The tissues of interest, with the tracks representing each tissue, for instance {“heart”: [“CNhs10608+”, “CNhs10612+”], “liver”: [“CNhs10608+”, “CNhs10612+”]}.

  • query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.

  • inference_framework (Literal["promoter-0"] = "promoter-0") – The inference framework to use for the inference. Currently only supports

  • borzoi_model (Literal["human-fold0"] = "human-fold0") – The model to use for the inference. Currently only supports the trained model of “human-fold0”.

Returns:

client.send_request(query) returns a PromoterActivityResponse with attributes activity_by_tissue (the activity of the promoter in each tissue) and query_name (the original query’s name).

Return type:

PromoterActivityResponse

class ginkgo_ai_client.queries.PromoterActivityResponse(*, activity_by_tissue: Dict[str, float], query_name: str | None = None)[source]

A response to a PromoterActivityQuery, with attributes activity (the predicted activity) and query_name (the original query’s name).

activity_by_tissue

The activity of the promoter in each tissue.

Type:

Dict[str, float]

query_name

The name of the query. It will appear in the API response and can be used to handle exceptions.

Type:

Optional[str] = None

Diffusion queries

Used to generate protein or nucleotide sequences using Ginkgo-devloped diffusion models LCDNA and AB-Diffusion.

class ginkgo_ai_client.queries.DiffusionMaskedQuery(*, sequence: str, temperature: float = 0.5, decoding_order_strategy: str = 'entropy', unmaskings_per_step: int = 50, model: str, query_name: str | None = None)[source]

A query to perform masked sampling using a diffusion model.

Parameters:
  • sequence (str) – Input sequence for masked sampling. The sequence may contain “<mask>” tokens.

  • temperature (float, optional (default=0.5)) – Sampling temperature, a value between 0 and 1.

  • decoding_order_strategy (str, optional (default="entropy")) – Strategy for decoding order, must be either “max_prob” or “entropy”.

  • unmaskings_per_step (int, optional (default=50)) – Number of tokens to unmask per step, an integer between 1 and 1000.

  • model (str) – The model to use for the inference.

  • query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.

Returns:

client.send_request(query) returns a DiffusionMaskedResponse with attributes sequence (the predicted sequence) and query_name (the original query’s name).

Return type:

DiffusionMaskedResponse

Examples

>>> query = DiffusionMaskedQuery(
...     sequence="ATTG<mask>TAC",
...     model="lcdna",
...     temperature=0.7,
...     decoding_order_strategy="entropy",
...     unmaskings_per_step=20,
... )
>>> client.send_request(query)
DiffusionMaskedResponse(sequence="ATTGCGTAC", query_name=None)
class ginkgo_ai_client.queries.DiffusionMaskedResponse(*, sequence: str, query_name: str | None = None)[source]

A response to a DiffusionMaskedQuery, with attributes sequence (the predicted sequence) and query_name (the original query’s name).

Boltz structure inference queries

Used to predict the 3D structure of a protein sequence using Boltz.

class ginkgo_ai_client.queries.BoltzStructurePredictionQuery(*, sequences: List[Dict[Literal['protein', 'ligand'], _Protein | _CCD | _Smiles]], model: Literal['boltz'] = 'boltz', query_name: str | None = None)[source]

A query to predict the structure of a protein using the Boltz model.

This type of query is better constructed using the from_yaml_file or from_protein_sequence methods.

Parameters:
  • sequences (List[Dict[Literal["protein", "ligand"], Union[_Protein, _CCD, _Smiles]]]) – The sequences to predict the structure for. Only protein sequences of size <1000aa are supported for now.

  • model (Literal["boltz"] = "boltz") – The model to use for the inference (only Boltz(1) is supported for now).

  • query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.

Examples

query = BoltzStructurePredictionQuery.from_yaml_file("input.yaml") # or below:
query = BoltzStructurePredictionQuery.from_protein_sequence("MLLKP")
response = client.send_request(query)
response.download_structure("structure.cif") # or below:
response.download_structure("structure.pdb")
class ginkgo_ai_client.queries.BoltzStructurePredictionResponse(*, cif_file_url: str, confidence_data: Dict[str, Any], query_name: str | None = None)[source]

A response to a BoltzStructurePredictionQuery

cif_file_url

The URL of the cif file.

Type:

str

confidence_data

The confidence data.

Type:

Dict[str, Any]

query_name

The name of the query. It will appear in the API response and can be used to handle exceptions.

Type:

Optional[str] = None

Examples

response = BoltzStructurePredictionResponse(
    cif_file_url="https://example.com/structure.cif",
    confidence_data={"confidence": 0.95},
    query_name="my_query",
)
response.download_structure("structure.cif") # or...
response.download_structure("structure.pdb")