API Documentation¶
GinkgoAIClient¶
- class ginkgo_ai_client.client.GinkgoAIClient(api_key: str | None = None, polling_delay: float = 1)[source]¶
A client for the public Ginkgo AI models API.
- Parameters:
api_key (str (optional)) – The API key to use for the Ginkgo AI API. If none is provided, the GINKGOAI_API_KEY environment variable will be used.
polling_delay (float (default: 1)) – The delay between polling requests to the Ginkgo AI API, in seconds.
Examples
client = GinkgoAIClient() query = MaskedInferenceQuery("MPK<mask><mask>RRL", model="ginkgo-aa0-650m") response = client.send_request(query) # response["sequence"] == "MPKYLRRL" responses = client.send_batch_request([query_params, other_query_params])
- send_batch_request(queries: List[QueryBase], timeout: float = None, on_failed_queries: Literal['ignore', 'warn', 'raise'] = 'ignore') List[Any] [source]¶
Send multiple queries at once to the Ginkgo AI API in batch mode.
All the queries are sent at once and returned list has results in the same order as the queries. Additionally, if the queries have a query_name attribute, it will be preserved in the query_name attribute of the results.
- Parameters:
queries (list of dict) – The parameters of the queries (depends on the model used) used to send to the Ginkgo AI API. These will typically be generated using the helper methods in ginkgo_ai_client.queries.
timeout (float (optional)) – The maximum time to wait for the batch to complete, in seconds.
on_failed_queries (Literal["ignore", "warn", "raise"] = "ignore") – What to do if some queries fail. The default is to ignore the failures, they will be returned as part of the results and will carry the corresponding query_name. The user will have to check and handle the queries themselves. “warn” will print a warning if there are failed queries, “raise” will raise an exception if at least one query failed.
- Returns:
responses – A list of responses from the Ginkgo AI API. the class of the responses depends on the class of the queries. If some
- Return type:
List[Any]
- Raises:
RequestException – If the request failed due to the query content or a system error.
Examples
client = GinkgoAIClient() queries = [ MaskedInferenceQuery("MPK<mask><mask>RRL", model="ginkgo-aa0-650m"), MaskedInferenceQuery("MES<mask><mask>YKL", model="ginkgo-aa0-650m") ] responses = client.send_batch_request(queries)
- send_request(query: QueryBase, timeout: float = 60) Any [source]¶
Send a query to the Ginkgo AI API.
- Parameters:
query (QueryBase) – The query to send to the Ginkgo AI API.
timeout (float (default: 60)) – The maximum time to wait for the query to complete, in seconds.
- Returns:
The response from the Ginkgo AI API, for instance {“sequence”: “ATG…”}. It will be different depending on the query, see the different docstrings of the helper methods ending in *_params.
- Return type:
Response
- Raises:
RequestError – If the request failed due to the query content or a system error. The exception carries the original query and (if it reached that stage) the url it was polling for the results.
- send_requests_by_batches(queries: List[QueryBase] | Iterator[QueryBase], batch_size: int = 20, timeout: float = None, on_failed_queries: Literal['ignore', 'warn', 'raise'] = 'ignore', max_concurrent: int = 3, show_progress: bool = True)[source]¶
Send multiple queries at once to the Ginkgo AI API in batch mode.
This method is useful for sending large numbers of queries to the Ginkgo AI API and process results in small batches as they are ready. It avoids running out of RAM by holding thousands of requests and their results in memory, and avoids overwhelming the web API servers.
The method divides the queries in small batches, then submits the batches to the web API (only 3 batches are submitted at the same time by default), and returns the list of results in each batch as soon as a full batch is ready.
Important Warning: this means that the batch results are not returned strictly in the same order as the batches sent. The best way to attribute results to inputs is to give each input query a query_name attribute, which will be preserved in the query_name attribute of the results. This is done automatically by some query methods such as .iter_from_fasta() which will attribute the sequence name each query.
Examples
model="esm2-650m" queries = MeanEmbeddingQuery.iter_from_fasta("sequences.fasta", model=model) for batch_result in client.send_requests_by_batches(queries, batch_size=10): for query_result in batch_result: query_result.write_to_jsonl("results.jsonl")
- Parameters:
queries (Union[List[QueryBase], Iterator[QueryBase]]) – The queries to send to the Ginkgo AI API. This can be a list or any iterable or an iterator
batch_size (int (default: 20)) – The size of the batches to send to the Ginkgo AI API.
timeout (float (optional)) – The maximum time to wait for one batch to complete, in seconds.
on_failed_queries (Literal["ignore", "warn", "raise"] = "ignore") – What to do if some queries fail. The default is to ignore the failures, they will be returned as part of the results and will carry the corresponding query_name. The user will have to check and handle the queries themselves. “warn” will print a warning if there are failed queries, “raise” will raise an exception if at least one query failed.
- exception ginkgo_ai_client.client.RequestError(cause: Exception, query: QueryBase | None = None, result_url: str | None = None)[source]¶
An exception raised by a request, due to the query content or a system error.
This exception carries the original query and the result url to enable users to better handle failure cases.
- Parameters:
cause (Exception) – The original exception that caused the request to fail.
query (QueryBase (optional)) – The query that failed. This enables users to retrieve and re-try the failed queries in a batch query
result_url (str (optional)) – The url where the result can be retrieved from. This enables users to get the result later if the failure cause was a temporary network error or an accidental timeout.
Mean embedding Queries¶
Used to get embedding vectors for protein or nucleotide sequences, using models such as ESM, Ginkgo-AA0, etc.
- class ginkgo_ai_client.queries.MeanEmbeddingQuery(*, sequence: str, model: str, query_name: str | None = None)[source]¶
A query to infer mean embeddings from a DNA or protein sequence.
- Parameters:
sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.
model (str) – The model to use for the inference.
query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.
- Returns:
client.send_request(query)
returns anEmbeddingResponse
with attributesembedding
(the mean embedding of the model’s last encoder layer) andquery_name
(the original query’s name).- Return type:
Examples
>>> query = MeanEmbeddingQuery("MLPP<mask>PPLM", model="ginkgo-aa0-650M") >>> client.send_request(query) EmbeddingResponse(embedding=[1.05, 0.002, ...])
Masked inference queries¶
Used to get maximum-likelihood predictions for masked protein or nucleotide sequences, using models such as ESM, Ginkgo-AA0, etc.
- class ginkgo_ai_client.queries.MaskedInferenceQuery(*, sequence: str, model: str, query_name: str | None = None)[source]¶
A query to infer masked tokens in a DNA or protein sequence.
- Parameters:
sequence (str) – The sequence to unmask. The sequence should be of the form “MLPP<mask>PPLM” with as many masks as desired.
model (str) – The model to use for the inference (only “ginkgo-aa0-650M” is supported for now).
query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.
- Returns:
client.send_request(query)
returns aSequenceResponse
with attributessequence` (the predicted sequence) and ``query_name
(the original query’s name).- Return type:
Promoter activity prediction queries¶
Used to predict the activity of promoters in various human tissues, using Borzoi and Ginkgo’s Promoter-0
- class ginkgo_ai_client.queries.PromoterActivityQuery(*, promoter_sequence: str, orf_sequence: str, tissue_of_interest: Dict[str, List[str]], source: str, inference_framework: Literal['promoter-0'] = 'promoter-0', borzoi_model: Literal['human-fold0'] = 'human-fold0', query_name: str | None = None)[source]¶
A query to infer the activity of a promoter in different tissues.
- Parameters:
promoter_sequence (str) – The promoter sequence. Only ATGCN characters are allowed.
orf_sequence (str) – The ORF sequence. Only ATGCN characters are allowed.
tissue_of_interest (Dict[str, List[str]]) – The tissues of interest, with the tracks representing each tissue, for instance {“heart”: [“CNhs10608+”, “CNhs10612+”], “liver”: [“CNhs10608+”, “CNhs10612+”]}.
query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.
inference_framework (Literal["promoter-0"] = "promoter-0") – The inference framework to use for the inference. Currently only supports
borzoi_model (Literal["human-fold0"] = "human-fold0") – The model to use for the inference. Currently only supports the trained model of “human-fold0”.
- Returns:
client.send_request(query)
returns aPromoterActivityResponse
with attributesactivity_by_tissue
(the activity of the promoter in each tissue) andquery_name
(the original query’s name).- Return type:
- class ginkgo_ai_client.queries.PromoterActivityResponse(*, activity_by_tissue: Dict[str, float], query_name: str | None = None)[source]¶
A response to a PromoterActivityQuery, with attributes activity (the predicted activity) and query_name (the original query’s name).
- activity_by_tissue¶
The activity of the promoter in each tissue.
- Type:
Dict[str, float]
- query_name¶
The name of the query. It will appear in the API response and can be used to handle exceptions.
- Type:
Optional[str] = None
Diffusion queries¶
Used to generate protein or nucleotide sequences using Ginkgo-devloped diffusion models LCDNA and AB-Diffusion.
- class ginkgo_ai_client.queries.DiffusionMaskedQuery(*, sequence: str, temperature: float = 0.5, decoding_order_strategy: str = 'entropy', unmaskings_per_step: int = 50, model: str, query_name: str | None = None)[source]¶
A query to perform masked sampling using a diffusion model.
- Parameters:
sequence (str) – Input sequence for masked sampling. The sequence may contain “<mask>” tokens.
temperature (float, optional (default=0.5)) – Sampling temperature, a value between 0 and 1.
decoding_order_strategy (str, optional (default="entropy")) – Strategy for decoding order, must be either “max_prob” or “entropy”.
unmaskings_per_step (int, optional (default=50)) – Number of tokens to unmask per step, an integer between 1 and 1000.
model (str) – The model to use for the inference.
query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.
- Returns:
client.send_request(query)
returns aDiffusionMaskedResponse
with attributessequence
(the predicted sequence) andquery_name
(the original query’s name).- Return type:
Examples
>>> query = DiffusionMaskedQuery( ... sequence="ATTG<mask>TAC", ... model="lcdna", ... temperature=0.7, ... decoding_order_strategy="entropy", ... unmaskings_per_step=20, ... ) >>> client.send_request(query) DiffusionMaskedResponse(sequence="ATTGCGTAC", query_name=None)
Boltz structure inference queries¶
Used to predict the 3D structure of a protein sequence using Boltz.
- class ginkgo_ai_client.queries.BoltzStructurePredictionQuery(*, sequences: List[Dict[Literal['protein', 'ligand'], _Protein | _CCD | _Smiles]], model: Literal['boltz'] = 'boltz', query_name: str | None = None)[source]¶
A query to predict the structure of a protein using the Boltz model.
This type of query is better constructed using the from_yaml_file or from_protein_sequence methods.
- Parameters:
sequences (List[Dict[Literal["protein", "ligand"], Union[_Protein, _CCD, _Smiles]]]) – The sequences to predict the structure for. Only protein sequences of size <1000aa are supported for now.
model (Literal["boltz"] = "boltz") – The model to use for the inference (only Boltz(1) is supported for now).
query_name (Optional[str] = None) – The name of the query. It will appear in the API response and can be used to handle exceptions.
Examples
query = BoltzStructurePredictionQuery.from_yaml_file("input.yaml") # or below: query = BoltzStructurePredictionQuery.from_protein_sequence("MLLKP") response = client.send_request(query) response.download_structure("structure.cif") # or below: response.download_structure("structure.pdb")
- class ginkgo_ai_client.queries.BoltzStructurePredictionResponse(*, cif_file_url: str, confidence_data: Dict[str, Any], query_name: str | None = None)[source]¶
A response to a BoltzStructurePredictionQuery
- cif_file_url¶
The URL of the cif file.
- Type:
str
- confidence_data¶
The confidence data.
- Type:
Dict[str, Any]
- query_name¶
The name of the query. It will appear in the API response and can be used to handle exceptions.
- Type:
Optional[str] = None
Examples
response = BoltzStructurePredictionResponse( cif_file_url="https://example.com/structure.cif", confidence_data={"confidence": 0.95}, query_name="my_query", ) response.download_structure("structure.cif") # or... response.download_structure("structure.pdb")