Report catalogue¶
The IGEM server ships a registry of curated annotation reports that resolve identifiers from external biological databases against the IGEM knowledge graph. This page is the per-report reference: one section per registered report, with parameters, output columns, typical inputs, and gotchas.
For how to call reports — list / explain / run, typed
helpers, the ReportResult API, and the igem report CLI — see
Reporting data. This page assumes you already
know the mechanics and just want to look up what each report does.
Tip
The catalogue below mirrors the markdown returned by
igem.report.explain(name). Calling explain from a notebook is
often the fastest way to confirm a parameter or a column name
without leaving your session.
Note
The catalogue grows over time. As new reports are added on the
server, this page is updated to keep parity with igem.report.list()
on a current snapshot. If list() returns a name not described
here, run explain(name) for the canonical contract — the server is
authoritative.
Currently five reports are registered. They share a uniform input / output contract, summarised once below to avoid repeating it in every section.
gene_annotations¶
Master annotation report for human genes. Accepts a list of gene identifiers (symbols, HGNC IDs, Ensembl IDs, Entrez IDs, or any registered alias) and returns one row per matched entity with consolidated cross-references, locus classification, genomic coordinates, and a relationship summary.
Parameters¶
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(all genes) |
Symbols, HGNC IDs, Ensembl IDs, Entrez IDs, or aliases. |
|
|
|
Genome assembly name used for the coordinate columns. |
Identifiers are normalised case-insensitively and matched against
all registered aliases. An input with no match emits a row with
status = "not_found".
Report-specific output columns¶
Column |
Description |
|---|---|
|
Approved HGNC gene symbol. |
|
HGNC identifier (e.g. |
|
Ensembl gene ID (e.g. |
|
NCBI Entrez Gene ID. |
|
HGNC approval status. |
|
HGNC locus group (e.g. |
|
HGNC locus type (e.g. |
|
Semicolon-separated HGNC gene family / group names. |
|
Genome assembly name for the coordinate columns. |
|
Chromosome number (1–22, 23=X, 24=Y, 25=MT). |
|
Genomic start (1-based). |
|
Genomic end (1-based). |
|
|
Example¶
with IGEM() as igem:
result = igem.report.gene_annotations(
input_values=["TP53", "BRCA1", "EGFR"],
assembly="GRCh38.p14",
)
result.df[["gene_symbol", "hgnc_id", "chromosome", "start_position"]]
Notes¶
RegressionResults.annotate(igem)calls this report under the hood to enrich association results — see Analyzing data.The
assemblyparameter only affects the coordinate columns; the HGNC / Ensembl / Entrez identifiers are assembly-agnostic.
disease_annotations¶
Master annotation report for diseases. Accepts a list of disease identifiers (MONDO IDs, OMIM IDs, MeSH IDs, ICD-10 codes, Orphanet IDs, names, or any registered alias) and returns one row per matched entity with cross-references, disease group memberships, and a relationship summary. Passing an empty input list returns all diseases.
Parameters¶
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(all diseases) |
Disease identifiers or names to look up. |
|
|
|
Restrict the output to a named disease group (e.g. |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
Report-specific output columns¶
Column |
Description |
|---|---|
|
Source-native primary disease ID (typically MONDO). |
|
Canonical disease label. |
|
Human-readable description. |
|
MONDO cross-reference. |
|
OMIM cross-reference. |
|
MeSH cross-reference. |
|
ICD-10 cross-reference. |
|
Orphanet cross-reference. |
|
Semicolon-separated disease group memberships. |
|
Number of parent diseases in the ontology. |
|
Number of child diseases in the ontology. |
Example¶
with IGEM() as igem:
result = igem.report.disease_annotations(
input_values=["MONDO:0005301", "OMIM:104300", "multiple sclerosis"],
)
result.df[["disease_id", "label", "mondo_id", "icd10"]]
Notes¶
Mixed identifier types in a single call are supported — the matcher resolves each input independently against the alias index.
group_filteris most useful in all-mode (noinput_values) for slicing the catalogue by therapeutic area.
go_annotations¶
Master annotation report for Gene Ontology terms. Accepts GO
identifiers (GO:xxxxxxx), term names, synonyms, or any registered
alias. Returns one row per matched term with namespace, name, and
relationship summary. Supports filtering by namespace.
Parameters¶
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(all GO terms) |
GO IDs, term names, or aliases. |
|
|
|
Restrict to a single GO namespace. |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
Namespace codes:
Code |
Label |
|---|---|
|
Biological Process |
|
Molecular Function |
|
Cellular Component |
Report-specific output columns¶
Column |
Description |
|---|---|
|
GO identifier (e.g. |
|
GO term name. |
|
|
|
Human-readable namespace (e.g. |
|
Number of parent terms in the GO DAG. |
|
Number of child terms in the GO DAG. |
Example¶
with IGEM() as igem:
result = igem.report.go_annotations(
input_values=["GO:0007049", "GO:0006281", "cell cycle"],
namespace="BP",
)
result.df[["go_id", "go_name", "namespace_label"]]
Notes¶
Combining
namespace="BP"with an explicitinput_valueslist filters the output — terms outside BP simply emitnot_found.The relationship summary is most informative when querying broad terms (large
entity_relationships_by_grouptotals).
pathway_annotations¶
Master annotation report for biological pathways. Accepts Reactome IDs, KEGG IDs, pathway names, or any registered alias. Returns one row per matched pathway with source metadata and relationship summary.
Parameters¶
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(all pathways) |
Reactome IDs, KEGG IDs, names, or aliases. |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
Report-specific output columns¶
Column |
Description |
|---|---|
|
Source-native ID (e.g. |
|
Human-readable pathway name. |
|
Database of origin ( |
|
Species (e.g. |
|
Source system name. |
|
Data source name (e.g. |
Example¶
with IGEM() as igem:
result = igem.report.pathway_annotations(
input_values=["R-HSA-109581", "hsa04110", "Cell Cycle"],
)
result.df[["pathway_id", "pathway_name", "source_db",
"entity_relationships_by_group"]]
Notes¶
Reactome IDs and KEGG IDs share a single
pathway_idcolumn — disambiguate withsource_dbif needed.Querying by name (
"Cell Cycle") is exact-match against the alias index; close matches without an exact hit returnnot_found.
protein_annotations¶
Master annotation report for human proteins. Accepts UniProt accessions, protein names, gene symbols, or any registered alias. Returns one row per matched entity with consolidated UniProt cross-references, function / location notes, Pfam domain summary, and relationship summary. Isoform inputs are resolved and annotated with their canonical counterpart.
Parameters¶
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(all proteins) |
UniProt accessions, names, gene symbols, or aliases. |
|
|
|
Populate |
|
|
|
Also populate |
|
|
|
Cap the number of Pfam accessions per type when |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
|
|
|
See shared contract. |
Report-specific output columns¶
Column |
Description |
|---|---|
|
Entity ID of the canonical (non-isoform) entry. |
|
Internal |
|
UniProt accession (e.g. |
|
|
|
Isoform-specific accession (e.g. |
|
Total number of isoforms registered for this protein master. |
|
Functional description from UniProt (truncated to 512 chars). |
|
Subcellular location(s). |
|
Tissue expression notes. |
|
Source system name (e.g. |
|
Data source name (e.g. |
|
Total Pfam domains linked to this protein master. |
|
Semicolon-separated |
|
Pfam accessions per type (only when |
Example¶
with IGEM() as igem:
result = igem.report.protein_annotations(
input_values=["P04637", "P00533", "P04637-2"],
include_pfam_details=True,
max_pfam_ids_per_type=5,
)
result.df[[
"protein_id", "input_is_isoform", "input_isoform_accession",
"canonical_entity_id", "pfam_total_count",
]]
Notes¶
Isoform queries (
P04637-2) resolve to the sameprotein_master_idas the canonical accession (P04637); useinput_is_isoformto tell them apart.Pfam details can inflate the response considerably for proteins with many domains — prefer the summary (
include_pfam_summary=True,include_pfam_details=False) unless you specifically need the accessions.Querying by gene symbol returns the protein product, not the gene — use
gene_annotationsfor the gene.
See also¶
Reporting data — how to call the reports (Python API and CLI),
ReportResultinterface, and integration withanalyze.annotate.Analyzing data —
RegressionResults.annotatefor joininggene_annotationscolumns into an EWAS / GWAS result.Cookbook → Custom report end-to-end — adding a new report on the server side.