Installation¶
There are two supported ways to install IGEM, chosen by what you want to do with it:
You want to… |
Install via |
When |
|---|---|---|
Run analyses against a remote knowledge graph |
pip |
Daily analyst workflow, laptop, notebook, CI |
Run a self-contained, reproducible IGEM environment |
image |
HPC, network-restricted hosts, frozen-version reproducibility |
Both paths give you the same igem Python API and CLI. The
difference is only where the knowledge graph lives — on a remote
server you point at, or inside a container you run locally.
Option 1 — pip¶
Install¶
python -m venv .venv
source .venv/bin/activate
pip install igem
Confirm the install:
igem --version
# igem, version 2.1.0
Connect to the GE knowledge graph¶
The IGEM client is purpose-built for the analyst side of the
pipeline: it is intentionally lightweight and does not manage
knowledge data itself. Knowledge curation — ETL pipelines, entity
normalisation (NLP), smart queries, and the catalog of ingested data
— lives in the companion igem-server package, which is
typically deployed once per institution and shared.
Tip
Many of IGEM’s analytical functions do not require a server
connection. Local-only operations include data loading, QC via
describe and modify, single-feature association tests
(gwas / ewas), interaction tests (lrt), multi-test correction
(Bonferroni / FDR), and visualisation. Connecting to a server is
only required for knowledge-graph reports
(gene_annotations, pathway_annotations, etc.) and for the
filter-then-test workflows that consume them.
The Hall Lab maintains a public endpoint that anyone can use to explore the knowledge graph:
https://geneexposure.org/api
Point your client at it with igem config:
igem config set server-url https://geneexposure.org/api
# set server-url = https://geneexposure.org/api
# → /your/cwd/.igem.toml
By default this writes a project-scoped ./.igem.toml in the current
directory. To set it user-globally (used from any directory), pass
--global:
igem config set --global server-url https://geneexposure.org/api
# → ~/.igem.toml
The resulting file is plain TOML:
[client]
server_url = "https://geneexposure.org/api"
Manage it through the CLI or by hand:
igem config show # print the merged local + home config
igem config get server-url # print just the resolved value
igem config unset server-url
Resolution order is environment variable → local ./.igem.toml
(walking up from cwd) → home ~/.igem.toml — so a one-off
IGEM_URL=… env var always wins for a single command.
Verify¶
igem health
# status: ok
If the call returns ok, the client is wired up correctly. From here
the Quickstart walks through running your first
report.
Hosting your own server¶
Standing up an igem-server instance — Postgres backend, ETL
pipelines, snapshot generation — is documented separately in
Operations → Server setup. For most
analysts the public endpoint above is sufficient and no local
server is needed.
Option 2 — Docker image¶
For HPC environments, network-restricted hosts, or any scenario where
you want a frozen, reproducible IGEM stack, the project ships a
single container image on GitHub Container Registry that bundles the
client, the server (running in-process), and the full scientific
Python stack. The container reads the knowledge graph from a
Parquet snapshot mounted at /snapshot, with no external
database connection involved.
Pull the image¶
docker pull ghcr.io/halllab/igem:latest
The image is public — no authentication required. Equivalent Apptainer pull on HPC nodes:
apptainer pull igem.sif docker://ghcr.io/halllab/igem:latest
Note
The :latest tag tracks the most recent stable release and is the
right choice for everyday use. For scientific reproducibility —
papers, regulatory submissions, re-analyses — pin a specific tag
instead (for example ghcr.io/halllab/igem:1.0.0). All published
tags are listed at
https://github.com/HallLab/IGEM/pkgs/container/igem. The container
records the embedded client and server versions as image labels;
inspect them with:
docker inspect ghcr.io/halllab/igem:latest \
--format '{{json .Config.Labels}}'
Download a knowledge graph snapshot¶
The container ships with the igem-server db snapshot-download
command, which fetches a versioned Parquet snapshot, verifies every
file against a sha256 manifest, and writes them to a directory you
choose:
mkdir -p $HOME/igem-snapshot
docker run --rm \
-v $HOME/igem-snapshot:/work \
ghcr.io/halllab/igem:latest \
igem-server db snapshot-download --output /work
The default URL is https://geneexposure.org/downloads/latest/. To
pin a specific version for reproducibility, pass --url:
docker run --rm -v $HOME/igem-snapshot:/work \
ghcr.io/halllab/igem:latest \
igem-server db snapshot-download \
--url https://geneexposure.org/downloads/2026-04-25/ \
--output /work
Useful flags:
--workers N— concurrent downloads (default 4).--include-nlp— also fetch the NLP automaton cache (~3.5 GB, saves ~70s on first NLP query).--overwrite— force re-download of every file. Without this flag, files whose sha256 already matches the manifest are skipped (cached), so re-running the command is safe and cheap.
Run a query¶
With the snapshot in place, mount it read-only and run any IGEM command:
docker run --rm \
-v $HOME/igem-snapshot:/snapshot:ro \
-v $(pwd):/work -w /work \
ghcr.io/halllab/igem:latest \
igem report run --name gene_annotations \
--input BRCA1 --input TP53 --input MYC \
--columns gene_symbol,entrez_id,chromosome,gene_locus_type
The container’s entrypoint validates the snapshot, starts the server
in-process, and routes the client to it via
IGEM_URL=embedded:///snapshot — no manual configuration required.
HPC and cloud platforms¶
For LSF / SLURM clusters, Apptainer-based execution, job submission templates, and integration notes for Anvil, DNAnexus, and All of Us, see Cookbook → Container and HPC workflows. The container is the same; only the runtime (Docker vs Apptainer) and the job scheduler change.
That page also collects ready-to-copy recipes for the three common execution patterns: interactive shell, scripted runs, and hybrid setups that combine the container’s Python stack with the public remote knowledge graph.
Troubleshooting¶
igem: command not found — the install succeeded but the entry
point is not on PATH. Activate the virtual environment, or invoke
with python -m igem.api.cli.main --version.
SSL or proxy errors on igem health — your network blocks
outbound HTTPS to geneexposure.org. Either route through your
institution’s proxy (HTTPS_PROXY=…), or move to the Docker image
above with a downloaded snapshot — once the snapshot is local, no
further network access is required.
status: not ok from igem health — the server is reachable
but reports an internal problem. Re-run with --debug to see the
raw response and contact the maintainers if it persists.
Docker pull fails with manifest unknown — the tag does not
exist. List available tags at
https://github.com/HallLab/IGEM/pkgs/container/igem.
Container runs but manifest.json missing — the snapshot bind
mount is wrong. Confirm the host path contains a manifest.json
file and that the bind syntax is --volume HOST:/snapshot:ro (note
the colon between source and destination, and :ro for read-only).