Interactive scientific database for Wnt signaling pathway research. Ungureanu Lab · University of Oulu
WntHub is a panel-based interactive platform for exploring Wnt signaling pathway genes across genomic, transcriptomic, epigenomic, proteomic, and clinical dimensions. It integrates data from GTEx, Human Protein Atlas, TCGA, CELLxGENE, Tahoe100m, ENCODE, ChIP-Atlas, LINCS L1000, and more into a unified sidebar + tabbed interface with interactive Plotly.js charts, an IGV.js genome browser, D3 force-directed co-expression networks, and an AI-powered research assistant grounded in 23K+ Wnt publications.
The platform covers 91 Wnt-pathway genes, organised into eight functional categories: ligands (all 19 WNTs), Frizzled receptors (FZD1–10), co-receptors (LRP5/6) and non-canonical receptors (ROR1/2, RYK), the R-spondin axis (RSPO3, LGR4/5/6, RNF43, ZNRF3), secreted antagonists (DKK1–4, SFRP1/2/4/5, FRZB, WIF1, SOST, KREMEN1/2, NOTUM) and secretion machinery (PORCN, WLS), the destruction complex (APC, AXIN1/2, GSK3B, CSNK1A1) with cytoplasmic transducers (CTNNB1, DVL1–3, CSNK1E) and feedback inhibitors (NKD1/2, FRAT1/2), the planar-cell-polarity / Rho axis (VANGL1/2, PRICKLE1/2, CELSR1–3, DAAM1/2, RHOA, RAC1, CDC42), and the transcription layer (TCF7, TCF7L1/2, LEF1, NLK, TLE1, CTNNBIP1) with canonical target genes (MYC, CCND1).
Gene Identity Card (full name, genomic coordinates, aliases, NCBI summary, cross-reference IDs: Entrez, HGNC, Ensembl, UniProt, RefSeq, Pfam, PDB) plus AI-generated summaries across 5 dimensions: Genomic, Expression, Pathway, Functional, Clinical.
Good for: Quick overview of any gene's role, function, disease associations, expression patterns, and database identifiers.
IGV.js genome browser (hg38) with an expandable track catalog:
Good for: Gene structure, exon/intron layout, clinical variants, regulatory landscape, TF binding, epigenomics, conservation.
Kitchen-sink Plotly.js charts across 6 data sources:
Good for: Tissue expression patterns, cancer expression, cell line data, survival analysis, immunotherapy survival, cross-database comparison, multi-gene analysis.
Protein-level data across 5 sources with interactive 3D structure viewer:
Good for: Protein expression across tissues and cancers, post-translational modifications, kinase-substrate relationships, 3D structure visualization with PTM overlays.
Good for: Cell-type-specific expression, which cell types express a gene, tumor microenvironment expression.
Good for: Co-expression partners, tissue-specific networks, pathway enrichment, functional associations.
Two complementary perturbation datasets accessible via sub-tab toggle, identifying drugs and genetic manipulations that significantly alter expression of Wnt pathway genes.
LINCS L1000 — Compound Perturbations
LINCS L1000 — Genetic Perturbations (CRISPR, shRNA, Overexpression)
72 of 91 genes are detected as L1000 readout targets (View 1) and 72 of 91 are used as L1000 perturbagens (View 2). Not detected as readout targets: AXIN2, KREMEN1, LGR6, NKD1, NKD2, NOTUM, PRICKLE1, PRICKLE2, RSPO3, SFRP2, SOST, VANGL2, WNT10A, WNT3A, WNT7B, WNT8A, WNT9A, WNT9B, ZNRF3.
Tahoe-100M — Single-Cell Perturbations
Good for: Drug discovery, target validation, single-cell perturbation responses, identifying compounds that modulate Wnt gene expression, cross-referencing bulk (LINCS) and single-cell (Tahoe) evidence.
Dynamic links to 30+ external databases (GeneCards, UniProt, NCBI, Ensembl, KEGG, Reactome, ClinVar, OMIM, etc.).
Good for: Finding a gene in other databases, jumping to external resources.
RAG-powered chatbot grounded in 23,323 Wnt pathway publications (31,773 text chunks). Accessible via the floating widget from any tab. Supports multi-turn conversation with history, markdown rendering, and adjustable temperature.
The AI assistant can optionally include structured experimental data from the site alongside literature context. Selectable data sections:
www CNAME, Let’s Encrypt cert auto-provisioned, Force HTTPS enabled. The auto-generated aesthetic-cascaron-f8669f.netlify.app URL keeps working — Netlify auto-301-redirects it to wnthub.org — so any grant-application links already circulating remain valid. Pre-submission inquiry letter (NAR Database Issue) updated to the live URL.WNT_GENES, WNT_SET, WNT_SET_HASH, _wntSetHash, wnt_gene/wnt_adj/wnt_edges/wnt_df local variables, load_wnt_expression(), etc.) into neutral names (GENES, GENE_SET, GENE_SET_HASH, _geneSetHash, pathway_gene, gene_adj, ...) across 27 engine scripts. WNT_GENES kept as a one-line alias so site-specific code (the AI-summary toolchain) keeps working unchanged. Future cherry-picks between WntHub and CABase now auto-merge instead of needing manual conflict resolution.scripts/pipelines/config.py now reads site-config.json at import and re-exports SITE_NAME, GENE_SET_LABEL, GENE_SET_FULL, DATA_SUFFIX — engine code never needs the literal “WntHub” / “Wnt” / “Wnts” strings hardcoded.scripts/build_site_stats.py + scripts/render_site_text.py) compute every data-derived number on the site (gene count, GTEx subtissues, TCGA KM curves, PRECOG records, iPTMnet sites, CELLxGENE counts, LINCS records, Tahoe drug-perturbation counts, RAG corpus size, ...) and substitute them into HTML/JS via declarative <span data-stat="key"> markers. Wired into master_rebuild_all.sh as Step 18, after data pipelines and before the _site/ rsync. Manual overrides via site_stats_overrides.json for values that can’t be derived from on-disk data (e.g. RAG corpus counts that live in the Neon Postgres DB). Eliminates the stale-number drift that has historically dogged about/index page edits.scripts/pipelines/master_rebuild_all.sh were executed against the expanded set with cache-aware skip logic, so existing per-gene API calls (ChIP-Atlas, iPTMnet, ProteomicsDB) and per-gene LINCS GCTX extractions were re-used; only the 45 new genes hit the external services. Co-expression network JSONs now embed a _wntSetHash fingerprint and self-invalidate when the gene set changes.build_survival_json.py, build_network_json.py, gtex/06_build_site_correlations.py) consolidated to import from a single source (scripts/pipelines/config.py); the duplicate gtex/04_network_json.py was reduced to a deprecation stub. LINCS pipelines gained per-gene skip-if-exists logic with --force override, with cached per-gene files folded into the combined output so subsequent runs touch only new genes.rna_cancer.tsv.zip endpoint; pipeline definitions in hpa/01_retrieve_and_filter.py updated accordingly. The DepMap portal is currently 403’ing public file downloads; CCLE expression now resolves through a local snapshot symlinked into data/_raw/DepMap/.scripts/pipelines/gtex/08_build_true_gtex_expression.py (GTEx V10 parquet → per-subtissue median TPM → nTPM, ≥20 samples/subtissue, 55 subtissues)docs/manuscript-data-table.2026-04-14.xlsx supersedes the 2026-03-23 version. GTEx row reclassified as direct/in-house; added rows for Tahoe-100M perturbations, ENCODE/UCSC regulatory tracks, and HPA subcellular localization; LINCS L1000 split into three rows (compounds, "what affects gene", "what gene affects"); LINCS compound count corrected to 12,735 filtered uniquesdocs/data-generation-report.md — per-plot/per-table provenance covering every panel in every sidebar tab, from upstream database through pipeline script to rendered valuescripts/pipelines/precog/01_extract_survival_zscores.pyscripts/pipelines/lincs/01_extract_perturbations.py) extracts per-gene perturbation data at |modz| ≥ 3.0 from Level 5 GCTX files (401K perturbations, 12,735 compounds, 702 MOA classes). Genetic pipeline (scripts/pipelines/lincs/02_extract_genetic_perturbations.py) extracts CRISPR/shRNA/overexpression data with the same threshold[dev] publish = "." to netlify.toml — local dev serves from project root, no rsync needed. Production deploys via netlify deploy --prod (build is automatic)data/TCGA/ (survival + expression). All paths and pipelines updatedWntHub integrates data from:
Developed at the University of Oulu, Precision Oncology group (Ungureanu Lab). Contact: harlan[dot]barker[at]oulu.fi