{ "cells": [ { "cell_type": "markdown", "id": "4d4d22ec", "metadata": {}, "source": [ "## Running CellART on Xenium colorectal cancer dataset" ] }, { "cell_type": "markdown", "id": "9f0ca67c", "metadata": {}, "source": [ "### Download data" ] }, { "cell_type": "markdown", "id": "54de16b5", "metadata": {}, "source": [ "The Xenium colorectal cancer dataset can be obtained from the 10x Genomics website [here](https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc), with name “Xenium In Situ, Sample P2 CRC”. Below is a demo script for create new data dir and download the required Xenium files. " ] }, { "cell_type": "code", "execution_count": null, "id": "db845d0d", "metadata": {}, "outputs": [], "source": [ "mkdir ./xenium_crc\n", "cd ./xenium_crc\n", "\n", "# Download Xenium colorectal cancer dataset files\n", "curl -O https://cf.10xgenomics.com/samples/xenium/2.0.0/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE_gene_panel.json\n", "curl -O https://cf.10xgenomics.com/samples/xenium/2.0.0/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE_he_image.ome.tif\n", "curl -O https://cf.10xgenomics.com/samples/xenium/2.0.0/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE_he_imagealignment.csv\n", "curl -O https://cf.10xgenomics.com/samples/xenium/2.0.0/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE_analysis_summary.html\n", "curl -O https://cf.10xgenomics.com/samples/xenium/2.0.0/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE/Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE_outs.zip\n", "\n", "# Unzip files\n", "unzip Xenium_V1_Human_Colon_Cancer_P2_CRC_Add_on_FFPE_outs.zip\n", "\n", "# Back to root dir\n", "cd .." ] }, { "cell_type": "markdown", "id": "04365484", "metadata": {}, "source": [ "The paired scRNA reference after selecting patient 2 can be download [here](https://drive.google.com/file/d/1kzNZq7h4V-JyaBcjJ1Kcz-JSlLFNAQrY/view?usp=drive_link). Please also download the reference file adata_sc_p2.h5ad into the data directory. Now you have prepared all the raw data to run CellART." ] }, { "cell_type": "markdown", "id": "2dce3b68", "metadata": {}, "source": [ "### Preprocess" ] }, { "cell_type": "code", "execution_count": null, "id": "be066ac3", "metadata": {}, "outputs": [], "source": [ "from cellart.utils.preprocess import SingleCellPreprocessor, XeniumPreprocessor\n", "from cellart.utils.io import load_list\n", "import scanpy as sc\n", "\n", "# Processed data save dir\n", "save_dir = './preprocessed_xenium_crc/'\n", "# Transcripts and nucleus boundary files in data directory\n", "transcripts_file = \"./xenium_crc/transcripts.parquet\"\n", "nucleus_boundary_10X = \"./xenium_crc/nucleus_boundaries.parquet\"\n", "\n", "st_preprocessor = XeniumPreprocessor(transcripts_file, nucleus_boundary_10X, save_dir)\n", "\n", "# Annotated scRNA reference path\n", "sc_adata = sc.read(\"./xenium_crc/adata_sc_p2.h5ad\")\n", "# Remember to specific your celltype_col and make sure your are using raw count data\n", "sc_preprocessor = SingleCellPreprocessor(sc_adata, celltype_col = \"celltype\", save_path= save_dir, st_gene_list=load_list(save_dir + \"/st_gene_list.txt\"))\n", "\n", "sc_preprocessor.preprocess()\n", "\n", "st_preprocessor.prepare_sst(load_list(save_dir + \"/filtered_gene_names.txt\"))\n", "st_preprocessor.get_nuclei_segmentation()" ] }, { "cell_type": "markdown", "id": "0811936b", "metadata": {}, "source": [ "Now in the preprocessed_crc directory, you can see all the preprocessed files. You can check the spatial and segmentation files to see if their are matched." ] }, { "cell_type": "code", "execution_count": null, "id": "2b6d5ad9", "metadata": {}, "outputs": [], "source": [ "# Check\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "gene_map = np.load(save_dir + \"/gene_map.npy\")\n", "segmentation_mask = np.load(save_dir + \"/segmentation_mask.npy\")\n", "\n", "gene_map_sum = gene_map.sum(axis=-1)" ] }, { "cell_type": "code", "execution_count": null, "id": "6ca217bd", "metadata": {}, "outputs": [], "source": [ "# plt.imshow(gene_map_sum)\n", "# plt.imshow(segmentation_mask > 0)\n", "fig, ax = plt.subplots(1,2, figsize=(12,5))\n", "ax[0].imshow(gene_map_sum)\n", "ax[0].set_title(\"Gene expression map sum\")\n", "ax[1].imshow(segmentation_mask > 0)\n", "ax[1].set_title(\"Nuclei segmentation mask\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "50d25b94", "metadata": {}, "source": [ "### Running CellART" ] }, { "cell_type": "markdown", "id": "38701821", "metadata": {}, "source": [ "NOTE: these part code make takes hours to run, so it is highly recommend you not to directly run in the notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "f9f99148", "metadata": {}, "outputs": [], "source": [ "import cellart\n", "from pathlib import Path\n", "import wandb\n", "import os\n", "\n", "# Preprocessed data\n", "save_dir = './preprocessed_xenium_crc/'\n", "# Directory to store all results\n", "log_dir = \"./results_xenium_crc/\"\n", "\n", "manager = cellart.ExperimentManager(\n", " # Basic input data settings (must be specified)\n", " gene_map=os.path.join(save_dir, \"gene_map.npy\"),\n", " nuclei_mask=os.path.join(save_dir, \"segmentation_mask.npy\"),\n", " basis=os.path.join(save_dir, \"basis.npy\"),\n", " gene_names=os.path.join(save_dir, \"filtered_gene_names.txt\"),\n", " celltype_names=os.path.join(save_dir, \"celltype_names.txt\"),\n", " log_dir=log_dir,\n", "\n", " # Training parameters (adjust based on convergence and wandb visualization)\n", " epochs=200, \n", " seg_training_epochs=10,\n", " deconv_warmup_epochs=100,\n", "\n", " pred_period=50,\n", " gpu=\"0\"\n", ")\n", "\n", "# Update options\n", "opt = manager.get_opt()\n", "print(opt)" ] }, { "cell_type": "code", "execution_count": null, "id": "b6918838", "metadata": {}, "outputs": [], "source": [ "# Set up wandb for logging and visualization\n", "run = wandb.init(project=\"CellART\", dir=manager.get_log_dir(), config=opt,\n", " name=os.path.basename(os.path.normpath(manager.get_log_dir())))" ] }, { "cell_type": "code", "execution_count": null, "id": "f1ca389c", "metadata": {}, "outputs": [], "source": [ "# Set up dataset\n", "dataset = cellart.SSTDataset(manager)\n", "gene_map_shape = dataset.gene_map.shape\n", "\n", "# Initialize and train the CellART model\n", "model = cellart.CellARTModel(manager, gene_map_shape, len(dataset.coords_starts))\n", "model.train_model(dataset)" ] }, { "cell_type": "markdown", "id": "ffe61514", "metadata": {}, "source": [ "### Check the output of CellART" ] }, { "cell_type": "code", "execution_count": 180, "id": "72b25c73", "metadata": {}, "outputs": [], "source": [ "# Load annotated adata at epoch 200\n", "adata = sc.read(os.path.join(\"./results_xenium_crc/\", \"epoch_200\", \"cell_deconv.h5ad\"))\n", "# Load segmentation\n", "segmentation_mask = np.load(os.path.join(\"./results_xenium_crc/\", \"new_segmentation_mask.npy\")).astype(\"int32\")" ] }, { "cell_type": "code", "execution_count": 181, "id": "32879f09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | x | \n", "y | \n", "celltype | \n", "
|---|---|---|---|
| cell_id | \n", "\n", " | \n", " | \n", " |
| 16712 | \n", "383 | \n", "1231 | \n", "Tumor III | \n", "
| 16713 | \n", "390 | \n", "1254 | \n", "Tumor III | \n", "
| 16714 | \n", "385 | \n", "1249 | \n", "Tumor III | \n", "
| 16715 | \n", "377 | \n", "1244 | \n", "Tumor III | \n", "
| 16716 | \n", "384 | \n", "1240 | \n", "Tumor III | \n", "