{ "cells": [ { "cell_type": "markdown", "id": "4d4d22ec", "metadata": {}, "source": [ "## Running CellART on VisiumHD colorectal cancer dataset" ] }, { "cell_type": "markdown", "id": "9f0ca67c", "metadata": {}, "source": [ "### Download data" ] }, { "cell_type": "markdown", "id": "54de16b5", "metadata": {}, "source": [ "The VisiumHD colorectal cancer dataset can be obtained from the 10x Genomics website [here](https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc), with name “Visium HD, Sample P2 CRC”. Below is a demo script for create new data dir and download the required VisiumHD files. " ] }, { "cell_type": "code", "execution_count": null, "id": "db845d0d", "metadata": {}, "outputs": [], "source": [ "mkdir ./visiumhd_crc\n", "cd ./visiumhd_crc\n", "\n", "curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_tissue_image.btf\n", "curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_alignment_file.json\n", "curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_binned_outputs.tar.gz\n", "curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_spatial.tar.gz\n", "\n", "# Unzip files\n", "tar -xzvf Visium_HD_Human_Colon_Cancer_P2_binned_outputs.tar.gz\n", "tar -xzvf Visium_HD_Human_Colon_Cancer_P2_spatial.tar.gz\n", "\n", "# Back to root dir\n", "cd .." ] }, { "cell_type": "markdown", "id": "04365484", "metadata": {}, "source": [ "After unzip the file, you will get binned_outputs and spatial directory. The paired scRNA reference after selecting patient 2 can be download [here](https://drive.google.com/file/d/1kzNZq7h4V-JyaBcjJ1Kcz-JSlLFNAQrY/view?usp=drive_link). Please also download the reference file adata_sc_p2.h5ad into the data directory. Now you have prepared all the raw data to run CellART." ] }, { "cell_type": "markdown", "id": "2dce3b68", "metadata": {}, "source": [ "### Preprocess" ] }, { "cell_type": "code", "execution_count": null, "id": "be066ac3", "metadata": {}, "outputs": [], "source": [ "import os\n", "# Modify the max image pixels limit for large images\n", "os.environ[\"OPENCV_IO_MAX_IMAGE_PIXELS\"] = pow(2,40).__str__()\n", "import cv2\n", "from cellart.utils.preprocess import SingleCellPreprocessor, VisiumHDPreprocessor\n", "from cellart.utils.io import load_list\n", "import scanpy as sc\n", "\n", "# Processed data save dir\n", "save_dir = './preprocessed_visiumhd_crc/'\n", "# Path to 002um spot data\n", "path = \"./visiumhd_crc/binned_outputs/square_002um/\"\n", "# Path to he\n", "source_image_path = \"./visiumhd_crc/Visium_HD_Human_Colon_Cancer_P2_tissue_image.btf\"\n", "# Path to spatial dir\n", "spaceranger_image_path = \"./visiumhd_crc/spatial/\"\n", "\n", "st_preprocessor = VisiumHDPreprocessor(path, source_image_path, spaceranger_image_path, save_dir)\n", "st_preprocessor.get_nuclei_segmentation()\n", "sc_adata = sc.read(\"./visiumhd_crc/adata_sc_p2.h5ad\")\n", "sc_preprocessor = SingleCellPreprocessor(sc_adata, celltype_col = \"celltype\", save_path= save_dir, st_gene_list=load_list(save_dir + \"/st_gene_list.txt\"))\n", "sc_preprocessor.preprocess(hvg_method=\"seurat_v3\", n_hvg=3000)\n", "st_preprocessor.prepare_sst(load_list(save_dir + \"/filtered_gene_names.txt\"))" ] }, { "cell_type": "markdown", "id": "0811936b", "metadata": {}, "source": [ "Now in the preprocessed_crc directory, you can see all the preprocessed files. You can check the spatial and segmentation files to see if their are matched." ] }, { "cell_type": "code", "execution_count": null, "id": "2b6d5ad9", "metadata": {}, "outputs": [], "source": [ "# Check\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "gene_map = np.load(save_dir + \"/gene_map.npy\")\n", "segmentation_mask = np.load(save_dir + \"/segmentation_mask.npy\")\n", "\n", "gene_map_sum = gene_map.sum(axis=-1)" ] }, { "cell_type": "code", "execution_count": null, "id": "6ca217bd", "metadata": {}, "outputs": [], "source": [ "# plt.imshow(gene_map_sum)\n", "# plt.imshow(segmentation_mask > 0)\n", "fig, ax = plt.subplots(1,2, figsize=(12,5))\n", "ax[0].imshow(gene_map_sum)\n", "ax[0].set_title(\"Gene expression map sum\")\n", "ax[1].imshow(segmentation_mask > 0)\n", "ax[1].set_title(\"Nuclei segmentation mask\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "50d25b94", "metadata": {}, "source": [ "### Running CellART" ] }, { "cell_type": "markdown", "id": "38701821", "metadata": {}, "source": [ "NOTE: these part code make takes hours to run, so it is highly recommend you not to directly run in the notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "f9f99148", "metadata": {}, "outputs": [], "source": [ "import cellart\n", "from pathlib import Path\n", "import wandb\n", "import os\n", "\n", "# Preprocessed data\n", "save_dir = './preprocessed_visiumhd_crc/'\n", "# Directory to store all results\n", "log_dir = \"./results_visiumhd_crc/\"\n", "\n", "manager = cellart.ExperimentManager(\n", " # Basic input data settings (must be specified)\n", " gene_map=os.path.join(save_dir, \"gene_map.npy\"),\n", " nuclei_mask=os.path.join(save_dir, \"segmentation_mask.npy\"),\n", " basis=os.path.join(save_dir, \"basis.npy\"),\n", " gene_names=os.path.join(save_dir, \"filtered_gene_names.txt\"),\n", " celltype_names=os.path.join(save_dir, \"celltype_names.txt\"),\n", " log_dir=log_dir,\n", "\n", " # Training parameters (adjust based on convergence and wandb visualization)\n", " epochs=400, \n", " seg_training_epochs=15,\n", " deconv_warmup_epochs=200,\n", "\n", " pred_period=50,\n", " gpu=\"0\"\n", ")\n", "\n", "# Update options\n", "opt = manager.get_opt()\n", "print(opt)" ] }, { "cell_type": "code", "execution_count": null, "id": "b6918838", "metadata": {}, "outputs": [], "source": [ "# Set up wandb for logging and visualization\n", "run = wandb.init(project=\"CellART\", dir=manager.get_log_dir(), config=opt,\n", " name=os.path.basename(os.path.normpath(manager.get_log_dir())))" ] }, { "cell_type": "code", "execution_count": null, "id": "f1ca389c", "metadata": {}, "outputs": [], "source": [ "# Set up dataset\n", "dataset = cellart.SSTDataset(manager)\n", "gene_map_shape = dataset.gene_map.shape\n", "\n", "# Initialize and train the CellART model\n", "model = cellart.CellARTModel(manager, gene_map_shape, len(dataset.coords_starts))\n", "model.train_model(dataset)" ] }, { "cell_type": "markdown", "id": "ffe61514", "metadata": {}, "source": [ "### Check the output of CellART" ] }, { "cell_type": "code", "execution_count": 3, "id": "07e0e6b2", "metadata": {}, "outputs": [], "source": [ "# Load annotated adata at epoch 400\n", "adata = sc.read(os.path.join(\"./results_visiumhd_crc/\", \"epoch_400\", \"cell_deconv.h5ad\"))\n", "# Load segmentation\n", "segmentation_mask = np.load(os.path.join(\"./results_visiumhd_crc/\", \"new_segmentation_mask.npy\")).astype(\"int32\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "961b9f7e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | x | \n", "y | \n", "celltype | \n", "
|---|---|---|---|
| cell_id | \n", "\n", " | \n", " | \n", " |
| 209338 | \n", "389 | \n", "19 | \n", "CD8 Cytotoxic T cell | \n", "
| 209346 | \n", "360 | \n", "25 | \n", "Plasma | \n", "
| 209365 | \n", "344 | \n", "320 | \n", "CD4 T cell | \n", "
| 209383 | \n", "351 | \n", "321 | \n", "CD4 T cell | \n", "
| 209404 | \n", "391 | \n", "30 | \n", "Endothelial | \n", "