Stanford University AI Foundation Model Enhances Cancer Diagnosis and Tailors Treatment.
Feb 04, 2025 / Nvidia –
A new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK (Multimodal transformer with Unified maSKed modeling), the research aims to advance precision oncology, tailoring treatment plans to each patient based on their unique medical data.
“Multimodal foundation models are a new frontier in medical AI research,” said Ruijiang LI, an associate professor of radiation oncology and study senior author. “Recently, vision–language foundation models have been developed for medicine, particularly in the field of pathology. However, existing studies use off-the-shelf foundation models that require paired image–text data for pretraining. Despite extensive efforts that led to the curation of 1M pathology image–text pairs, it’s still insufficient to fully capture the diversity of the entire disease spectrum.”
Oncologists rely on many data sources when considering a patient’s condition and planning optimal treatments. However, integrating and interpreting complex medical data remains difficult for doctors and AI models. The study, recently published in Nature, highlights how MUSK could help doctors make more accurate and informed decisions while also solving this long-standing challenge in medical AI.
Using deep learning, MUSK processes clinical text data (such as doctor’s notes) and pathology images (like histology slides), to identify patterns that may not be immediately obvious to doctors, leading to better clinical insights.
To do so, it uses a two-step multimodal transformer model. First, it learns from large amounts of unpaired data, pulling features from the text and images that are useful. Then it finetunes its understanding of the data by linking paired image-text data, which helps it recognize different types of cancer, predict biomarkers, and suggest effective treatment options.
The researchers pretrained the AI model on one of the biggest datasets in the field, using 50M pathology images from 11,577 patients with 33 tumor types and 1B pathology-related text data.
According to Jinxi Xiang, study lead author and postdoctoral scholar in radiation physics, the pretraining was conducted over 10 days using 64 NVIDIA V100 Tensor Core GPUs across eight nodes, enabling MUSK to process vast amounts of pathology images and clinical text efficiently. A secondary pretraining phase and ablation studies used NVIDIA A100 80 gb Tensor Core GPUs. The researchers also used NVIDIA RTX A6000 GPUs for evaluating downstream tasks. The framework was accelerated with NVIDIA CUDA and NVIDIA cuDNN libraries, for optimized performance.

When tested on 23 pathology benchmarks, MUSK outperformed existing AI models in several key areas. It excelled at matching pathology images with correlating medical text, making it more effective in gathering relevant patient information. It also interpreted pathology-related questions, such as identifying a cancerous area or predicting biomarker presence with 73% accuracy.
It improved detection and classification for cancer subtypes including breast, lung, and colorectal cancer by up to 10%, which could help with early diagnosis and treatment planning. It also detected breast cancer biomarkers with an AUC (a measure of model accuracy) of 83%.
Additionally, MUSK reliably predicted cancer survival outcomes 75% of the time, and which lung and gastro-esophageal cancers would respond to immunotherapy with 77% accuracy. This is a significant improvement over standard clinical biomarkers with an accuracy of only 60-65%.
“One striking finding is that AI models that integrate multi-modal data consistently outperform those based on imaging or text data alone, highlighting the power of a multimodal approach,” Li said. “The true value of MUSK lies in its ability to leverage large-scale unpaired image and text data for pretraining, which is a substantial increase over existing models that require paired data.”
A core strength of the research is that it can adapt across different clinical settings with little training. This could improve efficiency in oncology workflows and help doctors diagnose cancer faster while tailoring treatments for better patient outcomes.
Their future work will focus on validating the model in multi-institution cohorts of patients from diverse populations and for high-stakes applications such as treatment decision-making. The researchers note that prospective validation in clinical trials will be required for regulatory approval.
“We are also working on an extension of the MUSK approach to digital pathology to other types of data such as radiology images and genomic data,” said Li.
The researchers’ work, including installation instructions, model weights, evaluation code, and sample data is available on GitHub.