Nilotpal Sanyal
Assistant Professor
Assistant Director, Data Analytics Lab
|
Department of Mathematical Sciences University of Texas at El Paso 500 W University Ave El Paso, TX 79968-0514 |
Office: Bell Hall 328 Phone: (915)747-6763 E-mail: nsanyal@utep.edu Personal Webpage |
I am an Assistant Professor in the Department of Mathematical Sciences at the University of Texas at El Paso.
I obtained a PhD in Statistics from the University of Missouri-Columbia with a dissertation on Bayesian functional magnetic resonance imaging (fMRI) data analysis and Bayesian optimal design. Following that, I had an extensive postdoctoral research experience at Stanford University, the University of California-San Diego, and Texas A&M University working in biological data applications.
My current research interests are Bayesian statistics, survival analysis, high-dimensional variable selection, nonparametric regression, statistical genetics and bioinformatics, and computational neuroscience.
I am truly passionate about teaching and have immense respect for the value of good teaching and good mentoring.
Research
Overview
My research lies at the intersection of statistical theory, machine learning, computation, and real-world applications, with a focus on developing scalable, interpretable, and computationally efficient statistical and machine learning methods. I work within both the Bayesian and frequentist paradigms and enjoy translating methodological advances into practical software tools.
My current research interests include Bayesian statistics, survival analysis, high-dimensional variable selection, nonparametric regression, statistical genetics and bioinformatics, and computational neuroscience. Applications of my work span omics, epidemiology, public health, and neuroscience, with an emphasis on extracting meaningful signals from complex, high-dimensional data.
Specific research areas
- Bayesian methods. Bayesian methods offer a modern, flexible way to learn from data by combining prior knowledge with evidence and updating beliefs as new information arrives. They let you reason under uncertainty in a principled and transparent way, making them especially powerful for real-world problems where data are complex, noisy, or limited.
For students: Widely used in machine learning and AI, genomics, neuroscience, and public health, Bayesian approaches offer students a powerful foundation for impactful research that integrates theory, computation, and real-world applications. - High-dimensional variable selection and inference methods. Such methods focus on identifying a small number of truly important signals hidden within massive, high-dimensional datasets, where the number of variables can far exceed the number of observations. These methods are central to modern data science and machine learning, with exciting applications in public health, genomics (GWAS), and gene expression studies.
For students: This area is ideal for students who enjoy blending mathematical theory, scalable computation, and impactful applications. You will work on problems where smart modeling turns overwhelming data into clear scientific insight, with applications in genomics, public health, and modern machine learning. - Multiscalar methods. Such methods address data that contain meaningful patterns at multiple scales or levels of resolution. They are especially valuable for analyzing image data, spatial or areal data, and time series, where underlying processes--such as fMRI brain activity--naturally operate across scales.
For students: This area is well suited for students who enjoy discovering hidden structure in complex data. You will combine statistical modeling, computation, and scientific reasoning to analyze phenomena that unfold across multiple scales, with applications ranging from brain imaging and spatial data to time-evolving systems. - Survival data methods in the presence of competing risks. Such methods focus on accurately predicting the risk of an event due to a specific cause while accounting for other possible causes that can lead to the same outcome. These methods are essential in medical and public health studies--for example, estimating cardiovascular mortality while considering deaths from other causes such as accidents.
For students: This area is appealing for students who want to build rigorous statistical models with real clinical and public health impact. You will learn how to quantify cause-specific risks and translate complex survival data into meaningful insights for medical decision-making and risk assessment.

- Survival data methods in the presence of cure fraction. Such methods address situations where a subset of the population may never experience the event of interest. These methods are particularly important in biomedical and mental health studies, where some individuals--such as long-term meditators--may be effectively protected or "cured."
For students: This area is appealing for students interested in developing realistic survival models that reflect long-term protection and resilience. You will blend statistical theory and applied modeling to identify who remains at risk and who may never experience the event, with applications in biomedicine, mental health, and behavioral science.
- Gene by environment (GxE) interaction methods. Such methods aim to understand how genetic effects on traits or diseases are modified by environmental and lifestyle factors. By studying interactions such as genetics with pollution exposure or smoking behavior in diseases like lung cancer, these methods provide deeper insight into disease mechanisms and individual risk.
For students: If you're interested in how nature and nurture work together, this area lets you uncover how genetic risk depends on environment and lifestyle, developing models that provide deeper insight into disease mechanisms and individualized risk.
I have developed/co-developed the R software packages S3VS, NLPwavelet, GWASinlps, CGEN, and BHMSMAfMRI based on my research. See the Software tab for more details about them.
Publications
- Google Scholar- ResearchGate
- ORCiD
Software
Here are some software packages that I have developed/co-developed based on my research.
S3VS: This is an R software package that performs variable selection using the structured screen-and-select (S3VS) framework in linear models, generalized linear models with binary data, and survival models such as the Cox model and accelerated failure time (AFT) model. Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/S3VS/.
NLPwavelet: This is an R software package that performs Bayesian wavelet analysis using individual non-local priors as described in Sanyal & Ferreira (2017) and non-local prior mixtures as described in Sanyal (2025). Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/NLPwavelet/.
BHMSMAfMRI: This is an R software package that performs Bayesian hierarchical multi-subject multiscale analysis of function MRI (fMRI) data, or other multiscale data, as described in Sanyal & Ferreira (2012) using wavelet based prior that borrows strength across subjects and provides posterior smooth estimates of the effect sizes and samples from their posterior distribution. Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/BHMSMAfMRI/.
GWASinlps: This is an R software package that performs Bayesian non-local prior based iterative variable selection for data from genome-wide association studies (GWAS), or other high-dimensional data, as described in Sanyal et al. (2019). Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/GWASinlps/.SPLC-RAT: My past colleagues at Stanford University have developed this shiny app based on our joint work on the development and validation of the first risk prediction tool for second primary lung cancer that incorporates comprehensive risk factors including smoking information, medical history, treatment, and tumor characteristics using large population-based data. It is available at https://splc-risk-prediction.shinyapps.io/SPLC-RiskAssessmentTool/.
Teaching Software
Univariate probability distribution viewer: A shiny app to visualize various univariate probability distributions. Feel free to use for non-commercial classroom teaching.
References:
Teaching
I ardently love to teach and have immense respect for the value of good teaching and good mentoring.
Current courses (Spring 2026)
- DS 6494 - Statistical Data Mining II, UTEP
- DS 6398 - Dissertation I, UTEP
- STAT 6396 - Graduate Research, UTEP
- DS 6390 - Data Science Research Collaborative, UTEP
- STAT 5399 - Thesis II, UTEP
Past Courses
- DS 6390 - Data Science Research Collaborative, UTEP
- STAT 6370 - Special Topics (Competing Risk Methods), UTEP.
- STAT 6370 - Special Topics (Advanced Competing Risk Analysis), UTEP.
- STAT 6370 - Special Topics (Advanced Regression Analysis), UTEP.
- DS 6339 - Data Visualization, UTEP
- DS 6335 - Introduction to Data Science Collaborations, UTEP
- STAT 6329 - Statistical Programming, UTEP
- STAT 5396 - Graduate Research, UTEP
- STAT 5398 - Thesis I, UTEP
- STAT 5399 - Thesis II, UTEP
- STAT 3320 - Probability and Statistics, UTEP
- Statistical Data Analysis (with project supervision for 12 students), International Statistical Education Center, ISI, Kolkata, 2022-23.
- Statistical Methods, International Statistical Education Center, ISI, Kolkata, 2022-23.
- Descriptive Statistics, International Statistical Education Center, ISI, Kolkata, 2022-23.
Workshop teaching
- Special Lecture on Survival Analysis, Maulana Azad College, Kolkata, April 2023.
- R Sessions for CoxBoost modeling, Virtual workshop, Stanford University Quantitative Science Unit, January 2021.
- Random Forest for Competing Risk Data, Virtual workshop, Stanford University Quantitative Science Unit, December 2020.
- Predictive Modeling of Competing Risk Data Using Penalized Regression, Virtual workshop, Stanford University Quantitative Science Unit, November 2020.
- Time Series Analysis, Winter School on Statistical Data Analysis Methods, Indian Statistical Institute, Kolkata, February 2015.
- Introduction to R, Winter School on Statistical Data Analysis Methods, Indian Statistical Institute, Kolkata, February 2015.
- Descriptive Statistics, Winter School on Statistical Data Analysis Methods, Indian Statistical Institute, Kolkata, February 2015.
- Time Series Analysis, Short-term Course on Statistical Methods, Arya Vidyapeeth College, Guwahati, Assam, India, November 2014.
- Introduction to R, Short-term Course on Statistical Methods, Arya Vidyapeeth College, Guwahati, Assam, India, November 2014.
- Design of Experiments, Workshop on Techniques of Data Analysis, Dimapur Govt. College, Nagaland, India, September 2014.
- Time Series Analysis, Workshop on Techniques of Data Analysis, Dimapur Govt. College, Nagaland, India, September 2014.
- R for Time Series, Workshop on Techniques of Data Analysis, Dimapur Govt. College, Nagaland, India, September 2014.
Some materials from past teaching/workshops:
- Penalized regression analysis for competing risks data: Concepts and data analysis.
- Random forest analysis for competing risks data: Concepts and data analysis.
- Boosting for competing risks data: Concepts and data analysis.
- Competing risks data simulation
- Introductory time series analysis: Concepts and data analysis.
- Introduction to R and exercises.
Service
This is the content for the third link.
Learn
Here are some self-made precise guides for quick learning.Links
Here are miscelleneous useful links for research.Probability / Statistics / Linear algebra
- A History of the Central Limit Theorem - Hans Fischer
- A Geometrical Understanding of Matrices: Gregory Gundersen blog
- Affine transformations: Arcane Algorithm Archive
- What's So Special About Logit?: Statistical Horizons
Data Science
- Data Science Blog by Matthias Döring. A blog about everything related to data science and programming.
Computer
- Mac keyboard shortcuts
- Sublime Text Regular Expression Cheat Sheet
- LaTeX accents
- LaTeX Beamer themes
- Draw symbol to get LaTeX command
- Common Math Symbols in HTML, TeX, and Unicode
- Text to HTML
Free datasets / search
- UC Irvine Machine Learning Repository - Machine learning
- Arizona State Univ Datasets
- MIT-BIH Arrhythmia Database
- KDnuggets: Datasets for Data Science, Machine Learning, AI & Analytics
- KDD Cup Archives (the annual Data Mining and Knowledge Discovery competition)
- Kaggle Datasets - Miscellaneous
- Data.gov: The home of the U.S. Government's open data - Government
- Datahub.io - Miscellaneous
- Gene Expression Omnibus
- Google Dataset Search - Miscellaneous
- NASA Earth Data - Earth observation data
- CERN Open Data - Particle physics
- Global Health Observatory data repository: WHO - Health
- Tableau: Free Public Data Sets - Miscelleneous
- OpenML: A worldwide machine learning lab - Machine learning
- Data.world: 132355 free datasets - Miscelleneous
Free online courses
- MIT OpenCourseWare (OCW): Web based publication of virtually all MIT course content, open and available to the world.
Miscelleneous
- Scimago Journal & Country Rank
- Statistics and Probability Journals
- Convert DOI/ArXiv/ISBN to BibTeX, etc.
- A website of timelines
- Arcane Algorithm Archive: A collaborative effort to create a guide for all important algorithms in all languages.https://www.algorithm-archive.org. Corresponsing youtube channel is here.
- Philosophy of mathematics
- Visa rules for all countries
- External Funding Sources
- Choose graphics by data
Statistics Teaching
Bioinformatics / Biomedical Informatics / Biostatistics
- NCBI (National Center for Biotechnology Information): The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
- dbGaP (database of Genotypes and Phenotypes): An archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
- dbVar (Database of Genomic Structural Variation): The dbVar database has been developed to archive information associated with large scale genomic variation, including large insertions, deletions, translocations and inversions. In addition to archiving variation discovery, dbVar also stores associations of defined variants with phenotype information.
- dbSNP (Database of Short Genetic Variations): Includes single nucleotide variations, microsatellites, and small-scale insertions and deletions. dbSNP contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral variations and clinical mutations.
- GenBank: The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.
- Gene: A searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.
- Gene Expression Omnibus (GEO) Database: A public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted and tools are provided to help users query and download experiments and curated gene expression profiles.
- Genome: Contains sequence and map data from the whole genomes of over 1000 organisms. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.
- RefSeq (Reference Sequence): A collection of curated, non-redundant genomic DNA, transcript (RNA), and protein sequences produced by NCBI. RefSeqs provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. The RefSeq collection is accessed through the Nucleotide and Protein databases.
- PubMed: A database of citations and abstracts for biomedical literature from MEDLINE and additional life science journals. Links are provided when full text versions of the articles are available via PubMed Central (described below) or other websites.
- SRA (Sequence Read Archive): The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.
- SARS CoV: A summary of data for the SARS coronavirus (CoV), including links to the most recent sequence data and publications, links to other SARS related resources, and a pre-computed alignment of genome sequences from various isolates.
- cBioPortal for Cancer Genomics
- GWAS Catalog
- GSEA (Gene Set Enrichment Analysis): Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
- MSigDB (Molecular Signature Database): A collection of annotated gene sets for use with GSEA software.
- NITRC (NeuroImaging Tools & Resources Collaboratory): Award-winning free web-based resource offering comprehensive information on an ever expanding scope of neuroinformatics software and data.
- TCGA (The Cancer Genome Atlas Program): The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This joint effort between NCI and the National Human Genome Research Institute began in 2006, bringing together researchers from diverse disciplines and multiple institutions. Over the next dozen years, TCGA generated over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. The data, which has already led to improvements in our ability to diagnose, treat, and prevent cancer, will remain publicly available for anyone in the research community to use.
- Handbook of Biological Statistics - John H. McDonald
Funding sources
Others
Alongside academic research I have multifarious interests. Feel free to explore some of them here, to comment, and to connect.
UTEP Miscelleneous Links
- UTEP
- UTEP Mathematical Sciences
- MyUTEP / Single Sign On
- UTEP E-mail
- Blackboard
- Tech Support, Service Desk
- Academic Calendar
- On-campus Housing: Miner Village, Miner Canyon, and Miner Heights
- UTEP News
- Center for Faculty Leadership and Development
- Tenure and Promotion
- Holiday Schedule
- Travel Office
- Shuttles
- Pay Dates
- Computing Equipment
- Computer Purchases
- Campus Directory
- Campus Map
- Events Calendar
- Bookstore and Shop
- Library
- Building Addresses
- Blackboard Tutorials
El Paso Miscelleneous Links
- El Paso Official Website: Official website of the city with information on local government services, departments, permits, and regulations.
- 26 Things You Need To Know About El Paso Before You Move There
- What do I need to know before moving to El Paso, TX?
- Visit El Paso: Comprehensive resource for exploring tourism, attractions, events, dining, and recreational opportunities in El Paso.
- Electric
- Water
- Sun Metro: Provides information on public transportation services, routes, schedules, and fares in El Paso.
- County Transportation
- Public Libraries
- Zoo
- Museum of Art
- Museum of Archaeology
- Museum of History
- Symphony Orchestra: Provides details on upcoming concerts, ticket information, and educational programs related to classical music.
- Parks and Recreation
- County Parks & Recreation
- Convention Center
- Craigslist EL Paso
- Community College: Provides information on academic programs, admissions, campus locations, and resources for students.
- El Paso International Airport
- Public Health Department: Offers information on healthcare services, immunizations, disease prevention, and community health programs.
- El Paso Times: A local newspaper that covers news, events, and community updates in El Paso.
- Chamber of Commerce: Provides business resources, networking opportunities, and information on local businesses and industries.
- County: Offers resources on government services, departments, taxes, property records, and elections.
- County Clerk: Provides access to various services such as marriage licenses, birth and death certificates, property records, and voting information.
- Community Foundation: Offers information on philanthropic initiatives, grant opportunities, and community programs aimed at improving the quality of life in El Paso.
- Independent School District Police Department: Provides information on school safety, emergency protocols, and resources related to the El Paso Independent School District Police Department.
- County Tax Assessor-Collector: Provides information on property taxes, motor vehicle registration, and other tax-related services.
- 311: A centralized platform where residents can submit service requests, report issues, and seek information on various city services.
- Sun City Driving School West
- Cherokee Driving School
Blog
Learn
Others
misc
misc








