Profile Picture

Nilotpal Sanyal

Assistant Professor
Assistant Director, Data Analytics Lab

Department of Mathematical Sciences
University of Texas at El Paso
500 W University Ave
El Paso, TX 79968-0514
Office: Bell Hall 328
Phone: (915)747-6763
E-mail: nsanyal@utep.edu
Personal Webpage

Icon   Icon   Icon   Icon   Icon

I am an Assistant Professor in the Department of Mathematical Sciences at the University of Texas at El Paso.

I obtained a PhD in Statistics from the University of Missouri-Columbia with a dissertation on Bayesian functional magnetic resonance imaging (fMRI) data analysis and Bayesian optimal design. Following that, I had an extensive postdoctoral research experience at Stanford University, the University of California-San Diego, and Texas A&M University working in biological data applications.

My current research interests are Bayesian statistics, survival analysis, high-dimensional variable selection, nonparametric regression, statistical genetics and bioinformatics, and computational neuroscience.

I am truly passionate about teaching and have immense respect for the value of good teaching and good mentoring.



↩ UTEP Mathematical Sciences Faculty alphabetically, or by research areas

Research

Overview

My research lies at the intersection of statistical theory, machine learning, computation, and real-world applications, with a focus on developing scalable, interpretable, and computationally efficient statistical and machine learning methods. I work within both the Bayesian and frequentist paradigms and enjoy translating methodological advances into practical software tools.

My current research interests include Bayesian statistics, survival analysis, high-dimensional variable selection, nonparametric regression, statistical genetics and bioinformatics, and computational neuroscience. Applications of my work span omics, epidemiology, public health, and neuroscience, with an emphasis on extracting meaningful signals from complex, high-dimensional data.


Specific research areas

  • Bayesian methods. Bayesian methods offer a modern, flexible way to learn from data by combining prior knowledge with evidence and updating beliefs as new information arrives. They let you reason under uncertainty in a principled and transparent way, making them especially powerful for real-world problems where data are complex, noisy, or limited.

    For students: Widely used in machine learning and AI, genomics, neuroscience, and public health, Bayesian approaches offer students a powerful foundation for impactful research that integrates theory, computation, and real-world applications.

  • High-dimensional variable selection and inference methods. Such methods focus on identifying a small number of truly important signals hidden within massive, high-dimensional datasets, where the number of variables can far exceed the number of observations. These methods are central to modern data science and machine learning, with exciting applications in public health, genomics (GWAS), and gene expression studies.

    For students: This area is ideal for students who enjoy blending mathematical theory, scalable computation, and impactful applications. You will work on problems where smart modeling turns overwhelming data into clear scientific insight, with applications in genomics, public health, and modern machine learning.

  • Multiscalar methods. Such methods address data that contain meaningful patterns at multiple scales or levels of resolution. They are especially valuable for analyzing image data, spatial or areal data, and time series, where underlying processes--such as fMRI brain activity--naturally operate across scales.

     
    For students: This area is well suited for students who enjoy discovering hidden structure in complex data. You will combine statistical modeling, computation, and scientific reasoning to analyze phenomena that unfold across multiple scales, with applications ranging from brain imaging and spatial data to time-evolving systems.

  • Survival data methods in the presence of competing risks. Such methods focus on accurately predicting the risk of an event due to a specific cause while accounting for other possible causes that can lead to the same outcome. These methods are essential in medical and public health studies--for example, estimating cardiovascular mortality while considering deaths from other causes such as accidents.

     
    For students: This area is appealing for students who want to build rigorous statistical models with real clinical and public health impact. You will learn how to quantify cause-specific risks and translate complex survival data into meaningful insights for medical decision-making and risk assessment.

  • Survival data methods in the presence of cure fraction. Such methods address situations where a subset of the population may never experience the event of interest. These methods are particularly important in biomedical and mental health studies, where some individuals--such as long-term meditators--may be effectively protected or "cured."

    For students: This area is appealing for students interested in developing realistic survival models that reflect long-term protection and resilience. You will blend statistical theory and applied modeling to identify who remains at risk and who may never experience the event, with applications in biomedicine, mental health, and behavioral science.

  • Gene by environment (GxE) interaction methods. Such methods aim to understand how genetic effects on traits or diseases are modified by environmental and lifestyle factors. By studying interactions such as genetics with pollution exposure or smoking behavior in diseases like lung cancer, these methods provide deeper insight into disease mechanisms and individual risk.

    For students: If you're interested in how nature and nurture work together, this area lets you uncover how genetic risk depends on environment and lifestyle, developing models that provide deeper insight into disease mechanisms and individualized risk.

I have developed/co-developed the R software packages S3VS, NLPwavelet, GWASinlps, CGEN, and BHMSMAfMRI based on my research. See the Software tab for more details about them.


Publications

- Google Scholar
- ResearchGate
- ORCiD

Software

Here are some software packages that I have developed/co-developed based on my research.

S3VS: This is an R software package that performs variable selection using the structured screen-and-select (S3VS) framework in linear models, generalized linear models with binary data, and survival models such as the Cox model and accelerated failure time (AFT) model. Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/S3VS/.


 
NLPwavelet: This is an R software package that performs Bayesian wavelet analysis using individual non-local priors as described in Sanyal & Ferreira (2017) and non-local prior mixtures as described in Sanyal (2025). Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/NLPwavelet/.


 
BHMSMAfMRI: This is an R software package that performs Bayesian hierarchical multi-subject multiscale analysis of function MRI (fMRI) data, or other multiscale data, as described in Sanyal & Ferreira (2012) using wavelet based prior that borrows strength across subjects and provides posterior smooth estimates of the effect sizes and samples from their posterior distribution. Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/BHMSMAfMRI/.


 
GWASinlps: This is an R software package that performs Bayesian non-local prior based iterative variable selection for data from genome-wide association studies (GWAS), or other high-dimensional data, as described in Sanyal et al. (2019). Description and download instructions are available at the package webpage at https://nilotpalsanyal.github.io/GWASinlps/.


 
CGEN: This is an R software package that analyzes case-control data in genetic epidemiology. It provides a set of statistical methods for evaluating gene x environment (or gene x gene) interactions under multiplicative and additive risk models (Sanyal et al., 2021; Rochemonteix et al., 2021), with or without assuming gene-environment (or gene-gene) independence in the underlying population. Description and download instructions are available at the package webpage at https://www.bioconductor.org/packages/release/bioc/html/CGEN.html. A tutorial for the additive gene x environment interaction tests under the trend effect of genotypes, proposed in the above references, are available at https://github.com/thehanlab/AdditiveGxEtrendtest.

SPLC-RAT: My past colleagues at Stanford University have developed this shiny app based on our joint work on the development and validation of the first risk prediction tool for second primary lung cancer that incorporates comprehensive risk factors including smoking information, medical history, treatment, and tumor characteristics using large population-based data. It is available at https://splc-risk-prediction.shinyapps.io/SPLC-RiskAssessmentTool/.


Teaching Software

Univariate probability distribution viewer: A shiny app to visualize various univariate probability distributions. Feel free to use for non-commercial classroom teaching.

 


References:

Sanyal, Nilotpal, and Ferreira, Marco A.R. (2012). Bayesian hierarchical multi-subject multiscale analysis of functional MRI data. Neuroimage, 63, 3, 1519-1531. doi:10.1016/j.neuroimage.2012.08.041.

Sanyal, N., Lo, M.T., Kauppi, K., Djurovic, S., Andreassen, O.A., Johnson, V.E. and Chen, C.H. (2019). GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies. Bioinformatics, 35(1), pp.1-11. doi:10.1093/bioinformatics/bty472

Sanyal, N., Napolioni, V., de Rochemonteix, M., Belloy, M.E., Caporaso, N.E., Landi, M.T., Greicius, M.D., Chatterjee, N. and Han, S.S. (2021). A Robust Test for Additive Gene-Environment Interaction Under the Trend Effect of Genotype Using an Empirical Bayes-Type Shrinkage Estimator. American journal of epidemiology, 190(9), pp.1948-1960. doi:10.1093/aje/kwab124.

De Rochemonteix, M., Napolioni, V., Sanyal, N., Belloy, M.E., Caporaso, N.E., Landi, M.T., Greicius, M.D., Chatterjee, N. and Han, S.S. (2021). A likelihood ratio test for gene-environment interaction based on the trend effect of genotype under an additive risk model using the gene-environment independence assumption. American journal of epidemiology, 190(1), pp.129-141. American journal of epidemiology, 190(9), pp.1948-1960. doi:10.1093/aje/kwaa132.

Sanyal, Nilotpal, and Marco AR Ferreira. "Bayesian wavelet analysis using nonlocal priors with an application to FMRI analysis." Sankhya B 79.2 (2017): 361-388.

Sanyal, Nilotpal. "Nonlocal prior mixture-based Bayesian wavelet regression." arXiv preprint arXiv:2501.18134 (2025).

Teaching

I ardently love to teach and have immense respect for the value of good teaching and good mentoring.

Current courses (Spring 2026)

  • DS 6494 - Statistical Data Mining II, UTEP
  • DS 6398 - Dissertation I, UTEP
  • STAT 6396 - Graduate Research, UTEP
  • DS 6390 - Data Science Research Collaborative, UTEP
  • STAT 5399 - Thesis II, UTEP

Past Courses

  • DS 6390 - Data Science Research Collaborative, UTEP
  • STAT 6370 - Special Topics (Competing Risk Methods), UTEP.
  • STAT 6370 - Special Topics (Advanced Competing Risk Analysis), UTEP.
  • STAT 6370 - Special Topics (Advanced Regression Analysis), UTEP.
  • DS 6339 - Data Visualization, UTEP
  • DS 6335 - Introduction to Data Science Collaborations, UTEP
  • STAT 6329 - Statistical Programming, UTEP
  • STAT 5396 - Graduate Research, UTEP
  • STAT 5398 - Thesis I, UTEP
  • STAT 5399 - Thesis II, UTEP
  • STAT 3320 - Probability and Statistics, UTEP
  • Statistical Data Analysis (with project supervision for 12 students), International Statistical Education Center, ISI, Kolkata, 2022-23.
  • Statistical Methods, International Statistical Education Center, ISI, Kolkata, 2022-23.
  • Descriptive Statistics, International Statistical Education Center, ISI, Kolkata, 2022-23.

Workshop teaching

  • Special Lecture on Survival Analysis, Maulana Azad College, Kolkata, April 2023.
  • R Sessions for CoxBoost modeling, Virtual workshop, Stanford University Quantitative Science Unit, January 2021.
  • Random Forest for Competing Risk Data, Virtual workshop, Stanford University Quantitative Science Unit, December 2020.
  • Predictive Modeling of Competing Risk Data Using Penalized Regression, Virtual workshop, Stanford University Quantitative Science Unit, November 2020.
  • Time Series Analysis, Winter School on Statistical Data Analysis Methods, Indian Statistical Institute, Kolkata, February 2015.
  • Introduction to R, Winter School on Statistical Data Analysis Methods, Indian Statistical Institute, Kolkata, February 2015.
  • Descriptive Statistics, Winter School on Statistical Data Analysis Methods, Indian Statistical Institute, Kolkata, February 2015.
  • Time Series Analysis, Short-term Course on Statistical Methods, Arya Vidyapeeth College, Guwahati, Assam, India, November 2014.
  • Introduction to R, Short-term Course on Statistical Methods, Arya Vidyapeeth College, Guwahati, Assam, India, November 2014.
  • Design of Experiments, Workshop on Techniques of Data Analysis, Dimapur Govt. College, Nagaland, India, September 2014.
  • Time Series Analysis, Workshop on Techniques of Data Analysis, Dimapur Govt. College, Nagaland, India, September 2014.
  • R for Time Series, Workshop on Techniques of Data Analysis, Dimapur Govt. College, Nagaland, India, September 2014.

Some materials from past teaching/workshops:

Service

This is the content for the third link.

Others

Alongside academic research I have multifarious interests. Feel free to explore some of them here, to comment, and to connect.