Publications

Journal publications

1. Doubly high-dimensional contextual bandits: An interpretable model with applications to assortment/pricing

Junhui Cai, Ran Chen, Martin Wainwright, Linda Zhao (2025)

Management Science (accepted)

Key challenges in running a retail business include how to select products to present to consumers (the assortment problem), and how to price products (the pricing problem) to maximize revenue or profit. Instead of considering these problems in isolation, we propose a joint approach to assortment-pricing based on contextual bandits. Our model is doubly high-dimensional, in that both context vectors and actions allowed to take values in high-dimensional spaces. In order to circumvent the curse of dimensionality, we propose a simple yet flexible model that captures the interactions between covariates and actions via a (near) low-rank representation matrix. The resulting class of models is reasonably expressive while remaining interpretable through latent factors, and includes various structured linear bandit and pricing models as particular cases. We propose a computationally tractable procedure that combines an exploration/exploitation protocol with an efficient low-rank matrix estimator, and we prove bounds on its regret. Simulation results show that this method has lower regret than state-of-the-art methods applied to various standard bandit and pricing models. We also illustrate the gains achievable using our method by two case studies on real-world assortment-pricing problems for an industry-leading instant noodles company, and a smaller beauty start-up. In each case, we show both the gains in revenue achievable by our bandit methods, as well as the interpretability of the latent factor models that are learned.

2. Network regression and supervised centrality estimation

Junhui Cai, Dan Yang, Ran Chen, Haipeng Shen, Linda Zhao, Wu Zhu (2025)

Journal of American Statistical Association (accepted)

[ Abstract ] [ Paper ] [ Published version ]

Networks are ubiquitous and play a crucial role in our lives. The position of an agent in the network, usually captured by the “centrality”, has implications for the agent’s behaviour and serves as an important intermediary of network effects. Therefore, the centrality is often incorporated in regression models to elucidate the network effect on an outcome variable of interest. In empirical studies, researchers often adopt a two-stage procedure to estimate the centrality and to infer the network effect – they first estimate the centrality from the observed network and then employ the estimated centrality in the regression for estimation and inference. Despite its prevalent adoption, this naive two-stage procedure lacks theoretical backing and can fail in both estimation and inference. We therefore propose a unified framework that combines a network model and a network regression model, under which we prove the short-comings of the two-stage in centrality estimation and the undesirable consequences in the network regression. We then propose a novel supervised network centrality estimation (SuperCENT) methodology that simultaneously combines the information from the two models. SuperCENT dominates the two-stage procedure in the estimation of the centrality and the true underlying network universally. In addition, SuperCENT yields superior estimation of the network effect and provides valid and narrower confidence intervals than those from the two-stage. We apply our method to predict the currency risk premium based on the global trade network. We show that a trading strategy based on SuperCENT centrality estimates yields a return three times as high as the two-stage method, and the inference drawn by SuperCENT verifies an economic theory via a rigorous statistical testing while the two-stage procedure cannot.

3. State ownership in China: An equity network perspective

Junhui Cai, Xian Gu, Linda Zhao, Wu Zhu (2025)

The Arc of the Chinese Economy (edited by Hanming Fang and Marshall Meyer), Cambridge University Press

[ Abstract ] [ Published version ]

State ownership is the pillar of China’s economy. One cannot understand China’s economy without understanding the state ownership. Existing measures of state-owned enterprises (SOEs), largely self-reported, are limited to industrial firms covered by the Annual Industrial Survey (AIS). We provide a new lens by constructing a novel dynamic equity ownership network of all 40 million registered firms in China. Based on the network, we propose a new dynamic SOE metric. Our analysis reveals systematic and large-scale discrepancies between our method and the existing measures, with ours identifying a notably larger pool of SOEs. By the end of 2017, state capital had increased to 31% among all the in-network firms, while the total capital of all SOEs, including partial SOEs, had climbed up to 85%. Our finding suggests that state ownership exhibits both decentralization and indirect control trends over time, shedding new insights for future research.

4. Hierarchical vintage sparse PCA. Discussion on the paper by Rohe and Zeng

Junhui Cai, Dan Yang, Wu Zhu, Linda Zhao (2023)

Journal of the Royal Statistical Society. Series B: Statistical Methodology

[ Paper ] [ Published version ]

5. Practical issues concerning assumption-lean inference for generalized linear models. Discussion on the paper by Vansteelandt and Dukes

Elizabeth Ogburn, Junhui Cai, Arun Kumar Kuchibhotla, Richard Berk, Andreas Buja (2021)

Journal of the Royal Statistical Society. Series B: Statistical Methodology

[ Paper ] [ Published version ]

6. Valid post-selection inference in model-free linear regression

Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai, Edward I. George, Linda Zhao (2019)

Annals of Statistics

[ Abstract ] [ Paper ] [ Published version ]

Modern data-driven approaches to modeling make extensive use of covariate/model selection. Such selection incurs a cost: it invalidates classical statistical inference. A conservative remedy to the problem was proposed by Berk et al. (2013) and further extended by Bachoc et al. (2016). These proposals, labeled ``PoSI methods'', provide valid inference after arbitrary model selection. They are computationally NP-hard and have certain limitations in their theoretical justifications. We therefore propose computationally efficient PoSI confidence regions and prove large-$p$ asymptotics for them. We do this for linear OLS regression allowing misspecification of the normal linear model, for both fixed and random covariates, and for independent as well as some types of dependent data. We start by proving a general equivalence result for the post-selection inference problem and a simultaneous inference problem in a setting that strips inessential features still present in a related result of Berk et al. (2013). We then construct valid PoSI confidence regions that are the first to have vastly improved computational efficiency in that the required computation times grow only quadratically rather than exponentially with the total number $p$ of covariates. These are also the first PoSI confidence regions with guaranteed asymptotic validity when the total number of covariates~$p$ diverges (almost exponentially) with the sample size~$n$. Under standard tail assumptions, we only require $(\log p)^7 = o(n)$ and $k = o(\sqrt{n/\log p})$ where $k (\le p)$ is the largest number of covariates (model size) considered for selection. We study various properties of these confidence regions, including their Lebesgue measures, and compare them (theoretically) with those proposed previously.

7. Statistical theory powering data science

Junhui Cai, Avishai Mandelbaum, Chaitra H Nagaraja, Haipeng Shen, Linda Zhao (2019)

Statistical Science

[ Abstract ] [ Paper ] [ Published version ]

Statisticians are finding their place in the emerging field of data science. However, many issues considered “new” in data science have long histories in statistics. Examples of using statistical thinking are illustrated, which range from exploratory data analysis to mea- suring uncertainty to accommodating nonrandom samples. These examples are then applied to service networks, baseball predictions and official statistics.

Preprints

[ Top ]

8. Ownership network and firm growth: What do forty million companies tell about the Chinese economy?

Franklin Allen, Junhui Cai, Xian Gu, Jun Qian, Linda Zhao, Wu Zhu (2025)

Revsion, Management Science

China Financial Research Conference (CFRC) 2021 Best Paper Award (3 out of 534 papers).

[ Abstract ] [ Paper ] [ SSRN ]

The finance–growth nexus has been a central question in understanding the unprecedented success of the Chinese economy. Using unique data on all the registered firms in China, we build extensive firm-to-firm equity ownership networks. Entering a network and increasing network centrality leads to higher firm growth, and the effect of global centralities strengthens over time. The RMB 4 trillion stimulus launched by the Chinese government in 2008 partially “crowded out” the positive network effects. Equity ownership networks and bank credit tend to act as substitutes for state-owned enterprises, but as complements for private firms in promoting growth

9. Towards a holistic representation of online customer journeys: A tensor-based framework

Xinyuan Zhang, Junhui Cai, Jingjing Li, Ahmed Abbassi (2025)

Revsion, Information Systems Research

INFORMS Workshop on Data Science 2023, 2024.
INFORMS Conference on Information Systems and Technology (CIST) 2025.
The 46th AIS International Conference on Information Systems (ICIS).

[ Abstract ]

Understanding online user journeys has become crucial for explaining and predicting digital behavior. Existing methodologies often rely on principled feature engineering, which, while successful in predicting and interpreting customer journeys, are constructed artificially and thus present certain limitations. A more holistic and parsimonious framework is needed to fully comprehend omni-channel customer journeys. In this paper, we propose a tensor-based framework to capture users' digital channel interactions over time. We represent customer journeys through the lens of a three-dimensional user-channel-time tensor. We adopt tensor decomposition to extract interpretable latent factors. These factors capture digital trace patterns with explanatory and predictive power. For prediction, we incorporate the tensor into a deep learning architecture to learn the nonlinear and temporal convolutional patterns in customers' journeys. We evaluate our framework on 24 million raw user clickstreams and show that our methodology not only enhances our understanding of customer decision-making processes in purchases, but also significantly improves conversion prediction.

10. Centralization or decentralization? The evolution of state-ownership in China

Franklin Allen, Junhui Cai, Xian Gu, Jun Qian, Linda Zhao, Wu Zhu (2025)

Under review

China International Conference in Finance (CICF) 2021 XiYue Best Paper Award (2 out of 2065 papers).

[ Abstract ] [ Paper ] [ SSRN ] [ VoxChina ]

In this paper, we anatomize the state sector and its role in Chinese economy. We propose a measure of Chinese SOEs (and partial SOEs) based on the firm-to-firm equity investment relationships. We are the first to identify all SOEs among over 40 millions of all Chinese registered firms. Our measure captures a significant larger number of SOEs than the existing measure. The aggregated capital of all (partial) SOEs has climbed up to 85%, and the total state capital in all SOEs has increased to 31%, both over total capital in the economy by 2017. The state ownership shows parallel trends of decentralization (authoritarian hierarchy) and indirect control (ownership hierarchy) over time. In addition, we find mixed ownership is associated with higher firm growth and performance; while hierarchical distance to governments is associated with better firm performance but lower growth. Drawing a stark distinction between SOEs and privately-owned enterprises (POEs) could lead to misperceptions of the role of state ownership in Chinese economy

11. AI as "Co-founder": GenAI for Entrepreneurship

Junhui Cai, Xian Gu, Liugang Sheng, Mengjia Xia, Linda Zhao, Wu Zhu (2025)

Under review

[ Abstract ] [ Paper ] [ SSRN ]

This paper studies whether, how, and for whom generative artificial intelligence (GenAI) facilitates firm creation. Our identification strategy exploits the November 2022 release of ChatGPT as a global shock that lowered start-up costs and leverages variations across geocoded grids with differential pre-existing AI-specific human capital. Using high-resolution and universal data on Chinese firm registrations by the end of 2024, we find that grids with stronger AI-specific human capital experienced a sharp surge in new firm formation--driven entirely by small firms, contributing to 6.0% of overall national firm entry. Large-firm entry declines, consistent with a shift toward leaner ventures. New firms are smaller in capital, shareholder number, and founding team size, especially among small firms. The effects are strongest among firms with potential AI applications, weaker financing needs, and among first-time entrepreneurs. Overall, our results highlight that GenAI serves as a pro-competitive force by disproportionately boosting small-firm entry.

12. Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Junhui Cai, Ran Chen, Qitao Huang, Linda Zhao, Wu Zhu (2025)

Under review

2024 ESIF Economics and AI+ML Meeting.

[ Abstract ] [ arXiv ]

We study dynamic joint assortment and pricing where a seller updates decisions at regular accounting/operating intervals to maximize the cumulative per-period revenue over a horizon T. In many settings, assortment and prices affect not only what an arriving customer buys but also how many customers arrive within the period, whereas classical multinomial logit (MNL) models assume arrivals as fixed, potentially leading to suboptimal decisions. We propose a Poisson–MNL model that couples a contextual MNL choice model with a Poisson arrival model whose rate depends on the offered assortment and prices. Building on this model, we develop an efficient algorithm PMNL based on the idea of upper confidence bound (UCB). We establish its (near) optimality by proving a non-asymptotic regret bound of order sqrt(Tlog(T)) and a matching lower bound (up to log(T)). Simulation studies underscore the importance of accounting for the dependency of arrival rates on assortment and pricing: PMNL effectively learns customer choice and arrival models and provides joint assortment-pricing decisions that outperform others that assume fixed arrival rates.

13. Dash-M5H: An interactive dashboard for multi-modal, multi-model mental health assessment

Raymond Alavo, Xinyuan Zhang, Gemza Ademaj, Junhui Cai, Hyeokhyen Kwon, Robert Cotes, Gari Clifford, Ahmed Abbassi (2025)

Under review

14. Personalized reinforcement learning: With applications to sepsis management in ICU

Junhui Cai, Ran Chen, Martin Wainwright, Linda Zhao (2025)

[ Abstract ]

Reinforcement learning (RL) has achieved remarkable success across various domains; however, its applicability is often hampered by challenges in practicality and interpretability. Many real-world applications, such as in healthcare and business settings, have large and/or continuous state and action spaces and demand personalized solutions. In addition, the interpretability of the model is crucial to decision-makers so as to guide their decision-making process while incorporating their domain knowledge. To bridge this gap, we propose a personalized reinforcement learning framework that integrates personalized information into the state-transition and reward-generating mechanisms. We develop an online RL algorithm for our framework. Specifically, our algorithm learns the embeddings of the personalized state-transition distribution in a Reproduction Kernel Hilbert Space (RKHS) by balancing the exploitation-exploration trade-off. We further provide the regret bound of the algorithm and demonstrate its effectiveness in recommender systems.

15. INTFACT: Theory-guided in-context learning via parallel representation for LLM-based health assessment

Xinyuan Zhang, Junhui Cai, Brent Kitchens, Reza Mousavi, Ahmed Abbassi (2025)

INFORMS Workshop on Data Science 2025.

[ Abstract ]

The challenging and deepening mental health crisis, the increasing availability of textual data from online platforms, and the advancement of large language models (LLMs) present challenges and opportunities for LLM-based health assessment with textual data. The emerging learning paradigm in-context learning (ICL), while achieving great performance in textual assessment tasks, still presents several challenges. One main challenge is the high sensitivity of ICL performance to examples selected, while example quality is hard to quantify. Another layer of challenge is how text data is represented. While pretrained language models (PLMs) are commonly used, embeddings generated from PLMs are general and lack domain-specific considerations. In this study, we propose INTFACT, a theory-guided ICL framework via tensor-based parallel representation and factorization. The first part of the framework aims to develop document-level embeddings that parsimoniously capture context-aware semantic characteristics. Building on linguistic and social science theories, we generate parallel representations for each document, essentially converting each text input into a token-lexicon feature matrix. We then construct a count-based document-segment-feature tensor that effectively represents highly granular linguistic information at the document-segment level. We generate low-dimensional latent factors using tensor decomposition methods, and create document-level embeddings with decomposition outputs. The second part of INTFACT proposes a retrieval strategy from a global perspective, where we pre-cluster documents with tensor-based embeddings and retrieve examples based on both embedding similarities and cluster assignments. We evaluate our framework over a series of text-based mental health classification experiments on user-generated messages, and compare our method to common baseline methods. We demonstrate that our method outperforms baseline methods in greater prediction performance, especially with larger sizes of examples retrieved.

16. Ethical Adhocracies

Junhui Cai, Matthew Coetzee (2025)

Under review

Academy of Management Annual Meeting (AOM 2026).

17. Nonparametric empirical Bayes estimation and testing for sparse and heteroscedastic signals

Junhui Cai, Xu Han, Ya'acov Ritov, Linda Zhao (2021)

arXiv:2106.08881

[ Abstract ] [ arXiv ]

Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters. It is desirable to identify the sparse signals, ``the needles in the haystack'', with accuracy and false discovery control. However, the unprecedented complexity and heterogeneity in modern data structure require new machine learning tools to effectively exploit commonalities and to robustly adjust for both sparsity and heterogeneity. In addition, estimates for high-dimensional parameters often lack uncertainty quantification. In this paper, we propose a novel Spike-and-Nonparametric mixture prior (SNP) -- a spike to promote the sparsity and a nonparametric structure to capture signals. In contrast to the state-of-the-art methods, the proposed methods solve the estimation and testing problem at once with several merits: 1) an accurate sparsity estimation; 2) point estimates with shrinkage/soft-thresholding property; 3) credible intervals for uncertainty quantification; 4) an optimal multiple testing procedure that controls false discovery rate. Our method exhibits promising empirical performance on both simulated data and a gene expression case study.

18. Microscopic dynamics of equity ownership networks in China

Junhui Cai, Xian Gu, Linda Zhao, Wu Zhu (2021)

[ Abstract ]

19. All of Linear Regression

Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai (2019)

arXiv:1910.06386

[ Abstract ] [ Paper ] [ arXiv ]

Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),...,(X_n,Y_n)$ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge and what is the limit? What happens if the dimension is allowed to grow with $n$? What happens if the observations are dependent with dependence possibly strengthening with $n$? How to do statistical inference under these kinds of misspecification? What happens to OLS estimator under variable selection? How to do inference under misspecification and variable selection? We answer all the questions raised above with one simple deterministic inequality which holds for any set of observations and any sample size. This implies that all our results are finite sample (non-asymptotic) in nature. At the end, one only needs to bound certain random quantities under specific settings of interest to get concrete rates and we derive these bounds for the case of independent observations. In particular the problem of inference after variable selection is studied, for the first time, when $d$, the number of covariates increases (almost exponentially) with sample size $n$. We provide comments on the ``right'' statistic to consider for inference under variable selection and efficient computation of quantiles.

Working papers

[ Top ]

20. Sensemaking in multimodal, multi-model environments: Designing support for remote mental health assessments

Gemza Ademaj, Xinyuan Zhang, Junhui Cai, Ahmed Abbassi, Saonee Sarker, Suprateek Sarker (2025)

[ Abstract ]

Remote mental health assessments routinely generate rich multimodal data in the form of video, audio, and text. This creates new opportunities to support psychiatrists who are overloaded with patients, face increasing documentation demands, and risk missing subtle behaviors or emotional cues. These multimodal data are processed by different machine learning models, creating the need to support sensemaking in complex, multimodal and multi-model environments. This research presents a theory-driven approach to designing sensemaking tools for remote mental health assessments. Drawing on Integrative Sensemaking Theory and Signal Detection Theory, the study derives a set of design requirements and instantiates them in a dashboard prototype to support sensemaking during mental health assessments. The evaluation employs a set of design validity measures to assess the artifact’s support of sensemaking, tracing clinician attention patterns, integrative sensemaking framings, and diagnostic outcomes.

21. Ownership structure in China's real estate sector

Junhui Cai, Xian Gu, Wu Zhu, Linda Zhao (2021)

[ Abstract ]

22. Valid post-selection inference for the average treatment effect with covariate adjustment in randomized experiments

Junhui Cai, Arun Kumar Kuchibhotla, Linda Zhao (2020)

[ Abstract ]

Randomized experiments are the fundamental tools to evaluate the treatment effect in many fields. Prior to the treatment assignment, the baseline covariates are often collected and can be incorporated into the analysis to improve the estimation efficiency. The efficiency gain from covariate adjustment might encourage attempts to hunt for covariates that maximize the efficiency of the treatment effect estimate. Such a kind of ``significance hunting'' can invalidate statistical inference due to the data-dependent selection. Luckily, the randomization makes an exception. we show that under a class of unbiased estimators of the average treatment effect, the inference remains valid after selecting for the estimator with minimum variance, provided with a consistent standard error. We adopt a model-free approach without imposing a parametric outcome model and solely depends on the randomization in treatment assignment.

23. Generalized Cp and a predictive model selection test in assumption-lean framework

Junhui Cai, Lawrence D. Brown, Arun Kumar Kuchibhotla, Linda Zhao (2020)

[ Abstract ]

The classical methods of variable selection based on the estimate of the out-of-sample prediction risk are designed under the Gauss-Markov model and thus are not justifiable under misspecification. The customary elbow rule based on the scree plot can be misleading and a formal testing procedure accompanying confidence intervals will be more desirable. We propose a model-free analog of Cp, generalized Cp (GCp), and a predictive model selection test based on GCp. This estimator can be shown to be asymptotically equivalent to the testing error based on an independent sample and is also asymptotically equivalent to the leave-one-out cross-validation estimator of the out-of-sample prediction risk. We are currently pursuing the optimality and properties of the model selection test.

24. Computation of PoSI statistics

Arun Kumar Kuchibhotla, Junhui Cai (2020)

[ Abstract ]

The use of covariate selection in modern data-driven modelling invalidates classifical statistical inference. The "PoSI methods" of Berk et al. (2013) and Bachoc et al. (2016) provide valid inference after arbitrary model selection but are computationally inefficient because it involves inference simultaneously over all models. Even in the linear regression problem, the number of operations required therein is $O(p2^{p-1})$ which is prohibitive for large $p$. We propose a continuum relaxation of the PoSI statistic is proposed. This relaxation allows the use of various maximization algorithms for functions on a continuous convex set which only requires at most logarithmic of the total number of models with guaranteed approximation error bounds provided.

25. Common versus idiosyncratic risk

Junhui Cai, Wu Zhu, Linda Zhao (2020)

[ Abstract ]

It is of great interest to dissect the driving forces of common movements, or co-movement, among correlated objects, such as asset prices and product sales. Two popular models are commonly used, a common factor model and a network model, to explain the co-movement. However, there exists no literature on simultaneously examining the relative importance of these two mechanisms. We develop a flexible model incorporating both common factors and networks. We investigate conditions under which the common factors and the network effects can be simultaneously identified. Applying our model to asset pricing, we evaluate the relative importance of the two mechanisms in the co-movement of asset returns.

26. Self-reported Chinese company data: Can it be trusted?

Junhui Cai, Edward Cai, Ann Harrison, Marshall Meyer, Linda Zhao, Minyuan Zhao (2018)

[ Abstract ]

The Annual Industrial Survey (AIS), dubbed as the "census data", has been used as the golden source for empirical firm-level economic and operational research. It covers a long time span (as early as 1992) and provides rich information including identification information, stocks, and flows. However, the self-reported nature cast doubts on its credibility. The goal of this paper is to determine the reliability of AIS by comparing with Orbis, another firm-level data source that has the largest collection of firms with detailed ownership information. Firms' ownership is of the particular interest of researchers and serves as one of the most important controlling variables in their analysis. We, therefore, examine the disparities of ownership between AIS and Orbis, namely state-owned, privately-owned or foreign. Among the firms that have ownership information on both sides, the matching rate of ownership information is as high as 90%, which proves the credibility of AIS ownership information. Careful comparisons of several controlling variables between the cohort of matched firms and the AIS general population show there is no systematic bias in the matched cohort.

Publications

Journal publications

[ Top ]

1. Doubly high-dimensional contextual bandits: An interpretable model with applications to assortment/pricing

Junhui Cai, Ran Chen, Martin Wainwright, Linda Zhao (2025)

Management Science (accepted)

Categories: High-dimensional statistics, Online Decision-making, Revenue Managment, Bandit, Machine learning

[ Abstract ] [ SSRN ]

2. Network regression and supervised centrality estimation

Junhui Cai, Dan Yang, Ran Chen, Haipeng Shen, Linda Zhao, Wu Zhu (2025)

Journal of American Statistical Association (accepted)

Categories: Network Analysis

[ Abstract ] [ Paper ] [ Published version ]

3. State ownership in China: An equity network perspective

Junhui Cai, Xian Gu, Linda Zhao, Wu Zhu (2025)

The Arc of the Chinese Economy (edited by Hanming Fang and Marshall Meyer), Cambridge University Press

Categories: Equity Holding Network, Ownership Structure

[ Abstract ] [ Published version ]

4. Hierarchical vintage sparse PCA. Discussion on the paper by Rohe and Zeng

Junhui Cai, Dan Yang, Wu Zhu, Linda Zhao (2023)

Journal of the Royal Statistical Society. Series B: Statistical Methodology

Categories: Network Analysis

[ Paper ] [ Published version ]

5. Practical issues concerning assumption-lean inference for generalized linear models. Discussion on the paper by Vansteelandt and Dukes

Elizabeth Ogburn, Junhui Cai, Arun Kumar Kuchibhotla, Richard Berk, Andreas Buja (2021)

Journal of the Royal Statistical Society. Series B: Statistical Methodology

Categories: Misspecification

[ Paper ] [ Published version ]

6. Valid post-selection inference in model-free linear regression

Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai, Edward I. George, Linda Zhao (2019)

Annals of Statistics

Categories: Post-selection Inference, High-dimensional Statistics, Dependent Data

[ Abstract ] [ Paper ] [ Published version ]

7. Statistical theory powering data science

Junhui Cai, Avishai Mandelbaum, Chaitra H Nagaraja, Haipeng Shen, Linda Zhao (2019)

Statistical Science

Categories: Misspecification

[ Abstract ] [ Paper ] [ Published version ]

Preprints

[ Top ]

8. Ownership network and firm growth: What do forty million companies tell about the Chinese economy?

Franklin Allen, Junhui Cai, Xian Gu, Jun Qian, Linda Zhao, Wu Zhu (2025)

Revsion, Management Science

China Financial Research Conference (CFRC) 2021 Best Paper Award (3 out of 534 papers).

Categories: Equity Holding Network, Network Analysis, Operations-finanace Interface

[ Abstract ] [ Paper ] [ SSRN ]

9. Towards a holistic representation of online customer journeys: A tensor-based framework

Xinyuan Zhang, Junhui Cai, Jingjing Li, Ahmed Abbassi (2025)

Revsion, Information Systems Research

INFORMS Workshop on Data Science 2023, 2024. INFORMS Conference on Information Systems and Technology (CIST) 2025. The 46th AIS International Conference on Information Systems (ICIS).

Categories: Online Decision-making, Revenue Managment, Nonparametric Statistics

[ Abstract ]

10. Centralization or decentralization? The evolution of state-ownership in China

Franklin Allen, Junhui Cai, Xian Gu, Jun Qian, Linda Zhao, Wu Zhu (2025)

Under review

China International Conference in Finance (CICF) 2021 XiYue Best Paper Award (2 out of 2065 papers).

Categories: Equity Holding Network, Ownership Structure

[ Abstract ] [ Paper ] [ SSRN ] [ VoxChina ]

11. AI as "Co-founder": GenAI for Entrepreneurship

Junhui Cai, Xian Gu, Liugang Sheng, Mengjia Xia, Linda Zhao, Wu Zhu (2025)

Under review

Categories: Equity Holding Network, Network Analysis, Operations-finanace Interface

[ Abstract ] [ Paper ] [ SSRN ]

12. Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Junhui Cai, Ran Chen, Qitao Huang, Linda Zhao, Wu Zhu (2025)

Under review

2024 ESIF Economics and AI+ML Meeting.

Categories: Online Decision-making, Revenue Managment

[ Abstract ] [ arXiv ]

13. Dash-M5H: An interactive dashboard for multi-modal, multi-model mental health assessment

Raymond Alavo, Xinyuan Zhang, Gemza Ademaj, Junhui Cai, Hyeokhyen Kwon, Robert Cotes, Gari Clifford, Ahmed Abbassi (2025)

Under review

Categories: Large Language Model, Online Decision-making

14. Personalized reinforcement learning: With applications to sepsis management in ICU

Junhui Cai, Ran Chen, Martin Wainwright, Linda Zhao (2025)

Categories: Online Decision-making, Revenue Managment, Nonparametric Statistics

[ Abstract ]

15. INTFACT: Theory-guided in-context learning via parallel representation for LLM-based health assessment

Xinyuan Zhang, Junhui Cai, Brent Kitchens, Reza Mousavi, Ahmed Abbassi (2025)

INFORMS Workshop on Data Science 2025.

INFORMS Workshop on Data Science 2023, 2024.
INFORMS Conference on Information Systems and Technology (CIST) 2025.
The 46th AIS International Conference on Information Systems (ICIS).