Three method extensions built on the core EM, each closing a documented gap in the mixture-of-quantile-regressions toolkit.
family = "expectile" / "mquantile").
Asymmetric-least-squares (Newey & Powell 1987) and asymmetric- Huber
(Breckling & Chambers 1988) component losses, fitted by IRLS through
new registry engines. Expectile components are crossing-free in the
asymmetry level by construction; M-quantile dials between the quantile
and the expectile.mixqr_pen()). A SCAD / adaptive-LASSO / LASSO / MCP
penalty on the weighted check-loss M-step (the quantile analogue of
Khalili & Chen 2007), with each component getting its own sparse
support, a mixture BIC path for tuning, and component pruning. The inner
solve reuses rqPen; selectedVars() reports the
active set per component.mixqr_nc()). Fits a vector of
quantile levels jointly with one latent classification shared across all
levels (a coupled E-step), closing the two problems Wu & Yao (2016,
sec. 5) leave open: cross-level classification ambiguity and
within-component crossing (repaired by monotone rearrangement,
Chernozhukov, Fernandez-Val & Galichon 2010).
sim_mixqr_cross() provides a crossing-exhibiting design for
demonstrations.First CRAN release.
Exported two extension-API building blocks,
weighted_rq() and constrained_kde(), so
companion packages (location-varying gating, non-crossing) can reuse the
component and error-density machinery without forking the core.
Post-review refinements (correctness and performance) addressing two independent adversarial peer reviews:
Constraint integrity (R1). The constrained KDE
now preserves the tau-quantile = 0 constraint in every feasible case:
the two-constant Hall-Presnell weights are used only when non-negative
and well-conditioned, otherwise a per-point empirical-likelihood tilt
(Hall & Presnell 1999) enforces the constraint (verified for tau in
{0.05, 0.5, 0.9, 0.95}). Genuinely infeasible (one-sided) components are
flagged via fit$diagnostics$constraint and a warning, never
silently mis-calibrated.
Faithful Algorithm 3.1 (R2). The stochastic-EM P-step now draws the mixing probabilities (rejection-sampled, eq. 3.4) and the error density (bootstrap), not only the regression coefficients.
Calibrated standard errors (R3). Sparsity SEs
are disclosed as classification-conditional (in summary());
under-supported components and rank-deficient weighted designs now warn;
se_method/se_conditional recorded.
kdEM performance (R4). The E-step uses O(n) grid
interpolation and the grid is built by a binned/FFT KDE
(stats::density), removing the O(n^2) cost – kdEM is now
~3x ALD (was ~220x), meeting the speed target.
Bounded separability diagnostic (R6).
mi_fraction is now a bounded trace ratio in [0, 1]
(previously could return ~1e14 on imbalanced clusters).
New responsibility-based overlap diagnostic
(fit$diagnostics$overlap), independent of the stochastic-EM
path.
Real data (R5). Ships the engine
dataset (Brinkman 1981 ethanol-combustion data, the Wu & Yao Fig. 5
example); the README example now uses it. Added a golden test
reproducing the Wu & Yao Table 1 simulation means.
Selection rigor (R7).
mixqr_select() gains criterion = "cv" (K-fold
cross-validated held-out predictive log-likelihood) that PENALISES
complexity and works for either engine; AIC/BIC selection now emits the
mixture-boundary caveat and the ALD likelihood is labelled a working
likelihood.
Slope-based identifiability (R8). Default label ordering is now by slope (aligned with Wu & Yao Thm 2.1’s distinct-slope condition), and the distinctness guard uses a scale-relative threshold.
Robustness / UX (R10). Rank-deficient
(collinear) designs now error clearly; added
confint.mixqr() (Wald intervals).
Calibrated standard errors. The sparsity
variance now reads f(0) off a kernel density estimate of
the component residuals (Wu & Yao 2016, p.166) rather than the ALD
working density. A Monte-Carlo benchmark
(inst/benchmarks/se_coverage.R) shows
variance = "stochEM" now achieves ~95% (near-nominal)
coverage for the regression coefficients, up from ~67-77%; the
mixing-probability intervals reflect the documented finite-sample
pi-bias.
Diagnostics & docs. New mixqr()
help sections on the Wu & Yao sec.6 semiparametric bias and on
standard-error validity;
predict(type = "quantile_byclass"); component-collapse
and ALD non-monotonicity warnings; fit$total_iter (total EM
iterations across starts). Removed the dead package URL.
Documentation site. A full pkgdown website with
a comprehensive applied tutorial (“A Tutorial on Mixtures of Quantile
Regressions”) featuring publication-ready ggplot2 visualizations, a
get-started vignette, and a validation & diagnostics article. Added
inst/CITATION, author/affiliation metadata, and documented
every exported method.
First release. The frequentist EM substrate (sub-project 01 of the QMM suite).
mixqr() fits finite mixtures of tau-quantile
regressions with two engines: "ald" (fast parametric
asymmetric-Laplace mixture, genuine likelihood + AIC/BIC) and
"kdEM" (Wu & Yao 2016 kernel-density EM with
nonparametric component error densities, unequal or pooled), via a
generic pluggable EM driver mixqr_em().V_W + (1 + 1/B) V_B) with a cluster-separability
diagnostic.mixqr_select() for component-count selection
(AIC/BIC).sim_mixqr2() /
sim_mixqr3() reproducing the Wu & Yao 2- and
3-component designs.register_mixqr_engine())
and reserved diagnostics$crossing /
diagnostics$class_stability slots — the integration channel
for QMM sub-projects 03 (gating) and 04 (non-crossing).Note: this v0.1 is pure R. Rcpp acceleration of the KDE/E-step hot loops is planned.