Recent UW theory papers

Theory   |   Crypto   |   Quantum   |   CSE
Theory   |   CSE   |   Courses   |   Papers   |   News  
  • 2026-04-23 Sampling from the Hardcore Model on Random Regular Bipartite Graphs above the Uniqueness Threshold. Nicholas Kocurek, Shayan Oveis Gharan, and Dante Tjowasi.

    Summary: We design an efficient sampling algorithm to generate samples from the hardcore model on random regular bipartite graphs as long as $λ\lesssim \frac{1}{\sqrtΔ}$, where $Δ$ is the degree. Combined with recent work of Jenssen, Keevash and Perkins this implies an FPRAS for the partition function of the hardcore model on random regular bipartite graphs at any fugacity. Our algorithm is shown by analyzing two new Markov chains that work in complementary regimes. Our proof then proceeds by showing the corresponding simplicial complexes are top-link spectral expanders and appealing to the trickle-down theorem to prove fast mixing.

  • 2026-04-09 The Boolean surface area of polynomial threshold functions. Fan Chang, Joseph Slote, Alexander Volberg, and Haonan Zhang.

    Summary: Polynomial threshold functions (PTFs) are an important low-complexity class of Boolean functions, with strong connections to learning theory and approximation theory. Recent work on learning and testing PTFs has exploited structural and isoperimetric properties of the class, especially bounds on average sensitivity, one of the central themes in the study of PTFs since the Gotsman--Linial conjecture. In this work we study PTFs through the lens of the Boolean surface area (or Talagrand boundary) \[ \mathbf{BSA}[f]=\mathbb{E}|\nabla f|=\mathbb{E}\sqrt{s_{f}(x)}, \] a natural measure of vertex-boundary complexity on the discrete cube. Our main result is that every degree-$d$ PTF has polylogarithmic Boolean surface area: \[ \mathbf{BSA}[f]\le C_d(\log(en))^{C_d}. \] The proof is based on the PTF Restriction Lemma of Kabanets, Kane, and Lu \cite{KKL2017} and proceeds through a tail bound for the pointwise sensitivity. In particular, it controls all subcritical fractional moments of the sensitivity. We also record a random block partition principle for Boolean surface area and an alternative recursive argument following Kane's work \cite{DK} on average sensitivity, which independently yields the weaker bound \[ \mathbf{BSA}[f]\le \exp(C_d\sqrt{\log n}). \]

  • 2026-03-22 Playing Tag with Okamoto-Schnorr: Three-Move Pairing-Free Blind Signatures from DDH. Rutchathon Chairattana-Apirom, Michael Reichle, and Stefano Tessaro.

    Summary: This paper presents the first blind signature scheme in a pairing-free group with the following properties: (1) the signing protocol consists of only three moves; (2) the proof of one-more unforgeability relies solely on the Decisional Diffie-Hellman (DDH) assumption in the Random Oracle Model (ROM); and (3) the construction makes only black-box use of the underlying group. This resolves a major open problem in the area, as all prior pairing-free blind signatures either additionally relied on the Algebraic Group Model (AGM) or required at least four moves. Moreover, a recent lower bound by Dietz et al. (ePrint, '26) shows that three moves are optimal for such constructions. Both the communication complexity and the signature size in our scheme consist of a constant number of group elements. Our construction in fact achieves strong one-more unforgeability (which was not known for any of the recent AGM-free constructions requiring four moves), and we also present a partially blind variant. Furthermore, blindness is statistical (in the ROM). Our approach is based on a new construction paradigm that combines a conventional (yet, by itself, not fully secure) blind signature scheme (specifically, the blind Okamoto-Schnorr scheme) with a carefully crafted algebraic MAC.

  • 2026-03-09 SIMD HSS and aHMAC from Interval Encoding with Application to One-Bit-Per-Gate Garbling. Jaehyung Kim, Hanjun Li, Huijia Lin, and Zeyu Liu.

    Summary: Primitives enabling homomorphic computation over secret-shared values--Homomorphic Secret Sharing (HSS) and algebraic Homomorphic MACs (aHMAC)--have recently emerged as efficient alternatives to ciphertext-based primitives such as fully homomorphic encryption (FHE) and attribute-based encryption (ABE). Leveraging the distributed nature of secret sharing, direct constructions of HSS and aHMAC are simple, lightweight, avoid costly bootstrapping, and have many applications including one-bit-per-gate garbled circuits. Despite encouraging progress, all existing direct schemes still lack one key feature: efficient Single Instruction Multiple Data (SIMD) evaluation, a capability that has been critical to the efficiency of FHE. This gap leaves the potential of substantial efficiency improvements untapped. We present the first SIMD evaluation techniques for HSS and aHMAC, based on variants of the RLWE assumption. Using a new interval coefficient encoding, our approach embeds $\sqrt{n}$ integer-valued slots per ring element and supports $\sqrt{n}$-fold batch addition and multiplication in just $O(\log n)$ ring operations, achieving a multiplicative $\tilde O(\sqrt{n})$ improvement in amortized efficiency over prior direct constructions. Building on top of these improvements, we show a streamlined one-bit-per-gate SIMD garbling scheme with similar efficiency gains in the online phase. Our efficiency gains are concrete. Concrete operation counts and microbenchmark based estimates show $6\times$--$10\times$ improvements in amortized multiplication cost over prior non-SIMD constructions, with up to $25\times$--$50\times$ speedups for aggregation-heavy workloads such as matrix--vector multiplication. These results demonstrate the practical potential of SIMD techniques for secret-sharing-based homomorphic computation.

  • 2026-03-02 Tweed: Adaptively Secure Lattice-Based Two-Round Threshold Signatures. Kaijie Jiang, Stefano Tessaro, Hoeteck Wee, and Chenzhi Zhu.

    Summary: This paper gives the first lattice-based two-round threshold signature scheme that tolerates the adaptive corruption of up to $T -1$ out of $N$ signers. Our construction is based on the MLWE and MSIS assumptions. We substantially improve upon the only existing adaptively secure lattice-based construction, recently given by Katsumata, Reichle, and Takemure (CRYPTO '24), which requires five rounds.

  • 2026-02-18 Separating Oblivious and Adaptive Models of Variable Selection. Ziyun Chen, Jerry Li, Kevin Tian, and Yusong Zhu.

    Summary: Sparse recovery is among the most well-studied problems in learning theory and high-dimensional statistics. In this work, we investigate the statistical and computational landscapes of sparse recovery with $\ell_\infty$ error guarantees. This variant of the problem is motivated by \emph{variable selection} tasks, where the goal is to estimate the support of a $k$-sparse signal in $\mathbb{R}^d$. Our main contribution is a provable separation between the \emph{oblivious} (``for each'') and \emph{adaptive} (``for all'') models of $\ell_\infty$ sparse recovery. We show that under an oblivious model, the optimal $\ell_\infty$ error is attainable in near-linear time with $\approx k\log d$ samples, whereas in an adaptive model, $\gtrsim k^2$ samples are necessary for any algorithm to achieve this bound. This establishes a surprising contrast with the standard $\ell_2$ setting, where $\approx k \log d$ samples suffice even for adaptive sparse recovery. We conclude with a preliminary examination of a \emph{partially-adaptive} model, where we show nontrivial variable selection guarantees are possible with $\approx k\log d$ measurements.

  • 2026-02-17 Unforgeable Watermarks for Language Models via Robust Signatures. Huijia Lin, Kameron Shahabi, and Min Jae Song.

    Summary: Language models now routinely produce text that is difficult to distinguish from human writing, raising the need for robust tools to verify content provenance. Watermarking has emerged as a promising countermeasure, with existing work largely focused on model quality preservation and robust detection. However, current schemes provide limited protection against false attribution. We strengthen the notion of soundness by introducing two novel guarantees: unforgeability and recoverability. Unforgeability prevents adversaries from crafting false positives, texts that are far from any output from the watermarked model but are nonetheless flagged as watermarked. Recoverability provides an additional layer of protection: whenever a watermark is detected, the detector identifies the source text from which the flagged content was derived. Together, these properties strengthen content ownership by linking content exclusively to its generating model, enabling secure attribution and fine-grained traceability. We construct the first undetectable watermarking scheme that is robust, unforgeable, and recoverable with respect to substitutions (i.e., perturbations in Hamming metric). The key technical ingredient is a new cryptographic primitive called robust (or recoverable) digital signatures, which allow verification of messages that are close to signed ones, while preventing forgery of messages that are far from all previously signed messages. We show that any standard digital signature scheme can be boosted to a robust one using property-preserving hash functions (Boyle, LaVigne, and Vaikuntanathan, ITCS 2019).

  • 2026-02-12 The Power of Two Bases: Robust and copy-optimal certification of nearly all quantum states with few-qubit measurements. Andrea Coladangelo, Jerry Li, Joseph Slote, and Ellen Wu.

    Summary: A central task in quantum information science is state certification: testing whether an unknown state is $ε_1$-close to a fixed target state, or $ε_2$-far. Recent work has shown that surprisingly simple measurement protocols--comprising only single-qubit measurements--suffice to certify arbitrary $n$-qubit states [Huang, Preskill, Soleimanifar '25; Gupta, He, O'Donnell '25]. However, these certification protocols are not robust: rather than allowing constant $ε_1$, they can only positively certify states within $ε_1=O(1/n)$ trace distance of the target. In many experimental settings, the appropriate error tolerance is constant as the system size grows, so this lack of robustness renders existing tests inapplicable at scale, no matter how many times the test is repeated. Here we present robust certification protocols based on few-qubit measurements that apply to all but a $O(2^{-n})$-fraction of pure target states. Our first protocol achieves constant robustness, i.e. $ε_1=Θ(1)$, using a single $O(\log n)$-qubit measurement along with single-qubit measurements in the $Z$ or $X$ basis on the other qubits. As a corollary of its robustness, this protocol also achieves constant (in $n$) copy complexity, which is optimal. Our second protocol uses exclusively single-qubit measurements and is nearly robust: $ε_1=Ω(1/\log n)$. Our tests are based on a new uncertainty principle for conditional fidelities, which may be of independent interest.

  • 2026-01-13 Obfuscation of Arbitrary Quantum Circuits. Miryam Mi-Ying Huang and Er-Cheng Tang.

    Summary: Program obfuscation aims to conceal a program's internal structure while preserving its functionality. A central open problem is whether an obfuscation scheme for arbitrary quantum circuits exists. Despite several efforts having been made toward this goal, prior works have succeeded only in obfuscating quantum circuits that implement either pseudo-deterministic functions or unitary transformations. Although unitary transformations already include a broad class of quantum computation, many important quantum tasks, such as state preparation and quantum error-correction, go beyond unitaries and fall within general completely positive trace-preserving maps. In this work, we construct the first quantum ideal obfuscation scheme for arbitrary quantum circuits that support quantum inputs and outputs in the classical oracle model assuming post-quantum one-way functions, thereby resolving an open problem posed in Bartusek et al. (STOC 2023), Bartusek, Brakerski, and Vaikuntanathan (STOC 2024), and Huang and Tang (FOCS 2025). At the core of our construction lies a novel primitive that we introduce, called the subspace-preserving strong pseudorandom unitary (spsPRU). An spsPRU is a family of efficient unitaries that fix every vector in a given linear subspace $S$, while acting as a Haar random unitary on the orthogonal complement $S^\perp$ under both forward and inverse oracle queries. Furthermore, by instantiating the classical oracle model with the ideal obfuscation scheme for classical circuits proposed by Jain et al. (CRYPTO 2023) and later enhanced by Bartusek et al. (arxiv:2510.05316), our obfuscation scheme can also be realized in the quantumly accessible pseudorandom oracle model.

  • 2026-01-09 Rigorous Implications of the Low-Degree Heuristic. Jun-Ting Hsieh, Daniel M. Kane, Pravesh K. Kothari, Jerry Li, Sidhanth Mohanty, and Stefan Tiegel.

    Summary: Over the past decade, the low-degree heuristic has been used to estimate the algorithmic thresholds for a wide range of average-case planted vs null distinguishing problems. Such results rely on the hypothesis that if the low-degree moments of the planted and null distributions are sufficiently close, then no efficient (noise-tolerant) algorithm can distinguish between them. This hypothesis is appealing due to the simplicity of calculating the low-degree likelihood ratio (LDLR) -- a quantity that measures the similarity between low-degree moments. However, despite sustained interest in the area, it remains unclear whether low-degree indistinguishability actually rules out any interesting class of algorithms. In this work, we initiate the study and develop technical tools for translating LDLR upper bounds to rigorous lower bounds against concrete algorithms. As a consequence, we prove: for any permutation-invariant distribution $\mathsf{P}$, 1. If $\mathsf{P}$ is over $\{0,1\}^n$ and is low-degree indistinguishable from $U = \mathrm{Unif}(\{0,1\}^n)$, then a noisy version of $\mathsf{P}$ is statistically indistinguishable from $U$. 2. If $\mathsf{P}$ is over $\mathbb{R}^n$ and is low-degree indistinguishable from the standard Gaussian ${N}(0, 1)^n$, then no statistic based on symmetric polynomials of degree at most $O(\log n/\log \log n)$ can distinguish between a noisy version of $\mathsf{P}$ from ${N}(0, 1)^n$. 3. If $\mathsf{P}$ is over $\mathbb{R}^{n\times n}$ and is low-degree indistinguishable from ${N}(0,1)^{n\times n}$, then no constant-sized subgraph statistic can distinguish between a noisy version of $\mathsf{P}$ and ${N}(0, 1)^{n\times n}$.

  • 2026-01-07 Exact Bounds for Forbidden Configurations and the Extremal Matrices. Richard P. Anstee, Oakley Edens, Arvin Sahami, Jaehwan Seok, and Attila Sali.

    Summary: Let $F$ be a $k\times \ell$ (0,1)-matrix. A matrix is simple if it is a (0,1)-matrix with no repeated columns. A (0,1)-matrix $A$ is said to have a $F$ as a configuration if there is a submatrix of $A$ which is a row and column permutation of $F$. In the language of sets, a configuration is a trace. Let $\mathrm{Avoid}(m,F)$ be all simple $m$-rowed matrices $A$ with no configuration $F$. Define $\mathrm{forb}(m,F)$ as the maximum number of columns of any matrix in $\mathrm{Avoid}(m,F)$. The $2\times (p+1)$ (0,1)-matrix $F(0,p,1,0)$ consists of a row of $p$ 1's and a row of one 1 in the remaining column. The paper determines $\mathrm{forb}(m,F(0,p,1,0))$ for $1\le p\le 9$ and the extremal matrices are characterized. A construction may be extremal for all $p$.

  • 2025-12-07 Ideal Attribution and Faithful Watermarks for Language Models. Min Jae Song and Kameron Shahabi.

    Summary: We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of the prompt-response interaction history between a model and its user. Each mechanism produces deterministic decisions based on the ledger and an explicit selection criterion, making it well-suited to serve as a ground truth for attribution. We frame the design goal of watermarking schemes as faithful representation of ideal attribution mechanisms. This novel perspective brings conceptual clarity, replacing piecemeal probabilistic statements with a unified language for stating the guarantees of each scheme. It also enables precise reasoning about desiderata for future watermarking schemes, even when no current construction achieves them, since the ideal functionalities are specified first. In this way, the framework provides a roadmap that clarifies which guarantees are attainable in an idealized setting and worth pursuing in practice.

  • 2025-11-21 High-Accuracy List-Decodable Mean Estimation. Ziyun Chen, Spencer Compton, Daniel Kane, and Jerry Li.

    Summary: In list-decodable learning, we are given a set of data points such that an $α$-fraction of these points come from a nice distribution $D$, for some small $α\ll 1$, and the goal is to output a short list of candidate solutions, such that at least one element of this list recovers some non-trivial information about $D$. By now, there is a large body of work on this topic; however, while many algorithms can achieve optimal list size in terms of $α$, all known algorithms must incur error which decays, in some cases quite poorly, with $1 / α$. In this paper, we ask if this is inherent: is it possible to trade off list size with accuracy in list-decodable learning? More formally, given $ε> 0$, can we can output a slightly larger list in terms of $α$ and $ε$, but so that one element of this list has error at most $ε$ with the ground truth? We call this problem high-accuracy list-decodable learning. Our main result is that non-trivial high-accuracy guarantees, both information-theoretically and algorithmically, are possible for the canonical setting of list-decodable mean estimation of identity-covariance Gaussians. Specifically, we demonstrate that there exists a list of candidate means of size at most $L = \exp \left( O\left( \tfrac{\log^2 1 / α}{ε^2} \right)\right)$ so that one of the elements of this list has $\ell_2$ distance at most $ε$ to the true mean. We also design an algorithm that outputs such a list with runtime and sample complexity $n = d^{O(\log L)} + \exp \exp (\widetilde{O}(\log L))$. We do so by demonstrating a completely novel proof of identifiability, as well as a new algorithmic way of leveraging this proof without the sum-of-squares hierarchy, which may be of independent technical interest.

  • 2025-11-12 Separating QMA from QCMA with a classical oracle. John Bostanci, Jonas Haferkamp, Chinmay Nirkhe, and Mark Zhandry.

    Summary: We construct a classical oracle proving that, in a relativized setting, the set of languages decidable by an efficient quantum verifier with a quantum witness (QMA) is strictly bigger than those decidable with access only to a classical witness (QCMA). The separating classical oracle we construct is for a decision problem we coin spectral Forrelation -- the oracle describes two subsets of the boolean hypercube, and the computational task is to decide if there exists a quantum state whose standard basis measurement distribution is well supported on one subset while its Fourier basis measurement distribution is well supported on the other subset. This is equivalent to estimating the spectral norm of a "Forrelation" matrix between two sets that are accessible through membership queries. Our lower bound derives from a simple observation that a query algorithm with a classical witness can be run multiple times to generate many samples from a distribution, while a quantum witness is a "use once" object. This observation allows us to reduce proving a QCMA lower bound to proving a sampling hardness result which does not simultaneously prove a QMA lower bound. To prove said sampling hardness result for QCMA, we observe that quantum access to the oracle can be compressed by expressing the problem in terms of bosons -- a novel "second quantization" perspective on compressed oracle techniques, which may be of independent interest. Using this compressed perspective on the sampling problem, we prove the sampling hardness result, completing the proof.

  • 2025-11-10 How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets. Chung Peng Lee, Rachel Hong, Harry H. Jiang, Aster Plotnik, William Agnew, and Jamie Morgenstern.

    Summary: The internet has become the main source of data to train modern text-to-image or vision-language models, yet it is increasingly unclear whether web-scale data collection practices for training AI systems adequately respect data owners' wishes. Ignoring the owner's indication of consent around data usage not only raises ethical concerns but also has recently been elevated into lawsuits around copyright infringement cases. In this work, we aim to reveal information about data owners' consent to AI scraping and training, and study how it's expressed in DataComp, a popular dataset of 12.8 billion text-image pairs. We examine both the sample-level information, including the copyright notice, watermarking, and metadata, and the web-domain-level information, such as a site's Terms of Service (ToS) and Robots Exclusion Protocol. We estimate at least 122M of samples exhibit some indication of copyright notice in CommonPool, and find that 60\% of the samples in the top 50 domains come from websites with ToS that prohibit scraping. Furthermore, we estimate 9-13\% with 95\% confidence interval of samples from CommonPool to contain watermarks, where existing watermark detection methods fail to capture them in high fidelity. Our holistic methods and findings show that data owners rely on various channels to convey data consent, of which current AI data collection pipelines do not entirely respect. These findings highlight the limitations of the current dataset curation/release practice and the need for a unified data consent framework taking AI purposes into consideration.

  • 2025-11-06 Optimal Inference Schedules for Masked Diffusion Models. Sitan Chen, Kevin Cong, and Jerry Li.

    Summary: A major bottleneck of standard auto-regressive large language models is that their inference process is inherently sequential, resulting in very long and costly inference times. To circumvent this, practitioners proposed a class of language models called diffusion language models, of which the masked diffusion model (MDM) is the most successful. The MDM is able to sample tokens out-of-order and, ostensibly, many tokens at once and in parallel. However, there is very limited rigorous understanding of how much parallel sampling these models can perform without noticeable degradation in their sampling performance. Prior work of Li and Cai obtained some preliminary bounds, but these are not tight for many natural classes of distributions. In this work, we give a new, exact characterization of the expected divergence between the true distribution and the sampled distribution, for any distribution and any unmasking schedule for the sampler, showing an elegant connection to the theory of univariate function approximation. By leveraging this connection, we then attain a number of novel lower and upper bounds for this problem. While the connection to function approximation in principle gives the optimal unmasking schedule for any distribution, we show that it is in general impossible to compete with it without strong a priori knowledge of the distribution, even in seemingly benign settings. However, we also demonstrate new upper bounds and new sampling schedules in terms of well-studied information-theoretic properties of the base distribution, namely, its total correlation and dual total correlation, which show that in some natural settings, one can sample in $O(log n)$ steps without any visible loss in performance, where $n$ is the total sequence length.

  • 2025-11-04 Spectral Certificates and Sum-of-Squares Lower Bounds for Semirandom Hamiltonians. Nicholas Kocurek.

    Summary: The $k$-$\mathsf{XOR}$ problem is one of the most well-studied problems in classical complexity. We study a natural quantum analogue of $k$-$\mathsf{XOR}$, the problem of computing the ground energy of a certain subclass of structured local Hamiltonians, signed sums of $k$-local Pauli operators, which we refer to as $k$-$\mathsf{XOR}$ Hamiltonians. As an exhibition of the connection between this model and classical $k$-$\mathsf{XOR}$, we extend results on refuting $k$-$\mathsf{XOR}$ instances to the Hamiltonian setting by crafting a quantum variant of the Kikuchi matrix for CSP refutation, instead capturing ground energy optimization. As our main result, we show an $n^{O(\ell)}$-time classical spectral algorithm certifying ground energy at most $\frac{1}{2} + \varepsilon$ in (1) semirandom Hamiltonian $k$-$\mathsf{XOR}$ instances or (2) sums of Gaussian-signed $k$-local Paulis both with $O(n) \cdot \left(\frac{n}{\ell}\right)^{k/2-1} \log n /\varepsilon^4$ local terms, a tradeoff known as the refutation threshold. Additionally, we give evidence this tradeoff is tight in the semirandom regime via non-commutative Sum-of-Squares lower bounds embedding classical $k$-$\mathsf{XOR}$ instances as entirely classical Hamiltonians.

  • 2025-10-23 Excluding a Line Minor via Design Matrices and Column Number Bounds for the Circuit Imbalance Measure. Daniel Dadush, Friedrich Eisenbrand, Rom Pinchasi, Thomas Rothvoss, and Neta Singer.

    Summary: For a real matrix $A \in \mathbb{R}^{d \times n}$ with non-collinear columns, we show that $n \leq O(d^4 κ_A)$ where $κ_A$ is the \emph{circuit imbalance measure} of $A$. The circuit imbalance measure $κ$ is a real analogue of $Δ$-modularity for integer matrices, satisfying $κ_A \leq Δ_A$ for integer $A$. The circuit imbalance measure has numerous applications in the context of linear programming (see Ekbatani, Natura and V{é}gh (2022) for a survey). Our result generalizes the $O(d^4 Δ_A)$ bound of Averkov and Schymura (2023) for integer matrices and provides the first polynomial bound holding for all parameter ranges on real matrices. To derive our result, similar to the strategy of Geelen, Nelson and Walsh (2021) for $Δ$-modular matrices, we show that real representable matroids induced by $κ$-bounded matrices are minor closed and exclude a rank $2$ uniform matroid on $O(κ)$ elements as a minor (also known as a line of length $O(κ)$). As our main technical contribution, we show that any simple rank $d$ complex representable matroid which excludes a line of length $l$ has at most $O(d^4 l)$ elements. This complements the tight bound of $(l-3)\binom{d}{2} + d$ for $l \geq 4$, of Geelen, Nelson and Walsh which holds when the rank $d$ is sufficiently large compared to $l$ (at least doubly exponential in $l$).

  • 2025-10-09 Agnostic Product Mixed State Tomography via Robust Statistics. Alvan Arulandu, Ilias Diakonikolas, Daniel Kane, and Jerry Li.

    Summary: We study the complexity of two closely related learning problems, one quantum and one classical. In the quantum setting, we consider agnostic tomography for the natural class of product mixed states. Given $N$ copies of an $n$-qubit state $ρ$, the goal is to output a nearly optimal product mixed state approximation in trace distance. While recent work has focused on pure-state ansatz (e.g., product or stabilizer states), no polynomial-time guarantees were previously known for mixed-state ansatz. In the classical setting, we study robust learning of binary product distributions: given samples from an unknown distribution on ${0,1}^n$, the goal is to output a nearly optimal product approximation. Our main contributions are as follows. (1) We give a semi-agnostic tomography algorithm for product mixed states with polynomial sample and computational complexity achieving error $O(\mathrm{opt}\log(1/\mathrm{opt}))$, where $\mathrm{opt}$ is the trace distance to the best product approximation. This is the first efficient algorithm with any nontrivial agnostic guarantee for mixed-state ansatz, using only single-qubit, single-copy measurements. We also prove a Quantum Statistical Query lower bound showing near-optimality, and an unconditional lower bound demonstrating that adaptivity is necessary under single-qubit measurements. (2) We give a semi-agnostic algorithm for robustly learning binary product distributions with matching guarantees and establish a Statistical Query lower bound, essentially resolving the efficient robust learnability of this class and improving on prior work since Diakonikolas et al. (2016).

  • 2025-10-09 A Meta-Complexity Characterization of Minimal Quantum Cryptography. Bruno Cavalar, Boyang Chen, Andrea Coladangelo, Matthew Gray, Zihan Hu, Zhengfeng Ji, and Xingjian Li.

    Summary: We give a meta-complexity characterization of EFI pairs, which are considered the "minimal" primitive in quantum cryptography (and are equivalent to quantum commitments). More precisely, we show that the existence of EFI pairs is equivalent to the following: there exists a non-uniformly samplable distribution over pure states such that the problem of estimating a certain Kolmogorov-like complexity measure is hard given a single copy. A key technical step in our proof, which may be of independent interest, is to show that the existence of EFI pairs is equivalent to the existence of non-uniform single-copy secure pseudorandom state generators (nu 1-PRS). As a corollary, we get an alternative, arguably simpler, construction of a universal EFI pair.

  • 2025-10-08 Trickle-down Theorems via C-Lorentzian Polynomials II: Pairwise Spectral Influence and Improved Dobrushin's Condition. Jonathan Leake and Shayan Oveis Gharan.

    Summary: Let $μ$ be a probability distribution on a multi-state spin system on a set $V$ of sites; equivalently, a $d$-partite simplicial complex with distribution $μ$ on maximal faces. For any pair of vertices $u,v\in V$, define the pairwise spectral influence $\mathcal{I}_{u,v}$ as follows. Let $σ$ be a choice of spins $s_w\in S_w$ for every $w\in V\setminus\{u,v\}$, and construct a matrix in $\mathbb{R}^{(S_u\cup S_v)\times (S_u\cup S_v)}$ where for any $s_u\in S_u, s_v\in S_v$, the $(us_u,vs_v)$-entry is the probability that $s_v$ is the spin of $v$ conditioned on $s_u$ being the spin of $u$ and on $σ$. Then $\mathcal{I}_{u,v}$ is the maximal second eigenvalue of this matrix, over all choices of spins for all $w\in V\setminus\{u,v\}$. Equivalently, $\mathcal{I}_{u,v}$ is the maximum local spectral expansion of links of codimension $2$ that include a spin for every $w \in V \setminus \{u,v\}$. We show that if the largest eigenvalue of the pairwise spectral influence matrix with entries $\mathcal{I}_{u,v}$ is bounded away from 1, i.e. $λ_{\max}(\mathcal{I})\leq 1-ε$ (and $X$ is connected), then the Glauber dynamics mixes rapidly and generate samples from $μ$. This improves/generalizes the classical Dobrushin's influence matrix as the $\mathcal{I}_{u,v}$ lower-bounds the classical influence of $u\to v$. As an application, we prove that the Glauber dynamics mixes rapidly up to (approximately) the phase transition for the multi-state hardcore model--a widely studied model in telecommunication networks and statistical physics (generalizing the hardcore model) introduced by Mazel and Suhov. As a by-product of our results, we also prove improved/almost optimal trickle-down theorems for partite simplicial complexes. Our proof builds on the trickle-down theorems via $\mathcal{C}$-Lorentzian polynomials machinery recently developed by the authors and Lindberg.

  • 2025-10-06 Quantum precomputation: parallelizing cascade circuits and the Moore-Nilsson conjecture is false. Adam Bene Watts, Charles R. Chen, J. William Helton, and Joseph Slote.

    Summary: Parallelization is a major challenge in quantum algorithms due to physical constraints like no-cloning. This is vividly illustrated by the conjecture of Moore and Nilsson from their seminal work on quantum circuit complexity [MN01, announced 1998]: unitaries of a deceptively simple form--controlled-unitary "staircases"--require circuits of minimum depth $Ω(n)$. If true, this lower bound would represent a major break from classical parallelism and prove a quantum-native analogue of the famous NC $\neq$ P conjecture. In this work we settle the Moore-Nilsson conjecture in the negative by compressing all circuits in the class to depth $O(\log n)$, which is the best possible. The parallelizations are exact, ancilla-free, and can be computed in poly($n$) time. We also consider circuits restricted to 2D connectivity, for which we derive compressions of optimal depth $O(\sqrt{n})$. More generally, we make progress on the project of quantum parallelization by introducing a quantum blockwise precomputation technique somewhat analogous to the method of Arlazarov, Dinič, Kronrod, and Faradžev [Arl+70] in classical dynamic programming, often called the "Four-Russians method." We apply this technique to more-general "cascade" circuits as well, obtaining for example polynomial depth reductions for staircases of controlled $\log(n)$-qubit unitaries.

  • 2025-09-26 T-TAMER: Provably Taming Trade-offs in ML Serving. Yuanyuan Yang, Ruimin Zhang, Jamie Morgenstern, and Haifeng Xu.

    Summary: As machine learning models continue to grow in size and complexity, efficient serving faces increasingly broad trade-offs spanning accuracy, latency, resource usage, and other objectives. Multi-model serving further complicates these trade-offs; for example, in cascaded models, each early-exit decision balances latency reduction against potential accuracy loss. Despite the pervasiveness and importance of such trade-offs, current strategies remain largely heuristic and case-specific, limiting both their theoretical guarantees and general applicability. We present a general framework, T-Tamer, which formalizes this setting as a multi-stage decision process, where the objective is to determine both when to exit and which model to consult. Our main result shows that recall (i.e., the ability to revisit earlier models) is both necessary and sufficient for achieving provable performance guarantees. In particular, we prove that strategies without recall cannot obtain any constant-factor approximation to the optimal trade-off, whereas recall-based strategies provably attain the optimal trade-off in polynomial time. We validate our analysis through experiments on synthetic datasets and early-exit workloads for vision and NLP benchmarks. The results show that recall-based strategies consistently yield efficient accuracy-latency trade-offs. We hope this work provides a principled foundation for bridging heuristic practice with theoretical guarantees in the design of early-exit and cascaded models.

  • 2025-09-26 OptiMind: Teaching LLMs to Think Like Optimization Experts. Xinzhi Zhang, Zeyi Chen, Humishka Zope, Hugo Barbalho, Konstantina Mellou, Marco Molinaro, Janardhan Kulkarni, Ishai Menache, and Sirui Li.

    Summary: Mathematical programming -- the task of expressing operations and decision-making problems in precise mathematical language -- is fundamental across domains, yet remains a skill-intensive process requiring operations research expertise. Recent advances in large language models for complex reasoning have spurred interest in automating this task, translating natural language into executable optimization models. Current approaches, however, achieve limited accuracy, hindered by scarce and noisy training data without leveraging domain knowledge. In this work, we systematically integrate optimization expertise to improve formulation accuracy for mixed-integer linear programming, a key family of mathematical programs. Our OptiMind framework leverages semi-automated, class-based error analysis to guide both training and inference, explicitly preventing common mistakes within each optimization class. Our resulting fine-tuned LLM significantly improves formulation accuracy by 20.7% across multiple optimization benchmarks, with consistent gains under test-time scaling methods such as self-consistency and multi-turn feedback, enabling further progress toward robust LLM-assisted optimization formulation.

  • 2025-09-25 Semi-Random Graphs, Robust Asymmetry, and Reconstruction. Julian Asilis, Xi Chen, Dutch Hansen, and Shang-Hua Teng.

    Summary: The Graph Reconstruction Conjecture famously posits that any undirected graph on at least three vertices is determined up to isomorphism by its family of (unlabeled) induced subgraphs. At present, the conjecture admits partial resolutions of two types: 1) casework-based demonstrations of reconstructibility for families of graphs satisfying certain structural properties, and 2) probabilistic arguments establishing reconstructibility of random graphs by leveraging average-case phenomena. While results in the first category capture the worst-case nature of the conjecture, they play a limited role in understanding the general case. Results in the second category address much larger graph families, but it remains unclear how heavily the necessary arguments rely on optimistic distributional properties. Drawing on the perspectives of smoothed and semi-random analysis, we study the robustness of what are arguably the two most fundamental properties in this latter line of work: asymmetry and uniqueness of subgraphs. Notably, we find that various semi-random graph distributions exhibit these properties asymptotically, much like their Erdős-Rényi counterparts. In particular, Bollobás (1990) demonstrated that almost all Erdős-Rényi random graphs $G = (V, E) \sim \mathscr{G}(n, p)$ enjoy the property that their induced subgraphs on $n - Θ(1)$ vertices are asymmetric and mutually non-isomorphic, for $1 - p, p = Ω(\log(n) / n)$. We show that this property is robust against perturbation -- even when an adversary is permitted to add/remove each vertex pair in $V^{(2)}$ with (independent) arbitrarily large constant probability. Exploiting this result, we derive asymptotic characterizations of asymmetry in random graphs with planted structure and bounded adversarial corruptions, along with improved bounds on the probability mass of nonreconstructible graphs in $\mathscr{G}(n, p)$.

  • 2025-09-25 Lower Bounds for Learning Hamiltonians from Time Evolution. Ziyun Chen, Jerry Li, and Joseph Slote.

    Summary: Learning about a Hamiltonian $H$ from its time evolution $e^{-iHt}$ is a fundamental task in quantum science. A flurry of recent work has developed powerful new algorithms with provable guarantees for this task, for a variety of natural settings. Despite this, relatively little is known about lower bounds for learning Hamiltonians. In particular, in the natural setting where we assume $H$ is a $k$-local Hamiltonian on $n$ qubits, all existing algorithms require total evolution time at least $n^{Ω(k)}$ to learn the parameters of $H$, and it remained open whether one could obtain even faster algorithms -- or at the very least, if one could obtain better runtimes for simpler tasks, such as estimating a single designated coefficient of the Hamiltonian. In this work we show the answer is essentially no, by obtaining strong lower bounds for these problems. We find that not only do $k$-local Hamiltonians require $n^{Ω(k)}$ time evolution or interactions to learn, but also that in several senses, learning anything about a Hamiltonian is just as hard as learning everything. In particular, we find the same $n^{Ω(k)}$ lower bound holds for learning a single coefficient of a $k$-local Hamiltonian $H$, even if the rest of $H$ is already known. We also show an $n^{Ω(k)}$ lower bound for the task of effective Hamiltonian learning, where one seeks only to learn a unitary that approximately implements time evolution of $H$. Several related lower bounds, such as for general sparse (but not necessarily local) $H$ are also given. On the technical side, we make a new connection between Hamiltonian learning lower bounds and the analysis of Boolean functions, where we introduce a novel extremal property that may be of independent interest.

  • 2025-09-21 Auditability and the Landscape of Distance to Multicalibration. Nathan Derhake, Siddartha Devic, Dutch Hansen, Kuan Liu, and Vatsal Sharan.

    Summary: Calibration is a critical property for establishing the trustworthiness of predictors that provide uncertainty estimates. Multicalibration is a strengthening of calibration which requires that predictors be calibrated on a potentially overlapping collection of subsets of the domain. As multicalibration grows in popularity with practitioners, an essential question is: how do we measure how multicalibrated a predictor is? Błasiok et al. (2023) considered this question for standard calibration by introducing the distance to calibration framework (dCE) to understand how calibration metrics relate to each other and the ground truth. Building on the dCE framework, we consider the auditability of the distance to multicalibration of a predictor $f$. We begin by considering two natural generalizations of dCE to multiple subgroups: worst group dCE (wdMC), and distance to multicalibration (dMC). We argue that there are two essential properties of any multicalibration error metric: 1) the metric should capture how much $f$ would need to be modified in order to be perfectly multicalibrated; and 2) the metric should be auditable in an information theoretic sense. We show that wdMC and dMC each fail to satisfy one of these two properties, and that similar barriers arise when considering the auditability of general distance to multigroup fairness notions. We then propose two (equivalent) multicalibration metrics which do satisfy these requirements: 1) a continuized variant of dMC; and 2) a distance to intersection multicalibration, which leans on intersectional fairness desiderata. Along the way, we shed light on the loss-landscape of distance to multicalibration and the geometry of the set of perfectly multicalibrated predictors. Our findings may have implications for the development of stronger multicalibration algorithms as well as multigroup auditing more generally.

  • 2025-09-18 The Story of Sunflowers. Anup Rao.

    Summary: Sunflowers, or $Δ$-systems, are a fundamental concept in combinatorics introduced by Erdős and Rado in their paper: {\em Intersection theorems for systems of sets}, J. Lond. Math. Soc. (1) {\bf 35} (1960), 85--90. A sunflower is a collection of sets where all pairs have the same intersection. This paper explores the wide-ranging applications of sunflowers in computer science and combinatorics. We discuss recent progress towards the sunflower conjecture and present a short elementary proof of the best known bounds for the robust sunflower lemma.

  • 2025-09-15 Approximating the operator norm of local Hamiltonians via few quantum states. Lars Becker, Joseph Slote, Alexander Volberg, and Haonan Zhang.

    Summary: Consider a Hermitian operator $A$ acting on a complex Hilbert space of dimension $2^n$. We show that when $A$ has small degree in the Pauli expansion, or in other words, $A$ is a local $n$-qubit Hamiltonian, its operator norm can be approximated independently of $n$ by maximizing $|\braket{ψ|A|ψ}|$ over a small collection $\mathbf{X}_n$ of product states $\ketψ\in (\mathbf{C}^{2})^{\otimes n}$. More precisely, we show that whenever $A$ is $d$-local, \textit{i.e.,} $°(A)\le d$, we have the following discretization-type inequality: \[ \|A\|\le C(d)\max_{ψ\in \mathbf{X}_n}|\braket{ψ|A|ψ}|. \] The constant $C(d)$ depends only on $d$. This collection $\mathbf{X}_n$ of $ψ$'s, termed a \emph{quantum norm design}, is independent of $A$, and consists of product states, and can have cardinality as small as $(1+\eps)^n$, which is essentially tight. Previously, norm designs were known only for homogeneous $d$-localHamiltonians $A$ \cite{L,BGKT,ACKK}, and for non-homogeneous $2$-local traceless $A$ \cite{BGKT}. Several other results, such as boundedness of Rademacher projections for all levels and estimates of operator norms of random Hamiltonians, are also given.

  • 2025-09-01 The curious case of "XOR repetition" of monogamy-of-entanglement games. Andrea Coladangelo, Qipeng Liu, and Ziyi Xie.

    Summary: In this work, we consider "decision" variants of a monogamy-of-entanglement game by Tomamichel, Fehr, Kaniewski, and Wehner [New Journal of Physics '13]. In its original "search" variant, Alice prepares a (possibly entangled) state on registers $\mathsf{ABC}$; register $\mathsf{A}$, consisting of $n$ qubits, is sent to a Referee, while $\mathsf{B}$ and $\mathsf{C}$ are sent to Bob and Charlie; the Referee then measures each qubit in the standard or Hadamard basis (chosen uniformly at random). The basis choices are sent to Bob and Charlie, whose goal is to simultaneously guess the Referee's $n$-bit outcome string $x$. Tomamichel et al. show that the optimal winning probability is $\cos^{2n} {(\fracπ{8})}$, following a perfect parallel repetition theorem. We consider the following "decision" variants of this game: - Variant 1, "XOR repetition": Bob and Charlie's goal is to guess the XOR of all the bits of $x$. Ananth et al. [Asiacrypt '24] conjectured that the optimal advantage over random guessing decays exponentially in $n$. Surprisingly, we show that this conjecture is false, and, in fact, there is no decay at all: there exists a strategy that wins with probability $\cos^2{(\fracπ{8})} \approx 0.85$ for any $n$. - Variant 2, "Goldreich-Levin": The Referee additionally samples a uniformly random $n$-bit string $r$ that is sent to Bob and Charlie along with the basis choices. Their goal is to guess the parity of $r\cdot x$. We show that the optimal advantage over random guessing decays exponentially in $n$ for the restricted class of adversaries that do not share entanglement. A similar result was already shown by Champion et al. and Çakan et al.; we give a more direct proof. Additionally, we put forward a reasonably concrete conjecture that is equivalent to exponential decay for general adversaries.

  • 2025-08-25 Spectral Refutations of Semirandom $k$-LIN over Larger Fields. Nicholas Kocurek and Peter Manohar.

    Summary: We study the problem of strongly refuting semirandom $k$-LIN$(\mathbb{F})$ instances: systems of $k$-sparse inhomogeneous linear equations over a finite field $\mathbb{F}$. For the case of $\mathbb{F} = \mathbb{F}_2$, this is the well-studied problem of refuting semirandom instances of $k$-XOR, where the works of [GKM22,HKM23] establish a tight trade-off between runtime and clause density for refutation: for any choice of a parameter $\ell$, they give an $n^{O(\ell)}$-time algorithm to certify that there is no assignment that can satisfy more than $\frac{1}{2} + \varepsilon$-fraction of constraints in a semirandom $k$-XOR instance, provided that the instance has $O(n) \cdot \left(\frac{n}{\ell}\right)^{k/2 - 1} \log n /\varepsilon^4$ constraints, and the work of [KMOW17] provides good evidence that this tight up to a $\mathrm{polylog}(n)$ factor via lower bounds for the Sum-of-Squares hierarchy. However for larger fields, the only known results for this problem are established via black-box reductions to the case of $\mathbb{F}_2$, resulting in an $|{\mathbb{F}}|^{3k}$ gap between the current best upper and lower bounds. In this paper, we give an algorithm for refuting semirandom $k$-LIN$(\mathbb{F})$ instances with the "correct" dependence on the field size $|{\mathbb{F}}|$. For any choice of a parameter $\ell$, our algorithm runs in $(|{\mathbb{F}}|n)^{O(\ell)}$-time and strongly refutes semirandom $k$-LIN$(\mathbb{F})$ instances with at least $O(n) \cdot \left(\frac{|{\mathbb{F}^*}| n}{\ell}\right)^{k/2 - 1} \log(n |{\mathbb{F}^*}|) /\varepsilon^4$ constraints. We give good evidence that this dependence on the field size $|{\mathbb{F}}|$ is optimal by proving a lower bound for the Sum-of-Squares hierarchy that matches this threshold up to a $\mathrm{polylog}(n |{\mathbb{F}^*}|)$ factor. Our results also extend to the more general case of finite Abelian groups.

  • 2025-08-20 Robust Estimation Under Heterogeneous Corruption Rates. Syomantak Chaudhuri, Jerry Li, and Thomas A. Courtade.

    Summary: We study the problem of robust estimation under heterogeneous corruption rates, where each sample may be independently corrupted with a known but non-identical probability. This setting arises naturally in distributed and federated learning, crowdsourcing, and sensor networks, yet existing robust estimators typically assume uniform or worst-case corruption, ignoring structural heterogeneity. For mean estimation for multivariate bounded distributions and univariate gaussian distributions, we give tight minimax rates for all heterogeneous corruption patterns. For multivariate gaussian mean estimation and linear regression, we establish the minimax rate for squared error up to a factor of $\sqrt{d}$, where $d$ is the dimension. Roughly, our findings suggest that samples beyond a certain corruption threshold may be discarded by the optimal estimators -- this threshold is determined by the empirical distribution of the corruption rates given.

  • 2025-08-14 Welfare-Centric Clustering. Claire Jie Zhang, Seyed A. Esmaeili, and Jamie Morgenstern.

    Summary: Fair clustering has traditionally focused on ensuring equitable group representation or equalizing group-specific clustering costs. However, Dickerson et al. (2025) recently showed that these fairness notions may yield undesirable or unintuitive clustering outcomes and advocated for a welfare-centric clustering approach that models the utilities of the groups. In this work, we model group utilities based on both distances and proportional representation and formalize two optimization objectives based on welfare-centric clustering: the Rawlsian (Egalitarian) objective and the Utilitarian objective. We introduce novel algorithms for both objectives and prove theoretical guarantees for them. Empirical evaluations on multiple real-world datasets demonstrate that our methods significantly outperform existing fair clustering baselines.

  • 2025-07-25 Forbidden Configurations and Boundary Cases. Richard P. Anstee, Oakley Edens, Arvin Sahami, and Attila Sali.

    Summary: Let $F$ be a $k\times \ell$ (0,1)-matrix. Define a (0,1)-matrix $A$ to have a $F$ as a \emph{configuration} if there is a submatrix of $A$ which is a row and column permutation of $F$. In the language of sets, a configuration is a \emph{trace}. Define a matrix to be {\it simple} if it is a (0,1)-matrix with no repeated columns. Let $\mathrm{Avoid}(m,F)$ be all simple $m$-rowed matrices $A$ with no configuration $F$. Define $\mathrm{forb}(m,F)$ as the maximum number of columns of any matrix in $\mathrm{Avoid}(m,F)$. Determining $\mathrm{forb}(m,F)$ requires determining bounds and constructions of matrices in $\mathrm{Avoid}(m,F)$. The paper considers some column maximal $k$-rowed simple $F$ that have the bound $Θ(m^{k-2})$ and yet adding a column increases bound to $Ω(m^{k-1})$. By a construction, $\mathrm{forb(m,F)}$ is determined exactly.

  • 2025-07-16 Obfuscation of Unitary Quantum Programs. Mi-Ying Huang and Er-Cheng Tang.

    Summary: Program obfuscation aims to hide the inner workings of a program while preserving its functionality. In the quantum setting, recent works have obtained obfuscation schemes for specialized classes of quantum circuits. For instance, Bartusek, Brakerski, and Vaikuntanathan (STOC 2024) constructed a quantum state obfuscation scheme, which supports the obfuscation of quantum programs represented as quantum states for pseudo-deterministic quantum programs with classical inputs and outputs in the classical oracle model. In this work, we improve upon existing results by constructing the first quantum state obfuscation scheme for unitary (or approximately unitary) quantum programs supporting quantum inputs and outputs in the classical oracle model. At the core of our obfuscation scheme are two novel ingredients: a functional quantum authentication scheme that allows key holders to learn specific functions of the authenticated quantum state with simulation-based security, and a compiler that represents an arbitrary quantum circuit as a projective linear-plus-measurement quantum program described by a sequence of non-adaptive Clifford gates interleaved with adaptive and compatible measurements.

  • 2025-07-08 The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains. Scott Geng, Hamish Ivison, Chun-Liang Li, Maarten Sap, Jerry Li, Ranjay Krishna, and Pang Wei Koh.

    Summary: Improvements in language models are often driven by improving the quality of the data we train them on, which can be limiting when strong supervision is scarce. In this work, we show that paired preference data consisting of individually weak data points can enable gains beyond the strength of each individual data point. We formulate the delta learning hypothesis to explain this phenomenon, positing that the relative quality delta between points suffices to drive learning via preference tuning--even when supervised finetuning on the weak data hurts. We validate our hypothesis in controlled experiments and at scale, where we post-train 8B models on preference data generated by pairing a small 3B model's responses with outputs from an even smaller 1.5B model to create a meaningful delta. Strikingly, on a standard 11-benchmark evaluation suite (MATH, MMLU, etc.), our simple recipe matches the performance of Tulu 3, a state-of-the-art open model tuned from the same base model while relying on much stronger supervisors (e.g., GPT-4o). Thus, delta learning enables simpler and cheaper open recipes for state-of-the-art post-training. To better understand delta learning, we prove in logistic regression that the performance gap between two weak teacher models provides useful signal for improving a stronger student. Overall, our work shows that models can learn surprisingly well from paired data that might typically be considered weak.

  • 2025-06-30 Sampling and Identity-Testing Without Approximate Tensorization of Entropy. William Gay, William He, Nicholas Kocurek, and Ryan O'Donnell.

    Summary: Certain tasks in high-dimensional statistics become easier when the underlying distribution satisfies a local-to-global property called approximate tensorization of entropy (ATE). For example, the Glauber dynamics Markov chain of an ATE distribution mixes fast and can produce approximate samples in a small amount of time, since such a distribution satisfies a modified log-Sobolev inequality. Moreover, identity-testing for an ATE distribution requires few samples if the tester is given coordinate conditional access to the unknown distribution, as shown by Blanca, Chen, Štefankovič, and Vigoda (COLT 2023). A natural class of distributions that do not satisfy ATE consists of mixtures of (few) distributions that do satisfy ATE. We study the complexity of identity-testing and sampling for these distributions. Our main results are the following: 1. We show fast mixing of Glauber dynamics from a data-based initialization, with optimal sample complexity, for mixtures of distributions satisfying modified log-Sobolev inequalities. This extends work of Huang, Koehler, Lee, Mohanty, Rajaraman, Vuong, and Wu (STOC 2025, COLT 2025) for mixtures of distributions satisfying Poincaré inequalities. 2. Answering an open question posed by Blanca et al., we give efficient identity-testers for mixtures of ATE distributions in the coordinate-conditional sampling access model. We also give some simplifications and improvements to the original algorithm of Blanca et al.

  • 2025-06-28 MPC in the Quantum Head (or: Superposition-Secure (Quantum) Zero-Knowledge). Andrea Coladangelo, Ruta Jawale, Dakshita Khurana, Giulio Malavolta, and Hendrik Waldner.

    Summary: The MPC-in-the-head technique (Ishai et al., STOC 2007) is a celebrated method to build zero-knowledge protocols with desirable theoretical properties and high practical efficiency. This technique has generated a large body of research and has influenced the design of real-world post-quantum cryptographic signatures. In this work, we present a generalization of the MPC-in-the-head paradigm to the quantum setting, where the MPC is running a quantum computation. As an application of our framework, we propose a new approach to build zero-knowledge protocols where security holds even against a verifier that can obtain a superposition of transcripts. This notion was pioneered by Damgard et al., who built a zero-knowledge protocol for NP (in the common reference string model) secure against superposition attacks, by relying on perfectly hiding and unconditionally binding dual-mode commitments. Unfortunately, no such commitments are known from standard cryptographic assumptions. In this work we revisit this problem, and present two new three-round protocols in the common reference string model: (i) A zero-knowledge argument for NP, whose security reduces to the standard learning with errors (LWE) problem. (ii) A zero-knowledge argument for QMA from the same assumption.

  • 2025-06-20 A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset. Rachel Hong, Jevan Hutson, William Agnew, Imaad Huda, Tadayoshi Kohno, and Jamie Morgenstern.

    Summary: We investigate the contents of web-scraped data for training AI systems, at sizes where human dataset curators and compilers no longer manually annotate every sample. Building off of prior privacy concerns in machine learning models, we ask: What are the legal privacy implications of web-scraped machine learning datasets? In an empirical study of a popular training dataset, we find significant presence of personally identifiable information despite sanitization efforts. Our audit provides concrete evidence to support the concern that any large-scale web-scraped dataset may contain legally defined personal data. We use these findings of a real-world dataset to inform our legal analysis with respect to existing privacy and data protection laws. We surface various legal risks of current data curation practices that may propagate personal information to train downstream models. Based on our empirical and legal analyses, we argue for reorientation of current frameworks of "publicly available" information to meaningfully limit the development of AI built upon indiscriminate scraping of the internet.

  • 2025-06-05 Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards. Artin Tajdini, Jonathan Scarlett, and Kevin Jamieson.

    Summary: We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+ε)$-absolute central moment bounded by $\upsilon$ for some $ε\in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+ε}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $Ω(d^{\fracε{1+ε}} T^{\frac{1}{1+ε}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($ε= 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3ε}{2(1+ε)}} T^{\frac{1}{1+ε}})$, thus improving the dependence on $d$ for all $ε\in (0,1)$ and recovering a known optimal result for $ε= 1$. We also establish a lower bound of $Ω(d^{\frac{2ε}{1+ε}} T^{\frac{1}{1+ε}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + ε$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Matérn kernel that are the first to be sublinear for all $ε\in (0, 1]$.

  • 2025-05-29 Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals. Junyan Liu, Arnab Maiti, Artin Tajdini, Kevin Jamieson, and Lillian J. Ratliff.

    Summary: We initiate the study of a repeated principal-agent problem over a finite horizon $T$, where a principal sequentially interacts with $K\geq 2$ types of agents arriving in an adversarial order. At each round, the principal strategically chooses one of the $N$ arms to incentivize for an arriving agent of unknown type. The agent then chooses an arm based on its own utility and the provided incentive, and the principal receives a corresponding reward. The objective is to minimize regret against the best incentive in hindsight. Without prior knowledge of agent behavior, we show that the problem becomes intractable, leading to linear regret. We analyze two key settings where sublinear regret is achievable. In the first setting, the principal knows the arm each agent type would select greedily for any given incentive. Under this setting, we propose an algorithm that achieves a regret bound of $O(\min\{\sqrt{KT\log N},K\sqrt{T}\})$ and provide a matching lower bound up to a $\log K$ factor. In the second setting, an agent's response varies smoothly with the incentive and is governed by a Lipschitz constant $L\geq 1$. Under this setting, we show that there is an algorithm with a regret bound of $\tilde{O}((LN)^{1/3}T^{2/3})$ and establish a matching lower bound up to logarithmic factors. Finally, we extend our algorithmic results for both settings by allowing the principal to incentivize multiple arms simultaneously in each round.

  • 2025-05-24 Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality. Junyan Liu, Ziyun Chen, Kun Wang, Haipeng Luo, and Lillian J. Ratliff.

    Summary: We study the Pandora's Box problem in an online learning setting with semi-bandit feedback. In each round, the learner sequentially pays to open up to $n$ boxes with unknown reward distributions, observes rewards upon opening, and decides when to stop. The utility of the learner is the maximum observed reward minus the cumulative cost of opened boxes, and the goal is to minimize regret defined as the gap between the cumulative expected utility and that of the optimal policy. We propose a new algorithm that achieves $\widetilde{O}(\sqrt{nT})$ regret after $T$ rounds, which improves the $\widetilde{O}(n\sqrt{T})$ bound of Agarwal et al. [2024] and matches the known lower bound up to logarithmic factors. To better capture real-life applications, we then extend our results to a natural but challenging contextual linear setting, where each box's expected reward is linear in some known but time-varying $d$-dimensional context and the noise distribution is fixed over time. We design an algorithm that learns both the linear function and the noise distributions, achieving $\widetilde{O}(nd\sqrt{T})$ regret. Finally, we show that our techniques also apply to the online Prophet Inequality problem, where the learner must decide immediately whether or not to accept a revealed reward. In both non-contextual and contextual settings, our approach achieves similar improvements and regret bounds.

  • 2025-05-20 Subquadratic Algorithms and Hardness for Attention with Any Temperature. Shreya Gupta, Boyang Huang, Barna Saha, Yinzhan Xu, and Christopher Ye.

    Summary: Despite the popularity of the Transformer architecture, the standard algorithm for computing Attention suffers from quadratic time complexity in context length $n$. Alman and Song [NeurIPS 2023] showed that when the head dimension $d = Θ(\log n)$, subquadratic Attention is possible if and only if the inputs have small entries bounded by $B = o(\sqrt{\log n})$ in absolute values, under the Strong Exponential Time Hypothesis ($\mathsf{SETH}$). Equivalently, subquadratic Attention is possible if and only if the softmax is applied with high temperature for $d=Θ(\log n)$. Running times of these algorithms depend exponentially on $B$ and thus they do not lead to even a polynomial-time algorithm outside the specific range of $B$. This naturally leads to the question: when can Attention be computed efficiently without strong assumptions on temperature? Are there fast attention algorithms that scale polylogarithmically with entry size $B$? In this work, we resolve this question and characterize when fast Attention for arbitrary temperatures is possible. First, for all constant $d = O(1)$, we give the first subquadratic $\tilde{O}(n^{2 - 1/d} \cdot \mathrm{polylog}(B))$ time algorithm for Attention with large $B$. Our result holds even for matrices with large head dimension if they have low rank. In this regime, we also give a similar running time for Attention gradient computation, and therefore for the full LLM training process. Furthermore, we show that any substantial improvement on our algorithm is unlikely. In particular, we show that even when $d = 2^{Θ(\log^* n)}$, Attention requires $n^{2 - o(1)}$ time under $\mathsf{SETH}$. Finally, in the regime where $d = \mathrm{poly}(n)$, we show that the standard algorithm is optimal under popular fine-grained complexity assumptions.

  • 2025-04-25 Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training. Hiroki Naganuma, Xinzhi Zhang, Man-Chung Yue, Ioannis Mitliagkas, Philipp A. Witte, Russell J. Hewett, and Yin Tat Lee.

    Summary: Following AI scaling trends, frontier models continue to grow in size and continue to be trained on larger datasets. Training these models requires huge investments in exascale computational resources, which has in turn driven developtment of distributed deep learning methods. Data parallelism is an essential approach to speed up training, but it requires frequent global communication between workers, which can bottleneck training at the largest scales. In this work, we propose a method called Pseudo-Asynchronous Local SGD (PALSGD) to improve the efficiency of data-parallel training. PALSGD is an extension of Local SGD (Stich, 2018) and DiLoCo (Douillard et al., 2023), designed to further reduce communication frequency by introducing a pseudo-synchronization mechanism. PALSGD allows the use of longer synchronization intervals compared to standard Local SGD. Despite the reduced communication frequency, the pseudo-synchronization approach ensures that model consistency is maintained, leading to performance results comparable to those achieved with more frequent synchronization. Furthermore, we provide a theoretical analysis of PALSGD, establishing its convergence and deriving its convergence rate. This analysis offers insights into the algorithm's behavior and performance guarantees. We evaluated PALSGD on image classification and language modeling tasks. Our results show that PALSGD achieves better performance in less time compared to existing methods like Distributed Data Parallel (DDP), and DiLoCo. Notably, PALSGD trains 18.4% faster than DDP on ImageNet-1K with ResNet-50, 24.4% faster than DDP on TinyStories with GPT-Neo-125M, and 21.1% faster than DDP on TinyStories with GPT-Neo-8M.

  • 2025-04-21 Price Stability and Improved Buyer Utility with Presentation Design: A Theoretical Study of the Amazon Buy Box. Ophir Friedler, Hu Fu, Anna Karlin, and Ariana Tang.

    Summary: Platforms design the form of presentation by which sellers are shown to the buyers. This design not only shapes the buyers' experience but also leads to different market equilibria or dynamics. One component in this design is through the platform's mediation of the search frictions experienced by the buyers for different sellers. We take a model of monopolistic competition and show that, on one hand, when all sellers have the same inspection costs, the market sees no stable price since the sellers always have incentives to undercut each other, and, on the other hand, the platform may stabilize the price by giving prominence to one seller chosen by a carefully designed mechanism. This calls to mind Amazon's Buy Box. We study natural mechanisms for choosing the prominent seller, characterize the range of equilibrium prices implementable by them, and find that in certain scenarios the buyers' surplus improves as the search friction increases.

  • 2025-03-24 Training-Free Personalization via Retrieval and Reasoning on Fingerprints. Deepayan Das, Davide Talon, Yiming Wang, Massimiliano Mancini, and Elisa Ricci.

    Summary: Vision Language Models (VLMs) have lead to major improvements in multimodal reasoning, yet they still struggle to understand user-specific concepts. Existing personalization methods address this limitation but heavily rely on training procedures, that can be either costly or unpleasant to individual users. We depart from existing work, and for the first time explore the training-free setting in the context of personalization. We propose a novel method, Retrieval and Reasoning for Personalization (R2P), leveraging internal knowledge of VLMs. First, we leverage VLMs to extract the concept fingerprint, i.e., key attributes uniquely defining the concept within its semantic class. When a query arrives, the most similar fingerprints are retrieved and scored via chain-of-thought-reasoning. To reduce the risk of hallucinations, the scores are validated through cross-modal verification at the attribute level: in case of a discrepancy between the scores, R2P refines the concept association via pairwise multimodal matching, where the retrieved fingerprints and their images are directly compared with the query. We validate R2P on two publicly available benchmarks and a newly introduced dataset, Personal Concepts with Visual Ambiguity (PerVA), for concept identification highlighting challenges in visual ambiguity. R2P consistently outperforms state-of-the-art approaches on various downstream tasks across all benchmarks. Code will be available upon acceptance.

  • 2025-03-02 Optimal Trickle-Down Theorems for Path Complexes via C-Lorentzian Polynomials with Applications to Sampling and Log-Concave Sequences. Jonathan Leake, Kasper Lindberg, and Shayan Oveis Gharan.

    Summary: Let $X$ be a $d$-partite $d$-dimensional simplicial complex with parts $T_1,\dots,T_d$ and let $μ$ be a distribution on the facets of $X$. Informally, we say $(X,μ)$ is a path complex if for any $i

  • 2025-02-24 S4S: Solving for a Diffusion Model Solver. Eric Frankel, Sitan Chen, Jerry Li, Pang Wei Koh, Lillian J. Ratliff, and Sewoong Oh.

    Summary: Diffusion models (DMs) create samples from a data distribution by starting from random noise and iteratively solving a reverse-time ordinary differential equation (ODE). Because each step in the iterative solution requires an expensive neural function evaluation (NFE), there has been significant interest in approximately solving these diffusion ODEs with only a few NFEs without modifying the underlying model. However, in the few NFE regime, we observe that tracking the true ODE evolution is fundamentally impossible using traditional ODE solvers. In this work, we propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S). S4S directly optimizes a solver to obtain good generation quality by learning to match the output of a strong teacher solver. We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. By exploiting the full design space of DM solvers, with 5 NFEs, we achieve an FID of 3.73 on CIFAR10 and 13.26 on MS-COCO, representing a $1.5\times$ improvement over previous training-free ODE methods.

  • 2025-02-11 Pseudorandomness Properties of Random Reversible Circuits. William Gay, William He, Nicholas Kocurek, and Ryan O'Donnell.

    Summary: Motivated by practical concerns in cryptography, we study pseudorandomness properties of permutations on $\{0,1\}^n$ computed by random circuits made from reversible $3$-bit gates (permutations on $\{0,1\}^3$). Our main result is that a random circuit of depth $\sqrt{n} \cdot \tilde{O}(k^3)$, with each layer consisting of $Θ(n)$ random gates in a fixed two-dimensional nearest-neighbor architecture, yields approximate $k$-wise independent permutations. Our result can be seen as a particularly simple/practical block cipher construction that gives provable statistical security against attackers with access to $k$~input-output pairs within few rounds. The main technical component of our proof consists of two parts: 1. We show that the Markov chain on $k$-tuples of $n$-bit strings induced by a single random $3$-bit one-dimensional nearest-neighbor gate has spectral gap at least $1/n \cdot \tilde{O}(k)$. Then we infer that a random circuit with layers of random gates in a fixed one-dimensional gate architecture yields approximate $k$-wise independent permutations of $\{0,1\}^n$ in depth $n\cdot \tilde{O}(k^2)$ 2. We show that if the $n$ wires are layed out on a two-dimensional lattice of bits, then repeatedly alternating applications of approximate $k$-wise independent permutations of $\{0,1\}^{\sqrt n}$ to the rows and columns of the lattice yields an approximate $k$-wise independent permutation of $\{0,1\}^n$ in small depth. Our work improves on the original work of Gowers, who showed a gap of $1/\mathrm{poly}(n,k)$ for one random gate (with non-neighboring inputs); and, on subsequent work improving the gap to $Ω(1/n^2k)$ in the same setting.

  • 2025-02-10 On the Emergence of Thinking in LLMs I: Searching for the Right Intuition. Guanghao Ye, Khiem Duc Pham, Xinzhi Zhang, Sivakanth Gopi, Baolin Peng, Beibin Li, Janardhan Kulkarni, and Huseyin A. Inan.

    Summary: Recent AI advancements, such as OpenAI's new models, are transforming LLMs into LRMs (Large Reasoning Models) that perform reasoning during inference, taking extra time and compute for higher-quality outputs. We aim to uncover the algorithmic framework for training LRMs. Methods like self-consistency, PRM, and AlphaZero suggest reasoning as guided search. We ask: what is the simplest, most scalable way to enable search in LLMs? We propose a post-training framework called Reinforcement Learning via Self-Play (RLSP). RLSP involves three steps: (1) supervised fine-tuning with human or synthetic demonstrations of the reasoning process, (2) using an exploration reward signal to encourage diverse and efficient reasoning behaviors, and (3) RL training with an outcome verifier to ensure correctness while preventing reward hacking. Our key innovation is to decouple exploration and correctness signals during PPO training, carefully balancing them to improve performance and efficiency. Empirical studies in the math domain show that RLSP improves reasoning. On the Llama-3.1-8B-Instruct model, RLSP can boost performance by 23% in MATH-500 test set; On AIME 2024 math problems, Qwen2.5-32B-Instruct improved by 10% due to RLSP. However, a more important finding of this work is that the models trained using RLSP, even with the simplest exploration reward that encourages the model to take more intermediate steps, showed several emergent behaviors such as backtracking, exploration of ideas, and verification. These findings demonstrate that RLSP framework might be enough to enable emergence of complex reasoning abilities in LLMs when scaled. Lastly, we propose a theory as to why RLSP search strategy is more suitable for LLMs inspired by a remarkable result that says CoT provably increases computational power of LLMs, which grows as the number of steps in CoT \cite{li2024chain,merrill2023expresssive}.

  • 2025-02-05 Unweighted One-Sided Code Sparsifiers and Thin Subgraphs. Shayan Oveis Gharan and Arvin Sahami.

    Summary: For a linear code $\mathcal{C} \subseteq \mathbb{F}_2^n$ and $α\in [0,1]$, call a set $S \subseteq [n]$ an (unweighted) one-sided $α$-sparsifier of $\mathcal{C}$ if for all $c \in \mathcal{C}$, $\mathrm{wt}(c_S)\geq α\cdot \mathrm{wt}(c)$, where $c_S$ is the projection of $c$ onto the coordinates in $S$ and $\mathrm{wt}(c)$ is the Hamming weight of $c$. \\ We show that every $k$-dimensional linear code $\mathcal{C}\subseteq \mathbb{F}_2^n$ has at least $2^{n - k}$ many unweighted one-sided $1/2$-sparsifiers and hence one of size at most $n/2 + O(\sqrt{n k})$. As an application, letting $\mathcal{C} \subseteq \mathbb{F}_2^E$ denote the cut-space of a graph $G=(V, E)$, we show a lower bound of $2^{\lvert E \rvert- (\lvert V \rvert - 1)}$ on the number of $1/2$-thin subgraphs of $G$ and the existence of a $1/2$-thin subgraph with at least $\lvert E \rvert /2-O(\sqrt{\lvert E \rvert \cdot \lvert V \rvert})$ edges. In contrast to previous results on thin subgraphs, our proofs are purely "combinatorial".

  • 2025-01-31 Markovian Pandora's box. Yuanyuan Yang, Ruimin Zhang, Jamie Morgenstern, and Haifeng Xu.

    Summary: In this paper, we study the Markovian Pandora's Box Problem, where decisions are governed by both order constraints and Markovianly correlated rewards, structured within a shared directed acyclic graph. To the best of our knowledge, previous work has not incorporated Markovian dependencies in this setting. This framework is particularly relevant to applications such as data or computation driven algorithm design, where exploration of future models incurs cost. We present optimal fully adaptive strategies where the associated graph forms a forest. Under static transition, we introduce a strategy that achieves a near optimal expected payoff in multi line graphs and a 1/2 approximation in forest-structured graphs. Notably, this algorithm provides a significant speedup over the exact solution, with the improvement becoming more pronounced as the graph size increases. Our findings deepen the understanding of sequential exploration under Markovian correlations in graph-based decision-making.

  • 2025-01-11 DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory. Jerry Chee, Arturs Backurs, Rainie Heck, Li Zhang, Janardhan Kulkarni, Thomas Rothvoss, and Sivakanth Gopi.

    Summary: Quantizing the weights of a neural network has two steps: (1) Finding a good low bit-complexity representation for weights (which we call the quantization grid) and (2) Rounding the original weights to values in the quantization grid. In this paper, we study the problem of rounding optimally given any quantization grid. The simplest and most commonly used way to round is Round-to-Nearest (RTN). By rounding in a data-dependent way instead, one can improve the quality of the quantized model significantly. We study the rounding problem from the lens of \emph{discrepancy theory}, which studies how well we can round a continuous solution to a discrete solution without affecting solution quality too much. We prove that given $m=\mathrm{poly}(1/ε)$ samples from the data distribution, we can round all but $O(m)$ model weights such that the expected approximation error of the quantized model on the true data distribution is $\le ε$ as long as the space of gradients of the original model is approximately low rank (which we empirically validate). Our proof, which is algorithmic, inspired a simple and practical rounding algorithm called \emph{DiscQuant}. In our experiments, we demonstrate that DiscQuant significantly improves over the prior state-of-the-art rounding method called GPTQ and the baseline RTN over a range of benchmarks on Phi3mini-3.8B and Llama3.1-8B. For example, rounding Phi3mini-3.8B to a fixed quantization grid with 3.25 bits per parameter using DiscQuant gets 64\% accuracy on the GSM8k dataset, whereas GPTQ achieves 54\% and RTN achieves 31\% (the original model achieves 84\%). We make our code available at https://github.com/jerry-chee/DiscQuant.

  • 2025-01-04 A parameterized linear formulation of the integer hull. Friedrich Eisenbrand and Thomas Rothvoss.

    Summary: Let $A \in \mathbb{Z}^{m \times n}$ be an integer matrix with components bounded by $Δ$ in absolute value. Cook et al.~(1986) have shown that there exists a universal matrix $B \in \mathbb{Z}^{m' \times n}$ with the following property: For each $b \in \mathbb{Z}^m$, there exists $t \in \mathbb{Z}^{m'}$ such that the integer hull of the polyhedron $P = \{ x \in \mathbb{R}^n \colon Ax \leq b\}$ is described by $P_I = \{ x \in \mathbb{R}^n \colon Bx \leq t\}$. Our \emph{main result} is that $t$ is an \emph{affine} function of $b$ as long as $b$ is from a fixed equivalence class of the lattice $D \cdot \mathbb{Z}^m$. Here $D \in \mathbb{N}$ is a number that depends on $n$ and $Δ$ only. Furthermore, $D$ as well as the matrix $B$ can be computed in time depending on $Δ$ and $n$ only. An application of this result is the solution of an open problem posed by Cslovjecsek et al.~(SODA 2024) concerning the complexity of \emph{2-stage-stochastic integer programming} problems. The main tool of our proof is the classical theory of \emph{Chvátal-Gomory cutting planes} and the \emph{elementary closure} of rational polyhedra.