
20230602 An Empirical Study on Challenging Math Problem Solving with GPT4.
Summary: Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields. While several prior works have investigated solving elementary mathematics using LLMs, this work explores the frontier of using GPT4 for solving more complex and challenging math problems. We evaluate various ways of using GPT4. Some of them are adapted from existing work, and one is \MathChat, a conversational problemsolving framework newly proposed in this work. We perform the evaluation on difficult high school competition problems from the MATH dataset, which shows the advantage of the proposed conversational approach.

20230531 Doubly Constrained Fair Clustering.
Summary: The remarkable attention which fair clustering has received in the last few years has resulted in a significant number of different notions of fairness. Despite the fact that these notions are welljustified, they are often motivated and studied in a disjoint manner where one fairness desideratum is considered exclusively in isolation from the others. This leaves the understanding of the relations between different fairness notions as an important open problem in fair clustering. In this paper, we take the first step in this direction. Specifically, we consider the two most prominent demographic representation fairness notions in clustering: (1) Group Fairness (GF), where the different demographic groups are supposed to have close to populationlevel representation in each cluster and (2) Diversity in Center Selection (DS), where the selected centers are supposed to have close to populationlevel representation of each group. We show that given a constant approximation algorithm for one constraint (GF or DS only) we can obtain a constant approximation solution that satisfies both constraints simultaneously. Interestingly, we prove that any given solution that satisfies the GF constraint can always be postprocessed at a bounded degradation to the clustering cost to additionally satisfy the DS constraint while the reverse is not true. Furthermore, we show that both GF and DS are incompatible (having an empty feasibility set in the worst case) with a collection of other distancebased fairness notions. Finally, we carry experiments to validate our theoretical findings.

20230525 Learning across Data Owners with Joint Differential Privacy.
Summary: In this paper, we study the setting in which data owners train machine learning models collaboratively under a privacy notion called joint differential privacy [Kearns et al., 2018]. In this setting, the model trained for each data owner $j$ uses $j$'s data without privacy consideration and other owners' data with differential privacy guarantees. This setting was initiated in [Jain et al., 2021] with a focus on linear regressions. In this paper, we study this setting for stochastic convex optimization (SCO). We present an algorithm that is a variant of DPSGD [Song et al., 2013; Abadi et al., 2016] and provides theoretical bounds on its population loss. We compare our algorithm to several baselines and discuss for what parameter setups our algorithm is more preferred. We also empirically study joint differential privacy in the multiclass classification problem over two public datasets. Our empirical findings are wellconnected to the insights from our theoretical results.

20230515 Sparsifying Sums of Norms.
Summary: For any norms $N_1,\ldots,N_m$ on $\mathbb{R}^n$ and $N(x) := N_1(x)+\cdots+N_m(x)$, we show there is a sparsified norm $\tilde{N}(x) = w_1 N_1(x) + \cdots + w_m N_m(x)$ such that $N(x)  \tilde{N}(x) \leq \epsilon N(x)$ for all $x \in \mathbb{R}^n$, where $w_1,\ldots,w_m$ are nonnegative weights, of which only $O(\epsilon^{2} n \log(n/\epsilon) (\log n)^{2.5} )$ are nonzero. Additionally, we show that such weights can be found with high probability in time $O(m (\log n)^{O(1)} + \mathrm{poly}(n)) T$, where $T$ is the time required to evaluate a norm $N_i(x)$, assuming that $N(x)$ is $\mathrm{poly}(n)$equivalent to the Euclidean norm. This immediately yields analogous statements for sparsifying sums of symmetric submodular functions. More generally, we show how to sparsify sums of $p$th powers of norms when the sum is $p$uniformly smooth.

20230515 LinearSized Sparsifiers via NearLinear Time Discrepancy Theory.
Summary: Discrepancy theory provides powerful tools for producing higherquality objects which "beat the union bound" in fundamental settings throughout combinatorics and computer science. However, this quality has often come at the price of more expensive algorithms. We introduce a new framework for bridging this gap, by allowing for the efficient implementation of discrepancytheoretic primitives. Our framework repeatedly solves regularized optimization problems to low accuracy to approximate the partial coloring method of [Rot17], and simplifies and generalizes recent work of [JSS23] on fast algorithms for Spencer's theorem. In particular, our framework only requires that the discrepancy body of interest has exponentially large Gaussian measure and is expressible as a sublevel set of a symmetric, convex function. We combine this framework with new tools for proving Gaussian measure lower bounds to give improved algorithms for a variety of sparsification and coloring problems. As a first application, we use our framework to obtain an $\widetilde{O}(m \cdot \epsilon^{3.5})$ time algorithm for constructing an $\epsilon$approximate spectral sparsifier of an $m$edge graph, matching the sparsity of [BSS14] up to constant factors and improving upon the $\widetilde{O}(m \cdot \epsilon^{6.5})$ runtime of [LeeS17]. We further give a stateoftheart algorithm for constructing graph ultrasparsifiers and an almostlinear time algorithm for constructing linearsized degreepreserving sparsifiers via discrepancy theory; in the latter case, such sparsifiers were not known to exist previously. We generalize these results to their analogs in sparsifying isotropic sums of positive semidefinite matrices. Finally, to demonstrate the versatility of our technique, we obtain a nearlyinputsparsity time constructive algorithm for Spencer's theorem (where we recover a recent result of [JSS23]).

20230505 On Optimization and Counting of NonBroken Bases of Matroids.
Summary: Given a matroid $M=(E,{\cal I})$, and a total ordering over the elements $E$, a broken circuit is a circuit where the smallest element is removed and an NBC independent set is an independent set in ${\cal I}$ with no broken circuit. The set of NBC independent sets of any matroid $M$ define a simplicial complex called the broken circuit complex which has been the subject of intense study in combinatorics. Recently, Adiprasito, Huh and Katz showed that the face of numbers of any broken circuit complex form a logconcave sequence, proving a longstanding conjecture of Rota. We study counting and optimization problems on NBC bases of a generic matroid. We find several fundamental differences with the independent set complex: for example, we show that it is NPhard to find the maxweight NBC base of a matroid or that the convex hull of NBC bases of a matroid has edges of arbitrary large length. We also give evidence that the natural downup walk on the space of NBC bases of a matroid may not mix rapidly by showing that for some family of matroids it is NPhard to count the number of NBC bases after certain conditionings.

20230504 Automatic Prompt Optimization with "Gradient Descent" and Beam Search.
Summary: Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trialanderror effort. We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO), which is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. The algorithm uses minibatches of data to form natural language ``gradients'' that criticize the current prompt. The gradients are then ``propagated'' into the prompt by editing the prompt in the opposite semantic direction of the gradient. These gradient descent steps are guided by a beam search and bandit selection procedure which significantly improves algorithmic efficiency. Preliminary results across three benchmark NLP tasks and the novel problem of LLM jailbreak detection suggest that Automatic Prompt Optimization can outperform prior prompt editing techniques and improve an initial prompt's performance by up to 31\%, by using data to rewrite vague task descriptions into more precise annotation instructions.

20230504 A Hyperbolic Extension of KadisonSinger Type Results.
Summary: In 2013, Marcus, Spielman, and Srivastava resolved the famous KadisonSinger conjecture. It states that for $n$ independent random vectors $v_1,\cdots, v_n$ that have expected squared norm bounded by $\epsilon$ and are in the isotropic position in expectation, there is a positive probability that the determinant polynomial $\det(xI  \sum_{i=1}^n v_iv_i^\top)$ has roots bounded by $(1 + \sqrt{\epsilon})^2$. An interpretation of the KadisonSinger theorem is that we can always find a partition of the vectors $v_1,\cdots,v_n$ into two sets with a low discrepancy in terms of the spectral norm (in other words, rely on the determinant polynomial). In this paper, we provide two results for a broader class of polynomials, the hyperbolic polynomials. Furthermore, our results are in two generalized settings: $\bullet$ The first one shows that the KadisonSinger result requires a weaker assumption that the vectors have a bounded sum of hyperbolic norms. $\bullet$ The second one relaxes the KadisonSinger result's distribution assumption to the Strongly Rayleigh distribution. To the best of our knowledge, the previous results only support determinant polynomials [Anari and Oveis Gharan'14, Kyng, Luh and Song'20]. It is unclear whether they can be generalized to a broader class of polynomials. In addition, we also provide a subexponential time algorithm for constructing our results.

20230416 Thin trees for laminar families.
Summary: In the laminarconstrained spanning tree problem, the goal is to find a minimumcost spanning tree which respects upper bounds on the number of times each cut in a given laminar family is crossed. This generalizes the wellstudied degreebounded spanning tree problem, as well as a previously studied setting where a chain of cuts is given. We give the first constantfactor approximation algorithm; in particular we show how to obtain a multiplicative violation of the crossing bounds of less than 22 while losing less than a factor of 5 in terms of cost. Our result compares to the natural LP relaxation. As a consequence, our results show that given a $k$edgeconnected graph and a laminar family $\mathcal{L} \subseteq 2^V$ of cuts, there exists a spanning tree which contains only an $O(1/k)$ fraction of the edges across every cut in $\mathcal{L}$. This can be viewed as progress towards the Thin Tree Conjecture, which (in a strong form) states that this guarantee can be obtained for all cuts simultaneously.

20230407 Convex Minimization with Integer Minima in $\widetilde O(n^4)$ Time.
Summary: Given a convex function $f$ on $\mathbb{R}^n$ with an integer minimizer, we show how to find an exact minimizer of $f$ using $O(n^2 \log n)$ calls to a separation oracle and $O(n^4 \log n)$ time. The previous best polynomial time algorithm for this problem given in [Jiang, SODA 2021, JACM 2022] achieves $\widetilde{O}(n^2)$ oracle complexity. However, the overall runtime of Jiang's algorithm is at least $\widetilde{\Omega}(n^8)$, due to expensive subroutines such as the LenstraLenstraLov\'asz (LLL) algorithm [Lenstra, Lenstra, Lov\'asz, Math. Ann. 1982] and random walk based cutting plane method [Bertsimas, Vempala, JACM 2004]. Our significant speedup is obtained by a nontrivial combination of a faster version of the LLL algorithm due to [Neumaier, Stehl\'e, ISSAC 2016] that gives similar guarantees, the volumetric center cutting plane method (CPM) by [Vaidya, FOCS 1989] and its fast implementation given in [Jiang, Lee, Song, Wong, STOC 2020]. For the special case of submodular function minimization (SFM), our result implies a strongly polynomial time algorithm for this problem using $O(n^3 \log n)$ calls to an evaluation oracle and $O(n^4 \log n)$ additional arithmetic operations. Both the oracle complexity and the number of arithmetic operations of our more general algorithm are better than the previous bestknown runtime algorithms for this specific problem given in [Lee, Sidford, Wong, FOCS 2015] and [Dadush, V\'egh, Zambelli, SODA 2018, MOR 2021].

20230406 New Ways to Garble Arithmetic Circuits.
Summary: The beautiful work of Applebaum, Ishai, and Kushilevitz [FOCS'11] initiated the study of arithmetic variants of Yao's garbled circuits. An arithmetic garbling scheme is an efficient transformation that converts an arithmetic circuit $C: \mathcal{R}^n \rightarrow \mathcal{R}^m$ over a ring $\mathcal{R}$ into a garbled circuit $\widehat C$ and $n$ affine functions $L_i$ for $i \in [n]$, such that $\widehat C$ and $L_i(x_i)$ reveals only the output $C(x)$ and no other information of $x$. AIK presented the first arithmetic garbling scheme supporting computation over integers from a bounded (possibly exponentially large) range, based on Learning With Errors (LWE). In contrast, converting $C$ into a Boolean circuit and applying Yao's garbled circuit treats the inputs as bit strings instead of ring elements, and hence is not "arithmetic". In this work, we present new ways to garble arithmetic circuits, which improve the stateoftheart on efficiency, modularity, and functionality. To measure efficiency, we define the rate of a garbling scheme as the maximal ratio between the bitlength of the garbled circuit $\widehat C$ and that of the computation tableau $C\ell$ in the clear, where $\ell$ is the bit length of wire values (e.g., Yao's garbled circuit has rate $O(\lambda)$). $\bullet$ We present the first constantrate arithmetic garbled circuit for computation over large integers based on the Decisional Composite Residuosity (DCR) assumption, significantly improving the efficiency of the schemes of Applebaum, Ishai, and Kushilevitz. $\bullet$ We construct an arithmetic garbling scheme for modular computation over $\mathcal{R} = \mathbb{Z}_p$ for any integer modulus $p$, based on either DCR or LWE. The DCRbased instantiation achieves rate $O(\lambda)$ for large $p$. Furthermore, our construction is modular and makes blackbox use of the underlying ring and a simple key extension gadget. $\bullet$ We describe a variant of the first scheme supporting arithmetic circuits over bounded integers that are augmented with Boolean computation (e.g., truncation of an integer value, and comparison between two values), while keeping the constant rate when garbling the arithmetic part. To the best of our knowledge, constantrate (Boolean or arithmetic) garbling was only achieved before using the powerful primitive of indistinguishability obfuscation, or for restricted circuits with small depth.

20230327 Revisiting Area Convexity: Faster BoxSimplex Games and Spectrahedral Generalizations.
Summary: We investigate different aspects of area convexity [Sherman '17], a mysterious tool introduced to tackle optimization problems under the challenging $\ell_\infty$ geometry. We develop a deeper understanding of its relationship with more conventional analyses of extragradient methods [Nemirovski '04, Nesterov '07]. We also give improved solvers for the subproblems required by variants of the [Sherman '17] algorithm, designed through the lens of relative smoothness [BauschkeBolteTeboulle '17, LuFreundNesterov '18]. Leveraging these new tools, we give a stateoftheart firstorder algorithm for solving boxsimplex games (a primaldual formulation of $\ell_\infty$ regression) in a $d \times n$ matrix with bounded rows, using $O(\log d \cdot \epsilon^{1})$ matrixvector queries. As a consequence, we obtain improved complexities for approximate maximum flow, optimal transport, minmeancycle, and other basic combinatorial optimization problems. We also develop a nearlinear time algorithm for a matrix generalization of boxsimplex games, capturing a family of problems closely related to semidefinite programs recently used as subroutines in robust statistics and numerical linear algebra.

20230326 Standard Model TimeLock Puzzles: Defining Security and Constructing via Composition.
Summary: The introduction of timelock puzzles initiated the study of publicly “sending information into the future.” For timelock puzzles, the underlying securityenabling mechanism is the computational complexity of the operations needed to solve the puzzle, which must be tunable to reveal the solution after a predetermined time, and not before that time. Timelock puzzles are typically constructed via a commitment to a secret, paired with a reveal algorithm that sequentially iterates a basic function over such commitment. One then shows that shortcutting the iterative process violates cryptographic hardness of an underlying problem. To date, and for more than twentyfive years, research on timelock puzzles relied heavily on iteratively applying wellstructured algebraic functions. However, despite the tradition of cryptography to reason about primitives in a realistic model with standard hardness assumptions (often after initial idealized assumptions), most analysis of timelock puzzles to date still relies on cryptography modeled (in an ideal manner) as a random oracle function or a generic group function. Moreover, Mahmoody et al. showed that timelock puzzles with superpolynomial gap cannot be constructed from randomoracles; yet still, current treatments generally use an algebraic trapdoor to efficiently construct a puzzle with a large time gap, and then apply the inconsistent (with respect to Mahmoody et al.) randomoracle idealizations to analyze the solving process. Finally, little attention has been paid to the nuances of composing multiparty computation with timed puzzles that are solved as part of the protocol. In this work, we initiate a study of timelock puzzles in a model built upon a realistic (and falsifiable) computational framework. We present a new formal definition of residual complexity to characterize a realistic, gradual timerelease for timelock puzzles. We also present a general definition of timed multiparty computation (MPC) and both sequential and concurrent composition theorems for MPC in our model.

20230326 The Subspace Flatness Conjecture and Faster Integer Programming.
Summary: In a seminal paper, Kannan and Lov\'asz (1988) considered a quantity $\mu_{KL}(\Lambda,K)$ which denotes the best volumebased lower bound on the covering radius $\mu(\Lambda,K)$ of a convex body $K$ with respect to a lattice $\Lambda$. Kannan and Lov\'asz proved that $\mu(\Lambda,K) \leq n \cdot \mu_{KL}(\Lambda,K)$ and the Subspace Flatness Conjecture by Dadush (2012) claims a $O(\log n)$ factor suffices, which would match the lower bound from the work of Kannan and Lov\'asz. We settle this conjecture up to a constant in the exponent by proving that $\mu(\Lambda,K) \leq O(\log^{3}(n)) \cdot \mu_{KL} (\Lambda,K)$. Our proof is based on the Reverse Minkowski Theorem due to Regev and StephensDavidowitz (2017). Following the work of Dadush (2012, 2019), we obtain a $(\log n)^{O(n)}$time randomized algorithm to solve integer programs in $n$ variables. Another implication of our main result is a nearoptimal flatness constant of $O(n \log^{4}(n))$.

20230322 Sparks of Artificial General Intelligence: Early experiments with GPT4.
Summary: Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT4, when it was still in active development by OpenAI. We contend that (this early version of) GPT4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT4's performance is strikingly close to humanlevel performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond nextword prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

20230314 Testing Causality for High Dimensional Data.
Summary: Determining causal relationship between high dimensional observations are among the most important tasks in scientific discoveries. In this paper, we revisited the \emph{linear trace method}, a technique proposed in~\citep{janzing2009telling,zscheischler2011testing} to infer the causal direction between two random variables of high dimensions. We strengthen the existing results significantly by providing an improved tail analysis in addition to extending the results to nonlinear trace functionals with sharper confidence bounds under certain distributional assumptions. We obtain our results by interpreting the trace estimator in the causal regime as a function over random orthogonal matrices, where the concentration of Lipschitz functions over such space could be applied. We additionally propose a novel ridgeregularized variant of the estimator in \cite{zscheischler2011testing}, and give provable bounds relating the ridgeestimated terms to their groundtruth counterparts. We support our theoretical results with encouraging experiments on synthetic datasets, more prominently, under highdimension low sample size regime.

20230309 Optimal Security for Keyed Hash Functions: Avoiding TimeSpace Tradeoffs for Finding Collisions.
Summary: Cryptographic hash functions map data of arbitrary size to a fixed size digest, and are one of the most commonly used cryptographic objects. As it is infeasible to design an individual hash function for every input size, variableinput length hash functions are built by designing and bootstrapping a single fixedinput length function that looks sufficiently random. To prevent trivial preprocessing attacks, applications often require not just a single hash function but rather a family of keyed hash functions. The most wellknown methods for designing variableinput length hash function families from a fixed idealized function are the MerkleDamgård and Sponge designs. The former underlies the SHA1 and SHA2 constructions and the latter underlies SHA3. Unfortunately, recent works (Coretti et al. EUROCRYPT 2018, Coretti et al. CRYPTO 2018) show nontrivial timespace tradeoff attacks for finding collisions for both. Thus, this forces a parameter blowup (i.e., efficiency loss) for reaching a certain desired level of security. We ask whether it is possible to build families of keyed hash functions which are provably resistant to any nontrivial timespace tradeoff attacks for finding collisions, without incurring significant efficiency costs. We present several new constructions of keyed hash functions that are provably resistant to any nontrivial timespace tradeoff attacks for finding collisions. Our constructions provide various tradeoffs between their efficiency and the range of parameters where they achieve optimal security for collision resistance. Our main technical contribution is proving optimal security bounds for converting a hash function with a fixedsized input to a keyed hash function with (potentially larger) fixedsize input. We then use this keyed function as the underlying primitive inside the standard MD and Merkle tree constructions. We strongly believe that this paradigm of using a keyed inner hash function in these constructions is the right one, for which nonuniform security has not been analyzed prior to this work.

20230307 Complete Log Concavity of CoverageLike Functions.
Summary: We introduce an expressive subclass of nonnegative almost submodular set functions, called strongly 2coverage functions which include coverage and (sums of) matroid rank functions, and prove that the homogenization of the generating polynomial of any such function is completely logconcave, taking a step towards characterizing the coefficients of (homogeneous) completely logconcave polynomials. As a consequence we obtain that the "level sets" of any such function form an ultralog concave sequence.

20230302 An Improved Classical Singular Value Transformation for Quantum Machine Learning.
Summary: Quantum machine learning (QML) has shown great potential to produce large quantum speedups for linear algebra tasks. The quantum singular value transformation (QSVT), introduced by [GSLW, STOC'19, arXiv:1806.01838], is a unifying framework to obtain QML algorithms. We provide a classical algorithm that matches the performance of QSVT on lowrank inputs, up to small polynomial overhead. Under quantum memory assumptions, given a bounded matrix $A\in\mathbb{C}^{m\times n}$, vector $b\in\mathbb{C}^{n}$, and bounded degree$d$ polynomial $p$, QSVT can output a measurement from the state $p(A)b\rangle$ in $O(d\A\_F)$ time after lineartime preprocessing. We show that, in the same setting, for any $\varepsilon>0$, we can output a vector $v$ such that $\v  p(A) b\\leq\varepsilon\b\$ in $O(d^9\A\_F^4/\varepsilon^2)$ time after lineartime preprocessing. This improves upon the best known classical algorithm [CGLLTW, STOC'20, arXiv:1910.06151], which requires $O(d^{22}\A\_F^6/\varepsilon^6)$ time. Instantiating the aforementioned algorithm with different polynomials, we obtain fast quantuminspired algorithms for regression, recommendation systems, and Hamiltonian simulation. We improve in numerous parameter settings on prior work, including those that use problemspecialized approaches. Our key insight is to combine the Clenshaw recurrence, an iterative method for computing matrix polynomials, with sketching techniques to simulate QSVT classically. The tools we introduce in this work include (a) a matrix sketch for approximately preserving bilinear forms, (b) an asymmetric approximate matrix product sketch based on $\ell_2^2$ sampling, (c) a new stability analysis for the Clenshaw recurrence, and (d) a new technique to bound arithmetic progressions of the coefficients appearing in the Chebyshev series expansion of bounded functions, each of which may be of independent interest.

20230228 A CS guide to the quantum singular value transformation.
Summary: We present a simplified exposition of some pieces of [Gily\'en, Su, Low, and Wiebe, STOC'19, arXiv:1806.01838], who introduced a quantum singular value transformation (QSVT) framework for applying polynomial functions to blockencoded matrices. The QSVT framework has garnered substantial recent interest from the quantum algorithms community, as it was demonstrated by [GSLW19] to encapsulate many existing algorithms naturally phrased as an application of a matrix function. First, we posit that the lifting of quantum singular processing (QSP) to QSVT is better viewed not through Jordan's lemma (as was suggested by [GSLW19]) but as an application of the cosinesine decomposition, which can be thought of as a more explicit and stronger version of Jordan's lemma. Second, we demonstrate that the constructions of bounded polynomial approximations given in [GSLW19], which use a variety of ad hoc approaches drawing from Fourier analysis, Chebyshev series, and Taylor series, can be unified under the framework of truncation of Chebyshev series, and indeed, can in large part be matched via a bounded variant of a standard metatheorem from [Trefethen, 2013]. We hope this work finds use to the community as a companion guide for understanding and applying the powerful framework of [GSLW19].

20230227 Queryoptimal estimation of unitary channels in diamond distance.
Summary: We consider process tomography for unitary quantum channels. Given access to an unknown unitary channel acting on a $\textsf{d}$dimensional qudit, we aim to output a classical description of a unitary that is $\varepsilon$close to the unknown unitary in diamond norm. We design an algorithm achieving error $\varepsilon$ using $O(\textsf{d}^2/\varepsilon)$ applications of the unknown channel and only one qudit. This improves over prior results, which use $O(\textsf{d}^3/\varepsilon^2)$ [via standard process tomography] or $O(\textsf{d}^{2.5}/\varepsilon)$ [Yang, Renner, and Chiribella, PRL 2020] applications. To show this result, we introduce a simple technique to "bootstrap" an algorithm that can produce constanterror estimates to one that can produce $\varepsilon$error estimates with the Heisenberg scaling. Finally, we prove a complementary lower bound showing that estimation requires $\Omega(\textsf{d}^2/\varepsilon)$ applications, even with access to the inverse or controlled versions of the unknown unitary. This shows that our algorithm has both optimal query complexity and optimal space complexity.

20230224 Quantum trapdoor functions from classical oneway functions.
Summary: We formalize and study the notion of a quantum trapdoor function. This is an efficiently computable unitary that takes as input a "public" quantum state and a classical string $x$, and outputs a quantum state. This map is such that (i) it is hard to invert, in the sense that it is hard to recover $x$ given the output state (and many copies of the public state), and (ii) there is a classical trapdoor that allows efficient inversion. We show that a quantum trapdoor function can be constructed from any quantumsecure oneway function. A direct consequence of this result is that, assuming just the existence of quantumsecure oneway functions, there exists a publickey encryption scheme with a (pure) quantum public key.

20230221 $k$NNAdapter: Efficient Domain Adaptation for BlackBox Language Models.
Summary: Finetuning a language model on a new domain is standard practice for domain adaptation. However, it can be infeasible when it comes to modern largescale language models such as GPT3, which can only be accessed through APIs, making it difficult to access the internal parameters of the model. In this paper, we propose $k$NNAdapter, a method to effectively adapt these blackbox large language models (LLMs) to a new domain. The $k$NNAdapter builds on top of the retrievalaugmented language model, and adaptively learns to interpolate the output of the language model with retrieval results from a datastore consisting of the target domain data. Our experiments on four different domains demonstrate that $k$NNAdapter significantly improves perplexity, and works particularly well in settings with limited access to LLMs. Additionally, we show that $k$NNAdapter is more effective than finetuning when the amount of training data is limited. We also release a dataset to encourage further study.

20230220 Private (Stochastic) NonConvex Optimization Revisited: SecondOrder Stationary Points and Excess Risks.
Summary: We consider the problem of minimizing a nonconvex objective while preserving the privacy of the examples in the training data. Building upon the previous variancereduced algorithm SpiderBoost, we introduce a new framework that utilizes two different kinds of gradient oracles. The first kind of oracles can estimate the gradient of one point, and the second kind of oracles, less precise and more costeffective, can estimate the gradient difference between two points. SpiderBoost uses the first kind periodically, once every few steps, while our framework proposes using the first oracle whenever the total drift has become large and relies on the second oracle otherwise. This new framework ensures the gradient estimations remain accurate all the time, resulting in improved rates for finding secondorder stationary points. Moreover, we address a more challenging task of finding the global minima of a nonconvex objective using the exponential mechanism. Our findings indicate that the regularized exponential mechanism can closely match previous empirical and population risk bounds, without requiring smoothness assumptions for algorithms with polynomial running time. Furthermore, by disregarding running time considerations, we show that the exponential mechanism can achieve a good population risk bound and provide a nearly matching lower bound.

20230213 Algorithmic Aspects of the LogLaplace Transform and a NonEuclidean Proximal Sampler.
Summary: The development of efficient sampling algorithms catering to nonEuclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings. We develop a nonEuclidean analog of the recent proximal sampler of [LST21], which naturally induces regularization by an object known as the logLaplace transform (LLT) of a density. We prove new mathematical properties (with an algorithmic flavor) of the LLT, such as strong convexitysmoothness duality and an isoperimetric inequality, which are used to prove a mixing time on our proximal sampler matching [LST21] under a warm start. As our main application, we show our warmstarted sampler improves the value oracle complexity of differentially private convex optimization in $\ell_p$ and Schatten$p$ norms for $p \in [1, 2]$ to match the Euclidean setting [GLL22], while retaining stateoftheart excess risk bounds [GLLST23]. We find our investigation of the LLT to be a promising proofofconcept of its utility as a tool for designing samplers, and outline directions for future exploration.

20230121 From Pseudorandomness to MultiGroup Fairness and Back.
Summary: We identify and explore connections between the recent literature on multigroup fairness for prediction algorithms and the pseudorandomness notions of leakageresilience and graph regularity. We frame our investigation using new, statistical distancebased variants of multicalibration that are closely related to the concept of outcome indistinguishability. Adopting this perspective leads us naturally not only to our graph theoretic results, but also to new, more efficient algorithms for multicalibration in certain parameter regimes and a novel proof of a hardcore lemma for realvalued functions.

20230113 Cumulative Memory Lower Bounds for Randomized and Quantum Computation.
Summary: Cumulative memory  the sum of space used over the steps of a computation  is a finegrained measure of timespace complexity that is a more accurate measure of cost for algorithms with infrequent spikes in memory usage in the context of technologies such as cloud computing that allow dynamic allocation and deallocation of resources during their execution. We give the first lower bounds on cumulative memory complexity that apply to general sequential classical algorithms. We also prove the first such bounds for boundederror quantum circuits. Among many possible applications, we show that any classical sorting algorithm with success probability at least $1/\text{poly}(n)$ requires cumulative memory $\tilde \Omega(n^2)$, any classical matrix multiplication algorithm requires cumulative memory $\Omega(n^6/T)$, any quantum sorting circuit requires cumulative memory $\Omega(n^3/T)$, and any quantum circuit that finds $k$ disjoint collisions in a random function requires cumulative memory $\Omega(k^3n/T^2)$. More generally, we present theorems that can be used to convert a wide class of existing timespace tradeoff lower bounds to matching lower bounds on cumulative memory complexity.

20230110 Elastic Cash.
Summary: Elastic Cash is a new decentralized mechanism for regulating the money supply. The mechanism operates by modifying the supply so that an interest rate determined by a public market is kept approximately fixed. It can be incorporated into the conventional monetary system to improve the elasticity of the US Dollar, and it can be used to design new elastic cryptocurrencies that remain decentralized.

20230101 ReSQueing Parallel and Private Stochastic Convex Optimization.
Summary: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving stateoftheart complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{2/3} + \epsilon_{\text{opt}}^{2}$ gradient queries in total, assuming access to a boundedvariance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{1}, d^{1/4}]$, our algorithm matches the stateoftheart oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains nearoptimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately lowdimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is nearlinear.

20221214 Learning threshold neurons via the "edge of stability".
Summary: Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely nonconvex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of twolayer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "thresholdlike" neurons (i.e., neurons with a nonzero firstlayer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.

20221213 A (Slightly) Improved Deterministic Approximation Algorithm for Metric TSP.
Summary: We show that the max entropy algorithm can be derandomized (with respect to a particular objective function) to give a deterministic $3/2\epsilon$ approximation algorithm for metric TSP for some $\epsilon > 10^{36}$. To obtain our result, we apply the method of conditional expectation to an objective function constructed in prior work which was used to certify that the expected cost of the algorithm is at most $3/2\epsilon$ times the cost of an optimal solution to the subtour elimination LP. The proof in this work involves showing that the expected value of this objective function can be computed in polynomial time (at all stages of the algorithm's execution).

20221203 Exploring the Limits of Differentially Private Deep Learning with Groupwise Clipping.
Summary: Differentially private deep learning has recently witnessed advances in computational efficiency and privacyutility tradeoff. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{groupwise clipping}. To reduce the compute time overhead of private learning, we show that \emph{perlayer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memoryefficient and almost as fast per training update as nonprivate learning for many workflows of interest. While perlayer clipping with constant thresholds tends to underperform standard flat clipping, perlayer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately finetune the 175 billionparameter GPT3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{perdevice clipping} that clips the gradient of each model piece separately on its host device. Privately finetuning GPT3 with perdevice clipping achieves a task performance at $\epsilon=1$ better than what is attainable by nonprivately finetuning the largest GPT2 on a summarization task.

20221130 On Disperser/Lifting Properties of the Index and InnerProduct Functions.
Summary: Querytocommunication lifting theorems, which connect the query complexity of a Boolean function to the communication complexity of an associated `lifted' function obtained by composing the function with many copies of another function known as a gadget, have been instrumental in resolving many open questions in computational complexity. Several important complexity questions could be resolved if we could make substantial improvements in the input size required for lifting with the Index function, from its current nearlinear size down to polylogarithmic in the number of inputs $N$ of the original function or, ideally, constant. The nearlinear size bound was shown by Lovett, Meka, Mertz, Pitassi and Zhang using a recent breakthrough improvement on the Sunflower Lemma to show that a certain graph associated with the Index function of nearlinear size is a disperser. They also stated a conjecture about the Index function that is essential for further improvements in the size required for lifting with Index using current techniques. In this paper we prove the following; 1) The conjecture of Lovett et al. is false when the size of the Index gadget is $\log N\omega(1)$. 2) Also, the InnerProduct function, which satisfies the disperser property at size $O(\log N)$, does not have this property when its size is $\log N\omega(1)$. 3) Nonetheless, using Index gadgets of size at least 4, we prove a lifting theorem for a restricted class of communication protocols in which one of the players is limited to sending parities of its inputs. 4) Using the ideas from this lifting theorem, we derive a strong lifting theorem from decision tree size to parity decision tree size. We use this to derive a general lifting theorem in proof complexity from treeresolution size to treelike $Res(\oplus)$ refutation size, which yields many new exponential lower bounds on such proofs.

20221121 Upper and Lower Bounds on the Smoothed Complexity of the Simplex Method.
Summary: The simplex method for linear programming is known to be highly efficient in practice, and understanding its performance from a theoretical perspective is an active research topic. The framework of smoothed analysis, first introduced by Spielman and Teng (JACM '04) for this purpose, defines the smoothed complexity of solving a linear program with $d$ variables and $n$ constraints as the expected running time when Gaussian noise of variance $\sigma^2$ is added to the LP data. We prove that the smoothed complexity of the simplex method is $O(\sigma^{3/2} d^{13/4}\log^{7/4} n)$, improving the dependence on $1/\sigma$ compared to the previous bound of $O(\sigma^{2} d^2\sqrt{\log n})$. We accomplish this through a new analysis of the \emph{shadow bound}, key to earlier analyses as well. Illustrating the power of our new method, we use our method to prove a nearly tight upper bound on the smoothed complexity of twodimensional polygons. We also establish the first nontrivial lower bound on the smoothed complexity of the simplex method, proving that the \emph{shadow vertex simplex method} requires at least $\Omega \Big(\min \big(\sigma^{1/2} d^{1/2}\log^{1/4} d,2^d \big) \Big)$ pivot steps with high probability. A key part of our analysis is a new variation on the extended formulation for the regular $2^k$gon. We end with a numerical experiment that suggests this analysis could be further improved.

20221117 How to FineTune Vision Models with SGD.
Summary: SGD (with momentum) and AdamW are the two most used optimizers for finetuning large neural networks in computer vision. When the two methods perform the same, SGD is preferable because it uses less memory (12 bytes/parameter) than AdamW (16 bytes/parameter). However, on a suite of downstream tasks, especially those with distribution shifts, we show that finetuning with AdamW performs substantially better than SGD on modern Vision Transformer and ConvNeXt models. We find that large gaps in performance between SGD and AdamW occur when the finetuning gradients in the first "embedding" layer are much larger than in the rest of the model. Our analysis suggests an easy fix that works consistently across datasets and models: merely freezing the embedding layer (less than 1\% of the parameters) leads to SGD performing competitively with AdamW while using less memory. Our insights result in stateoftheart accuracies on five popular distribution shift benchmarks: WILDSFMoW, WILDSCamelyon, Living17, Waterbirds, and DomainNet.

20221109 A 4/3Approximation Algorithm for HalfIntegral Cycle Cut Instances of the TSP.
Summary: A longstanding conjecture for the traveling salesman problem (TSP) states that the integrality gap of the standard linear programming relaxation of the TSP is at most 4/3. Despite significant efforts, the conjecture remains open. We consider the halfintegral case, in which the LP has solution values in $\{0, 1/2, 1\}$. Such instances have been conjectured to be the most difficult instances for the overall fourthirds conjecture. Karlin, Klein, and Oveis Gharan, in a breakthrough result, were able to show that in the halfintegral case, the integrality gap is at most 1.49993. This result led to the first significant progress on the overall conjecture in decades; the same authors showed the integrality gap is at most $1.5 10^{36}$ in the nonhalfintegral case. For the halfintegral case, the current bestknown ratio is 1.4983, a result by Gupta et al. With the improvements on the 3/2 bound remaining very incremental even in the halfintegral case, we turn the question around and look for a large class of halfintegral instances for which we can prove that the 4/3 conjecture is correct. The previous works on the halfintegral case perform induction on a hierarchy of critical tight sets in the support graph of the LP solution, in which some of the sets correspond to "cycle cuts" and the others to "degree cuts". We show that if all the sets in the hierarchy correspond to cycle cuts, then we can find a distribution of tours whose expected cost is at most 4/3 times the value of the halfintegral LP solution; sampling from the distribution gives us a randomized 4/3approximation algorithm. We note that the known bad cases for the integrality gap have a gap of 4/3 and have a halfintegral LP solution in which all the critical tight sets in the hierarchy are cycle cuts; thus our result is tight.

20221107 From approximate to exact integer programming.
Summary: Approximate integer programming is the following: For a convex body $K \subseteq \mathbb{R}^n$, either determine whether $K \cap \mathbb{Z}^n$ is empty, or find an integer point in the convex body scaled by $2$ from its center of gravity $c$. Approximate integer programming can be solved in time $2^{O(n)}$ while the fastest known methods for exact integer programming run in time $2^{O(n)} \cdot n^n$. So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point $x^* \in (K \cap \mathbb{Z}^n)$ can be found in time $2^{O(n)}$, provided that the remainders of each component $x_i^* \mod{\ell}$ for some arbitrarily fixed $\ell \geq 5(n+1)$ of $x^*$ are given. The algorithm is based on a cuttingplane technique, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a $2^{O(n)}n^n$ algorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (2012) that is considerably more involved. Our algorithm also relies on a new asymmetric approximate Carath\'eodory theorem that might be of interest on its own. Our second method concerns integer programming problems in equationstandard form $Ax = b, 0 \leq x \leq u, \, x \in \mathbb{Z}^n$ . Such a problem can be reduced to the solution of $\prod_i O(\log u_i +1)$ approximate integer programming problems. This implies, for example that knapsack or subsetsum problems with polynomial variable range $0 \leq x_i \leq p(n)$ can be solved in time $(\log n)^{O(n)}$. For these problems, the best running time so far was $n^n \cdot 2^{O(n)}$.

20221029 The Vector Balancing Constant for Zonotopes.
Summary: The vector balancing constant $\mathrm{vb}(K,Q)$ of two symmetric convex bodies $K,Q$ is the minimum $r \geq 0$ so that any number of vectors from $K$ can be balanced into an $r$scaling of $Q$. A question raised by Schechtman is whether for any zonotope $K \subseteq \mathbb{R}^d$ one has $\mathrm{vb}(K,K) \lesssim \sqrt{d}$. Intuitively, this asks whether a natural geometric generalization of Spencer's Theorem (for which $K = B^d_\infty$) holds. We prove that for any zonotope $K \subseteq \mathbb{R}^d$ one has $\mathrm{vb}(K,K) \lesssim \sqrt{d} \log \log \log d$. Our main technical contribution is a tight lower bound on the Gaussian measure of any section of a normalized zonotope, generalizing Vaaler's Theorem for cubes. We also prove that for two different normalized zonotopes $K$ and $Q$ one has $\mathrm{vb}(K,Q) \lesssim \sqrt{d \log d}$. All the bounds are constructive and the corresponding colorings can be computed in polynomial time.

20221021 Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation.
Summary: Knowledge distillation is one of the primary methods of transferring knowledge from large to small models. However, it requires massive taskspecific data, which may not be plausible in many realworld applications. Data augmentation methods such as representation interpolation, token replacement, or augmentation with models are applied to tackle this problem. However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models). To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. Our method builds on top of representation interpolation augmentation methods to maintain the diversity of expressions and converts the augmented data to tokens to avoid shifting decision boundaries. It uses simple operations that come with little computational overhead. The results on multiple GLUE tasks show that our methods can improve distillation performance by a large margin at a low time cost. Codes are available at https://github.com/googleresearch/googleresearch/tree/master/augpro.

20221013 Conditionnumberindependent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators.
Summary: We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of $e^{f(x)}$ on a convex body $\mathcal{M}\subset\mathbb{R}^{n}$. We show that for distributions in the form of $e^{\alpha^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonlyused integrators is independent of $\left\Vert \alpha\right\Vert _{2}$ and the geometry of the polytope. In particular, the implicit midpoint method (IMM) and the generalized Leapfrog method (LM) have a mixing time of $\widetilde{O}\left(mn^{3}\right)$ to achieve $\epsilon$ total variation distance to the target distribution. These guarantees are based on a general bound on the convergence rate for densities of the form $e^{f(x)}$ in terms of parameters of the manifold and the integrator. Our theoretical guarantee complements the empirical results of [KLSV22], which shows that RHMC with IMM can sample illconditioned, nonsmooth and constrained distributions in very high dimension efficiently in practice.

20221012 Sample Constrained Treatment Effect Estimation.
Summary: Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sampleconstrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset must be further partitioned into treatment and control groups. Algorithms for partitioning the entire population into treatment and control groups, or for choosing a single representative subset, have been wellstudied. The key challenge in our setting is jointly choosing a representative subset and a partition for that set. We focus on both individual and average treatment effect estimation, under a linear effects model. We give provably efficient experimental designs and corresponding estimators, by identifying connections to discrepancy minimization and leveragescorebased sampling used in randomized numerical linear algebra. Our theoretical results obtain a smooth transition to known guarantees when $s$ equals the population size. We also empirically demonstrate the performance of our algorithms.

20221012 Quantum Depth in the Random Oracle Model.
Summary: We give a comprehensive characterization of the computational power of shallow quantum circuits combined with classical computation. Specifically, for classes of search problems, we show that the following statements hold, relative to a random oracle: (a) $\mathsf{BPP}^{\mathsf{QNC}^{\mathsf{BPP}}} \neq \mathsf{BQP}$. This refutes Jozsa's conjecture [QIP 05] in the random oracle model. As a result, this gives the first instantiatable separation between the classes by replacing the oracle with a cryptographic hash function, yielding a resolution to one of Aaronson's ten semigrand challenges in quantum computing. (b) $\mathsf{BPP}^{\mathsf{QNC}} \nsubseteq \mathsf{QNC}^{\mathsf{BPP}}$ and $\mathsf{QNC}^{\mathsf{BPP}} \nsubseteq \mathsf{BPP}^{\mathsf{QNC}}$. This shows that there is a subtle interplay between classical computation and shallow quantum computation. In fact, for the second separation, we establish that, for some problems, the ability to perform adaptive measurements in a single shallow quantum circuit, is more useful than the ability to perform polynomially many shallow quantum circuits without adaptive measurements. (c) There exists a 2message proof of quantum depth protocol. Such a protocol allows a classical verifier to efficiently certify that a prover must be performing a computation of some minimum quantum depth. Our proof of quantum depth can be instantiated using the recent proof of quantumness construction by Yamakawa and Zhandry [STOC 22].

20220921 Chaining, Group Leverage Score Overestimates, and Fast Spectral Hypergraph Sparsification.
Summary: We present an algorithm that given any $n$vertex, $m$edge, rank $r$ hypergraph constructs a spectral sparsifier with $O(n \varepsilon^{2} \log n \log r)$ hyperedges in nearlylinear $\widetilde{O}(mr)$ time. This improves in both size and efficiency over a line of work (BansalSvenssonTrevisan 2019, KapralovKrauthgamerTardosYoshida 2021) for which the previous best size was $O(\min\{n \varepsilon^{4} \log^3 n,nr^3 \varepsilon^{2} \log n\})$ and runtime was $\widetilde{O}(mr + n^{O(1)})$. Independent Result: In an independent work, Lee (Lee 2022) also shows how to compute a spectral hypergraph sparsifier with $O(n \varepsilon^{2} \log n \log r)$ hyperedges.

20220915 Multicalibrated Regression for Downstream Fairness.
Summary: We show how to take a regression function $\hat{f}$ that is appropriately ``multicalibrated'' and efficiently postprocess it into an approximately error minimizing classifier satisfying a large variety of fairness constraints. The postprocessing requires no labeled data, and only a modest amount of unlabeled data and computation. The computational and sample complexity requirements of computing $\hat f$ are comparable to the requirements for solving a single fair learning task optimally, but it can in fact be used to solve many different downstream fairnessconstrained learning problems efficiently. Our postprocessing method easily handles intersecting groups, generalizing prior work on postprocessing regression functions to satisfy fairness constraints that only applied to disjoint groups. Our work extends recent work showing that multicalibrated regression functions are ``omnipredictors'' (i.e. can be postprocessed to optimally solve unconstrained ERM problems) to constrained optimization.

20220909 Spectral hypergraph sparsification via chaining.
Summary: In a hypergraph on $n$ vertices where $D$ is the maximum size of a hyperedge, there is a weighted hypergraph spectral $\varepsilon$sparsifier with at most $O(\varepsilon^{2} \log(D) \cdot n \log n)$ hyperedges. This improves over the bound of Kapralov, Krauthgamer, Tardos and Yoshida (2021) who achieve $O(\varepsilon^{4} n (\log n)^3)$, as well as the bound $O(\varepsilon^{2} D^3 n \log n)$ obtained by Bansal, Svensson, and Trevisan (2019). The same sparsification result was obtained independently by Jambulapati, Liu, and Sidford (2022).

20220829 Online Bidding Algorithms for ReturnonSpend Constrained Advertisers.
Summary: Online advertising has recently grown into a highly competitive and complex multibilliondollar industry, with advertisers bidding for ad slots at large scales and high frequencies. This has resulted in a growing need for efficient "autobidding" algorithms that determine the bids for incoming queries to maximize advertisers' targets subject to their specified constraints. This work explores efficient online algorithms for a single valuemaximizing advertiser under an increasingly popular constraint: ReturnonSpend (RoS). We quantify efficiency in terms of regret relative to the optimal algorithm, which knows all queries a priori. We contribute a simple online algorithm that achieves nearoptimal regret in expectation while always respecting the specified RoS constraint when the input sequence of queries are i.i.d. samples from some distribution. We also integrate our results with the previous work of Balseiro, Lu, and Mirrokni [BLM20] to achieve nearoptimal regret while respecting both RoS and fixed budget constraints. Our algorithm follows the primaldual framework and uses online mirror descent (OMD) for the dual updates. However, we need to use a noncanonical setup of OMD, and therefore the classic lowregret guarantee of OMD, which is for the adversarial setting in online learning, no longer holds. Nonetheless, in our case and more generally where lowregret dynamics are applied in algorithm design, the gradients encountered by OMD can be far from adversarial but influenced by our algorithmic choices. We exploit this key insight to show our OMD setup achieves low regret in the realm of our algorithm.

20220824 A Slightly Improved Bound for the KLS Constant.
Summary: We refine the recent breakthrough technique of Klartag and Lehec to obtain an improved polylogarithmic bound for the KLS constant.

20220809 An Improved TrickleDown Theorem for Partite Complexes.
Summary: Given a $d+1$partite $d$dimensional simplicial complex, we prove a generalization of the trickledown theorem. We show that if "on average" faces of codimension 2 are $\frac{1\delta}{d}$(onesided) spectral expanders, then any face of codimension $k$ is an $O(\frac{1\delta}{k\delta})$(onesided) spectral expander, for all $3\leq k\leq d+1$. For an application, using our theorem as a blackbox, we show that links of faces of codimension $k$ in recent constructions of bounded degree high dimensional expanders have local spectral expansion at most $O(1/k)$ fraction of the local expansion of worst faces of codimension $2$.

20220807 Decomposable NonSmooth Convex Optimization with NearlyLinear Gradient Oracle Complexity.
Summary: Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $\theta$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cuttingplane and interiorpoint methods.

20220727 Searching for Regularity in Bounded Functions.
Summary: Given a function $f$ on $\mathbb{F}_2^n$, we study the following problem. What is the largest affine subspace $\mathcal{U}$ such that when restricted to $\mathcal{U}$, all the nontrivial Fourier coefficients of $f$ are very small? For the natural class of bounded Fourier degree $d$ functions $f:\mathbb{F}_2^n \to [1,1]$, we show that there exists an affine subspace of dimension at least $ \tilde\Omega(n^{1/d!}k^{2})$, wherein all of $f$'s nontrivial Fourier coefficients become smaller than $ 2^{k}$. To complement this result, we show the existence of degree $d$ functions with coefficients larger than $2^{d\log n}$ when restricted to any affine subspace of dimension larger than $\Omega(dn^{1/(d1)})$. In addition, we give explicit examples of functions with analogous but weaker properties. Along the way, we provide multiple characterizations of the Fourier coefficients of functions restricted to subspaces of $\mathbb{F}_2^n$ that may be useful in other contexts. Finally, we highlight applications and connections of our results to parity kill number and affine dispersers.

20220719 Pandora Box Problem with Nonobligatory Inspection: Hardness and Approximation Scheme.
Summary: Weitzman (1979) introduced the Pandora Box problem as a model for sequential search with inspection costs, and gave an elegant indexbased policy that attains provably optimal expected payoff. In various scenarios, the searching agent may select an option without making a costly inspection. The variant of the Pandora box problem with nonobligatory inspection has attracted interest from both economics and algorithms researchers. Various simple algorithms have proved suboptimal, with the best known 0.8approximation algorithm due to Guha et al. (2008). No hardness result for the problem was known. In this work, we show that it is NPhard to compute an optimal policy for Pandora's problem with nonobligatory inspection. We also give a polynomialtime approximation scheme (PTAS) that computes policies with an expected payoff at least $(1  \epsilon)$fraction of the optimal, for arbitrarily small $\epsilon > 0$. On the side, we show the decision version of the problem to be in NP.

20220718 Private Convex Optimization in General Norms.
Summary: We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\\cdot\$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(k(F+\mu r))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\\cdot\$, generalizing a recent work of [Gopi, Lee, Liu '22] to nonEuclidean settings. We show that this mechanism satisfies Gaussian differential privacy and solves both DPERM (empirical risk minimization) and DPSCO (stochastic convex optimization) by using localization tools from convex geometry. Our framework is the first to apply to private convex optimization in general normed spaces and directly recovers nonprivate SCO rates achieved by mirror descent as the privacy parameter $\epsilon \to \infty$. As applications, for Lipschitz optimization in $\ell_p$ norms for all $p \in (1, 2)$, we obtain the first optimal privacyutility tradeoffs; for $p = 1$, we improve tradeoffs obtained by the recent works [Asi, Feldman, Koren, Talwar '21, Bassily, Guzman, Nandi '21] by at least a logarithmic factor. Our $\ell_p$ norm and Schatten$p$ norm optimization frameworks are complemented with polynomialtime samplers whose query complexity we explicitly bound.

20220707 Approximate Carathéodory bounds via Discrepancy Theory.
Summary: The approximate Carath\'eodory problem in general form is as follows: Given two symmetric convex bodies $P,Q \subseteq \mathbb{R}^m$, a parameter $k \in \mathbb{N}$ and $\mathbf{z} \in \textrm{conv}(X)$ with $X \subseteq P$, find $\mathbf{v}_1,\ldots,\mathbf{v}_k \in X$ so that $\\mathbf{z}  \frac{1}{k}\sum_{i=1}^k \mathbf{v}_i\_Q$ is minimized. Maurey showed that if both $P$ and $Q$ coincide with the $\ \cdot \_p$ball, then an error of $O(\sqrt{p/k})$ is possible. We prove a reduction to the vector balancing constant from discrepancy theory which for most cases can provide tight bounds for general $P$ and $Q$. For the case where $P$ and $Q$ are both $\ \cdot \_p$balls we prove an upper bound of $\sqrt{ \frac{\min\{ p, \log (\frac{2m}{k}) \}}{k}}$. Interestingly, this bound cannot be obtained taking independent random samples; instead we use the LovettMeka random walk. We also prove an extension to the more general case where $P$ and $Q$ are $\\cdot \_p$ and $\ \cdot \_q$balls with $2 \leq p \leq q \leq \infty$.

20220707 Individual Preference Stability for Clustering.
Summary: In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first show that deciding whether a given data set allows for an IPstable clustering in general is NPhard. As a result, we explore the design of efficient algorithms for finding IPstable clusterings in some restricted metric spaces. We present a polytime algorithm to find a clustering satisfying exact IPstability on the real line, and an efficient algorithm to find an IPstable 2clustering for a tree metric. We also consider relaxing the stability constraint, i.e., every data point should not be too far from its own cluster compared to any other cluster. For this case, we provide polytime algorithms with different guarantees. We evaluate some of our algorithms and several standard clustering approaches on real data sets.

20220701 When Does Differentially Private Learning Not Suffer in High Dimensions?.
Summary: Large pretrained models can be privately finetuned to achieve performance approaching that of nonprivate models. A common theme in these results is the surprising observation that highdimensional models can achieve favorable privacyutility tradeoffs. This seemingly contradicts known results on the modelsize dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term \emph{restricted Lipschitz continuity} and derive improved bounds for the excess empirical and population risks that are dimensionindependent under additional conditions. We empirically show that in private finetuning of large language models, gradients obtained during finetuning are mostly controlled by a few principal components. This behavior is similar to conditions under which we obtain dimensionindependent bounds in convex settings. Our theoretical and empirical results together provide a possible explanation for recent successes in largescale private finetuning. Code to reproduce our results can be found at \url{https://github.com/lxuechen/privatetransformers/tree/main/examples/classification/spectral_analysis}.

20220622 Active Learning with Safety Constraints.
Summary: Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into realtime, realworld decisionmaking pipelines, it is increasingly important that such algorithms are designed with safety in mind. In this work we investigate the complexity of learning the best safe decision in interactive environments. We reduce this problem to a constrained linear bandits problem, where our goal is to find the best arm satisfying certain (unknown) safety constraints. We propose an adaptive experimental designbased algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal. To our knowledge, our results are the first on bestarm identification in linear bandits with safety constraints. In practice, we demonstrate that this approach performs well on synthetic and real world datasets.

20220617 RECAPP: Crafting a More Efficient Catalyst for Convex Optimization.
Summary: The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a wellestablished reduction from convex optimization to approximate proximal point computation (i.e., regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal point to high accuracy. In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. We apply RECAPP to two canonical problems: finitesum and maxstructured minimization. For finitesum problems, we match the best known complexity, previously obtained by carefullydesigned problemspecific algorithms. For minimizing $\max_y f(x,y)$ where $f$ is convex in $x$ and stronglyconcave in $y$, we improve on the best known (Catalystbased) bound by a logarithmic factor.

20220606 Multilearner risk reduction under endogenous participation dynamics.
Summary: Prediction systems face exogenous and endogenous distribution shift  the world constantly changes, and the predictions the system makes change the environment in which it operates. For example, a music recommender observes exogeneous changes in the user distribution as different communities have increased access to high speed internet. If users under the age of 18 enjoy their recommendations, the proportion of the user base comprised of those under 18 may endogeneously increase. Most of the study of endogenous shifts has focused on the single decisionmaker setting, where there is one learner that users either choose to use or not. This paper studies participation dynamics between subpopulations and possibly many learners. We study the behavior of systems with \emph{riskreducing} learners and subpopulations. A riskreducing learner updates their decision upon observing a mixture distribution of the subpopulations $\mathcal{D}$ in such a way that it decreases the risk of the learner on that mixture. A risk reducing subpopulation updates its apportionment amongst learners in a way which reduces its overall loss. Previous work on the single learner case shows that myopic risk minimization can result in high overall loss~\citep{perdomo2020performative, miller2021outside} and representation disparity~\citep{hashimoto2018fairness, zhang2019group}. Our work analyzes the outcomes of multiple myopic learners and market forces, often leading to better global loss and less representation disparity.

20220530 Optimal and Adaptive MonteiroSvaiter Acceleration.
Summary: We develop a variant of the MonteiroSvaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any $p\ge 2$ we improve the complexity of convex optimization with Lipschitz $p$th derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem parameters, and implement it as either a second or firstorder method by solving linear systems or applying MinRes, respectively. On logistic regression our method outperforms previous secondorder momentum methods, but underperforms Newton's method; simply iterating our firstorder adaptive subproblem solver performs comparably to LBFGS.

20220525 Preference Dynamics Under Personalized Recommendations.
Summary: Many projects (both practical and academic) have designed algorithms to match users to content they will enjoy under the assumption that user's preferences and opinions do not change with the content they see. Evidence suggests that individuals' preferences are directly shaped by what content they see  radicalization, rabbit holes, polarization, and boredom are all example phenomena of preferences affected by content. Polarization in particular can occur even in ecosystems with "mass media," where no personalization takes place, as recently explored in a natural model of preference dynamics by~\citet{hkazla2019geometric} and~\citet{gaitonde2021polarization}. If all users' preferences are drawn towards content they already like, or are repelled from content they already dislike, uniform consumption of media leads to a population of heterogeneous preferences converging towards only two poles. In this work, we explore whether some phenomenon akin to polarization occurs when users receive \emph{personalized} content recommendations. We use a similar model of preference dynamics, where an individual's preferences move towards content the consume and enjoy, and away from content they consume and dislike. We show that standard user reward maximization is an almost trivial goal in such an environment (a large class of simple algorithms will achieve only constant regret). A more interesting objective, then, is to understand under what conditions a recommendation algorithm can ensure stationarity of user's preferences. We show how to design a content recommendations which can achieve approximate stationarity, under mild conditions on the set of available content, when a user's preferences are known, and how one can learn enough about a user's preferences to implement such a strategy even when user preferences are initially unknown.

20220512 Optimal Methods for HigherOrder Smooth Monotone Variational Inequalities.
Summary: In this work, we present new simple and optimal algorithms for solving the variational inequality (VI) problem for $p^{th}$order smooth, monotone operators  a problem that generalizes convex optimization and saddlepoint problems. Recent works (Bullins and Lai (2020), Lin and Jordan (2021), Jiang and Mokhtari (2022)) present methods that achieve a rate of $\tilde{O}(\epsilon^{2/(p+1)})$ for $p\geq 1$, extending results by (Nemirovski (2004)) and (Monteiro and Svaiter (2012)) for $p=1,2$. A drawback to these approaches, however, is their reliance on a line search scheme. We provide the first $p^{\textrm{th}}$order method that achieves a rate of $O(\epsilon^{2/(p+1)}).$ Our method does not rely on a line search routine, thereby improving upon previous rates by a logarithmic factor. Building on the Mirror Prox method of Nemirovski (2004), our algorithm works even in the constrained, nonEuclidean setting. Furthermore, we prove the optimality of our algorithm by constructing matching lower bounds. These are the first lower bounds for smooth MVIs beyond convex optimization for $p > 1$. This establishes a separation between solving smooth MVIs and smooth convex optimization, and settles the oracle complexity of solving $p^{\textrm{th}}$order smooth MVIs.

20220503 Nested Dissection Meets IPMs: Planar MinCost Flow in NearlyLinear Time.
Summary: We present a nearlylinear time algorithm for finding a minimumcost flow in planar graphs with polynomially bounded integer costs and capacities. The previous fastest algorithm for this problem is based on interior point methods (IPMs) and works for general sparse graphs in $O(n^{1.5}\text{poly}(\log n))$ time [DaitchSpielman, STOC'08]. Intuitively, $\Omega(n^{1.5})$ is a natural runtime barrier for IPMbased methods, since they require $\sqrt{n}$ iterations, each routing a possiblydense electrical flow. To break this barrier, we develop a new implicit representation for flows based on generalized nesteddissection [LiptonRoseTarjan, JSTOR'79] and approximate Schur complements [KyngSachdeva, FOCS'16]. This implicit representation permits us to design a data structure to route an electrical flow with sparse demands in roughly $\sqrt{n}$ update time, resulting in a total running time of $O(n\cdot\text{poly}(\log n))$. Our results immediately extend to all families of separable graphs.

20220427 Regularized BoxSimplex Games and Dynamic Decremental Bipartite Matching.
Summary: Boxsimplex games are a family of bilinear minimax objectives which encapsulate graphstructured problems such as maximum flow [She17], optimal transport [JST19], and bipartite matching [AJJ+22]. We develop efficient nearlinear time, highaccuracy solvers for regularized variants of these games. Beyond the immediate applications of such solvers for computing Sinkhorn distances, a prominent tool in machine learning, we show that these solvers can be used to obtain improved running times for maintaining a (fractional) $\epsilon$approximate maximum matching in a dynamic decremental bipartite graph against an adaptive adversary. We give a generic framework which reduces this dynamic matching problem to solving regularized graphstructured optimization problems to high accuracy. Through our reduction framework, our regularized boxsimplex game solver implies a new algorithm for dynamic decremental bipartite matching in total time $\tilde{O}(m \cdot \epsilon^{3})$, from an initial graph with $m$ edges and $n$ nodes. We further show how to use recent advances in flow optimization [CKL+22] to improve our runtime to $m^{1 + o(1)} \cdot \epsilon^{2}$, thereby demonstrating the versatility of our reductionbased approach. These results improve upon the previous best runtime of $\tilde{O}(m \cdot \epsilon^{4})$ [BGS20] and illustrate the utility of using regularized optimization problem solvers for designing dynamic algorithms.

20220308 A Fast ScaleInvariant Algorithm for Nonnegative Least Squares with Nonnegative Data.
Summary: Nonnegative (linear) least square problems are a fundamental class of problems that is wellstudied in statistical learning and for which solvers have been implemented in many of the standard programming languages used within the machine learning community. The existing offtheshelf solvers view the nonnegativity constraint in these problems as an obstacle and, compared to unconstrained least squares, perform additional effort to address it. However, in many of the typical applications, the data itself is nonnegative as well, and we show that the nonnegativity in this case makes the problem easier. In particular, while the oracle complexity of unconstrained least squares problems necessarily scales with one of the data matrix constants (typically the spectral norm) and these problems are solved to additive error, we show that nonnegative least squares problems with nonnegative data are solvable to multiplicative error and with complexity that is independent of any matrix constants. The algorithm we introduce is accelerated and based on a primaldual perspective. We further show how to provably obtain linear convergence using adaptive restart coupled with our method and demonstrate its effectiveness on largescale data via numerical experiments.

20220303 Online Balanced Experimental Design.
Summary: e consider the experimental design problem in an online environment, an important practical task for reducing the variance of estimates in randomized experiments which allows for greater precision, and in turn, improved decision making. In this work, we present algorithms that build on recent advances in online discrepancy minimization which accommodate both arbitrary treatment probabilities and multiple treatments. The proposed algorithms are computational efficient, minimize covariate imbalance, and include randomization which enables robustness to misspecification. We provide worst case bounds on the expected mean squared error of the causal estimate and show that the proposed estimator is no worse than an implicit ridge regression, which are within a logarithmic factor of the best known results for offline experimental design. We conclude with a detailed simulation study showing favorable results relative to complete randomization as well as to offline methods for experimental design with time complexities exceeding our algorithm.

20220303 Data Augmentation as Feature Manipulation.
Summary: Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain invariance? In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. We find that data augmentation can alter the relative importance of various features, effectively making certain informative but hard to learn features more likely to be captured in the learning process. Importantly, we show that this effect is more pronounced for nonlinear models, such as neural networks. Our main contribution is a detailed analysis of data augmentation on the learning dynamic for a two layer convolutional neural network in the recently proposed multiview data model by AllenZhu and Li [2020]. We complement this analysis with further experimental evidence that data augmentation can be viewed as feature manipulation.

20220301 Private Convex Optimization via Exponential Mechanism.
Summary: In this paper, we study private optimization problems for nonsmooth convex functions $F(x)=\mathbb{E}_i f_i(x)$ on $\mathbb{R}^d$. We show that modifying the exponential mechanism by adding an $\ell_2^2$ regularizer to $F(x)$ and sampling from $\pi(x)\propto \exp(k(F(x)+\mu\x\_2^2/2))$ recovers both the known optimal empirical risk and population loss under $(\epsilon,\delta)$DP. Furthermore, we show how to implement this mechanism using $\widetilde{O}(n \min(d, n))$ queries to $f_i(x)$ for the DPSCO where $n$ is the number of samples/users and $d$ is the ambient dimension. We also give a (nearly) matching lower bound $\widetilde{\Omega}(n \min(d, n))$ on the number of evaluation queries. Our results utilize the following tools that are of independent interest: (1) We prove Gaussian Differential Privacy (GDP) of the exponential mechanism if the loss function is strongly convex and the perturbation is Lipschitz. Our privacy bound is \emph{optimal} as it includes the privacy of Gaussian mechanism as a special case and is proved using the isoperimetric inequality for strongly logconcave measures. (2) We show how to sample from $\exp(F(x)\mu \x\^2_2/2)$ for $G$Lipschitz $F$ with $\eta$ error in total variation (TV) distance using $\widetilde{O}((G^2/\mu) \log^2(d/\eta))$ unbiased queries to $F(x)$. This is the first sampler whose query complexity has \emph{polylogarithmic dependence} on both dimension $d$ and accuracy $\eta$.

20220222 Better Private Algorithms for Correlation Clustering.
Summary: In machine learning, correlation clustering is an important problem whose goal is to partition the individuals into groups that correlate with their pairwise similarities as much as possible. In this work, we revisit the correlation clustering under the differential privacy constraints. Particularly, we improve previous results and achieve an $\Tilde{O}(n^{1.5})$ additive error compared to the optimal cost in expectation on general graphs. As for unweighted complete graphs, we improve the results further and propose a more involved algorithm which achieves $\Tilde{O}(n \sqrt{\Delta^*})$ additive error, where $\Delta^*$ is the maximum degrees of positive edges among all nodes.

20220220 On Optimal Early Stopping: Overinformative versus Underinformative Parametrization.
Summary: Early stopping is a simple and widely used method to prevent overtraining neural networks. We develop theoretical results to reveal the relationship between the optimal early stopping time and model dimension as well as sample size of the dataset for certain linear models. Our results demonstrate two very different behaviors when the model dimension exceeds the number of features versus the opposite scenario. While most previous works on linear models focus on the latter setting, we observe that the dimension of the model often exceeds the number of features arising from data in common deep learning tasks and propose a model to study this setting. We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.

20220211 Distributionally Robust Data Join.
Summary: Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is $r_1$ away from the empirical distribution over the labeled dataset and $r_2$ away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features.

20220204 Optimal Spend Rate Estimation and Pacing for Ad Campaigns with Budgets.
Summary: Online ad platforms offer budget management tools for advertisers that aim to maximize the number of conversions given a budget constraint. As the volume of impressions, conversion rates and prices vary over time, these budget management systems learn a spend plan (to find the optimal distribution of budget over time) and run a pacing algorithm which follows the spend plan. This paper considers two models for impressions and competition that varies with time: a) an episodic model which exhibits stationarity in each episode, but each episode can be arbitrarily different from the next, and b) a model where the distributions of prices and values change slowly over time. We present the first learning theoretic guarantees on both the accuracy of spend plans and the resulting endtoend budget management system. We present four main results: 1) for the episodic setting we give sample complexity bounds for the spend rate prediction problem: given $n$ samples from each episode, with high probability we have $\widehat{\rho}_e  \rho_e \leq \tilde{O}(\frac{1}{n^{1/3}})$ where $\rho_e$ is the optimal spend rate for the episode, $\widehat{\rho}_e$ is the estimate from our algorithm, 2) we extend the algorithm of Balseiro and Gur (2017) to operate on varying, approximate spend rates and show that the resulting combined system of optimal spend rate estimation and online pacing algorithm for episodic settings has regret that vanishes in number of historic samples $n$ and the number of rounds $T$, 3) for nonepisodic but slowlychanging distributions we show that the same approach approximates the optimal bidding strategy up to a factor dependent on the rateofchange of the distributions and 4) we provide experiments showing that our algorithm outperforms both static spend plans and nonpacing across a wide variety of settings.

20220203 Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space.
Summary: We demonstrate for the first time that illconditioned, nonsmooth, constrained distributions in very high dimension, upwards of 100,000, can be sampled efficiently $\textit{in practice}$. Our algorithm incorporates constraints into the Riemannian version of Hamiltonian Monte Carlo and maintains sparsity. This allows us to achieve a mixing rate independent of smoothness and condition numbers. On benchmark data sets in systems biology and linear programming, our algorithm outperforms existing packages by orders of magnitude. In particular, we achieve a 1,000fold speedup for sampling from the largest published human metabolic network (RECON3D). Our package has been incorporated into the COBRA toolbox.

20220201 A Criterion for Decoding on the BSC.
Summary: We present an approach to showing that a linear code is resilient to random errors. We use this approach to obtain decoding results for both transitive codes and ReedMuller codes. We give three kinds of results about linear codes in general, and transitive linear codes in particular. 1) We give a tight bound on the weight distribution of every transitive linear code $C \subseteq \mathbb{F}_2^N$: $\Pr_{c \in C}[c = \alpha N] \leq 2^{(1h(\alpha)) \mathsf{dim}(C)}$. 2) We give a criterion that certifies that a linear code $C$ can be decoded on the binary symmetric channel. Let $K_s(x)$ denote the Krawtchouk polynomial of degree $s$, and let $C^\perp$ denote the dual code of $C$. We show that bounds on $\mathbb{E}_{c \in C^{\perp}}[ K_{\epsilon N}(c)^2]$ imply that $C$ recovers from errors on the binary symmetric channel with parameter $\epsilon$. Weaker bounds can be used to obtain listdecoding results using similar methods. One consequence of our criterion is that whenever the weight distribution of $C^\perp$ is sufficiently close to the binomial distribution in some interval around $\frac{N}{2}$, $C$ is resilient to $\epsilon$errors. 3) We combine known estimates for the Krawtchouk polynomials with our weight bound for transitive codes, and with known weight bounds for ReedMuller codes, to obtain listdecoding results for both these families of codes. In some regimes, our bounds for ReedMuller codes achieve the informationtheoretic optimal tradeoff between rate and list size.

20220128 Electra: Conditional Generative Model based PredicateAware Query Approximation.
Summary: The goal of Approximate Query Processing (AQP) is to provide very fast but "accurate enough" results for costly aggregate queries thereby improving user experience in interactive exploration of large datasets. Recently proposed MachineLearning based AQP techniques can provide very low latency as query execution only involves model inference as compared to traditional query processing on database clusters. However, with increase in the number of filtering predicates(WHERE clauses), the approximation error significantly increases for these methods. Analysts often use queries with a large number of predicates for insights discovery. Thus, maintaining low approximation error is important to prevent analysts from drawing misleading conclusions. In this paper, we propose ELECTRA, a predicateaware AQP system that can answer analyticsstyle queries with a large number of predicates with much smaller approximation errors. ELECTRA uses a conditional generative model that learns the conditional distribution of the data and at runtime generates a small (~1000 rows) but representative sample, on which the query is executed to compute the approximate result. Our evaluations with four different baselines on three realworld datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.

20220108 Shortstep Methods Are Not Strongly PolynomialTime.
Summary: Shortstep methods are an important class of algorithms for solving convex constrained optimization problems. In this short paper, we show that under very mild assumptions on the selfconcordant barrier and the width of the $\ell_2$neighbourhood, any shortstep interiorpoint method is not strongly polynomialtime.

20220104 Anticoncentration and the Exact GapHamming Problem.
Summary: We prove anticoncentration bounds for the inner product of two independent random vectors, and use these bounds to prove lower bounds in communication complexity. We show that if $A,B$ are subsets of the cube $\{\pm 1\}^n$ with $A \cdot B \geq 2^{1.01 n}$, and $X \in A$ and $Y \in B$ are sampled independently and uniformly, then the inner product $\langle X,Y \rangle$ takes on any fixed value with probability at most $O(1/\sqrt{n})$. In fact, we prove the following stronger "smoothness" statement: $$ \max_{k } \big \Pr[\langle X,Y \rangle = k]  \Pr[\langle X,Y \rangle = k+4]\big \leq O(1/n).$$ We use these results to prove that the exact gaphamming problem requires linear communication, resolving an open problem in communication complexity. We also conclude anticoncentration for structured distributions with low entropy. If $x \in \mathcal{Z}^n$ has no zero coordinates, and $B \subseteq \{\pm 1\}^n$ corresponds to a subspace of $\mathcal{F}_2^n$ of dimension $0.51n$, then $\max_k \Pr[\langle x,Y \rangle = k] \leq O(\sqrt{\ln (n)/n})$.

20211230 Deniable Encryption in a Quantum World.
Summary: (Sender)Deniable encryption provides a very strong privacy guarantee: a sender who is coerced by an attacker into "opening" their ciphertext afterthefact is able to generate "fake" local random choices that are consistent with any plaintext of their choice. The only known fullyefficient constructions of publickey deniable encryption rely on indistinguishability obfuscation (iO) (which currently can only be based on subexponential hardness assumptions). In this work, we study (sender)deniable encryption in a setting where the encryption procedure is a quantum algorithm, but the ciphertext is classical. First, we propose a quantum analog of the classical definition in this setting. We give a fully efficient construction satisfying this definition, assuming the quantum hardness of the Learning with Errors (LWE) problem. Second, we show that quantum computation unlocks a fundamentally stronger form of deniable encryption, which we call perfect unexplainability. The primitive at the heart of unexplainability is a quantum computation for which there is provably no efficient way, such as exhibiting the "history of the computation", to establish that the output was indeed the result of the computation. We give a construction which is secure in the random oracle model, assuming the quantum hardness of LWE. Crucially, this notion implies a form of protection against coercion "beforethefact", a property that is impossible to achieve classically.

20211223 Analysis of Langevin Monte Carlo from Poincaré to LogSobolev.
Summary: Classically, the continuoustime Langevin diffusion converges exponentially fast to its stationary distribution $\pi$ under the sole assumption that $\pi$ satisfies a Poincar\'e inequality. Using this fact to provide guarantees for the discretetime Langevin Monte Carlo (LMC) algorithm, however, is considerably more challenging due to the need for working with chisquared or R\'enyi divergences, and prior works have largely focused on strongly logconcave targets. In this work, we provide the first convergence guarantees for LMC assuming that $\pi$ satisfies either a Lata{\l}aOleszkiewicz or modified logSobolev inequality, which interpolates between the Poincar\'e and logSobolev settings. Unlike prior works, our results allow for weak smoothness and do not require convexity or dissipativity conditions.

20211213 A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions.
Summary: Zhang et al. introduced a novel modification of Goldstein's classical subgradient method, with an efficiency guarantee of $O(\varepsilon^{4})$ for minimizing Lipschitz functions. Their work, however, makes use of a nonstandard subgradient oracle model and requires the function to be directionally differentiable. In this paper, we show that both of these assumptions can be dropped by simply adding a small random perturbation in each step of their algorithm. The resulting method works on any Lipschitz function whose value and gradient can be evaluated at points of differentiability. We additionally present a new cutting plane algorithm that achieves better efficiency in low dimensions: $O(d\varepsilon^{3})$ for Lipschitz functions and $O(d\varepsilon^{2})$ for those that are weakly convex.

20211201 Faster Maxflow via Improved Dynamic Spectral Vertex Sparsifiers.
Summary: We make several advances broadly related to the maintenance of electrical flows in weighted graphs undergoing dynamic resistance updates, including: 1. More efficient dynamic spectral vertex sparsification, achieved by faster length estimation of random walks in weighted graphs using Morris counters [Morris 1978, NelsonYu 2020]. 2. A direct reduction from detecting edges with large energy in dynamic electric flows to dynamic spectral vertex sparsifiers. 3. A procedure for turning algorithms for estimating a sequence of vectors under updates from an oblivious adversary to one that tolerates adaptive adversaries via the Gaussianmechanism from differential privacy. Combining these pieces with modifications to prior robust interior point frameworks gives an algorithm that on graphs with $m$ edges computes a mincost flow with edge costs and capacities in $[1, U]$ in time $\widetilde{O}(m^{3/21/58} \log^2 U)$. In prior and independent work, [AxiotisM\k{a}dryVladu FOCS 2021] also obtained an improved algorithm for sparse mincost flows on capacitated graphs. Our algorithm implies a $\widetilde{O}(m^{3/21/58} \log U)$ time maxflow algorithm, improving over the $\widetilde{O}(m^{3/21/328}\log U)$ time maxflow algorithm of [GaoLiuPeng FOCS 2021].

20211129 Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes.
Summary: In this paper, we introduce the online and streaming MAP inference and learning problems for Nonsymmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a singlepass over the data as well as sublinear memory. The online setting has an additional requirement of maintaining a valid solution at any point in time. For solving these new problems, we propose algorithms with theoretical guarantees, evaluate them on several realworld datasets, and show that they give comparable performance to stateoftheart offline algorithms that store the entire data in memory and take multiple passes over it.

20211124 Matroid Partition Property and the Secretary Problem.
Summary: A matroid $\mathcal{M}$ on a set $E$ of elements has the $\alpha$partition property, for some $\alpha>0$, if it is possible to (randomly) construct a partition matroid $\mathcal{P}$ on (a subset of) elements of $\mathcal{M}$ such that every independent set of $\mathcal{P}$ is independent in $\mathcal{M}$ and for any weight function $w:E\to\mathbb{R}_{\geq 0}$, the expected value of the optimum of the matroid secretary problem on $\mathcal{P}$ is at least an $\alpha$fraction of the optimum on $\mathcal{M}$. We show that the complete binary matroid, ${\cal B}_d$ on $\mathbb{F}_2^d$ does not satisfy the $\alpha$partition property for any constant $\alpha>0$ (independent of $d$). Furthermore, we refute a recent conjecture of B\'erczi, Schwarcz, and Yamaguchi by showing the same matroid is $2^d/d$colorable but cannot be reduced to an $\alpha 2^d/d$colorable partition matroid for any $\alpha$ that is sublinear in $d$.

20211121 Multiscale entropic regularization for MTS on general metric spaces.
Summary: We present an $O((\log n)^2)$competitive algorithm for metrical task systems (MTS) on any $n$point metric space that is also $1$competitive for service costs. This matches the competitive ratio achieved by Bubeck, Cohen, Lee, and Lee (2019) and the refined competitive ratios obtained by Coester and Lee (2019). Those algorithms work by first randomly embedding the metric space into an ultrametric and then solving MTS there. In contrast, our algorithm is cast as regularized gradient descent where the regularizer is a multiscale metric entropy defined directly on the metric space. This answers an open question of Bubeck (Highlights of Algorithms, 2019).

20211104 A New Framework for Matrix Discrepancy: Partial Coloring Bounds via Mirror Descent.
Summary: Motivated by the Matrix Spencer conjecture, we study the problem of finding signed sums of matrices with a small matrix norm. A wellknown strategy to obtain these signs is to prove, given matrices $A_1, \dots, A_n \in \mathbb{R}^{m \times m}$, a Gaussian measure lower bound of $2^{O(n)}$ for a scaling of the discrepancy body $\{x \in \mathbb{R}^n: \ \sum_{i=1}^n x_i A_i\ \leq 1\}$. We show this is equivalent to covering its polar with $2^{O(n)}$ translates of the cube $\frac{1}{n} B^n_\infty$, and construct such a cover via mirror descent. As applications of our framework, we show: $\bullet$ Matrix Spencer for LowRank Matrices. If the matrices satisfy $\A_i\_{\mathrm{op}} \leq 1$ and $\mathrm{rank}(A_i) \leq r$, we can efficiently find a coloring $x \in \{\pm 1\}^n$ with discrepancy $\\sum_{i=1}^n x_i A_i \_{\mathrm{op}} \lesssim \sqrt{n \log (\min(rm/n, r))}$. This improves upon the naive $O(\sqrt{n \log r})$ bound for random coloring and proves the matrix Spencer conjecture when $r m \leq n$. $\bullet$ Matrix Spencer for Block Diagonal Matrices. For block diagonal matrices with $\A_i\_{\mathrm{op}} \leq 1$ and block size $h$, we can efficiently find a coloring $x \in \{\pm 1\}^n$ with $\\sum_{i=1}^n x_i A_i \_{\mathrm{op}} \lesssim \sqrt{n \log (hm/n)}$. Using our proof, we reduce the matrix Spencer conjecture to the existence of a $O(\log(m/n))$ quantum relative entropy net on the spectraplex. $\bullet$ Matrix Discrepancy for Schatten Norms. We generalize our discrepancy bound for matrix Spencer to Schatten norms $2 \le p \leq q$. Given $\A_i\_{S_p} \leq 1$ and $\mathrm{rank}(A_i) \leq r$, we can efficiently find a partial coloring $x \in [1,1]^n$ with $\{i : x_i = 1\} \ge n/2$ and $\\sum_{i=1}^n x_i A_i\_{S_q} \lesssim \sqrt{n \min(p, \log(rk))} \cdot k^{1/p1/q}$, where $k := \min(1,m/n)$.

20211104 Exact Representation of Sparse Networks with Symmetric Nonnegative Embeddings.
Summary: Many models for undirected graphs are based on factorizing the graph's adjacency matrix; these models find a vector representation of each node such that the predicted probability of a link between two nodes increases with the similarity (dot product) of their associated vectors. Recent work has shown that these models are unable to capture key structures in realworld graphs, particularly heterophilous structures, wherein links occur between dissimilar nodes. In contrast, a factorization with two vectors per node, based on logistic principal components analysis (LPCA), has been proven not only to represent such structures, but also to provide exact lowrank factorization of any graph with bounded max degree. However, this bound has limited applicability to realworld networks, which often have power law degree distributions with high max degree. Further, the LPCA model lacks interpretability since its asymmetric factorization does not reflect the undirectedness of the graph. We address these issues in two ways. First, we prove a new bound for the LPCA model in terms of arboricity rather than max degree; this greatly increases the bound's applicability to many sparse realworld networks. Second, we propose an alternative graph model whose factorization is symmetric and nonnegative, which allows for link predictions to be interpreted in terms of node clusters. We show that the bounds for exact representation in the LPCA model extend to our new model. On the empirical side, our model is optimized effectively on realworld graphs with gradient descent on a crossentropy loss. We demonstrate its effectiveness on a variety of foundational tasks, such as community detection and link prediction.

20211102 Improved Iteration Complexities for Overconstrained $p$Norm Regression.
Summary: In this paper we obtain improved iteration complexities for solving $\ell_p$ regression. We provide methods which given any fullrank $\mathbf{A} \in \mathbb{R}^{n \times d}$ with $n \geq d$, $b \in \mathbb{R}^n$, and $p \geq 2$ solve $\min_{x \in \mathbb{R}^d} \left\\mathbf{A} x  b\right\_p$ to high precision in time dominated by that of solving $\widetilde{O}_p(d^{\frac{p2}{3p2}})$ linear systems in $\mathbf{A}^\top \mathbf{D} \mathbf{A}$ for positive diagonal matrices $\mathbf{D}$. This improves upon the previous best iteration complexity of $\widetilde{O}_p(n^{\frac{p2}{3p2}})$ (Adil, Kyng, Peng, Sachdeva 2019). As a corollary, we obtain an $\widetilde{O}(d^{1/3}\epsilon^{2/3})$ iteration complexity for approximate $\ell_\infty$ regression. Further, for $q \in (1, 2]$ and dual norm $q = p/(p1)$ we provide an algorithm that solves $\ell_q$ regression in $\widetilde{O}(d^{\frac{p2}{2p2}})$ iterations. To obtain this result we analyze row reweightings (closely inspired by $\ell_p$norm Lewis weights) which allow a closer connection between $\ell_2$ and $\ell_p$ regression. We provide adaptations of two different iterative optimization frameworks which leverage this connection and yield our results. The first framework is based on iterative refinement and multiplicative weights based width reduction and the second framework is based on highly smooth acceleration. Both approaches yield $\widetilde{O}_p(d^{\frac{p2}{3p2}})$ iteration methods but the second has a polynomial dependence on $p$ (as opposed to the exponential dependence of the first algorithm) and provides a new alternative to the previous stateoftheart methods for $\ell_p$ regression for large $p$.

20211029 Computing Lewis Weights to High Precision.
Summary: We present an algorithm for computing approximate $\ell_p$ Lewis weights to high precision. Given a fullrank $\mathbf{A} \in \mathbb{R}^{m \times n}$ with $m \geq n$ and a scalar $p>2$, our algorithm computes $\epsilon$approximate $\ell_p$ Lewis weights of $\mathbf{A}$ in $\widetilde{O}_p(\log(1/\epsilon))$ iterations; the cost of each iteration is linear in the input size plus the cost of computing the leverage scores of $\mathbf{D}\mathbf{A}$ for diagonal $\mathbf{D} \in \mathbb{R}^{m \times m}$. Prior to our work, such a computational complexity was known only for $p \in (0, 4)$ [CohenPeng2015], and combined with this result, our work yields the first polylogarithmicdepth polynomialwork algorithm for the problem of computing $\ell_p$ Lewis weights to high precision for all constant $p > 0$. An important consequence of this result is also the first polylogarithmicdepth polynomialwork algorithm for computing a nearly optimal selfconcordant barrier for a polytope.

20211013 Differentially Private Finetuning of Language Models.
Summary: We give simpler, sparser, and faster algorithms for differentially private finetuning of largescale pretrained language models, which achieve the stateoftheart privacy versus utility tradeoffs on many standard NLP tasks. We propose a metaframework for this problem, inspired by the recent success of highly parameterefficient methods for finetuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of nonprivate models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTaLarge and $83.5\%$ using RoBERTaBase with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTaLarge achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately finetuning with DART, GPT2Small, GPT2Medium, GPT2Large, and GPT2XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e5) whereas the nonprivate baseline is $48.1$. All our experiments suggest that larger models are better suited for private finetuning: while they are well known to achieve superior accuracy nonprivately, we find that they also better maintain their accuracy when privacy is introduced.

20211005 Approximate $\mathrm{CVP}$ in time $2^{0.802 \, n}$  now in any norm!.
Summary: We show that a constant factor approximation of the shortest and closest lattice vector problem in any norm can be computed in time $2^{0.802\, n}$. This contrasts the corresponding $2^n$ time, (gap)SETH based lower bounds for these problems that even apply for small constant approximation. For both problems, $\mathrm{SVP}$ and $\mathrm{CVP}$, we reduce to the case of the Euclidean norm. A key technical ingredient in that reduction is a twist of Milman's construction of an $M$ellipsoid which approximates any symmetric convex body $K$ with an ellipsoid $\mathcal{E}$ so that $2^{\varepsilon n}$ translates of a constant scaling of $\mathcal{E}$ can cover $K$ and vice versa.

20210919 Multiscale Manifold Warping.
Summary: Many realworld applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and humanrobot coordination. Dynamic Time Warping (DTW) is a popular alignment method, but can fail on highdimensional realworld data where the dimensions of aligned sequences are often unequal. In this paper, we show that exploiting the multiscale manifold latent structure of realworld data can yield improved alignment. We introduce a novel framework called Warping on Wavelets (WOW) that integrates DTW with a a multiscale manifold learning framework called Diffusion Wavelets. We present a theoretical analysis of the WOW family of algorithms and show that it outperforms previous state of the art methods, such as canonical time warping (CTW) and manifold warping, on several realworld datasets.

20210827 Auctions and Peer Prediction for Academic Peer Review.
Summary: Peer reviewed publications are considered the gold standard in certifying and disseminating ideas that a research community considers valuable. However, we identify two major drawbacks of the current system: (1) the overwhelming demand for reviewers due to a large volume of submissions, and (2) the lack of incentives for reviewers to participate and expend the necessary effort to provide highquality reviews. In this work, we adopt a mechanismdesign approach to propose improvements to the peer review process, tying together the paper submission and review processes and simultaneously incentivizing highquality submissions and reviews. In the submission stage, authors participate in a VCG auction for review slots by submitting their papers along with a bid that represents their expected value for having their paper reviewed. For the reviewing stage, we propose a novel peer prediction mechanism (HDIPP) building on recent work in the information elicitation literature, which incentivizes participating reviewers to provide honest and effortful reviews. The revenue raised in the submission stage auction is used to pay reviewers based on the quality of their reviews in the reviewing stage.

20210818 A Tighter Relation Between Hereditary Discrepancy and Determinant Lower Bound.
Summary: In seminal work, Lov\'asz, Spencer, and Vesztergombi [European J. Combin., 1986] proved a lower bound for the hereditary discrepancy of a matrix $A \in \mathbb{R}^{m \times n}$ in terms of the maximum $\det(B)^{1/k}$ over all $k \times k$ submatrices $B$ of $A$. We show algorithmically that this determinant lower bound can be off by at most a factor of $O(\sqrt{\log (m) \cdot \log (n)})$, improving over the previous bound of $O(\log(mn) \cdot \sqrt{\log (n)})$ given by Matou\v{s}ek [Proc. of the AMS, 2013]. Our result immediately implies $\mathrm{herdisc}(\mathcal{F}_1 \cup \mathcal{F}_2) \leq O(\sqrt{\log (m) \cdot \log (n)}) \cdot \max(\mathrm{herdisc}(\mathcal{F}_1), \mathrm{herdisc}(\mathcal{F}_2))$, for any two set systems $\mathcal{F}_1, \mathcal{F}_2$ over $[n]$ satisfying $\mathcal{F}_1 \cup \mathcal{F}_2 = m$. Our bounds are tight up to constants when $m = O(\mathrm{poly}(n))$ due to a construction of P\'alv\"olgyi [Discrete Comput. Geom., 2010] or the counterexample to Beck's three permutation conjecture by Newman, Neiman and Nikolov [FOCS, 2012].

20210810 Optimal learning of quantum Hamiltonians from hightemperature Gibbs states.
Summary: We study the problem of learning a Hamiltonian $H$ to precision $\varepsilon$, supposing we are given copies of its Gibbs state $\rho=\exp(\beta H)/\operatorname{Tr}(\exp(\beta H))$ at a known inverse temperature $\beta$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (Nature Physics, 2021, arXiv:2004.07266) recently studied the sample complexity (number of copies of $\rho$ needed) of this problem for geometrically local $N$qubit Hamiltonians. In the hightemperature (low $\beta$) regime, their algorithm has sample complexity poly$(N, 1/\beta,1/\varepsilon)$ and can be implemented with polynomial, but suboptimal, time complexity. In this paper, we study the same question for a more general class of Hamiltonians. We show how to learn the coefficients of a Hamiltonian to error $\varepsilon$ with sample complexity $S = O(\log N/(\beta\varepsilon)^{2})$ and time complexity linear in the sample size, $O(S N)$. Furthermore, we prove a matching lower bound showing that our algorithm's sample complexity is optimal, and hence our time complexity is also optimal. In the appendix, we show that virtually the same algorithm can be used to learn $H$ from a realtime evolution unitary $e^{it H}$ in a small $t$ regime with similar sample and time complexity.

20210810 Tutorial on the Robust Interior Point Method.
Summary: We give a short, selfcontained proof of the interior point method and its robust version.

20210720 Nonexistence of annular separators in geometric graphs.
Summary: Benjamini and Papasoglou (2011) showed that planar graphs with uniform polynomial volume growth admit $1$dimensional annular separators: The vertices at graph distance $R$ from any vertex can be separated from those at distance $2R$ by removing at most $O(R)$ vertices. They asked whether geometric $d$dimensional graphs with uniform polynomial volume growth similarly admit $(d1)$dimensional annular separators when $d > 2$. We show that this fails in a strong sense: For any $d \geq 3$ and every $s \geq 1$, there is a collection of interiordisjoint spheres in $\mathbb{R}^d$ whose tangency graph $G$ has uniform polynomial growth, but such that all annular separators in $G$ have cardinality at least $R^s$.

20210713 Tight bounds on the Fourier growth of bounded functions on the hypercube.
Summary: We give tight bounds on the degree $\ell$ homogenous parts $f_\ell$ of a bounded function $f$ on the cube. We show that if $f: \{\pm 1\}^n \rightarrow [1,1]$ has degree $d$, then $\ f_\ell \_\infty$ is bounded by $d^\ell/\ell!$, and $\ \hat{f}_\ell \_1$ is bounded by $d^\ell e^{\binom{\ell+1}{2}} n^{\frac{\ell1}{2}}$. We describe applications to pseudorandomness and learning theory. We use similar methods to generalize the classical Pisier's inequality from convex analysis. Our analysis involves properties of realrooted polynomials that may be useful elsewhere.

20210713 Multitoken Markov Game with Switching Costs.
Summary: We study a general Markov game with metric switching costs: in each round, the player adaptively chooses one of several Markov chains to advance with the objective of minimizing the expected cost for at least $k$ chains to reach their target states. If the player decides to play a different chain, an additional switching cost is incurred. The special case in which there is no switching cost was solved optimally by Dumitriu, Tetali, and Winkler~\cite{DTW03} by a variant of the celebrated Gittins Index for the classical multiarmed bandit (MAB) problem with Markovian rewards \cite{Git74,Git79}. However, for Markovian multiarmed bandit with nontrivial switching cost, even if the switching cost is a constant, the classic paper by Banks and Sundaram \cite{BS94} showed that no index strategy can be optimal. In this paper, we complement their result and show there is a simple index strategy that achieves a constant approximation factor if the switching cost is constant and $k=1$. To the best of our knowledge, this index strategy is the first strategy that achieves a constant approximation factor for a general Markovian MAB variant with switching costs. For the general metric, we propose a more involved constantfactor approximation algorithm, via a nontrivial reduction to the stochastic $k$TSP problem, in which a Markov chain is approximated by a random variable. Our analysis makes extensive use of various interesting properties of the Gittins index.

20210712 Hidden Cosets and Applications to Unclonable Cryptography.
Summary: In this work, we study a generalization of hidden subspace states to hidden coset states (first introduced by Aaronson and Christiano [STOC '12]). This notion was considered independently by Vidick and Zhang [Eurocrypt '21], in the context of proofs of quantum knowledge from quantum money schemes. We explore unclonable properties of coset states and several applications:  We show that assuming indistinguishability obfuscation (iO), hidden coset states possess a certain direct product hardness property, which immediately implies a tokenized signature scheme in the plain model. Previously, it was known only relative to an oracle, from a work of BenDavid and Sattath [QCrypt '17].  Combining a tokenized signature scheme with extractable witness encryption, we give a construction of an unclonable decryption scheme in the plain model. The latter primitive was recently proposed by Georgiou and Zhandry [ePrint '20], who gave a construction relative to a classical oracle.  We conjecture that coset states satisfy a certain natural (informationtheoretic) monogamyofentanglement property. Assuming this conjecture is true, we remove the requirement for extractable witness encryption in our unclonable decryption construction, by relying instead on computeandcompare obfuscation for the class of unpredictable distributions. This conjecture was later proved by Culf and Vidick in a followup work.  Finally, we give a construction of a copyprotection scheme for pseudorandom functions (PRFs) in the plain model. Our scheme is secure either assuming iO, OWF, and extractable witness encryption, or assuming iO, OWF, computeandcompare obfuscation for the class of unpredictable distributions, and the conjectured monogamy property mentioned above. This is the first example of a copyprotection scheme with provable security in the plain model for a class of functions that is not evasive.

20210709 Optimal Space and Time for Streaming Pattern Matching.
Summary: In this work, we study longest common substring, pattern matching, and wildcard pattern matching in the asymmetric streaming model. In this streaming model, we have random access to one string and streaming access to the other one. We present streaming algorithms with provable guarantees for these three fundamental problems. In particular, our algorithms for pattern matching improve the upper bound and beat the unconditional lower bounds on the memory of randomized and deterministic streaming algorithms. In addition to this, we present algorithms for wildcard pattern matching in the asymmetric streaming model that have optimal space and time.

20210628 The Convergence Rate of SGD's Final Iterate: Analysis on Dimension Dependence.
Summary: Stochastic Gradient Descent (SGD) is among the simplest and most popular methods in optimization. The convergence rate for SGD has been extensively studied and tight analyses have been established for the running average scheme, but the suboptimality of the final iterate is still not wellunderstood. shamir2013stochastic gave the best known upper bound for the final iterate of SGD minimizing nonsmooth convex functions, which is $O(\log T/\sqrt{T})$ for Lipschitz convex functions and $O(\log T/ T)$ with additional assumption on strongly convexity. The best known lower bounds, however, are worse than the upper bounds by a factor of $\log T$. harvey2019tight gave matching lower bounds but their construction requires dimension $d= T$. It was then asked by koren2020open how to characterize the finaliterate convergence of SGD in the constant dimension setting. In this paper, we answer this question in the more general setting for any $d\leq T$, proving $\Omega(\log d/\sqrt{T})$ and $\Omega(\log d/T)$ lower bounds for the suboptimality of the final iterate of SGD in minimizing nonsmooth Lipschitz convex and strongly convex functions respectively with standard step size schedules. Our results provide the first general dimension dependent lower bound on the convergence of SGD's final iterate, partially resolving a COLT open question raised by koren2020open. We also present further evidence to show the correct rate in one dimension should be $\Theta(1/\sqrt{T})$, such as a proof of a tight $O(1/\sqrt{T})$ upper bound for onedimensional special cases in settings more general than koren2020open.

20210622 Robust Regression Revisited: Acceleration and Improved Estimation Rates.
Summary: We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a firstorder method using biased gradient queries, or the Sever framework of Diakonikolas et. al., an iterative outlierremoval method calling a stationary point finder. We present nearlylinear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the stateoftheart. For the general case of smooth GLMs (e.g. logistic regression), we show that the robust gradient descent framework of Prasad et. al. can be accelerated, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e.g. support vector machines), answering several open questions in the literature. For the wellstudied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearlylinear time algorithms. Interestingly, our method starts with an identifiability proof introduced in the context of the sumofsquares algorithm of Bakshi and Prasad, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sampleefficient algorithm under fewer distributional assumptions.

20210617 Stochastic BiasReduced Gradient Methods.
Summary: We develop a new primitive for stochastic optimization: a lowbias, lowcost estimator of the minimizer $x_\star$ of any Lipschitz stronglyconvex function. In particular, we use a multilevel MonteCarlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $\delta$, variance $O(\log(1/\delta))$, and an expected sampling cost of $O(\log(1/\delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the MoreauYoshida envelope of any Lipschitz convex function, allowing us to perform dimensionfree randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover stateoftheart rates for projectionefficient and gradientefficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly lineartime, optimalutility, differentiallyprivate nonsmooth stochastic optimization method.

20210610 Lower Bounds on Metropolized Sampling Methods for WellConditioned Distributions.
Summary: We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolisadjusted Langevin algorithm (MALA) and multistep Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to wellconditioned distributions. Our main result is a nearlytight lower bound of $\widetilde{\Omega}(\kappa d)$ on the mixing time of MALA from an exponentially warm start, matching a line of algorithmic results up to logarithmic factors and answering an open question of Chewi et. al. We also show that a polynomial dependence on dimension is necessary for the relaxation time of HMC under any number of leapfrog steps, and bound the gains achievable by changing the step count. Our HMC analysis draws upon a novel connection between leapfrog integration and Chebyshev polynomials, which may be of independent interest.

20210607 A Matrix TrickleDown Theorem on Simplicial Complexes and Applications to Sampling Colorings.
Summary: We show that the natural Glauber dynamics mixes rapidly and generates a random proper edgecoloring of a graph with maximum degree $\Delta$ whenever the number of colors is at least $q\geq (\frac{10}{3} + \epsilon)\Delta$, where $\epsilon>0$ is arbitrary and the maximum degree satisfies $\Delta \geq C$ for a constant $C = C(\epsilon)$ depending only on $\epsilon$. For edgecolorings, this improves upon prior work \cite{Vig99, CDMPP19} which show rapid mixing when $q\geq (\frac{11}{3}\epsilon_0 ) \Delta$, where $\epsilon_0 \approx 10^{5}$ is a small fixed constant. At the heart of our proof, we establish a matrix trickledown theorem, generalizing Oppenheim's influential result, as a new technique to prove that a high dimensional simplical complex is a local spectral expander.

20210605 Numerical Composition of Differential Privacy.
Summary: We give a fast algorithm to optimally compose privacy guarantees of differentially private (DP) algorithms to arbitrary accuracy. Our method is based on the notion of privacy loss random variables to quantify the privacy loss of DP algorithms. The running time and memory needed for our algorithm to approximate the privacy curve of a DP algorithm composed with itself $k$ times is $\tilde{O}(\sqrt{k})$. This improves over the best prior method by Koskela et al. (2020) which requires $\tilde{\Omega}(k^{1.5})$ running time. We demonstrate the utility of our algorithm by accurately computing the privacy loss of DPSGD algorithm of Abadi et al. (2016) and showing that our algorithm speeds up the privacy computations by a few orders of magnitude compared to prior work, while maintaining similar accuracy.

20210601 Junta Distance Approximation with SubExponential Queries.
Summary: Leveraging tools of De, Mossel, and Neeman [FOCS, 2019], we show two different results pertaining to the \emph{tolerant testing} of juntas. Given blackbox access to a Boolean function $f:\{\pm1\}^{n} \to \{\pm1\}$, we give a $poly(k, \frac{1}{\varepsilon})$ query algorithm that distinguishes between functions that are $\gamma$close to $k$juntas and $(\gamma+\varepsilon)$far from $k'$juntas, where $k' = O(\frac{k}{\varepsilon^2})$. In the nonrelaxed setting, we extend our ideas to give a $2^{\tilde{O}(\sqrt{k/\varepsilon})}$ (adaptive) query algorithm that distinguishes between functions that are $\gamma$close to $k$juntas and $(\gamma+\varepsilon)$far from $k$juntas. To the best of our knowledge, this is the first subexponentialin$k$ query algorithm for approximating the distance of $f$ to being a $k$junta (previous results of Blais, Canonne, Eden, Levi, and Ron [SODA, 2018] and De, Mossel, and Neeman [FOCS, 2019] required exponentially many queries in $k$). Our techniques are Fourier analytical and make use of the notion of "normalized influences" that was introduced by Talagrand [AoP, 1994].

20210528 Lower Bounds for Differentially Private ERM: Unconstrained and NonEuclidean.
Summary: We consider the lower bounds of differentially private empirical risk minimization (DPERM) for convex functions in constrained/unconstrained cases with respect to the general $\ell_p$ norm beyond the $\ell_2$ norm considered by most of the previous works. We provide a simple blackbox reduction approach which can generalize lower bounds in constrained case to unconstrained case. For $(\epsilon,\delta)$DP, we achieve $\Omega(\frac{\sqrt{d \log(1/\delta)}}{\epsilon n})$ lower bounds for both constrained and unconstrained cases and any $\ell_p$ geometry where $p\geq 1$ by introducing a novel biased mean property for fingerprinting codes, where $n$ is the size of the dataset and $d$ is the dimension.

20210520 A (Slightly) Improved Bound on the Integrality Gap of the Subtour LP for TSP.
Summary: We show that for some $\epsilon > 10^{36}$ and any metric TSP instance, the max entropy algorithm returns a solution of expected cost at most $\frac{3}{2}\epsilon$ times the cost of the optimal solution to the subtour elimination LP. This implies that the integrality gap of the subtour LP is at most $\frac{3}{2}\epsilon$. This analysis also shows that there is a randomized $\frac{3}{2}\epsilon$ approximation for the 2edgeconnected multisubgraph problem, improving upon Christofides' algorithm.

20210518 Time and Query Optimal Quantum Algorithms Based on Decision Trees.
Summary: It has recently been shown that starting with a classical query algorithm (decision tree) and a guessing algorithm that tries to predict the query answers, we can design a quantum algorithm with query complexity $O(\sqrt{GT})$ where $T$ is the query complexity of the classical algorithm (depth of the decision tree) and $G$ is the maximum number of wrong answers by the guessing algorithm [arXiv:1410.0932, arXiv:1905.13095]. In this paper we show that, given some constraints on the classical algorithms, this quantum algorithm can be implemented in time $\tilde O(\sqrt{GT})$. Our algorithm is based on nonbinary span programs and their efficient implementation. We conclude that various graph theoretic problems including bipartiteness, cycle detection and topological sort can be solved in time $O(n^{3/2}\log n)$ and with $O(n^{3/2})$ quantum queries. Moreover, finding a maximal matching can be solved with $O(n^{3/2})$ quantum queries in time $O(n^{3/2}\log n)$, and maximum bipartite matching can be solved in time $O(n^2\log n)$.

20210504 Thinking Inside the Ball: NearOptimal Minimization of the Maximal Loss.
Summary: We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$. For nonsmooth functions, existing methods require $O(N\epsilon^{2})$ queries to a firstorder oracle to compute an $\epsilon$suboptimal point and $\tilde{O}(N\epsilon^{1})$ queries if the $f_i$ are $O(1/\epsilon)$smooth. We develop methods with improved complexity bounds of $\tilde{O}(N\epsilon^{2/3} + \epsilon^{8/3})$ in the nonsmooth case and $\tilde{O}(N\epsilon^{2/3} + \sqrt{N}\epsilon^{1})$ in the $O(1/\epsilon)$smooth case. Our methods consist of a recently proposed ball optimization oracle acceleration algorithm (which we refine) and a careful implementation of said oracle for the softmax function. We also prove an oracle complexity lower bound scaling as $\Omega(N\epsilon^{2/3})$, showing that our dependence on $N$ is optimal up to polylogarithmic factors.

20210504 Quantum Keylength Extension.
Summary: Should quantum computers become available, they will reduce the effective key length of basic secretkey primitives, such as blockciphers. To address this we will either need to use blockciphers which inherently have longer keys or use keylength extension techniques which employ a blockcipher to construct a more secure blockcipher that uses longer keys. We consider the latter approach and revisit the FX and double encryption constructions. Classically, FX is known to be secure, while double encryption is no more secure than single encryption due to a meetinthemiddle attack. We provide positive results, with concrete and tight bounds, for both of these constructions against quantum attackers in ideal models. For FX, we consider a partiallyquantum model, where the attacker has quantum access to the ideal primitive, but only classic access to FX. We provide two results for FX in this model. The first establishes the security of FX against nonadaptive attackers. The second establishes security against general adaptive attacks for a variant of FX using a random oracle in place of an ideal cipher. This result relies on the techniques of Zhandry (CRYPTO '19) for lazily sampling a quantum random oracle. An extension to perfectly lazily sampling a quantum random permutation, which would help resolve the adaptive security of standard FX, is an important but challenging open question. We introduce techniques for partiallyquantum proofs without relying on analyzing the classical and quantum oracles separately, which is common in existing work. This may be of broader interest. For double encryption we apply a technique of Tessaro and Thiruvengadam (TCC '18) to establish that security reduces to the difficulty of solving the list disjointness problem, which we are able to reduce through a chain of results to the known quantum difficulty of the element distinctness problem.

20210430 On the Hardness of Scheduling With NonUniform Communication Delays.
Summary: In the scheduling with nonuniform communication delay problem, the input is a set of jobs with precedence constraints. Associated with every precedence constraint between a pair of jobs is a communication delay, the time duration the scheduler has to wait between the two jobs if they are scheduled on different machines. The objective is to assign the jobs to machines to minimize the makespan of the schedule. Despite being a fundamental problem in theory and a consequential problem in practice, the approximability of scheduling problems with communication delays is not very well understood. One of the top ten open problems in scheduling theory, in the influential list by Schuurman and Woeginger and its latest update by Bansal, asks if the problem admits a constant factor approximation algorithm. In this paper, we answer the question in negative by proving that there is a logarithmic hardness for the problem under the standard complexity theory assumption that NPcomplete problems do not admit quasipolynomial time algorithms. Our hardness result is obtained using a surprisingly simple reduction from a problem that we call Unique Machine Precedence constraints Scheduling (UMPS). We believe that this problem is of central importance in understanding the hardness of many scheduling problems and conjecture that it is very hard to approximate. Among other things, our conjecture implies a logarithmic hardness of related machine scheduling with precedences, a longstanding open problem in scheduling theory and approximation algorithms.

20210329 Waveencoding and Shuffling Enables Rapid Time Resolved Structural Imaging.
Summary: T2Shuffling reconstructs multiple sharp T2weighted images from a single volumetric fast spinecho (3DFSE) scan. WaveCAIPI is a parallel imaging technique that achieves good reconstruction at high accelerations through additional sinusoidal gradients that induce a voxel spreading effect in the readout direction to better take advantage of coilsensitivity information. In this work, the Shuffling model in T2Shuffling is augmented with waveencoding to achieve higher acceleration capability. The resulting "WaveShuffling" approach is applied to 3DFSE and MagnetizationPrepared Rapid GradientEcho (MPRAGE) to achieve rapid, 1 mmisotropic resolution, timeresolved structural imaging.

20210329 Private Nonsmooth Empirical Risk Minimization and Stochastic Convex Optimization in Subquadratic Steps.
Summary: We study the differentially private Empirical Risk Minimization (ERM) and Stochastic Convex Optimization (SCO) problems for nonsmooth convex functions. We get a (nearly) optimal bound on the excess empirical risk and excess population loss with subquadratic gradient complexity. More precisely, our differentially private algorithm requires $O(\frac{N^{3/2}}{d^{1/8}}+ \frac{N^2}{d})$ gradient queries for optimal excess empirical risk, which is achieved with the help of subsampling and smoothing the function via convolution. This is the first subquadratic algorithm for the nonsmooth case when $d$ is super constant. As a direct application, using the iterative localization approach of Feldman et al. \cite{fkt20}, we achieve the optimal excess population loss for stochastic convex optimization problem, with $O(\min\{N^{5/4}d^{1/8},\frac{ N^{3/2}}{d^{1/8}}\})$ gradient queries. Our work makes progress towards resolving a question raised by Bassily et al. \cite{bfgt20}, giving first algorithms for private ERM and SCO with subquadratic steps. We note that independently Asi et al. \cite{afkt21} gave other algorithms for private ERM and SCO with subquadratic steps.

20210315 Counting and Sampling Perfect Matchings in Regular Expanding NonBipartite Graphs.
Summary: We show that the ratio of the number of near perfect matchings to the number of perfect matchings in $d$regular strong expander (nonbipartite) graphs, with $2n$ vertices, is a polynomial in $n$, thus the Jerrum and Sinclair Markov chain [JS89] mixes in polynomial time and generates an (almost) uniformly random perfect matching. Furthermore, we prove that such graphs have at least $\Omega(d)^n$ any perfect matchings, thus proving the LovaszPlummer conjecture [LP86] for this family of graphs.

20210308 Asymptotics of Ridge Regression in Convolutional Models.
Summary: Understanding generalization and estimation error of estimators for simple models such as linear and generalized linear models has attracted a lot of attention recently. This is in part due to an interesting observation made in machine learning community that highly overparameterized neural networks achieve zero training error, and yet they are able to generalize well over the test samples. This phenomenon is captured by the so called double descent curve, where the generalization error starts decreasing again after the interpolation threshold. A series of recent works tried to explain such phenomenon for simple models. In this work, we analyze the asymptotics of estimation error in ridge estimators for convolutional linear models. These convolutional inverse problems, also known as deconvolution, naturally arise in different fields such as seismology, imaging, and acoustics among others. Our results hold for a large class of input distributions that include i.i.d. features as a special case. We derive exact formulae for estimation error of ridge estimators that hold in a certain highdimensional regime. We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments.

20210225 Machine Unlearning via Algorithmic Stability.
Summary: We study the problem of machine unlearning and identify a notion of algorithmic stability, Total Variation (TV) stability, which we argue, is suitable for the goal of exact unlearning. For convex risk minimization problems, we design TVstable algorithms based on noisy Stochastic Gradient Descent (SGD). Our key contribution is the design of corresponding efficient unlearning algorithms, which are based on constructing a (maximal) coupling of Markov chains for the noisy SGD procedure. To understand the tradeoffs between accuracy and unlearning efficiency, we give upper and lower bounds on excess empirical and populations risk of TV stable algorithms for convex risk minimization. Our techniques generalize to arbitrary nonconvex functions, and our algorithms are differentially private as well.

20210219 NearOptimal Randomized Exploration for Tabular Markov Decision Processes.
Summary: We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernsteintype magnitude of noise, we obtain a worstcase $\widetilde{O}\left(H\sqrt{SAT}\right)$ regret bound for episodic timeinhomogeneous Markov Decision Process where $S$ is the size of state space, $A$ is the size of action space, $H$ is the planning horizon and $T$ is the number of interactions. This bound polynomially improves all existing bounds for algorithms based on randomized value functions, and for the first time, matches the $\Omega\left(H\sqrt{SAT}\right)$ lower bound up to logarithmic factors. Our result highlights that randomized exploration can be nearoptimal, which was previously achieved only by optimistic algorithms. To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

20210216 Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information.
Summary: Training and evaluation of fair classifiers is a challenging problem. This is partly due to the fact that most fairness metrics of interest depend on both the sensitive attribute information and label information of the data points. In many scenarios it is not possible to collect large datasets with such information. An alternate approach that is commonly used is to separately train an attribute classifier on data with sensitive attribute information, and then use it later in the ML pipeline to evaluate the bias of a given classifier. While such decoupling helps alleviate the problem of demographic scarcity, it raises several natural questions such as: how should the attribute classifier be trained?, and how should one use a given attribute classifier for accurate bias estimation? In this work we study this question from both theoretical and empirical perspectives. We first experimentally demonstrate that the test accuracy of the attribute classifier is not always correlated with its effectiveness in bias estimation for a downstream model. In order to further investigate this phenomenon, we analyze an idealized theoretical model and characterize the structure of the optimal classifier. Our analysis has surprising and counterintuitive implications where in certain regimes one might want to distribute the error of the attribute classifier as unevenly as possible among the different subgroups. Based on our analysis we develop heuristics for both training and using attribute classifiers for bias estimation in the data scarce regime. We empirically demonstrate the effectiveness of our approach on real and simulated data.

20210205 Fast and Memory Efficient Differentially PrivateSGD via JL Projections.
Summary: Differentially PrivateSGD (DPSGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks. This algorithm requires computation of persample gradients norms which is extremely slow and memory intensive in practice. In this paper, we present a new framework to design differentially private optimizers called DPSGDJL and DPAdamJL. Our approach uses JohnsonLindenstrauss (JL) projections to quickly approximate the persample gradient norms without exactly computing them, thus making the training time and memory requirements of our optimizers closer to that of their nonDP versions. Unlike previous attempts to make DPSGD faster which work only on a subset of network architectures or use compiler techniques, we propose an algorithmic solution which works for any network in a blackbox manner which is the main contribution of this paper. To illustrate this, on IMDb dataset, we train a Recurrent Neural Network (RNN) to achieve good privacyvsaccuracy tradeoff, while being significantly faster than DPSGD and with a similar memory footprint as nonprivate SGD. The privacy analysis of our algorithms is more involved than DPSGD, we use the recently proposed fDP framework of Dong et al. (2019) to prove privacy.

20210204 Online Discrepancy Minimization via Persistent SelfBalancing Walks.
Summary: We study the online discrepancy minimization problem for vectors in $\mathbb{R}^d$ in the oblivious setting where an adversary is allowed fix the vectors $x_1, x_2, \ldots, x_n$ in arbitrary order ahead of time. We give an algorithm that maintains $O(\sqrt{\log(nd/\delta)})$ discrepancy with probability $1\delta$, matching the lower bound given in [Bansal et al. 2020] up to an $O(\sqrt{\log \log n})$ factor in the highprobability regime. We also provide results for the weighted and multicolor versions of the problem.

20210124 BUTrace: A Permissionless Mobile System for PrivacyPreserving Intelligent Contact Tracing.
Summary: The coronavirus disease 2019 (COVID19) pandemic has caused an unprecedented health crisis for the global. Digital contact tracing, as a transmission intervention measure, has shown its effectiveness on pandemic control. Despite intensive research on digital contact tracing, existing solutions can hardly meet users' requirements on privacy and convenience. In this paper, we propose BUTrace, a novel permissionless mobile system for privacypreserving intelligent contact tracing based on QR code and NFC technologies. First, a user study is conducted to investigate and quantify the user acceptance of a mobile contact tracing system. Second, a decentralized system is proposed to enable contact tracing while protecting user privacy. Third, an intelligent behavior detection algorithm is designed to ease the use of our system. We implement BUTrace and conduct extensive experiments in several realworld scenarios. The experimental results show that BUTrace achieves a privacypreserving and intelligent mobile system for contact tracing without requesting location or other privacyrelated permissions.

20210115 Fundamental Tradeoffs in Distributionally Adversarial Training.
Summary: Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Paretooptimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as twolayer neural network with random firstlayer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Paretooptimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of twolayer neural network would affect this tradeoff.

20210115 An Improved Approximation Algorithm for the Minimum $k$Edge Connected MultiSubgraph Problem.
Summary: We give a randomized $1+\frac{5.06}{\sqrt{k}}$approximation algorithm for the minimum $k$edge connected spanning multisubgraph problem, $k$ECSM.

20210114 Minimum Cost Flows, MDPs, and $\ell_1$Regression in Nearly Linear Time for Dense Instances.
Summary: In this paper we provide new randomized algorithms with improved runtimes for solving linear programs with twosided constraints. In the special case of the minimum cost flow problem on $n$vertex $m$edge graphs with integer polynomiallybounded costs and capacities we obtain a randomized method which solves the problem in $\tilde{O}(m+n^{1.5})$ time. This improves upon the previous best runtime of $\tilde{O}(m\sqrt{n})$ (LeeSidford 2014) and, in the special case of unitcapacity maximum flow, improves upon the previous best runtimes of $m^{4/3+o(1)}$ (LiuSidford 2020, Kathuria 2020) and $\tilde{O}(m\sqrt{n})$ (LeeSidford 2014) for sufficiently dense graphs. For $\ell_1$regression in a matrix with $n$columns and $m$rows we obtain a randomized method which computes an $\epsilon$approximate solution in $\tilde{O}(mn+n^{2.5})$ time. This yields a randomized method which computes an $\epsilon$optimal policy of a discounted Markov Decision Process with $S$ states and $A$ actions per state in time $\tilde{O}(S^2A+S^{2.5})$. These methods improve upon the previous best runtimes of methods which depend polylogarithmically on problem parameters, which were $\tilde{O}(mn^{1.5})$ (LeeSidford 2015) and $\tilde{O}(S^{2.5}A)$ (LeeSidford 2014, SidfordWangWuYe 2018). To obtain this result we introduce two new algorithmic tools of independent interest. First, we design a new general interior point method for solving linear programs with two sided constraints which combines techniques from (LeeSongZhang 2019, Brand et al. 2020) to obtain a robust stochastic method with iteration count nearly the square root of the smaller dimension. Second, to implement this method we provide dynamic data structures for efficiently maintaining approximations to variants of Lewisweights, a fundamental importance measure for matrices which generalize leverage scores and effective resistances.