
Finding Fair and Efficient Allocations
With Sanath Kumar Krishnamurthy and Rohit Vaish
Preprint, 2017
[arXiv]
[Abstract]
We study the problem of allocating indivisible goods fairly and efficiently among a set of agents who have additive valuations for the goods. Here, an allocation is said to be fair if it is envyfree up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent, up to the removal of one good. In addition, an allocation is deemed to be efficient if it satisfies Pareto efficiency. A notable result of Caragiannis et al. (2016) shows thatunder additive valuationsthere is no need to trade efficiency for fairness; specifically, an allocation that maximizes the Nash social welfare (NSW) objective simultaneously satisfies these two seemingly incompatible properties. However, this approach does not directly provide an efficient algorithm for finding fair and efficient allocations, since maximizing NSW is an NPhard problem.
In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations which are EF1 and Pareto efficient; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally Pareto efficient. The approach developed in the paper leads to a polynomialtime 1.45approximation for maximizing the Nash social welfare objective. This improves upon the best known approximation ratio for this problem. Our results are based on constructing Fisher markets wherein specific equilibria are not only efficient, but also fair.

Online Learning for Structured Loss Spaces
With Aditya Gopalan and Aadirupa Saha
Preprint, 2017
[arXiv]
[Abstract]
We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls. We derive a regret bound for a general version of the online mirror descent (OMD) algorithm that uses a combination of regularizers, each adapted to the constituent atomic norms. The general result recovers standard OMD regret bounds, and yields regret bounds for new structured settings where the loss vectors are (i) noisy versions of points from a lowrank subspace, (ii) sparse vectors corrupted with noise, and (iii) sparse perturbations of lowrank vectors. For the problem of online learning with structured losses, we also show lower bounds on regret in terms of rank and sparsity of the source set of the loss vectors, which implies lower bounds for the above additive loss settings as well.

Approximation Algorithms for Maximin Fair Division
With Sanath Kumar Krishnamurthy
EC, 2017
[arXiv]
[Abstract]
We consider the problem of dividing indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin share, that is defined to be the maximum value that an agent can ensure for herself, if she were to partition the goods into n bundles, and then receive a minimum valued bundle. Since maximin fair allocations (i.e., allocations in which each agent gets at least her maximin share) do not always exist, prior work has focussed on approximation results that aim to find allocations in which the value of the bundle allocated to each agent is (multiplicatively) as close to her maximin share as possible. In particular, Procaccia and Wang (2014) along with Amanatidis et al. (2015) have shown that under a 2/3approximate maximin fair allocation always exists and can be found in polynomial time. We complement these results by developing a simple and efficient algorithm that achieves the same approximation guarantee.
Furthermore, we initiate the study of approximate maximin fair division under submodular valuations. Specifically, we show that when the valuations of the agents are nonnegative, monotone, and submodular, then a 1/10approximate maximin fair allocation is guaranteed to exist. In fact, we show that such an allocation (with a slightly worse approximation guarantee) can be efficiently found by a simple roundrobin algorithm. A technical contribution of the paper is to analyze the performance of this combinatorial algorithm by employing the concept of multilinear extensions.

Algorithmic Aspects of Optimal Channel Coding
With Omar Fawzi
IEEE Transactions on Information Theory, 2017
[arXiv]
[Abstract]
A central question in information theory is to determine
the maximum success probability that can be achieved in sending a fixed number of messages over a noisy channel.
This was first studied in the pioneering work of Shannon who established a simple expression characterizing this quantity in the limit of multiple
independent uses of the channel. Here we consider the general setting with only one use of the channel. We observe
that the maximum success probability can be expressed as the maximum value of a submodular function. Using this connection,
we establish the following results:
1. There is a simple greedy polynomialtime algorithm that computes a code achieving a (11/e)approximation of the maximum success probability. Moreover, for this problem it is NPhard to obtain an approximation ratio strictly better than (11/e).
2. Shared quantum entanglement between the sender and the receiver can increase the success probability by a factor of at most 1/(11/e). In addition, this factor is tight if one allows an arbitrary nonsignaling box between the sender and the receiver.
3. We give tight bounds on the oneshot performance of the metaconverse of PolyanskiyPoorVerdu.

The Dictionary Testing Problem
With Arnab Bhattacharyya and Suprovat Ghoshal
Preprint, 2016
[arXiv]
[Abstract]
The dictionary learning (or sparse coding) problem plays an important role in signal processing, neuroscience, statistics and machine learning. Given a collection of vectors, the goal is to learn a basis with respect to which all the given vectors have a sparse representation. In this work, we study the testing analogue of this problem.
Roughly speaking, the dictionary testing problem is to decide whether a given collection of vectors can be approximately transformed into sparse vectors. More precisely, given as input vectors y_1,..., y_p \in R^d, we want to distinguish between the following two cases (for some choice of parameters):
(i) YES case: There exist a matrix A \in R^{d×m} satisfying RIP and ksparse unit vectors x_1,..., x_p \in R^m such that y_i=A x_i for all i.
(ii) NO case: There does not exist k′sparse unit vectors x_1,...,x_p \in R^m and A \in R^{dxm} satisfying RIP such that (Ax_1,...,Ax_p) is "close" to (y_1,...,y_p).
Note that unlike known results in provable dictionary learning, there is no assumption on the distribution of x_1,...,x_p in the YES case. The goal is to design an efficient algorithm that can distinguish the above two cases using linear queries, i.e., inner products of the input vectors with vectors of our choice.
Our tester is very simple and efficient: it outputs YES iff the gaussian width of the input is sufficiently small. The analysis of the test can be viewed as giving a robust characterization of gaussian width in terms of sparsity. The number of linear queries made by the tester is independent of the dimension.

Algorithmic Aspects of Private Bayesian Persuasion
With Yakov Babichenko
ITCS, 2017
[arXiv]
[Abstract]
We study computational questions in a gametheoretic model that, in particular, aims to capture advertising/persuasion applications such as viral marketing.
Specifically, we consider a multiagent Bayesian persuasion model where an informed sender (marketer) tries to persuade a group of agents (consumers) to adopt a certain product.
The quality of the product is known to the sender, but it is unknown to the agents. The sender is allowed to commit to a signaling policy where she sends a
private signalsay, a viral marketing adto every agent. This work studies the computation aspects of finding a signaling policy that maximizes the sender's revenue.
We show that if the sender's utility is a submodular function of the set of agents that adopt the product, then we can efficiently find a signaling policy whose revenue is at least (11/e) times the optimal. We also prove that approximating the sender's optimal revenue by a factor better than (11/e) is NPhard and, hence, the developed approximation guarantee is essentially tight. When the sender's utility is a function of the number of agents that adopt the product (i.e., the utility function is anonymous), we show that an optimal signaling policy can be computed in polynomial time. Our results are based on an interesting connection between the Bayesian persuasion problem and the evaluation of the concave closure
of a set function.

Empirical Distribution of Equilibrium Play and Its Testing Application
With Yakov Babichenko and Ron
Peretz
Mathematics of Operations Research (MOR), 2017
[arXiv]
[Abstract]
We show that in any nplayer maction normalform game, we can obtain an approximate
equilibrium by sampling any mixedaction equilibrium a small number of times. We study three types of equilibria: Nash, correlated and
coarse correlated. For each one of them we obtain upper and lower bounds on the number of samples required for the
empirical distribution over the sampled action profiles to form an approximate equilibrium with probability close to one.
These bounds imply that using a small number of samples we can test whether or not players are playing according to an approximate equilibrium,
even in games where n and m are large. In addition, our results substantially improve previously known upper bounds on the support size of
approximate equilibria in games with many players. In particular, for all the three types of equilibria we show the existence of approximate
equilibrium with support size polylogarithmic in n and m, whereas the previously bestknown upper bounds were polynomial in n.

Approximating Nash Equilibria in Tree Polymatrix Games
With Katrina Ligett and Georgios Piliouras
SAGT, 2015
[arXiv]
[Abstract]
We develop a quasipolynomial time Las Vegas algorithm for approximating Nash equilibria in polymatrix
games over trees, under a mild renormalizing assumption. Our result, in particular, leads to an expected
polynomialtime algorithm for computing approximate Nash equilibria of tree polymatrix games in which the number
of actions per player is a fixed constant. Further, for trees with constant degree, the running time of the algorithm
matches the best known upper bound for approximating Nash equilibria in bimatrix games (Lipton, Markakis, and Mehta 2003).
Notably, this work closely complements the hardness result of Rubinstein (2015),
which establishes the inapproximability of Nash equilibria in polymatrix games over constantdegree bipartite graphs with two actions per player.

Finding Any Nontrivial Coarse Correlated Equilibrium Is Hard
With Katrina Ligett
EC, 2015
[arXiv]
[Abstract]
One of the most appealing aspects of the (coarse) correlated
equilibrium concept is that natural dynamics quickly arrive at
approximations of such equilibria, even in games with many players. In
addition, there exist polynomialtime algorithms that compute exact
(coarse) correlated equilibria. In light of these results, a natural
question is how good are the (coarse) correlated equilibria that can
arise from any efficient algorithm or dynamics.
In this paper we address this question, and establish strong negative
results. In particular, we show that in multiplayer games that have a
succinct representation, it is NPhard to compute any coarse correlated
equilibrium (or approximate coarse correlated equilibrium) with welfare
strictly better than the worst possible. The focus on succinct games
ensures that the underlying complexity question is interesting; many
multiplayer games of interest are in fact succinct. Our results imply
that, while one can efficiently compute a coarse correlated
equilibrium, one cannot provide any nontrivial welfare guarantee for
the resulting equilibrium, unless P=NP. We show that analogous hardness
results hold for correlated equilibria, and persist under the
egalitarian objective or Pareto optimality.
To complement the hardness results, we develop an algorithmic framework
that identifies settings in which we can efficiently compute an
approximate correlated equilibrium with nearoptimal welfare. We use
this framework to develop an efficient algorithm for computing an
approximate correlated equilibrium with nearoptimal welfare in
aggregative games.

Approximating Nash Equilibria and Dense Subgraphs via an Approximate Version of Caratheodory's Theorem
STOC, 2015
[arXiv]
[Abstract]
We
present algorithmic applications of an approximate version of
Caratheodory's theorem. The theorem states that given a set of vectors
X in R^d, for every vector in the convex hull of X there exists an
\epsilonclose (under the pnorm distance, for 2 < p < infinity)
vector that can be expressed as a convex combination of at most b
vectors of X, where the bound b depends on \epsilon and the norm p and
is independent of the dimension d. This theorem can be derived by
instantiating Maurey's lemma, early references to which can be found in
the work of Pisier (1981) and Carl (1985). However, in this paper we
present a selfcontained proof of this result.
Using this theorem we establish that in a bimatrix game with n x n
payoff matrices A, B, if the number of nonzero entries in any column
of A+B is at most s then an \epsilonNash equilibrium of the game can
be computed in time n^{O\left(\frac{\log s }{\epsilon^2}\right)}. This,
in particular, gives us a polynomialtime approximation scheme for Nash
equilibrium in games with fixed column sparsity s. Moreover, for
arbitrary bimatrix gamessince s can be at most nthe running time
of our algorithm matches the bestknown upper bound, which was obtained
by Lipton, Markakis, and Mehta (2003).
The approximate Caratheodory's theorem also leads to an additive
approximation algorithm for the normalized densest ksubgraph problem.
Given a graph with n vertices and maximum degree d, the developed
algorithm determines a subgraph with exactly k vertices with normalized
density within \epsilon (in the additive sense) of the optimal in time
n^{O\left( \frac{\log d}{\varepsilon^2}\right)}. Additionally, we show
that a similar approximation result can be achieved for the problem of
finding a k x kbipartite subgraph of maximum normalized density.

Online Convex Optimization Using Predictions
With Niangjun Chen, Anish Agarwal, Adam Wierman, and Lachlan L. H. Andrew
ACM SIGMETRICS, 2015
[arXiv]
[Abstract]
Making
use of predictions is a crucial, but underexplored, area of online
algorithms.This paper studies a class of online optimization problems
where we have external noisy predictions available. We propose a
stochastic prediction error model that generalizes prior models in the
learning and stochastic control communities, incorporates correlation
among prediction errors, and captures the fact that predictions improve
as time passes. We prove that achieving sublinear regret and constant
competitive ratio for online algorithms requires the use of an
unbounded prediction window in adversarial settings, but that under
more realistic stochastic prediction error models it is possible to use
Averaging Fixed Horizon Control (AFHC) to simultaneously achieve
sublinear regret and constant competitive ratio using only a
constantsized prediction window. Furthermore, we show that typical
performance of AFHC is tightly concentrated around its mean.

Query Complexity of Correlated
Equilibrium
With Yakov Babichenko
Trans. on Economics and Computation, 2015
[arXiv]
[Abstract]
We
study lower bounds on the query complexity of determining correlated
equilibrium. In particular, we consider a query model in which an
nplayer game is specified via a black box that returns players'
utilities at pure action profiles. In this model we establish that in
order to compute a correlated equilibrium any deterministic algorithm
must query the black box an exponential (in n) number of times.

Simple Approximate Equilibria in Large Games
With Yakov Babichenko and Ron
Peretz
EC, 2014
[arXiv]
[Slides]
[Abstract]
We prove that in every
normal form nplayer game with m actions for each player, there exists
an approximate Nash equilibrium in which each player randomizes
uniformly among a set of O(log m + log n) pure actions. This result
induces an O(N^{log log N})time algorithm for computing an approximate
Nash equilibrium in games where the number of actions is polynomial in
the number of players (m=poly(n)); here N=nm^n is the size of the game
(the input size). Furthermore, when the number of actions is a fixed
constant (m=O(1)) the same algorithm runs in O(N^{log log log N })
time. In addition, we establish an inverse connection between the
entropy of Nash equilibria in the game, and the time it takes to find
such an approximate Nash equilibrium using the random sampling method.
We also consider other relevant notions of equilibria. Specifically, we
prove the existence of approximate correlated equilibrium of support
size polylogarithmic in the number of players, n, and the number of
actions per player, m. In particular, using the probabilistic method,
we show that there exists a multiset of action profiles of
polylogarithmic size such that the uniform distribution over this
multiset forms an approximate correlated equilibrium. Along similar
lines, we establish the existence of approximate coarse correlated
equilibrium with logarithmic support. We complement these results by
considering the computational complexity of determining smallsupport
approximate equilibria. We show that random sampling can be used to
efficiently determine an approximate coarse correlated equilibrium with
logarithmic support. But, such a tight result does not hold for
correlated equilibrium, i.e., sampling might generate an approximate
correlated equilibrium of support size Omega(m) where m is the number
of actions per player. Finally, we show that finding an exact
correlated equilibrium with smallest possible support is NPhard under
Cook reductions, even in the case of twoplayer zerosum games.

Network Design with Coverage Costs
With Shuchi Chawla and Seeun
Umboh
APPROX, 2014
[arXiv]
[Abstract]
We
study network design with a cost structure motivated by redundancy in
data traffic. We are given a graph, g groups of terminals, and a
universe of data packets. Each group of terminals desires a subset of
the packets from its respective source. The cost of routing traffic on
any edge in the network is proportional to the total size of the
distinct packets that the edge carries. Our goal is to find a minimum
cost routing. We focus on two settings. In the first, the collection of
packet sets desired by sourcesink pairs is laminar. For this setting,
we present a primaldual based 2approximation, improving upon a
logarithmic approximation due to Barman and Chawla (2012). In the
second setting, packet sets can have nontrivial intersection. We focus
on the case where each packet is desired by either a single terminal
group or by all of the groups, and the graph is unweighted. For this
setting we present an O(log g)approximation.
Our approximation for the
second setting is based on a novel spannertype construction in
unweighted graphs that, given a collection of g vertex subsets, finds a
subgraph of cost only a constant factor more than the minimum spanning
tree of the graph, such that every subset in the collection has a
Steiner tree in the subgraph of cost at most O(log g) that of its
minimum Steiner tree in the original graph. We call such a subgraph a
group spanner.

On the Existence of LowRank Explanations for Mixed Strategy Behavior
With Umang Bhaskar,
Federico Echenique, and Adam Wierman
WINE, 2014
[arXiv]
[Slides]
[Abstract]
Nash equilibrium is used as a model to explain the observed behavior of
players in strategic settings. For example, in many empirical
applications we observe player behavior, and the problem is to
determine if there exist payoffs for the players for which the
equilibrium corresponds to observed player behavior. Computational
complexity of Nash equilibria is an important consideration in this
framework. If the instance of the model that explains observed player
behavior requires players to have solved a computationally hard
problem, then the explanation provided is questionable. In this paper
we provide conditions under which Nash equilibrium is a reasonable
explanation for strategic behavior, i.e., conditions under which
observed behavior of players can be explained by games in which Nash
equilibria are easy to compute. We identify three structural conditions
and show that if the data set of observed behavior satisfies any of
these conditions, then it is consistent with payoff matrices for which
the observed Nash equilibria could have been computed efficiently. Our
conditions admit large and structurally complex data sets of observed
behavior, showing that even with complexity considerations, Nash
equilibrium is often a reasonable model.

On Load Shedding in Complex Event
Processing
With Yeye He and Jeffery
Naughton
ICDT, 2014
[arXiv]
[Abstract]
Complex
Event Processing (CEP) is a stream processing model
that focuses on detecting event patterns in continuous event
streams. While the CEP model has gained popularity in
the research communities and commercial technologies, the
problem of gracefully degrading performance under heavy
load in the presence of resource constraints, or load shedding,
has been largely overlooked. CEP is similar to “classical”
stream data management, but addresses a substantially
different class of queries. This unfortunately renders the
load shedding algorithms developed for stream data processing
inapplicable. In this paper we study CEP load shedding
under various resource constraints. We formalize broad
classes of CEP loadshedding scenarios as different optimization
problems. We demonstrate an array of complexity
results that reveal the hardness of these problems and
construct shedding algorithms with performance guarantees.
Our results shed some light on the difficulty of developing
loadshedding algorithms that maximize utility.

The Empirical Implications of Rank
in Bimatrix Games
With Umang Bhaskar,
Federico Echenique, and Adam Wierman
EC, 2013
[arXiv]
[Slides]
[Abstract]
We
study the structural complexity of bimatrix games, formalized via rank,
from an empirical perspective. We consider a setting where we have data
on player behavior in diverse strategic situations, but where we do not
observe the relevant payoff functions. We prove that high complexity
(high rank) has empirical consequences when arbitrary data is
considered. Additionally, we prove that, in more restrictive classes of
data (termed laminar), any observation is rationalizable using a
lowrank game: specifically a zerosum game. Hence complexity as a
structural property of a game is not always testable. Finally, we prove
a general result connecting the structure of the feasible data sets
with the highest rank that may be needed to rationalize a set of
observations.

Simultaneous
Bounds on Competitiveness and Regret
With Lachlan L. H. Andrew,
Katrina Ligett, Minghong Lin, Adam Meyerson, Alan Roytman, and Adam
Wierman
COLT, 2013
[pdf]
[Slides]
[Abstract]
We
consider algorithms for "smoothed online convex optimization" problems,
a variant of
the class of online convex optimization problems that is strongly
related to metrical task
systems. Prior literature on these problems has focused on two
performance metrics: regret
and the competitive ratio. There exist known algorithms with sublinear
regret and known
algorithms with constant competitive ratios; however, no known
algorithm achieves both
simultaneously. We show that this is due to a fundamental
incompatibility between these
two metrics  no algorithm (deterministic or randomized) can achieve
sublinear regret and
a constant competitive ratio, even in the case when the objective
functions are linear. However,
we also exhibit an algorithm that, for the important special case of
one dimensional
decision spaces, provides sublinear regret while maintaining a
competitive ratio that grows
arbitrarily slowly.

TrafficRedundancy Aware Network
Design
With Shuchi Chawla
SODA, 2012
[arXiv]
[Slides]
[Abstract]
We
consider network design problems for information networks where routers
can replicate data but cannot alter it. This functionality allows the
network to eliminate dataredundancy in traffic, thereby saving on
routing costs. We consider two problems within this framework and
design approximation algorithms. The first problem we study is the
trafficredundancy aware network design (RAND) problem. We are given a
weighted graph over a single server and many clients. The server owns a
number of different data packets and each client desires a subset of
the packets; the client demand sets form a laminar set system. Our goal
is to connect every client to the source via a single path, such that
the collective cost of the resulting network is minimized. Here the
transportation cost over an edge is its weight times times the number
of distinct packets that it carries.
The second problem is a facility
location problem that we call RAFL. Here the goal is to find an
assignment from clients to facilities such that the total cost of
routing packets from the facilities to clients (along unshared paths),
plus the total cost of "producing" one copy of each desired packet at
each facility is minimized. We present a constant factor approximation
for the RAFL and an O(log P) approximation for RAND, where P is the
total number of distinct packets. We remark that P is always at most
the number of different demand sets desired or the number of clients,
and is generally much smaller.

Decomposition Methods for Large
Scale LP Decoding
With Xishuo Liu, Stark Draper,
and Benjamin Recht
IEEE Transactions on Information Theory, 2013
[arXiv]
[Slides]
[Abstract]
When
binary linear errorcorrecting codes are used over symmetric channels,
a relaxed version of the maximum likelihood decoding problem can be
stated as a linear program (LP). This LP decoder can be used to decode
errorcorrecting codes at biterrorrates comparable to
stateoftheart belief propagation (BP) decoders, but with
significantly stronger theoretical guarantees. However, LP decoding
when implemented with standard LP solvers does not easily scale to the
block lengths of modern error correcting codes. In this paper we draw
on decomposition methods from optimization theory, specifically the
Alternating Directions Method of Multipliers (ADMM), to develop
efficient distributed algorithms for LP decoding.
The key enabling
technical result is a "twoslice" characterization of the geometry of
the parity polytope, which is the convex hull of all codewords of a
single parity check code. This new characterization simplifies the
representation of points in the polytope. Using this simplification, we
develop an efficient algorithm for Euclidean norm projection onto the
parity polytope. This projection is required by ADMM and allows us to
use LP decoding, with all its theoretical guarantees, to decode
largescale error correcting codes efficiently. We present numerical
results for LDPC codes of lengths more than 1000. The waterfall region
of LP decoding is seen to initiate at a slightly higher signaltonoise
ratio than for sumproduct BP, however an error floor is not observed
for LP decoding, which is not the case for BP. Our implementation of LP
decoding using ADMM executes as fast as our baseline sumproduct BP
decoder, is fully parallelizable, and can be seen to implement a type
of messagepassing with a particularly simple schedule.

A Bicriteria Approximation for
the
Reordering Buffer Problem
With Shuchi Chawla and Seeun
Umboh
ESA, 2012
[arXiv]
[Abstract]
In
the reordering buffer problem (RBP), a server is asked to process a
sequence of requests lying in a metric space. To process a request the
server must move to the corresponding point in the metric. The requests
can be processed slightly out of order; in particular, the server has a
buffer of capacity k which can store up to k requests as it reads in
the sequence. The goal is to reorder the requests in such a manner that
the buffer constraint is satisfied and the total travel cost of the
server is minimized. The RBP arises in many applications that require
scheduling with a limited buffer capacity, such as scheduling a disk
arm in storage systems, switching colors in paint shops of a car
manufacturing plant, and rendering 3D images in computer graphics. We
study the offline version of RBP and develop bicriteria approximations.
When the underlying metric is a tree, we obtain a solution of cost no
more than 9OPT using a buffer of capacity 4k + 1 where OPT is the cost
of an optimal solution with buffer capacity k. Constant factor
approximations were known previously only for the uniform metric
(AvigdorElgrabli et al., 2012). Via randomized tree embeddings, this
implies an O(log n) approximation to cost and O(1) approximation to
buffer size for general metrics. Previously the best known algorithm
for arbitrary metrics by Englert et al. (2007) provided an O(log^2 k
log n) approximation without violating the buffer constraint.

Secretary Problems with Convex
Costs
With Shuchi Chawla, David
Malec,
and Seeun Umboh
ICALP, 2012
[arXiv]
[Slides]
[Abstract]
We
consider online resource allocation problems where given a set of
requests our goal is to select a subset that maximizes a value minus
cost type of objective function. Requests are presented online in
random order, and each request possesses an adversarial value and an
adversarial size. The online algorithm must make an irrevocable
accept/reject decision as soon as it sees each request. The "profit" of
a set of accepted requests is its total value minus a convex cost
function of its total size. This problem falls within the framework of
secretary problems. Unlike previous work in that area, one of the main
challenges we face is that the objective function can be positive or
negative and we must guard against accepting requests that look good
early on but cause the solution to have an arbitrarily large cost as
more requests are accepted. This requires designing new techniques.
We
study this problem under various feasibility constraints and present
online algorithms with competitive ratios only a constant factor worse
than those known in the absence of costs for the same feasibility
constraints. We also consider a multidimensional version of the
problem that generalizes multidimensional knapsack within a secretary
framework. In the absence of any feasibility constraints, we present an
O(l) competitive algorithm where l is the number of dimensions; this
matches within constant factors the best known ratio for
multidimensional knapsack secretary.

Approximate String Membership
Checking: A
Multiple Filter, OptimizationBased Approach
With Chong Sun and Jeffrey
Naughton
ICDE, 2012
[pdf]
[Abstract]
We
consider the approximate string membership
checking (ASMC) problem of extracting all the strings or
substrings in a document that approximately match some string
in a given dictionary. To solve this problem, the current stateofart
approach involves first applying an approximate, fast filter,
then applying a more expensive exact verification algorithm to the
strings that pass the filter. Correspondingly, many string filters
have been proposed. We note that different filters are good at
eliminating different strings, depending on the characteristics of
the strings in both the documents and the dictionary. We suspect
that no single filter will dominate all other filters everywhere.
Given an ASMC problem instance and a set of string filters, we
need to select the optimal filter to maximize the performance.
Furthermore, in our experiments we found that in some cases a
sequence of filters dominates any of the filters of the sequence in
isolation, and that the best set of filters and their ordering depend
upon the specific problem instance encountered. Accordingly, we
propose that the approximate match problem be viewed as an
optimization problem, and evaluate a number of techniques for
solving this optimization problem.

On the Complexity of
PrivacyPreserving Complex Event
Processing
With Yeye He, Jeffrey
Naughton, and Di Wang
PODS, 2011
[pdf]
[Abstract]
Complex
Event Processing (CEP) Systems are stream processing
systems that monitor incoming event streams in search of userspecified
event patterns. While CEP systems have been adopted in
a variety of applications, the privacy implications of event pattern
reporting mechanisms have yet to be studied — a stark contrast to
the significant amount of attention that has been devoted to privacy
for relational systems. In this paper we present a privacy problem
that arises when the system must support desired patterns (those
that should be reported if detected) and private patterns (those that
should not be revealed). We formalize this problem, which we term
privacypreserving, utility maximizing CEP (PPCEP), and analyze
its complexity under various assumptions. Our results show that
this is a rich problem to study and shed some light on the difficulty
of developing algorithms that preserve utility without compromising
privacy.

Preventing Equivalence Attacks in
Updated,
Anonymized Data
With Yeye He and Jeffrey
Naughton
ICDE, 2011
[pdf]
[Abstract]
In
comparison to the extensive body of existing
work considering publishonce, static anonymization, dynamic
anonymization is less well studied. Previous work, most notably
minvariance, has made considerable progress in devising a
scheme that attempts to prevent individual records from being
associated with too few sensitive values. We show, however, that
in the presence of updates, even an minvariant table can be
exploited by a new type of attack we call the “equivalence attack.”
To deal with the equivalence attack, we propose a
graphbased anonymization algorithm that leverages solutions
to the classic “mincut/maxflow” problem, and demonstrate
with experiments that our algorithm is efficient and effective
in preventing equivalence attacks.

Region Growing For MultiRoute
Cuts
With Shuchi Chawla
SODA, 2010
[arXiv]
[Slides]
[Abstract]
We
study a number of multiroute cut problems: given a graph G=(V,E) and
connectivity thresholds k_(u,v) on pairs of nodes, the goal is to find
a minimum cost set of edges or vertices the removal of which reduces
the connectivity between every pair (u,v) to strictly below its given
threshold. These problems arise in the context of reliability in
communication networks; They are natural generalizations of traditional
minimum cut problems where the thresholds are either 1 (we want to
completely separate the pair) or infinity (we don't care about the
connectivity for the pair). We provide the first nontrivial
approximations to a number of variants of the problem including for
both nodedisjoint and edgedisjoint connectivity thresholds. A main
contribution of our work is an extension of the region growing
technique for approximating minimum multicuts to the multiroute
setting. When the connectivity thresholds are either 2 or infinity (the
"2route cut" case), we obtain polylogarithmic approximations while
satisfying the thresholds exactly. For arbitrary connectivity
thresholds this approach leads to bicriteria approximations where we
approximately satisfy the thresholds and approximately minimize the
cost. We present a number of different algorithms achieving different
costconnectivity tradeoffs.

Packing Multiway Cuts in
Capacitated Graphs
With Shuchi Chawla
SODA, 2009
[arXiv]
[Slides]
[Abstract]
We
consider the following "multiway cut packing" problem in undirected
graphs: we are given a graph G=(V,E) and k commodities, each
corresponding to a set of terminals located at different vertices in
the graph; our goal is to produce a collection of cuts {E_1,...,E_k}
such that E_i is a multiway cut for commodity i and the maximum load on
any edge is minimized. The load on an edge is defined to be the number
of cuts in the solution crossing the edge. In the capacitated version
of the problem the goal is to minimize the maximum relative load on any
edgethe ratio of the edge's load to its capacity. Multiway cut
packing arises in the context of graph labeling problems where we are
given a partial labeling of a set of items and a neighborhood structure
over them, and, informally, the goal is to complete the labeling in the
most consistent way. This problem was introduced by Rabani, Schulman,
and Swamy (SODA'08), who developed an O(log n/log log n) approximation
for it in general graphs, as well as an improved O(log^2 k)
approximation in trees. Here n is the number of nodes in the graph. We
present the first constant factor approximation for this problem in
arbitrary undirected graphs. Our approach is based on the observation
that every instance of the problem admits a nearoptimal laminar
solution (that is, one in which no pair of cuts cross each other).

Secure
TwoParty ContextFree
Language Recognition
With
Anshuman Singh and K.K.
Shukla
ICDCIT,
2005
[pdf]
[Abstract]
The
growth of the internet provides opportunities for cooperative
computation, it also requires development of protocols that can
accomplish this task among mutually untrusting parties. The aim is to
develop methods which ensure both the correct evaluation of the
function
and privacy of individual inputs. Multiparty Computation protocols
help to achieve the aim without using a trusted third party.
In this paper we consider the problem of contextfree language
recognition
in a twoparty setting. Alice has the description of a contextfree
language L while Bob has a secret string whose membership in L is to be
checked. Neither Alice nor Bob is ready to disclose his/her input to
the
other. Here we propose a protocol which accomplishes secure two party
contextfree language recognition. The novelty of this paper lies in
the
use of formal languages based approach for multiparty computations.