The subset-sum problem is the problem of determining whether or not a set S of positive integers contains a subset that sums to a target value t. This can be expressed as a language:
SUBSET-SUM = {<S,t> | S contains a subset whose sum is t }
This language is in NP, because a non-deterministic Turing machine can guess the subset, compute the sum of the elements in the subset, and confirm that the subset sum equals t all in polynomial time.
To show that SUBSET-SUM is in NPC, we reduce 3SAT to SUBSET-SUM. The reduction takes a 3cnf-formula φ as its input and constructs a set of integers S and a target value t from the formula.
The table below shows the structure of the set S and how t is constructed. The table is constructed from a 3cnf-formula φ that has l variables and k clauses.
1 | 2 | 3 | 4 | ⋯ | l | c1 | c2 | ⋯ | ck | |
---|---|---|---|---|---|---|---|---|---|---|
y1 | 1 | 0 | 0 | 0 | ⋯ | 0 | 1 | 0 | ⋯ | 0 |
z1 | 1 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | ⋯ | 0 |
y2 | 0 | 1 | 0 | 0 | ⋯ | 0 | 0 | 1 | ⋯ | 0 |
z2 | 0 | 1 | 0 | 0 | ⋯ | 0 | 1 | 0 | ⋯ | 0 |
y3 | 0 | 0 | 1 | 0 | ⋯ | 0 | 1 | 1 | ⋯ | 0 |
z3 | 0 | 0 | 1 | 0 | ⋯ | 0 | 0 | 0 | ⋯ | 1 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
yl | 0 | 0 | 0 | 0 | ⋯ | 1 | 0 | 0 | ⋯ | 0 |
zl | 0 | 0 | 0 | 0 | ⋯ | 1 | 0 | 0 | ⋯ | 0 |
g1 | 0 | 0 | 0 | 0 | ⋯ | 0 | 1 | 0 | ⋯ | 0 |
h1 | 0 | 0 | 0 | 0 | ⋯ | 0 | 1 | 0 | ⋯ | 0 |
g2 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 1 | ⋯ | 0 |
h2 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 1 | ⋯ | 0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
gk | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | ⋯ | 1 |
hk | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | ⋯ | 1 |
t | 1 | 1 | 1 | 1 | ⋯ | 1 | 3 | 3 | ⋯ | 3 |
The table contains two rows for each variable xi in φ. The row labeled yi corresponds to xi appearing in its non-negated form, while the row zi corresponds to xi appearing in its negated form xi. The table also has a column labeled cj for each clause in φ. If the variable xi appears as xi in clause cj, we put a 1 in the table in row yi, column cj. If the variable xi appears as xi in clause cj, we put a 1 in row zi, column cj of the table.
The number t is carefully constructed to express all of the relevant constraints in our system. The first thing to note about the columns of the table and the number t is that no column contains enough 1s in it to trigger a carry operation when we sum down that column.
The next thing to note is that the columns whose sum is 1 express the constraint that each variable in the formula has to either be true or false. By forcing the first set of columns to sum to 1, we are essentially forcing ourselves to pick a subset of the rows in such a way that none of the first l columns in the subset sum contain anything other than exactly one 1 per column.
The second group of columns, which must all sum to 3, express the constraint that each clause must have at least one term that evaluates to true. To get the sum in a particular column to reach 3, we can start by picking numbers from the g and h rows for our subset. These numbers will allow a given column to sum to at most 2, so we will have to pick at least one additional row that has a 1 in that column to reach the sum of three. Picking a row that has a 1 in the proper column corresponds to forcing the variable whose row we picked to evaluate to true.
Definition A language L is in the class P if there exists a polynomial time decider for L.
Definition A decidable language L is in the class NP if there exists a polynomial time nondeterministic Turing machine that decides L.
Definition A decidable language L is in the class NP if the language is polynomial time verifiable. A verifier for a language is a Turing machine program V which, when given a pair <w,c> where w and c are both strings, can use c to determine in polynomial time whether or not w is in L.
Definition A Turing decidable language L is NP-complete if
Cook-Levin Theorem SAT is NP-complete.
Theorem If A and B are both in NP, A is NP-complete, and A ≤P B, then B is NP-complete.
SAT ≤P 3SAT
3SAT ≤P CLIQUE
CLIQUE ≤P VERTEX-COVER
3SAT ≤P ≠SAT
≠SAT ≤P MAX-CUT
3SAT ≤P 3COLOR
3SAT ≤P SUBSET-SUM
If A is in NP there exists a polynomial time verifier V for A. The verifier works on pairs <w,c>, where w is the input that may or may not be in A and c is a certificate chosen from some certificate space C(w).
Here is a Turing machine that can decide A with the help of the verifier V:
On input w:
This machine is a decider, because the certificate space C(w) is finite and V runs in polynomial time.
This machine does not decide A in polynomial time, because although C(w) is finite, for all NP-complete languages C(w) will have a size that is not polynomial in |w|.
Here are some examples of NP-complete languages and the sizes of their certificate spaces:
Language | Typical c | Size of C(w) |
---|---|---|
3SAT | A list of T,F assignments for the n variables in φ | 2n |
CLIQUE | A list of k vertices in the clique | |V| choose k |
3COLOR | A list of color assignments for the vertices in V | 3|V| |
SUBSET-SUM | The numbers in the subset of S | 2|S| |
Definition A Turing decidable language B is NP-complete if
Theorem If A and B are Turing decidable languages with B in P and A ≤P B, then A is in P.
Theorem If B is NP-complete and B ∈ P, then P = NP.
Proof Just combine the definition and the previous theorem.