Mildly Non-linear Codes.

 

 A non-technical abstract.

 In many situations, transmitted data is messed up a little bit by "noise."

 Examples:

1) Cell phone transmission can degrade when the transmitted signal is weak or corrupted by nearby electrical devices or other radio signals.

2) Data read into computer live memory from a hard disk or floppy disk may be imperfect due to slight flaws in the disk (as there almost always are).

3) Phone lines are by nature imperfect; very fast digital transmission over a phone line can easily be corrupted.

 

One imagines a transmission as consisting of a sequence of "packets." In a "coded transmission," the packet X is transmitted along with a "check packet" Y, uniquely determined by X, containing redundant information that will help recover X even if part of the combined packet X,Y is corrupted. The simplest way to do this is for Y simply to be a number of copies of X -- in other words, you transmit X several times. This "please say that again" technique turns out to be very inefficient. For example, suppose that X is sent three times (so Y is two copies of X), and that about 1% of the data sent is corrupted (a fairly large but not unknown level of noise), then roughly one-tenth of one percent of errors will not be corrected. This sounds like a small number of errors until we remember that something like one million packets transmitted is typical. The result would be around 1000 surviving errors. By contrast, a well-designed code in which Y is the same size as X (not equal to X), can clean out the 1% errors in one million packet transmissions with virtual certainty.

 

One of the tasks of coding theory is to build very efficient codes: small check packets and very good error correction. Of course, these goals compete against each other to some extent.

 

Four years ago, a group at the Cambridge University led by a guy named McKay began experimenting with "randomly generated linear codes." A linear code describes a particularly simple method of generating the check packets; so simple, that when Gallager originally discovered them in the early 1960's, it was thought that they couldn't be very good. Linear codes that were constructed systematically turned out, indeed, to be poor. A linear code is determined by a large table of 0's and 1's (a "binary matrix"); a typical size for such a table is 5000 by 10000 or so. McKay discovered techniques for filling in the table by "random" choices -- there are many constraints on the table, and so you don't just flip a coin for each entry, nonetheless, large tracts in the table are chosen randomly. McKay's codes performed at nearly optimal levels, engendering a great deal of interest in them. I heard a talk by McKay on this and I spoke with him last summer at the U. of Minn.

 

I worked last summer on variations of McKay's techniques; a student of mine worked on this too and he found his own variations. I found several alternative methods that employ randomness to construct linear codes. In order to test my codes, I needed to perform a rather complicated calculation that approximates the likelihood that the code will correct an error. In the course of trying to adapt the general formulas to my specific situation, I noticed a similarity between the formulas and some other formulas that come from the analysis of signal frequencies. I saw how to restate the entire theory of these formulas in a new way that is much easier to deal with. When I did this, I realized that small modifications of the linear codes would, in general, improve the correction of errors. The "modified codes" are not linear codes, but they are very close to being linear codes, and so I called them "mildly non-linear." These codes are the subject of ongoing projects.