MIT anti-spam conference

MIT Conference Takes Aim at Spam E – mails

Spam filtering software looks for patterns that suggest an e-mail is spam. But the spammers are constantly evading them, finding new ways to arrange text to make the messages unrecognizable as spam.

Yerazunis’ presentation on his CRM114 Discriminator language was a centerpiece of the conference. His filtering technique “hashes” the messages, matching short phrases from the incoming text with phrases that the user previously supplied as example text, catching spam that might not exactly match standard spam text. He claims that the system has higher than 99.9 percent effectiveness; it can be downloaded for free and is compatible with SpamAssassin or other spam-flagging software.

“This thing is even more accurate than humans,” he said.

my first fulltime exposure to Windows

In every other job I have had, I have had some way to get my work done without using Windows. So I have missed out on the joys of 8 hour a day exposure to Win95/98/NT/2000/xp.

Those days ended this week when I took command some of a box fully loaded with WIN2000.

I got tired of stumbling around as guest on my own desktop so I asked the IT lads to fix that. Then the machine was brutally slow — the systray stuff took forever to show up — so I made another call for help. The proposed fix was to “clean it up.”

I came in the next day and the box was frozen. Had to powercycle it. It’s a little better but still seems to bog down on login.
Continue reading “my first fulltime exposure to Windows”

the end of spam? Perhaps

A Plan for Spam

One of the things you learn when you analyze spam texts is how narrow a subset of the language spammers operate in. It’s that fact, together with the equally characteristic vocabulary of any individual user’s mail, that makes Bayesian filtering a good bet.

This whole article is a great read, even if you don’t understand the code fragments (I don’t but the text makes it clear enough). Mozilla 1.3 now has a Bayesian filter, based on Graham’s ideas.

What struck me about this was the notion of vocabulary shallowness as a filtering strategy. I was catching up on my reading last week (doctors offices are great for that) and these was an article about the genesis of the Cat in the Hat and simultaneous demise of Dick and Jane. The problem with Dick and Jane, as explained by Rudolph Flesch was that those books introduced words as words, rather than phonemes or sounds: the expectation was that children would learn to recognize words without knowing the sounds that comprised them. The success of this approach is illustrated by the title of Flesch’s book: Why Johnny Can’t Read.

(Flesch is also know for the “Fog Index,” a measure of the comprehensibility of a document, based on word lengths and frequencies.)

Dr. Suess was commissioned to write a book using some of the simple lists of words Flesch determined contained the core phonemes: took him a year and 222 words. Not only were the Cat in the Hat and its successor monstrous successes, but a whole line of books — Beginner Books — was created to keep ’em coming.

This idea of being able to filter out junk email based on the message’s lack of vocabulary, either inherent or driven by formula, is fascinating. I’ll be testing this to see how it works.

I know Apple’s Mail.app has a teachable filter in it but it doesn’t seem to work all that well: we’ll have to examine Mozilla’s take on it.

right-sized IT

NewsForge: The Online Newspaper of Record for Linux and Open Source

The Largo solution is to put four terminals in the break room, and load them only with Mozilla and Evolution, and encourage workers to surf, chat, and play online all they want during their lunch hours and other breaks.

Many IT shops might hesitate to put in something like this; those that run PCs would need to supply and maintain four complete computers, including (no doubt) Windows, so they’d need to have virus software kept up to date and take care of all the other chores that go along with running a standalone computer. But none of this applies in the Largo IT scheme. The four thin-client units in the break room were purchased for $2 each on eBay and take no maintenance, and besides the client pieces all you have is keyboards, mice, and monitors, and these are not costly items. The biggest thing that makes this sort of niceness possible, though, big enough that it’s worth saying over again, is no maintenance!

This is a great example of what can be done when you focus on the essential needs and resist the feeping creaturism.
Continue reading “right-sized IT”

Dave Winer goes to Harvard

DaveNet : First essay of the year

Recently a reporter asked me if all this michegas about weblogs isn’t just the Web, and I said of course it is. The first website, done by Tim Berners-Lee was a weblog in every sense of the word. All we’re doing is lowering the barrier, making it easier to get in. That’s a big deal of course, but in another sense, it’s the same thing again and again, every year for the last decade, and each time through the loop it’s bigger, because it gets easier.

This resonates with me because a large part of my working life has been engaged in simplifying the work of publishing.

From cold type to photo-offset to desktop publishing to the internet, it’s been all about using technology to remove the distance between inspiration and publication.

I may have to add Dave to my reading list.

no child left behind, indeed

FindLaw’s Writ – Ramasastry: No Child Left Unrecruited?

When public high school opened their doors this fall, military recruiters converged upon them, seeking student data. Schools and parents, taken aback by these unprecedented requests – for thirty years, this private information has been closely guarded – were surprised to discover that the requests were actually authorized by statute.

In January of this year, the “No Child Left Behind Act” was signed into law. The Act was touted as being designed to ensure that no child is left behind when it comes to getting a decent education. But it also had another, much less publicized aspect: It sought to ensure no child is left behind when it comes to military recruitment, as well.

While I had heard about two different attempts to make some kind of national service compulsory, the notion that recruiters could contact students directly, using data that is assumed to be privileged — I don’t expect the school to share it with anyone — is frightening.

High school students are not known for their refined decisionmaking skills, nor are they always aware of consequences.

quotable

“It especially annoys me when racists are accused of ‘discrimination.’ The ability to discriminate is a precious faculty; by judging all members of one ‘race’ to be the same, the racist precisely shows himself to be incapable of discrimination.”


Letters to a Young Contrarian

Another sees the light . . .

A story of packages: SFS, Debian, and FreeBSD This gets to a piece of Debian-ese that seems a bit counter-intuitive at first: you can’t compile and install a Debian package in one step. Rather, you download the source package, compile it, build a Debian .deb package binary. Once you’ve got that binary package, you can use the Debian standard tools to install it. (RPM is the same way.) Actually, you can install pre-compiled RPM packages but you’ll eventually run into issues. RPM is widely-adopted across the bazillion Linux distros, but none of them are equivalent: their RPMs don’t always work across distros. The interesting thing about this is that in some way it runs counter to the cathedral and the bazaar hypothesis: in this case, the cathedral — the single source — does a better job of making solid, well-documented code work than the bazaar — the wild and woolly world of linux. Learning this once was enough: I have no desire to work with linux again. Of the distros, Debian and Gentoo seem the most engaged in making support and upgrading relatively painless. To no one’s surprise, they lack the gleaming GUIs of RedHat and Mandrake. Darwin is experimenting with RPM and if anyone can make it work, Jordan Hubbard’s team can.