Tuesday, August 26, 2008

CAPTCHA and reCAPTCHA

CAPTCHA is a test that many readers have seen but likely do not know what it is. CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart." CAPTCHAs consist of small images of letters or words that one must type in to convince a system that you are in fact a human and not a spambot. The letters are generally somewhat distorted or have stray additional lines added. Such excercises take advantage of the fact that humans are able to do difficult symbolic recognition processing that computers so far cannot.

One more recent innovation is reCAPTCHA. reCAPTCHA helps to solve an interesting problem: There are long-term efforts to enage in mass digitization of old books and records. However, there is a problem: simply saving images of the pages would take up much space and would make the pages unsearchable. Thus, books are being scanned and computer programs are being used to figure out what the words are. Some words however, if they are scratched, poorly written, water-damaged or subject to other problems can make the computers unable to recognize the original wording. This is the same issue that allows CAPTCHA to work. However, it is impractical to have humans comb through these many words to identify them all. Now, enter reCAPTCHA.

reCAPTCHA is just like CAPTCHA but the source words to be deciphered are words from old books which computers are having trouble recognizing. The same words needing to be recaptured are presented to multiple different users. If the users agree then the digitizers can be pretty sure that the humans successfuly recognized the correct word and can then digitize that word and use it as an additional word as a challenge word.

More specifically, each reCAPTCHA challenege consists of two words. One of which has a known answer and one which does not. The individual challenged does not know which word is in which category and thus must answer both.

This procedure is a brilliant way of harvesting otherwise lost processing power.

Now, this is all well and good, but what am I expected to do when the reCAPTCHA challenge is:

Friday, August 22, 2008

Thoughts about the late Robert Dunne

Professor Robert Dunne died on Saturday.

This is the first time one of my professors has died. It is disturbing. My assigned adviser Walter Feit died right after he retired and Serge Lang died three days after I talked to him. But there is something different when the professor actually taught a class I was in.

Dunne was also much younger than either Feit or Lang. He was only 59.

That's not the only thing that makes this disturbing: I've had professors where I barely remember their names and faces. Bob Dunne was not that sort of professor. He was engaging, charismatic and thoughtful. He was always willing to stay after class and talk about material even tangentially related to the subject.

I only took one class with Professor Dunne. That was Computers and the Law. The class was a gut. I think that Dunne didn't realize the normal level of material in a Yale class. Between preexisting general knowledge and the ease of the class, I could have not shown up to the lectures and gotten a similar grade. But I didn't skip class. Dunne was too good a lecturer. He was
thoughtful and funny. I learned things from him that would never be formally articulated in a textbook. He taught me the true rule of parody: Parody is accepted as a fair-use defense only if the court finds that parody funny. I cannot think of a better example of Dunne's humor and ability to cut through legalism. It is unfortunate that future students will not benefit from his teaching. He will be missed.

Saturday, August 16, 2008

My Reaction to Rick Warren's Civil Forum

As far as I can tell the exchange went pretty much as expected. I share some of ERV's concerns about this exchange. I would phrase my concerns differently: Whether or not any deities or other supernatural entities exist or whether or not they believe in said supernatural entities has little bearing on whether or not someone can do a decent job as President. The fact that this is the second exchange focusing on religion while the proposal for a Science Debate has been ignored makes this all the more irritating. That said, in practice the questions were much more relevant than I expected. Only a small fraction were about explicitly religious topics.

Overall, the predictions of Mark and Aaron were pretty accurate. A few highlights:

When asked when he thought human life began Obama dodged the question. He gave a reasonable moderate position about abortion. But he did not answer the fundamental question.

Obama felt a need to say that he believes that "marriage" is a "sacred" union between one man and one woman.

McCain demonstrated once again that he is not the candidate of science. He once again repeated his caricature about "3 million dollars to study bear DNA. It has been already established that this is both an inaccurate description of the research in question and that the research was money well-spent. (See also Orac's take on the matter)

McCain also talked briefly at one point about how modern communication had changed many things. He stumbled a bit there, was redundant and not to the point. It might not be fair to attribute this to his lack of technological know-how but I'm going to tentatively do it. Frankly, the only way it could have gotten much worse is if he had talked about how the internet isn't a truck but a series of tubes.

Overall, I think that McCain came across better. He seemed more prepared and was more charismatic. He answered questions faster with little bullet points. Obama attempted what may have been more nuance but came across at times as plodding or pedantic.

I expect that more detailed (and less biased) analysis will be available soon at The Presidential Debate blog so you should go over there (especially because my twin runs it).

Monday, August 11, 2008

Alternative medicine and Wikipedia

A group of alternative medicine practitioners have announced that they are going to start their own wiki to counter Wikipedia which is not sufficiently positive about alternative treatments. They complain that people who read Wikipedia articles on alternative medicine are being “systematically exposed to anti-CAM data.” I’ll let that phrase speak for itself. I’m not going to examine in detail the incredible idiocy and willful ignorance on display here other than to note that they have stated that any anti-CAM data on the wiki will be swiftly removed. This will apparently occur regardless of the truth, falsity or verifiability of the data in question. There is an excellent post over at The Lay Scientist discussing this detail. I’m also not going to discuss how the first group to come to mind to also try to start their own Wiki was extreme right wing Christians.[i]
What I’m actually going to address is a related issue. Some of the commentators who have remarked on this new wiki have in passing attacked Wikipedia. For example, in the otherwise excellent post I linked to above, the author felt a need to say that the alternative medicine proponents "have finally grown tired of trying to insert their claims into the sewerage system of the collective consciousness that is Wikipedia." This is unfair to Wikipedia and to the hundreds of editors who work on Wikipedia’s articles about fringe ideas.
It is a testament to how well Wikipedia functions that extremist groups that are unable to handle Wikipedia’s neutral point of view policy are not able to successfully subvert Wikipedia articles. They have been forced to go elsewhere to promote their extreme minority viewpoints. This is an example of Wikipedia succeeding. This isn’t complete success: Ideally, these people would stay and help make actually neutral articles. But the reader can be confident that for most major alternative medicine claims such as homeopathy and magnet therapy, the articles will accurately reflect what scientific studies have discovered about the topics whether positive or negative. The articles will include the claims made by practitioners and will neutrally discuss what the scientific community thinks of those claims.

[i] However, I cannot resist pointing out that Conservapedia has recently decided that Leif Ericson never came to America. Apparently, claims that he did are part of a liberal plot to undermine the achievements of the Christian explorer Christopher Columbus. I’m not making this up. And before anyone comments, yes I know that Ericson was almost certainly Christian.

While I’m pointing out absurdities on Conservapedia, they also recently announced on their mainpage that “41 students have already signed up for Conservapedia's in-person class this fall, perhaps making it the largest pre-college American History class in the world." Again, I’m not making this up. And moreover, they seem to think that a large student/teacher ratio is a good thing.

Edit on January 28, 2009: The linked to edits at Conservapedia are apparently no longer functioning. I am currently attempting to determine if this is due to Conservapedia's running server problems or if it is due to deliberate attempts to send them down the memory hole. I will post a followup entry when I have more information.

Tuesday, August 5, 2008

Aleph-not is a large cardinal

The math level in this post may be a bit higher than the usual level in my math entries so feel free to skip it. I just have to share this because I found it absolutely amazing.

0 is a large cardinal. We just declare it to exist with an axiom that we all agree on. For all the common definitions of large cardinals 0 satisfies the requirements. To a lot of people who have studied model theory and set theory this will likely seem trivial, but this was pretty mind-blowing to me.

This information is courtesy of Harry Altman.