loan

subject: Human Recognition System-Reverse Turing Test [print this page]

Human Recognition System-Reverse Turing Test

Abstract:

To protect the websites from the web bots. Web bots are scripts are driven by the PC. Unlike the human eye, bots do not have the possibility of visually viewing the web form. This advantage has to be made use of when projecting the security of the form. The easiest form can be described as follows: the web form contains an entry field for entering the security code from the picture. The code on the picture is a randomly generated number. In the case of an incorrect code the script does not process the data and redirects the user to a page to reenter the newly generated code. The other name of this method is Captcha. Captcha is short for "Completely Automated Public Turing test to tell Computers and Humans Apart".

Keywords:

Completely Automated Public Turing test to tell Computers and Human Apart (CAPTCHA), Optical Character Recognition (OCR), Internet Security, Reverse Turing Test.

1. Introduction

A CAPTCHA or Captcha is a type of challenge-response test used in computing to ensure that the response is not generated by a computer. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. A common type of CAPTCHA requires that the user type letters or digits from a distorted image that appears on the screen.

The term "CAPTCHA" (based upon the word capture) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford (all of Carnegie Mellon University). It is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart." Carnegie Mellon University attempted to trademark the term, but the trademark application was abandoned on 21 April 2008.

Most CAPTCHA research to date has been limited to academic applications. Far more powerful algorithms will be required for commercial CAPTCHAs. As CAPTCHAs become more prevalent, bot programmers are expected to unleash armies of bots bent on breaking them. Most research programs focus on either building CAPTCHAs or breaking them through, e.g., dictionary and computer-vision attacks. PARC research is unique in that it does both: we play both offense and defense. From exploring how to break them, researchers are discovering new techniques for building CAPTCHAs that are less vulnerable. For example, Baffle Text uses non-English pronounceable character strings to defend against dictionary-driven attacks, and Gestalt-motivated image-masking degradations to defend against image restoration attacks.

A similar method to the Turing test should be used to distinguish human users from

Computer programs with the difference that the human interrogator should be replaced by a computer, which should ask questions to distinguish the human user from the computer

program. This method is called CAPTCHA (Completely Automated Public Turing test to tell Computers and Human Apart). The main focus of this method is, therefore, on questions that the human user can easily answer but which the present computer programs are hardly likely to be able to answer. Reverse Turing Test is Machine can gives the Question Human can the answer bur machine or current computer program cannot.

Among the other methods used for distinguishing human users from computer programs is the use of pictures of words. It is a method based on the weak points of optical character recognition (OCR) programs. OCR programs are used for automatically reading the texts, but they have difficulty reading texts printed with a low quality or reading manuscripts and can only recognize high quality typed texts that use common standard formats. So, this defect of the OCR programs can be taken advantage of by changing the picture of a word so that it can be recognized only by a human user but not by any OCR program. Section 2 will further elaborate on the methods used for this purpose.

2. Web Bots

A Web bot is a computer program that browses the World Wide Web in a methodical, automated manner which starts automated processes via Internet. In most cases, bots execute simple structured repeating tasks. These tasks are executing much more times than an average man can do. Bots are used when collecting and feeding data from the internet. In this process automated scripts seek, analyze and file information from the web server.

Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, or Web spiders, Web robots. This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

2.1 Types of Bots

Mainly four types of bots.

1. Chatterbot: A chatbot (or chatterbot, or chat bot) is a computer program designed to simulate an intelligent conversation with one or more human users via auditory or textual methods. Traditionally, the aim of such simulation has been to fool the user into thinking that the program's output has been produced by a human (the Turing test). Programs playing this role are sometimes referred to as Artificial Conversational Entities, talk bots or chatterboxes. More recently, however, chatbot-like methods have been used for practical purposes such as online help, personalized service, or information acquisition, in which case the program is functioning as a type of conversational agent. What distinguishes a chatbot from more sophisticated natural language processing systems is the simplicity of the algorithms used. Although many chatbots do appear to interpret human input intelligently when generating their responses, many simply scan for keywords within the input and pull a reply with the most matching keywords, or the most similar wording pattern, from a textual database.

2. Spam bot: A program which performs or assists in E-mail address harvesting, a spammer's tool;

A mail filter program which fights the spam;

A software used in a forum spamming, a spammer's tool

Bots analyze html code of web page searching for data in the form. The important data are the name of the input fields and main attributes of the form (method and action).After getting these data, bots create random values for input fields. They send them to the address that is found in the action attribute

3. IRC bot: An IRC bot is a set of scripts or an independent program that connects to Internet Relay Chat as a client, and so appears to other IRC users as another user. An IRC bot differs from a regular client in that instead of providing interactive access to IRC for a human user, it performs automated functions.

An IRC bot is deployed as a detached program running from a stable host. It sits on an IRC channel to keep it open and prevents malicious users from taking over the channel. It can be configured to give channel operator status to privileged users when they join the channel, and can provide a unified channel operator list. Many of these features require that the bot be a channel operator. Thus, most IRC bots are run from computers which have long uptimes (generally running a BSD derivative or Linux) and a fast, stable Internet connection. As IRC has become popular with many dial-up users as well, special services have appeared that offer limited user-level access to a stable Linux server with a decent connection. The user may run an IRC bot from this shell account. These services are commonly known as shell providers.

4. Know bot: knowbot is a kind of bot that collects information by automatically gathering certain specified information from web sites.

KNOWBOT is the acronym for Knowledge-Based Object Technology. The term was coined in 1992 by Ronald T. Carr of the U.S.A. to describe computer-based objects he developed for collecting and storing specific information, in order to use that information to accomplish a specific task, and to enable sharing that information with other objects or processes. An early use of know bots was to provide a computerized assistant to users to complete redundant detailed tasks without a need to train the user in computer technology.

3. Types of CAPTCHAs:

3.1 Gimpy CAPTCHA

Gimpy is a more difficult variant of a word-based CAPTCHA. Ten words are presented in distortion and clutter similar to EZ-Gimpy. The words are also overlapped, providing a CAPTCHA test that can be challenging for humans in some cases. The user is required to name 3 of the 10 words in the image in order to pass the test. Our algorithm can pass this more difficult test 33% of the time.

EZ-Gimpy and Gimpy, the CAPTCHAs that we have broken, are examples of word-based CAPTCHAs. In EZ-Gimpy, the CATPCHA used by Yahoo! (shown in the figure above), the user is presented with an image of a single word. This image has been distorted, and a cluttered, textured background has been added. The distortion and clutter is sufficient to confuse current OCR (optical character recognition) software. However, using our computer vision techniques we are able to correctly identify the word 92% of the time.

3.2 Image Recognition CAPTCHA

Image recognition CAPTCHA is to present the website visitor with a grid of random pictures and instruct the visitor to click on specific pictures to verify that they are not a bot (such as "Click on the pictures of the airplane, the boat and the clock"). Image recognition CAPTCHAs face many potential problems which have not been fully studied. It is difficult for a small site to acquire a large dictionary of images which an attacker does not have access to and without a means of automatically acquiring new labeled images, an image based challenge does not usually meet the definition of a CAPTCHA. KittenAuth, by default, only had 42 images in its database. Microsoft's "Asirra," which it is providing as a free web service, attempts to address this by means of Microsoft Research's partnership with Petfinder.com, which has provided it with more than three million images of cats and dogs, classified by people at thousands of US animal shelters. Researchers claim to have written a program that can break the Microsoft Asirra CAPTCHA. The IMAGINATION CAPTCHA, however, uses a sequence of randomized distortions on the original images to create the CAPTCHA images. Their original images can be made public without risking image-retrieval or image-annotation based attacks.

Fig: Image Recognition captcha

3.3 Graphic Based CAPTCHAS

3.3.1 BONGO

Bongo is a program that asks the user to solve a visual pattern recognition problem. In particular, Bongo displays two series of blocks, the left and the right series. The blocks in the left series differ from those in the right, and the user must find the characteristic that sets the two series apart. A possible left and right series are shown below:

(These two series are different because everything in the left is drawn with thick lines, while everything in the right is drawn with thin lines.). After seeing the two series of blocks, the user is presented with four single blocks and is asked to determine whether each block belongs to the right series or to the left. The user passes the test if he or she correctly determines the side to which all the four blocks belongs.

3.3.2 PIX;

PIX is a program that has a large database of labeled images. All of these images are pictures of concrete objects (a horse, a table, a house, a flower, etc). The program picks an object at random, finds 4 random images of that object from its database, distorts them at random, presents them to the user and then asks the question "what are these pictures of?" (See the example below.) Current computer programs are not able to answer this question. Human can answer the question.

Answer is: DOG

3.4 ReCAPTCHA

ReCAPTCHA's have two words, and that logo you see over there. One word is a regular CAPTCHA, the other (usually the second) is a word scanned from a book. You have to get the CAPTCHA right, but since the computer has no idea what the other word is, you can type anything you want there. Some helpful suggestions: fag, goatse, nigger (extra points if it's someone's name or an adjective), lolwut, longcat, goatse, etc... For all the computer knows, it really says that and it's stupid enough to take your word for it. You don't get to see the result of this yourself, but you do get the satisfaction of knowing that someone out there is reading an e-book about how Sherlock Goatse raeped Professor McNiggertits with his pointy pork sword.

In the original reCAPTCHA it was easy to tell which word is the scan and which the CAPTCHA, the scan is would be clear and have a line through it while the CAPTCHA would be wavy. Thanks to massive trolling to protest CAPTCHAs they've changed it a little; now both words look mostly alike but the discerning troll can still tell the difference. The scan usually has slightly blurrier edges, and is usually an old and/or weird word, or a name. In the example above, the scan is the first word.

4. Techniques in Password Security

. In the ideal world people would choose passwords uniformly from this set. However, passwords tend to be low-entropy, and are typically taken from a relatively small domain (a dictionary) with high probability. This is problematic because it lends itself to a dictionary attack one where the attacker typically succeeds after trying all the passwords in the dictionary (as opposed to trying all 251 passwords). Dictionary attacks come in two varieties: online and online. In an online dictionary attack, each password attempt is sent to a varied (a program not under the attacker's control, a remote machine, etc.) to check. In an online attack, the attacker knows something that allows him to determine the password's correctness by himself (for example, we might know how to compute h, and that h(p) = x). In this chapter, we survey a variety of techniques designed to address these problems.

4.1 Early Work and Current Practice

In the 1960s, it was common practice for login usernames and passwords to be stored

Unencrypted in a password _le designed to be unreadable to the system's users. When

a user attempts to login with password p0, the system looks up the password p associated with the username. If p = p0, the login is successful, otherwise the user is rejected.

This practice was changed when a notable bug on the CTSS time-sharing system caused the Message of the Day,_ shown at login, to be overwritten with a copy of the password, exposing everyone's password. It became clear that another approach one in which the password need not remain secret was necessary. When a user attempts to log in with password p0, compute h (p0) and succeed if h (p0) = key. Publishing the password does not lead to catastrophic results, since one cannot log in by submitting key, and computing its inverse, the actual password, is hard. Unfortunately, this method is susceptible to an online dictionary attack, if the attacker can get a hold of the password . The idea here is that the attacker pre-computes h(p0) for all the passwords in his dictionary. Once the attacker gets the password , he merely has to scan his list of hashes to see if there are any matches. This attack is quite feasible, even on a PDP-11.

4.2 A Web-Based Solution

Pinkas and Sander examine the problem of defending against online dictionary attacks in [PS02], specifically focusing on the problem of defending a Web site that requires login. They begin by describing two commonly-implemented countermeasures against online dictionary attacks, and refuting them: delayed response, and account locking. The strategy of delayed response is one in which the server simply waits for a second or two before responding yes or no to the login request this means that an attacker can only try passwords at a system-specified, slow, rate. The account locking strategy is one in which a given account is locked out for, say, five hours after three incorrect password attempts.

These countermeasures do well if the attacker is trying to compromise a specific account, but fail if the goal is to compromise any account. For example, the attacker can, in parallel, request (u1, p1), (u1, p2), etc., which circumvents the delayed response scheme. Furthermore, if the list of usernames is large (as it is for most web services), there are enough that one can pause for five hours between each attempt, since there 17 are plenty of other names to try. In addition, account locking provides a vector for a Denial-of-service attack (and a customer service nightmare).

A simple solution is the following: prompt the user for a CAPTCHA solution. If the CAPTCHA is solved correctly, prompt for their username and password; otherwise give up on the process. This turns out to be somewhat arduous for a serious production-quality Web site, though, because generating CAPTCHAs for every single login attempt may be too expensive for the Web server, and because it also represents added work that the user

most likely would be reluctant to deal with every time. Instead, Pinkas and Sander propose another scheme, which we've simplified and described.

1. Prompt for the username and password

2. Check to see if the client has a valid cookie

3. If he does:

(a) Password is correct: Accept

(b) Password is incorrect: show the user a CAPTCHA and Reject regardless

of its result

4. If he does not:

(a) Password is correct: show the user a CAPTCHA. If it is solved correctly,

Accept and set a cookie, else reject

(b) Password is incorrect: show the user a CAPTCHA and Reject regardless

of its result

This approach is desirable because legitimate users of the system only have to solve

CAPTCHAs in two cases: when they log in from a new computer, or when their cookie

expires (after, say, 100 successful logins). The chief solvers of the CAPTCHAs are

people who mistype their passwords and the would-be attackers. Pinkas and Sander

also present a modification to the system where incorrect password attempts only

18 solve CAPTCHAs.

4.3 INKBLOTS FOR STRONGER PASSWORDS

Another approach of note is one which tries to improve the quality of the user-selected password. The system, Inkblots [SS04], generates a series of images that look like Rorschach Inkblots, and displays them to the user. The user is supposed to come up

with an association for each inkblot, and, to create a password, takes the first and last letters from each association and concatenates them. So, if inkblots 1, 2, and 3 reminded the user of _helicopter, hippo doing stretches, and crab, the resulting password would be "hrhscb". These inkblots can be seen in below figure.

Here, unlike any other strategy shown above, the user is essentially given an algorithm for picking a password based on computer input, rather than coming up with his or her own, and having it strengthened by other mechanisms. While innovative, it is unclear to what extent rigorous analysis can be performed on the Inkblots if the majority of them look similar, it is not clear that they will be a good source of entropy. However, preliminary studies done (and referenced) in [SS04] look promising. That point notwithstanding, the Inkblots display an approach different from any discussed above; one in which the user is encouraged to create a password more resilient to dictionary attacks, rather than simply making dictionary attacks more difficult.

5. CONCLUSION:

Although it contains some restrictions, in combination with other techniques it gives good choice for security. Modern bots have algorithms which successfully conquer the Captcha method. Most web sites contain their own forms of Captcha implementation, and currently it is rather difficulty for the creators of web bots because they have to create new bot for every web site.

In this paper you can find basic information about security of web forms from web bots (robots) with emphasis on Captcha method which is the most often used method. Although it contains some restrictions, in combination with other techniques it gives good choice for security. Modern bots have algorithms which successfully conquer the Captcha method. Most web sites contain their own forms of Captcha implementation, and currently it is rather difficulty for the creators of web bots because they have to create new bot for every web site. In order for one web form to be secured it is necessary to be able to see all aspects of possible threats. It is recommended to use alternative solutions such as listening to the audio file of the generated code. However, one should not use a wide range of security techniques; it is enough to use one or two methods because the aims of the web sites are to be user friendly.

6FUTURE SCOPES

A lot of CAPTCHA alternatives are provided. Therefore, a lot of web sites replace the CAPTCHA with some alternative. Most of CAPTCHA alternatives are accessible and easy to use. In the CAPTCHA accessibility survey 23.73% of responders said they are not using CAPTCHA in their web site. CAPTCHA could disappear in the coming few years because it is inaccessible and facing a hard challenge with accessible alternatives. We hope CAPTCHA creator find some way to make CAPTCHA accessible. In fact, I think CAPTCHA will disappear before they can make it accessible

REFERENCE PAPERS

1. Luis von Ahn, Manuel Blum and John Langford. Telling Humans and Computers Apart Automatically. In Communications of the ACM.

2. Greg Mori and Jitendra Malik. Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA. In CVPR. (Explains how to break a simple CAPTCHA.)

3. For CAPTCHA www.captcha.net

4. Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford. CAPTCHA: Using Hard AI Problems for Security. In Eurocrypt.

5. www.wikipedia.org/ocr

6. S. Madhvanath and V. Govindaraju. The role of holistic paradigms in handwritten word recognition. IEEE Trans.PAMI, 23(2), February 2001.

7. G. Mori, S. Belongie, and J. Malik. Shape contexts enable efficient retrieval of similar shapes. In CVPR, volume 1,

pages 723730, 2001.

8. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. PAMI, 24(4):509522, April 2002.

9. M. Chew and J. D. Tygar. Image Recognition CAPTCHAs. In Proceedings of the 7th Annual Information Security Conference (Palo Alto, CA, USA, 2004), 268279 .

10. Datta, R., Li, J., and Wang, J. Z. Exploiting the Human-Machine Gap in Image Recognition for Designing CAPTCHAs. IEEE Transactions on Information Forensics and Security, 4 (3), 504-518.

11. Beowebhost forum, http://www.beowebhost.com/forum

12. Wikipedia, http://en.wikipedia.org/wiki/Captcha

Human Recognition System-Reverse Turing Test

By: Bandu Madar

welcome to loan (http://www.yloan.com/)