.: Приложение
***************************************
ACCESSING THE WWW BY E-MAIL
USER GUIDE TO WWW SEARCH ENGINES
HOW TO "CRACK" SEARCH ENGINES BY E-MAIL
***************************************
Copyright (c) 1996-2000, Gerald E. Boyd
gboyd@expita.com
Version 2.0 02Jul00
All rights reserved. Permission is granted to duplicate and
distribute copies of this document provided the copyright
notice and this permission notice are preserved on all copies.
GETTING THE LATEST VERSION
~~~~~~~~~~~~~~~~~~~~~~~~~~
You can get the file by anonymous FTP at:
ftp.expita.com/wscrack.faq
Or by Agora, Getweb, or W3mail
ftp://ftp.expita.com/wscrack.faq
The introduction file (wsintro.faq) and How To "CRACK" Search Engines by
E-mail (wscrack.faq) are located in the same places.
If you desire all the files in one zip file, then download the file
titled wssearch.zip
CHANGES
~~~~~~~
Version 2.0 30Jul00 Revised AltaVista, added Go (old InfoSeek), revised
Getting the Latest Version and my address
Version 1.9 02Jul99 Revised AltaVista and InfoSeek added new variables
added new section about ACTION fields: POST and GET methods.
Version 1.8 31Jan99 Revised AltaVista and InfoSeek -- new addresses
Version 1.7 31Aug98 Revised AltaVista and InfoSeek examples
Version 1.6 10Jan98 Revised AltaVista and InfoSeek examples,
reformatted
Version 1.5 31May97 Revised InfoSeek example again to show new changes
Version 1.4 02Feb97 Revised InfoSeek example to show new changes
Version 1.3 29Sep96 Updated getting the latest version, added changes,
added missing &col=WW in final URL for the InfoSeek example.
Version 1.2 09Aug96 changed ftp directory
Version 1.1 27Jul96 First issue as a FAQ
PURPOSE
~~~~~~~
I have provided many FAQs regarding how to use search engines by email.
You may be wondering how I solved some of these so that you may begin to
do the same thing for yourself.
To begin with, Web search engines normally display as "HTML forms"
(Hypertext Markup Language) which allow users to enter queries and press
the enter key or click on a "begin search" button to retrieve responses
to these queries.
To use these forms you must have a web browser and an Internet
connection.
Many of these forms also have a cgi-bin (Common Gateway Interface)
directory in which the actual search parameters are used to retrieve
data from the search engine's database.
Through trial and error it is possible to translate some of these forms
into email URL's. Not only Web search engine forms, but also other
types of forms can be used via email.
What we want to do is to find the HTML tag <FORM statement and the HTML
tag </FORM> statement. Between these two statements lie all the HTML
coding that displays the form to the user. However, this also contains
all the revelant information we will need to construct and email URL.
We must know how to strip out the HTML code that is useless to us when
constructing an email URL.
FORM ACTION
~~~~~~~~~~~
Excerpts from RFC1866: Hypertext Markup Language - 2.0
The <FORM> element contains a sequence of input elements, along
with document structuring elements. The attributes are:
ACTION
specifies the action URI for the form. The action URI of
a form defaults to the base URI of the document (see 7,
"Hyperlinks").
METHOD
selects a method of accessing the action URI. The set of
applicable methods is a function of the scheme of the
action URI of the form. See 8.2.2, "Query Forms:
METHOD=GET" and 8.2.3, "Forms with Side-Effects:
METHOD=POST".
Almost all "submit" type of web-based pages use the FORM METHOD=POST
action. Almost all simple search engines use the METHOD=GET action.
The METHOD=POST actions are difficult or near impossible to duplicate
via email. The POST method was developed so that duplicate information
is not stored on the server side or that large amounts of data are
submitted to the server. The GET method uses caching to achieve the
results so it's simply easier to duplicate via email.
Of course, there is a lot more to this than stated above, because you
must understand how the METHOD coding works on a web server.
Excerpts from RFC1866: Hypertext Markup Language - 2.0
8.2.2. Query Forms: METHOD=GET
If the processing of a form is idempotent (i.e. it has no lasting
observable effect on the state of the world), then the form method
should be `GET'. Many database searches have no visible side-effects and
make ideal applications of query forms.
To process a form whose action URL is an HTTP URL and whose method is
`GET', the user agent starts with the action URI and appends a `?' and
the form data set, in `application/x-www-form-urlencoded' format as
above. The user agent then traverses the link to this URI just as if it
were an anchor (see 7.2, "Activation of Hyperlinks").
NOTE - The URL encoding may result in very long URIs, which cause some
historical HTTP server implementations to exhibit defective behavior. As
a result, some HTML forms are written using `METHOD=POST' even though
the form submission has no side-effects.
8.2.3. Forms with Side-Effects: METHOD=POST
If the service associated with the processing of a form has side effects
(for example, modification of a database or subscription to a service),
the method should be `POST'.
(Notice the phrase "...subscription to a service..." -- G.E. Boyd)
To process a form whose action URL is an HTTP URL and whose method is
`POST', the user agent conducts an HTTP POST transaction using the
action URI, and a message body of type `application/x-www-form-
urlencoded' format as above. The user agent should display the response
from the HTTP POST interaction just as it would display the response
from an HTTP GET above.
Well it turns out that most of the simple search engine sites that I
have "cracked" use a GET action. The search engines sites that use a
POST method probably should have used a GET method, that is, the site
returns the same values from the POST that a GET would have returned.
That is the reason why I could "crack" the POST method.
When the METHOD=POST is truly as stated in section 8.2.3, I haven't been
able to crack the site because the results are all done on the
server-side through the actions of the cgi scripts and only the answer
is returned.
I don't think it is possible to handle a "true" METHOD=POST because the
processing of the METHOD is done on the server side. If the person has
not written their code to handle a GET then the request would fail. The
HTTP server code has to take in account the METHOD=GET and METHOD=POST
code and then process them accordingly. This is dependent upon the
person who installed the server code to make sure the coding checks for
either. From my experience in setting up and running an OS/2 HTTP
server, I know that the actions to be taken have to be handled by the
server. In fact I only handled GET methods (POST methods just fell
through as I didn't write any code to handle them).
NOTE: In light of the development of the www4mil servers, it appears as
though most obstacles to the POST action have been overcome. However,
even the www4mail servers don't always work. I have had trouble with
Deja Advanced searching and Lycos Advanced searching. The www4mail form
submissions fail as well as the query strings. The query strings also
fail using Agora servers at the same two sites.
EXAMPLE USING ALTAVISTA
~~~~~~~~~~~~~~~~~~~~~~~
Let's take a look at a Web search form by viewing the source HTML code
of the form. I will use AltaVista (http://www.altavista.com) as
an example.
I have simplified the display of the HTML page by stripping out all the
code preceding the <FORM statement and all the code succeeding the
</FORM> statement. We now have the following:
AltaVista - Welcome
<FORM name=mfrm action=/cgi-bin/query>
<INPUT maxLength=800 size=53 name=q>
<SELECT name=kl>
<OPTION value=XX selected>any language
<OPTION value=en>English
<OPTION value=zh>Chinese
<OPTION value=cs>Czech
<OPTION value=da>Danish
<OPTION value=nl>Dutch
<OPTION value=et>Estonian
<OPTION value=fi>Finnish
<OPTION value=fr>French
<OPTION value=de>German
<OPTION value=el>Greek
<OPTION value=he>Hebrew
<OPTION value=hu>Hungarian
<OPTION value=is>Icelandic
<OPTION value=it>Italian
<OPTION value=ja>Japanese
<OPTION value=ko>Korean
<OPTION value=lv>Latvian
<OPTION value=lt>Lithuanian
<OPTION value=no>Norwegian
<OPTION value=pl>Polish
<OPTION value=pt>Portuguese
<OPTION value=ro>Romanian
<OPTION value=ru>Russian
<OPTION value=es>Spanish
<OPTION value=sv>Swedish
</OPTION>
</SELECT>
<INPUT type=image height=20 alt=Search width=58 src="AltaVista_files/search.gif"
align=absMiddleborder=0 name=search>
<INPUT type=hidden value=q name=pg>
<INPUT type=hidden value=on name=Translate>
</FORM>
Stripping out the HTML code that is of no use to us leaves:
<FORM name=mfrm action=/cgi-bin/query>
<INPUT maxLength=800 size=53 name=q>
<SELECT name=kl>
<OPTION value=XX selected>any language
<OPTION value=en>English
<OPTION value=zh>Chinese
<OPTION value=cs>Czech
<OPTION value=da>Danish
<OPTION value=nl>Dutch
<OPTION value=et>Estonian
<OPTION value=fi>Finnish
<OPTION value=fr>French
<OPTION value=de>German
<OPTION value=el>Greek
<OPTION value=he>Hebrew
<OPTION value=hu>Hungarian
<OPTION value=is>Icelandic
<OPTION value=it>Italian
<OPTION value=ja>Japanese
<OPTION value=ko>Korean
<OPTION value=lv>Latvian
<OPTION value=lt>Lithuanian
<OPTION value=no>Norwegian
<OPTION value=pl>Polish
<OPTION value=pt>Portuguese
<OPTION value=ro>Romanian
<OPTION value=ru>Russian
<OPTION value=es>Spanish
<OPTION value=sv>Swedish
</OPTION>
</SELECT>
<INPUT type=hidden value=q name=pg>
<INPUT type=hidden value=on name=Translate>
</FORM>
To construct an email URL, we will need to have an ACTION followed by a
?QUERY string.
The ?QUERY string will begin with ?variable=value followed by one or
more &variable=value pairs.
NOTE: The query always begins with a question mark (?) followed by a
variable name, followed by an equal sign (=), followed by a string.
Normally the ACTION= statement refers to a program that executes the
query and the other values form the query. It can also refer to a
"cgi-bin (Common Gateway Interface)" directory in which the actual
search parameters are used to retrieve data from the search engine's
database.
So, to begin our email search URL we will use the string between the
quote marks following the action= word. In this case it refers to the
"cg-bin" directory: /cgi-bin/query
For cgi scripts and subdirectories, we just append these to the site
name to form our email URL. So we now have:
http://www.altavista.com/cgi-bin/query
Now we need to construct the ?QUERY string which comes after the "?"
(question mark).
Our query keyword will come from the
<INPUT maxLength=800 size=53 name=q>
This statement indicates that the variable "q" (NAME=) can contain 800
characters maximum (MAXLENGTH=800).
With this information, we now have part of the ?QUERY string:
?q=
Our email URL at this point, looks like this:
http://www.altavista.com/cgi-bin/query?q=
Assume we are searching on the string "frog dissection". We note that
the search string cannot contain any blanks and that words are separated
by a "+" (plus sign). Other characters in the search string may need to
be converted to hexadecimal values preceeded with a percent sign (%).
With this information, we can construct the first "&variable=value"
pair:
&q=frog+dissection
if you want to use quotes " in your search words, then add %22
example: %22spam%22, %22frog+dissection%22
if you want to use parentheses () in your search words, then add %28 for
a left paren "(" and %29 for a right paren ")"
example: %28spam%29, %28frog+dissection%29
if you want to use commas (",") in your search words, then add %2C
example: man%2Cwoman
you can use hyphens ("-") as is in your search words
example: cable-networks
if you want to use brackets [] in your search words, then add %5B for a
left bracket "[" and %5D for a right bracket "]"
example: %28spam%29, %28frog+dissection%29
if you want to use a plus sign ("+") in your search, then add %2B
example: man+%2Bwoman
you can use a minus sign ("-") as is in your search words
example: python+-monty
Next, we begin to construct the "&variable=value" pairs for the keyword
or keywords that we get from the source HTML code. We get these
keywords from the two types of statements in the HTML code:
<SELECT NAME= and <INPUT TYPE=
Some forms have options in which you can choose by selecting a button or
using a scrolling list or a check box.
AltaVista uses a scrolling list to display documents in various
languages.
In the view source mode, you will see the following:
... Or enter a few words in
<SELECT NAME=kl>
<OPTION value=XX selected>any language
<OPTION value=en>English
<OPTION value=zh>Chinese
<OPTION value=cs>Czech
<OPTION value=da>Danish
<OPTION value=nl>Dutch
<OPTION value=et>Estonian
<OPTION value=fi>Finnish
<OPTION value=fr>French
<OPTION value=de>German
<OPTION value=el>Greek
<OPTION value=he>Hebrew
<OPTION value=hu>Hungarian
<OPTION value=is>Icelandic
<OPTION value=it>Italian
<OPTION value=ja>Japanese
<OPTION value=ko>Korean
<OPTION value=lv>Latvian
<OPTION value=lt>Lithuanian
<OPTION value=no>Norwegian
<OPTION value=pl>Polish
<OPTION value=pt>Portuguese
<OPTION value=ro>Romanian
<OPTION value=ru>Russian
<OPTION value=es>Spanish
<OPTION value=sv>Swedish
</SELECT>
Using the above source data, we can construct the second
"&variable=value" pair:
&kl=XX (in any language - default because of the word SELECTED)
=en English
=zh Chinese
=cs Czech
=da Danish
=nl Dutch
=et Estonian
=fi Finnish
=fr French
=de German
=el Greek
=he Hebrew
=hu Hungarian
=is Icelandic
=it Italian
=ja Japanese
=ko Korean
=lv Latvian
=lt Lithuanian
=no Norwegian
=pl Polish
=pt Portuguese
=ro Romanian
=ru Russian
=es Spanish
=sv Swedish
After making a choice (for example - in any language) our email URL
looks like this:
http://www.altavista.com/cgi-bin/query?q=frog+dissection&kl=XX
Hidden values are *NOT* shown to the user using a broswer to view the
web page. However, they are required as part of the ?QUERY string.
In our example above, we have two hidden values:
<INPUT type=hidden value=q name=pg>
<INPUT type=hidden value=on name=Translate>
Using the above source data, we can construct the two hidden
"&variable=value" pairs:
&pq=q
&Translate=on
So our final URL looks like this:
http://www.altavista.com/cgi-bin/query?q=frog+dissection&kl=XX
&pg=q&Translate=on
EXAMPLE USING GO
~~~~~~~~~~~~~~~~
Let's take a look at a another Web search form by viewing the source
HTML code of the form. I will use Go (http://www.go.com) as an
example.
I have simplified the display of the HTML page by stripping out all the
code preceding the <FORM statement and all the code succeeding the
</FORM> statement. We now have the following:
GO Network
<FORM name=WW_dir_search action=/Titles method=get>
<TBODY>
Sunday, July 30, 2000
<SPACER width="14" type="block">
<TBODY>
SEARCH
<INPUT type=radio CHECKED value=WW name=col>the Web
<INPUT type=radio value=MC name=col>Images
<INPUT type=radio value=AV name=col>Audio/Video
<INPUT maxLength=511 size=40 name=qt>
<INPUT type=hidden value=home_searchbox name=svx>
<INPUT type=hidden value=IS name=sv>
<INPUT type=hidden value=noframes name=lk>
<INPUT type=submit value=Find>
</FORM>
Stripping out the HTML code that is of no use to us leaves:
<FORM name=WW_dir_search action=/Titles method=get>
SEARCH
<INPUT type=radio CHECKED value=WW name=col>the Web
<INPUT type=radio value=MC name=col>Images
<INPUT type=radio value=AV name=col>Audio/Video
<INPUT maxLength=511 size=40 name=qt>
<INPUT type=hidden value=home_searchbox name=svx>
<INPUT type=hidden value=IS name=sv>
<INPUT type=hidden value=noframes name=lk>
</FORM>
To construct an email URL, we will need to have an ACTION followed by a
?QUERY string.
The ?QUERY string will begin with ?variable=value followed by one or
more &variable=value pairs.
NOTE: The query always begins with a question mark (?) followed by a
variable name, followed by an equal sign (=), followed by a string.
Normally the ACTION= statement refers to a program that executes the
query and the other values form the query. It can also refer to a
"cgi-bin (Common Gateway Interface)" directory in which the actual
search parameters are used to retrieve data from the search engine's
database.
So, to begin our email search URL we will use the string between the
quote marks following the action= word. In this case it refers to the
"/Titles" directory: /Titles
For cgi scripts and subdirectories, we just append these to the site
name to form our email URL. So we now have:
http://www.go.com/Titles
Now we need to construct the ?QUERY string which comes after the "?"
(question mark).
Normally, the search string is used as the query variable, however, in
the case of GO, it is used later.
Our query keyword will come from the the keyword or keywords that we get
from the source HTML code. We get these keywords from the two types of
statements in the HTML code:
<SELECT NAME= and <INPUT TYPE=
Some forms have options in which you can choose by selecting a button or
using a scrolling list or a check box. Go uses radio buttons to
search:
- the Web
- Images
- Auio/Video
In the view source mode, you will see the following:
<INPUT type=radio CHECKED value=WW name=col>the Web
<INPUT type=radio value=MC name=col>Images
<INPUT type=radio value=AV name=col>Audio/Video
Using the above source data, we can construct the query as
"?col=value" pair:
?col=WW (World Wide Web)
?col=MC (Images)
?col=AV (Audio/Video)
After making a choice (for example - the World Wide Web) our
email URL looks like this:
http://www.go.com/Titles?col=WW
Our query string will come from the
<INPUT maxLength=511 size=40 name=qt>
This statement indicates the the variable "qt" (NAME=) has no value and
that it can contain 511 characters maximum (MAXLENGTH=511).
Assume we are searching on the string "frog dissection". We note that
the search string cannot contain any blanks and that words are separated
by a "+" (plus sign). Other characters in the search string may need to
be converted to hexadecimal values preceeded with a percent sign (%).
With this information, we now have the first "&variable=value" pair:
&qt=frog+dissection
if you want to use quotes " in your search words, then add %22
example: %22spam%22, %22frog+dissection%22
if you want to use parentheses () in your search words, then add %28 for
a left paren "(" and %29 for a right paren ")"
example: %28spam%29, %28frog+dissection%29
if you want to use commas (",") in your search words, then add %2C
example: man%2Cwoman
you can use hyphens ("-") as is in your search words
example: cable-networks
if you want to use brackets [] in your search words, then add %5B for a
left bracket "[" and %5D for a right bracket "]"
example: %28spam%29, %28frog+dissection%29
if you want to use a plus sign ("+") in your search, then add %2B
example: man+%2Bwoman
you can use a minus sign ("-") as is in your search words
example: python+-monty
So our URL at this point looks like this:
http://www.go.com/Titles?col=WW&qt=frog+dissection
Hidden values are *NOT* shown to the user using a broswer to view the
web page. However, they are required as part of the ?QUERY string.
In our example above, we have three hidden values:
<INPUT type=hidden value=home_searchbox name=svx>
<INPUT type=hidden value=IS name=sv>
<INPUT type=hidden value=noframes name=lk>
Using the above source data, we can construct the three hidden
"&variable=value" pairs:
&svx=home_searchbox
&sv=IS
&lk=noframes
After adding the hidden values, our final email URL looks like this:
http://www.go.com/Titles?col=WW&qt=frog+dissection
&svx=home_searchbox&sv=IS&lk=noframes
------------------------------------------------------------------------
Copyright (c) 1996-2000, Gerald E. Boyd
gboyd@expita.com
All rights reserved. Permission is granted to duplicate and
distribute copies of this document provided the copyright
notice and this permission notice are preserved on all copies.
------------------------------------------------------------------------
|