|
JavaScript Documentation :
Pavuk uses JavaScript bindings for doing some complicated tasks which need
some more complexity than can be achieved with not scriptable programs. At
first you need to have javascript library from Mozilla project installed on
your system. It is possible to download sources from
Mozilla FTP space.
Compilation is very simple.
The use of JavaScript scripts within pavuk is not yet documented in the
manual pages.
You can load one JavaScript file into pavuk using option -js_script_file.
Currently there are two places in pavuk, where users can insert own
JavaScript functions.
One is inside routine which is doing decision whether a particular URL should
be downloaded or not. If you want insert a own JavaScript decision function, you
must name it pavuk_url_cond_check. The prototype of this function looks
like following:
function pavuk_url_cond_check(url, level)
{
}
Arguments:
- level is an integer number and indicates from which of the different places
in pavuk code pavuk_url_cond_check function was called:
level | place |
0 | Condition checking is called from HTML parsing routine.
At this point you can use all conditions besides -dmax, -min_time,
-max_time, -max_size, -min_size, -amimet, -dmimet |
1 | Condition checking is called from routine which is performing
queuing of URLs into download queue. At this point you can use all conditions like
in level 0 including -dmax. |
2 | Condition checking is called when URL is taken from download
queue and will be transfered after this check will be successful.
At this point you can use same set of conditions like in level 1. |
3 | Condition checking is called after pavuk sent request for
download and detected document size, modification time and mime type. In
this level you can use all conditions. |
- url is an object instance of PavukUrl class. It contains all information
about the particular URL and is wrapper for parsed URLs defined in pavuk as
structure of url type.
It has following attributes:
read-write attributes |
status | (int32, defined always) holds bitfields with different
information (look in url.h to see more) |
read-only attributes defined always |
protocol | one of "http", "https", "ftp", "ftps", "file", "gopher",
"unknown" means the type of the URL |
level | level in document tree at which this URL lies |
ref_cnt | number of parent documents which reference this URL |
urlstr | full URL string |
read-only attributes defined when protocol is "http" or "https" |
http_host | host name or IP address |
http_port | port number |
http_document | HTTP document |
http_searchstr | query string when available (the part of URL after ?) |
http_anchor_name | anchor name when available (the part of URL after #) |
http_user | user name for authorization when available |
http_password | password for authorization when available |
read-only attributes defined when protocol is "ftp" or "ftps" |
ftp_host | host name or IP address |
ftp_port | port number |
ftp_user | user name for authorization when available |
ftp_password | password for authorization when available |
ftp_path | path to file or directory |
ftp_anchor_name | anchor name when available (the part of URL after #) |
ftp_dir | flag whether this URL points to directory |
read-only attributes defined when protocol is "file" |
file_name | path to file or directory |
file_searchstr | query string when available (the part of URL after ?) |
file_anchor_name | anchor name when available (the part of URL after #) |
read-only attributes defined when protocol is "gopher" |
gopher_host | host name or IP address |
gopher_port | port number |
gopher_selector | selector string |
read-only attributes defined when protocol is "unknown" |
unsupported_urlstr | full URL string |
read-only attributes available when performing checking of conditions |
check_level | equivalent to level parameter of pavuk_url_cond_check function |
mime_type | MIME type of this URL (defined when available) |
doc_size | size of document (defined when available) |
modification_time | modification time of document (defined when available) |
doc_number | number of document in download queue (defined when available) |
html_doc | full content of parent document of current URL (defined when level is 0) |
html_doc_offset | offset of current HTML tag in parent document of URL
(defined when level is 0) |
moved_to | get URL to which was this URL moved (define when available) |
html_tag | full HTML tag including <> from which is taken current URL
(defined when level is 0) |
tag | name of HTML tag from which is current URL taken (defined when level is 0) |
attrib | name of HTML tag attribute from which is current URL taken (defined
when level is 0) |
And following methods:
get_parent(n) | get URL of n-th parent document |
check_cond(name, ....) | check condition which option name is "name".
When you will not provide additional parameters pavuk will use
parameters from command line or scenario file for condition
checking. Otherwise it will use listed parameters. |
Here is a example what pavuk_url_cond_check function can look like:
function pavuk_url_cond_check (url, level)
{
if(level == 0)
{
if(url.level > 3 && url.check_cond("-asite", "www.host.com"))
return false;
if(url.check_cond("-url_rpattern",
"http://www.pavuk.org/", "http://www.pavuk.org/~pic/") &&
url.check_cond("-dsfx", ".jar", ".tgz", ".png))
return false;
}
if(level == 2)
{
par = url.get_parent();
if(par && par.get_moved())
return false;
}
return true;
}
The example is useless, but shows you how to use this feature...
The second possible use of JavaScript with pavuk is in -fnrules option
for generating local names. In this case it is done by special function of
extended -fnrules option syntax called jsf which has one parameter -
the name of javascript function which will be called. The function must
return a string parameter and its prototype is something like following:
function some_jsf_func(fnrule)
{
}
The fnrule parameter is an object instance of PavukFnrules class.
It has one attribute url which is of PavukUrl type described above
and also have one method get_macro(macro) which returns a value of the
%x macros used in -fnrules option.
You can do something like -fnrules F "*" '(jsf "some_fnrules_func")'
|
|