|
FAQ :
These are Frequently Asked Questions on using pavuk. There are only few
entries at the moment. We will very appreciate if you find some problem and
solution for it, if you write FAQ entry and send it to the developers.
It will help us to reduce the load on responding to help requests.
When I'm downloading documents with some special characters like ?&* the stored document tree is not browsable. I want to convert this characters to some others.
|
This is posible with -tr_chr_chr option. For example use
-tr_chr_chr '?&*' _
and all of ?&* characters become a _ character. If you want to make this
a default behavior, add in your ~/.pavukrc file the line
TrChrToChr: "?&*" "_"
|
|
Some sites have dynamic session numbers like PHPSESSION. How do I download them without reloading the whole site on each call?
|
The argument -fnrules F '*' '(rmpar %o "PHPSESSION")' rewrites the stored filename and
removes the dynamic parameter PHPSESSION from the name. If the website uses another parameter, then
you need to adapt the command. It is also possible to remove multiple variables from a dynamic
website. In this case you need to join the commands like this:
-fnrules F '*' '(rmpar (rmpar (rmpar %o "id") "mode") "PHPSESSION")'.
|
|
I'm using a firewall for Internet access. Can pavuk go through it?
|
Yes. You can use proxys for HTTP, HTTPS, FTP and Gopher.
For HTTP proxy use -http_proxy host:port.
For HTTPS proxy use -ssl_proxy host:port. Pavuk requires a HTTP proxy with
enabled CONNECT request.
For Gopher proxy use -gopher_proxy host:port. It can optionally use HTTP
gateway for accessing gopher servers (use -gopher_httpgw option) or can
use HTTP proxy with enabled CONNECT request.
For FTP proxy use -ftp_proxy host:port. Pavuk can use three different
methods for going through the firewall. You can use HTTP gateway for FTP
(option -ftp_httpgw), you can use native FTP proxy and third option is
to use HTTP proxy with enabled CONNECT request (option -ftp_dirtyproxy)
If your firewall supports SOCKS 4 or SOCKS 5 proxy, you can compile pavuk
to support it. You only need development libraries for this protocols
during compile time.
|
|
I'm using FWTK as firewall, but I can't download any files through FTP proxy.
|
The FTP proxy included in FWTK doesn't support passive data transfers.
Use option -ftp_active to use active mode of FTP data connections.
|
|
I have different scenarios, which I want to execute automatically. Is it possible to serialize scenario execution with pavuk?
|
You can use shell or any scripting language to write short scripts to do
this. Here is example how to use it with sh or bash:
for scn in *.scn; do pavuk -scndir . -scenario $scn;done
|
|
There are files beginning with .in_ in the directories. What are they for?
|
These files are used as temporary files while a file is downloaded. When
transfer of file fails these files contain the transfered part, which is
used for next reget (if possible). These files are used for locking of
documents too.
|
|
I want to start pavuk always with GUI interface. Is there any chance to set this in ~/.pavukrc file?
|
No. There isn't any chance to set it in ~/.pavukrc, but you can use aliasing
mechanizm of your shell. For example:
csh | : | alias xpavuk 'pavuk -X' |
bash | : | alias xpavuk='pavuk -X' |
|
|
Is there any chance to close or restart Xserver, without breaking pavuk when I'm running pavuk with GUI?
|
Yes it is posible with a lot of limitations. At first pavuk must be executed
as background job (run pavuk with pavuk -X &; or pavuk -X -bg; or stop
pavuk with CTRL-Z from shell, and then put in to background with bg shell
command). Then you can use "Go Bg" button, which will discard all pavuk
windows from screen as soon as it will be safe (transfer of current document
must finish) and then will close the connection to XServer.
|
|
Does pavuk preserve symbolic links with FTP servers?
|
Yes it does. But you have to use option -ftplist to enable this feature.
|
|
How can I download a complex site to a single directory without subdirectories?
|
Use following options:
a) | -store_info -fnrules F '*' '/directory/%n' |
b) | -store_info -base_level 1000 -cdir /home/my/directory |
Option -store_info is optional with version 0.9pl20 and higher,
but is required if you want to do synchronisation in future
(see manual for description).
|
|
How do I force pavuk not to build whole directory hierarchy for local document tree?
|
There are two different ways to do this:
1) | You can use option -base_level to cut some levels from hierarchy.
For example if you are downloading http://www.site.tld/manual/automake/automake_toc.html
and you want to store it only in automake directory use -base_level 3. |
2) | You can also use option -fnrules to do this job.
For example you can put all downloaded files into single directory by using
-fnrules 'F' '*' '/directory/%n'. |
|
|
In sync mode I'm using the option -remove_old, but pavuk doesn't remove documents which have just disapeared from remote server?
|
This is no bug. Pavuk needs to know which directory contains your
mirror, to be able to find files which belong to it. So you have to
use option -subdir together with option -remove_old
to specify that directory.
For example if you are mirroring http://www.pavuk.org/ to directory
/home/my/mirror, use command
pavuk -mode sync http://pavuk.org/ -dont_leave_dir
-remove_old -cdir /home/my/mirror/
-subdir /home/my/mirror/http/www.pavuk.org/
and removing of old documents will work well for you.
|
|
Pavuk tells me stat: no such file or directory but all the files seem to be in the local document tree, just where they belong. What's going on?
|
This happens when you're deleting temporary files with an external program
or script via the -post_cmd switch and then try to rewrite links that are
embedded in new incoming documents. By issuing the above mentioned error
message, pavuk tells you that something's wrong (i.e., the file that is
being referenced in an incoming document is no longer in the local document
tree) but it's not crucial as pavuk rewrites the link to the remote
destination nevertheless.
|
|
I am trying to mirror a website but locally cached files are not being removed even though the -remove_old option is set.
|
Make sure you are using the latest version of Pavuk. If you are using
the mirror mode be sure that -remove_old, -cdir, and -subdir options
are set properly. If you are using fnrules to alter the directory
mapping you must also set the -store_info option.
|
|
If you have questions which are not answered here or in the other documents,
ask at the pavuk mailing list.
|
|