tools/page-loader/README.txt

Fri, 16 Jan 2015 18:13:44 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Fri, 16 Jan 2015 18:13:44 +0100
branch
TOR_BUG_9701
changeset 14
925c144e1f1f
permissions
-rw-r--r--

Integrate suggestion from review to improve consistency with existing code.

michael@0 1 # This Source Code Form is subject to the terms of the Mozilla Public
michael@0 2 # License, v. 2.0. If a copy of the MPL was not distributed with this
michael@0 3 # file, You can obtain one at http://mozilla.org/MPL/2.0/.
michael@0 4
michael@0 5 Rough notes on setting up this test app. jrgm@netscape.com 2001/08/05
michael@0 6
michael@0 7 1) this is intended to be run as a mod_perl application under an Apache web
michael@0 8 server. [It is possible to run it as a cgi-bin, but then you will be paying
michael@0 9 the cost of forking perl and re-compiling all the required modules on each
michael@0 10 page load].
michael@0 11
michael@0 12 2) it should be possible to run this under Apache on win32, but I expect that
michael@0 13 there are *nix-oriented assumptions that have crept in. (You would also need
michael@0 14 a replacement for Time::HiRes, probably by using Win32::API to directly
michael@0 15 call into the system to Windows 'GetLocalTime()'.)
michael@0 16
michael@0 17 3) You need to have a few "non-standard" Perl Modules installed. This script
michael@0 18 will tell you which ones are not installed (let me know if I have left some
michael@0 19 out of this test).
michael@0 20
michael@0 21 --8<--------------------------------------------------------------------
michael@0 22 #!/usr/bin/perl
michael@0 23 my @modules = qw{
michael@0 24 LWP::UserAgent SQL::Statement Text::CSV_XS DBD::CSV
michael@0 25 DBI Time::HiRes CGI::Request URI
michael@0 26 MIME::Base64 HTML::Parser HTML::Tagset Digest::MD5
michael@0 27 };
michael@0 28 for (@modules) {
michael@0 29 printf "%20s", $_;
michael@0 30 eval "use $_;";
michael@0 31 if ($@) {
michael@0 32 print ", I don't have that.\n";
michael@0 33 } else {
michael@0 34 print ", version: ", eval "\$" . "$_" . "::VERSION", "\n";
michael@0 35 }
michael@0 36 }
michael@0 37 --8<--------------------------------------------------------------------
michael@0 38
michael@0 39 For modules that are missing, you can find them at http://www.cpan.org/.
michael@0 40 Download the .tar.gz files you need, and then (for the most part) just
michael@0 41 do 'perl Makefile.PL; make; make test; make install'.
michael@0 42
michael@0 43 [Update: 28-Mar-2003] I recently installed Redhat 7.2, as server, which
michael@0 44 installed Apache 1.3.20 with mod_perl 1.24 and perl 5.6.0. I then ran the
michael@0 45 CPAN shell (`perl -MCPAN -e shell') and after completing configuration, I
michael@0 46 did 'install Bundle::CPAN', 'install Bundle::LWP' and 'install DBI' to
michael@0 47 upgrade tose modules and their dependencies. These instructions work on OSX
michael@0 48 as well, make sure you run the CPAN shell with sudo so you have sufficient
michael@0 49 privs to install the files.
michael@0 50
michael@0 51 CGI::Request seems to have disappeared from CPAN, but you can get a copy
michael@0 52 from <http://stein.cshl.org/WWW/software/CGI::modules/> and then install
michael@0 53 with the standard `perl Makefile.PL; make; make test; make install'.
michael@0 54
michael@0 55 To install the SQL::Statement, Text::CSV_XS, and DBD::CSV modules, there is
michael@0 56 a bundle available on CPAN, so you can use the CPAN shell and just enter
michael@0 57 'install Bundle::DBD::CSV'.
michael@0 58
michael@0 59 At the end of this, the output for the test program above was the
michael@0 60 following. (Note: you don't necessarily have to have the exact version
michael@0 61 numbers for these modules, as far as I know, but something close would be
michael@0 62 safest).
michael@0 63
michael@0 64 LWP::UserAgent, version: 2.003
michael@0 65 SQL::Statement, version: 1.005
michael@0 66 Text::CSV_XS, version: 0.23
michael@0 67 DBD::CSV, version: 0.2002
michael@0 68 DBI, version: 1.35
michael@0 69 Time::HiRes, version: 1.43
michael@0 70 CGI::Request, version: 2.75
michael@0 71 URI, version: 1.23
michael@0 72 MIME::Base64, version: 2.18
michael@0 73 HTML::Parser, version: 3.27
michael@0 74 HTML::Tagset, version: 3.03
michael@0 75 Digest::MD5, version: 2.24
michael@0 76
michael@0 77 4) There is code to draw a sorted graph of the final results, but I have
michael@0 78 disabled the place in 'report.pl' where its use would be triggered (look
michael@0 79 for the comment). This is so that you can run this without having gone
michael@0 80 through the additional setup of the 'gd' library, and the modules GD and
michael@0 81 GD::Graph. If you have those in place, you can turn this on by just
michael@0 82 reenabling the print statement in report.pl
michael@0 83
michael@0 84 [Note - 28-Mar-2003: with Redhat 7.2, libgd.so.1.8.4 is preinstalled to
michael@0 85 /usr/lib. The current GD.pm modules require libgd 2.0.5 or higher, but you
michael@0 86 use 1.8.4 if you install GD.pm version 1.40, which is available at
michael@0 87 <http://stein.cshl.org/WWW/software/GD/old/GD-1.40.tar.gz>. Just do 'perl
michael@0 88 Makefile.PL; make; make install' as usual. I chose to build with JPEG
michael@0 89 support, but without FreeType, XPM and GIF support. I had a test error when
michael@0 90 running 'make test', but it works fine for my purposes. I then installed
michael@0 91 'GD::Text' and 'GD::Graph' from the CPAN shell.]
michael@0 92
michael@0 93 5) To set this up with Apache, create a directory in the cgi-bin for the web
michael@0 94 server called e.g. 'page-loader'.
michael@0 95
michael@0 96 5a) For Apache 1.x/mod_perl 1.x, place this in the Apache httpd.conf file,
michael@0 97 and skip to step 5c.
michael@0 98
michael@0 99 --8<--------------------------------------------------------------------
michael@0 100 Alias /page-loader/ /var/www/cgi-bin/page-loader/
michael@0 101 <Location /page-loader>
michael@0 102 SetHandler perl-script
michael@0 103 PerlHandler Apache::Registry
michael@0 104 PerlSendHeader On
michael@0 105 Options +ExecCGI
michael@0 106 </Location>
michael@0 107 --8<--------------------------------------------------------------------
michael@0 108
michael@0 109 [MacOSX note: The CGI folder lives in /Library/WebServer/CGI-Executables/
michael@0 110 so the Alias line above should instead read:
michael@0 111
michael@0 112 Alias /page-loader/ /Library/WebServer/CGI-Executables/page-loader
michael@0 113
michael@0 114 Case is important (even though the file system is case-insensitive) and
michael@0 115 if you type it incorrectly you will get "Forbidden" HTTP errors.
michael@0 116
michael@0 117 In addition, perl (and mod_perl) aren't enabled by default. You need to
michael@0 118 uncomment two lines in httpd.conf:
michael@0 119 LoadModule perl_module libexec/httpd/libperl.so
michael@0 120 AddModule mod_perl.c
michael@0 121 (basically just search for "perl" and uncomment the lines you find).]
michael@0 122
michael@0 123 5b) If you're using Apache 2.x and mod_perl 1.99/2.x (tested with Red Hat 9),
michael@0 124 place this in your perl.conf or httpd.conf:
michael@0 125
michael@0 126 --8<--------------------------------------------------------------------
michael@0 127 Alias /page-loader/ /var/www/cgi-bin/page-loader/
michael@0 128
michael@0 129 <Location /page-loader>
michael@0 130 SetHandler perl-script
michael@0 131 PerlResponseHandler ModPerl::RegistryPrefork
michael@0 132 PerlOptions +ParseHeaders
michael@0 133 Options +ExecCGI
michael@0 134 </Location>
michael@0 135 --8<--------------------------------------------------------------------
michael@0 136
michael@0 137 If your mod_perl version is less than 1.99_09, then copy RegistryPrefork.pm
michael@0 138 to your vendor_perl ModPerl directory (for example, on Red Hat 9, this is
michael@0 139 /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/ModPerl).
michael@0 140
michael@0 141 If you are using mod_perl 1.99_09 or above, grab RegistryPrefork.pm from
michael@0 142 http://perl.apache.org/docs/2.0/user/porting/compat.html#C_Apache__Registry___C_Apache__PerlRun__and_Friends
michael@0 143 and copy it to the vendor_perl directory as described above.
michael@0 144
michael@0 145 5c) When you're finished, restart Apache. Now you can run this as
michael@0 146 'http://yourserver.domain.com/page-loader/loader.pl'
michael@0 147
michael@0 148 6) You need to create a subdirectory call 'db' under the 'page-loader'
michael@0 149 directory. This subdirectory 'db' must be writeable by UID that Apache
michael@0 150 executes as (e.g., 'nobody' or 'apache'). [You may want to figure out some
michael@0 151 other way to do this if this web server is not behind a firewall].
michael@0 152
michael@0 153 7) You need to assemble a set of content pages, with all images, included JS
michael@0 154 and CSS pulled to the same directory. These pages can live anywhere on the
michael@0 155 same HTTP server that is running this app. The app assumes that each page
michael@0 156 is in its own sub-directory, with included content below that
michael@0 157 directory. You can set the location and the list of pages in the file
michael@0 158 'urllist.txt'. [See 'urllist.txt' for further details on what needs to be
michael@0 159 set there.]
michael@0 160
michael@0 161 There are various tools that will pull in complete copies of web pages
michael@0 162 (e.g. 'wget' or something handrolled from LWP::UserAgent). You should edit
michael@0 163 the pages to remove any redirects, popup windows, and possibly any platform
michael@0 164 specific JS rules (e.g., Mac specific CSS included with
michael@0 165 'document.write("LINK...'). You should also check that for missing content,
michael@0 166 or URLs that did not get changed to point to the local content. [One way to
michael@0 167 check for this is tweak this simple proxy server to check your links:
michael@0 168 http://www.stonehenge.com/merlyn/WebTechniques/col34.listing.txt)
michael@0 169
michael@0 170 [MacOSX note: The web files live in /Library/WebServer/Documents, so you will
michael@0 171 need to modify urllist.txt to have the appropriate FILEBASE and HTTPBASE.]
michael@0 172
michael@0 173 8) The "hook" into the content is a single line in each top-level document like this:
michael@0 174 <!-- MOZ_INSERT_CONTENT_HOOK -->
michael@0 175 which should be placed immediately after the opening <HEAD> element. The script uses
michael@0 176 this as the way to substitute a BASE HREF and some JS into the page which will control
michael@0 177 the exectution of the test.
michael@0 178
michael@0 179 9) You will most likely need to remove all load event handlers from your
michael@0 180 test documents (onload attribute on body and handlers added with
michael@0 181 addEventListener).
michael@0 182
michael@0 183 10) Because the system uses (X)HTML base, and some XML constructs are not
michael@0 184 subject to that (for example xml-stylesheet processing instructions),
michael@0 185 you may need to provide the absolute path to external resources.
michael@0 186
michael@0 187 11) If your documents are tranformed on the client side with XSLT, you will
michael@0 188 need to add this snippet of XSLT to your stylesheet (and possibly make
michael@0 189 sure it does not conflict with your other rules):
michael@0 190 --8<--------------------------------------------------------------------
michael@0 191 <!-- Page Loader -->
michael@0 192 <xsl:template match="html:script">
michael@0 193 <xsl:copy>
michael@0 194 <xsl:apply-templates/>
michael@0 195 </xsl:copy>
michael@0 196 <xsl:for-each select="@*">
michael@0 197 <xsl:copy/>
michael@0 198 </xsl:for-each>
michael@0 199 </xsl:template>
michael@0 200 --8<--------------------------------------------------------------------
michael@0 201 And near the top of your output rules add:
michael@0 202 <xsl:apply-templates select="html:script"/>
michael@0 203 Finally make sure you define the XHTML namespace in the stylesheet
michael@0 204 with "html" prefix.
michael@0 205
michael@0 206 12) I've probably left some stuff out. Bug jrgm@netscape.com for the missing stuff.

mercurial