Advanced HTML Authoring(Stretching the Web)
- Advanced HTML Authoring(Stretching the Web)
- Part 2 -CGI Scripts, Forms, & Other Advanced Features
- Dr Lawrie Brown
- Computer Science, ADFA
- Introduction
- will consider scripts where user enters query information processed by
CGI scripts
- using either ISINDEX or FORM docs
- CGI interface
- security concerns
- periodic update of documents
- ISINDEX and Scripts
- so far scripts have produced output only
- may want to obtain information from users
- original mechanism was ISINDEX
- originally designed for search queries
- now more generalised for any single input
- HTML document includes <ISINDEX> tag
- browser supplies an input box for this doc
- information entered is returned with URL
- http://aserver.on.net/yourdoc.html?query_string
- ISINDEX and Scripts
- request must be handled by a program/script
- usually have a script that both generates
- the original HTML doc with ISINDEX
- and processes the response, generating its doc
- query string is passed to script both
- as command-line arguments
- in QUERY_STRING environment variable
- nb. encoded according to URL encoding rules
- must be decoded before use, see later
- Sample ISINDEX HTML Doc
- <html><head>
- <title>Finger</title>
- </head><body>
- <h1>Finger Gateway</h1>
- Finger? <ISINDEX>
- </body></html>
- Generic Simple Script
- a generic ISINDEX script has the form
- if no query information is passed then
- generate initial query document with ISINDEX
- write to stdout with doc type, blank line, html
- else
- extract query details
- use query to obtain desired info
- compose response
- write to stdout with doc type, blank line, html
- Sample Script
- #!/bin/sh
- echo Content-type: text/html ; echo
- if [ $# = 0 ]; then; cat << EOM
- <html><head><title>Finger</title></head>
- <body><h1>Finger Gateway</h1>
- Finger?<ISINDEX></body></html>
- EOM
- else ; cat << EOM
- <html><head><title>Finger Response
</title></head>
- <body><h1>Finger Response</h1><pre>
- EOM
- /usr/ucb/finger "$*"
- cat << EOM
- </pre></body></html>
- EOM
- fi
- Security Concerns!!!
- Remember, scripts permit
- anyone, anywhere in the world to
- run programs as a user on our systems
- where they provide program input
- need to take great care that the script cannot be coerced into unintended
behaviour
- should carefully vet all information supplied
- check for "naughty chars" eg \n \r ; & |
- could be misinterpreted by program or shell
- take great care how info is passed to program
- Request Methods
- GET
- default request method
- used for plain requests, ISINDEX, simple forms
- query information is encoded in request
- POST
- alternate request method used with forms
- query information is sent following request, available on stdin for CGI
scripts
- HEAD
- returns just headers for requested document
- status check
- URL Encodings
- standard for URLs specifies that:
- spaces are replaced by +
- all "special chars" (incl +) are sent as hex %xx
- query info sent from ISINDEX or FORMs will be encoded also
- server decodes ISINDEX queries to command-line arguments
- BUT QUERY_STRING environment, and FORM data on standard input are encoded
- CGI Interface Standard
- Common Gateway Interface (CGI)
- mechanism for communication between a web server and some gateway
application
- defines
- how program is called
- what environment information is passed
- how the response should be composed
- have already seen examples of "standard" CGI programs (htimage,
nph-count, cgiwrap)
- CGI
- Common Gateway Interface (CGI)
- mechanism for communication between a web server and some gateway
application
- user enters query into ISINDEX or FORM
- HTTP server passes to gateway using CGI
- gateway assembles reply from info as HTML
- server relays HTML + headers back to browser
- CGI Environment & Input
- CGI Environment variables available:
- QUERY_STRING - query info on GET request
- REQUEST_METHOD - type of request
- PATH_INFO - extra path info
- CONTENT_TYPE - for POST requests
- CONTENT_LENGTH - for POST requests
- standard input contains the query data sent with a POST request, with URL
encoding
- command-line contains GET query info
- CGI Output
- script response sent to standard output is:
- full document including MIME type
- Content-Type: text/html
- <html> .... </html>
- reference to another document
- Location: http://somewhere.on.net/another.html
- CGI Output
- server examines these headers & responds appropriately, adding other
Headers
- if script name is nph-xxx, script must send ALL headers, as no server
parsing is done
- useful when sending non-text data that might confuse the server, cf.
nph-count
- nph means "non parsed header"
- CARE needed for such scripts!!!
- PERL Scripts
- shell scripts provide limited string handling
- can use ANY suitable programming lg
- PERL is Unix script language of choice
- very powerfull string manipulation
- full programming language expressive ability
- relatively simple to write and debug
- cgilib.pl - standard library of routines
- MethGet, ReadParse, PrintHeader, PrintVariables
- SendMail (in our modified version)
- Sample PERL Script
- #!/usr/local/bin/perl
- # finger.pl - perl version of finger cgi script
- # - written by: Lawrie.brown@adfa.edu.au - Feb 96
- print "Content-type: text/html\n\n"; # print header
- if ($#ARGV < 0) { # no command-line arguments
- print
"<html><head><title>Finger</title></head>\n";
- print "<body><h1>Finger Gateway</h1>\n";
- print "Finger?<isindex></body></html>\n";
- } else { # have query passed
- Sample PERL Script cont
- print
"<html><head><title>Finger</title></head>\n";
- print "<body><h1>Finger Response</h1>\n";
- $args = "@ARGV"; # collect args together
- if ( $args !~ /^[\w\-+ \t\/@%]+$/) { # check chars
- print "<h2>Illegal Chars</h2>\n";
- print "Your request contains illegal chars!!!\n";
- print "<p>Please go back and re-enter it.\n";
- } else {
- $results = `/usr/ucb/finger @ARGV`;
- print "<pre>\n$results\n</pre>\n";
- }
- print "</body></html>\n";
- }
- Forms
- HTML Forms
- specify the various parts of a fill-out form
- text boxes, radio buttons, check boxes, menus
- enables information to be entered
- info is returned to server for processing
- requires use of CGI scripts on server to process the returned information
- Simple Form
- <html><head>
- <title>Simple Form</title></head><body>
- <h1>Simple HTML Form</h1>
- This document displays a simple form.<hr>
- <form action=http://www.cs.adfa.edu.au/cgi-bin/cgiwrap/~lpb/echo.pl"
method="post">
- hat is your Name?
- <input type="text" name="name" size=40 value="">
- <input type="submit" value="Submit Form">
- <input type="reset" value="Clear Values">
- </form>
- <hr></body></html>
- Form Handler
- form handler is just another CGI script
- query information is sent URL encoded as
- name1=value1&name2=value2&...&namen=valuen
- must be prepared to accept and decode this information from either
- QUERY_STRING environment if method=GET
- standard input if method=POST
- requires good string handling to do this
- then must construct reponse based on info
- Simple Echo Form Handler
- #!/usr/local/bin/perl
- # echo.pl - simple reflector script for testing forms
- # need cgi-lib.pl for: &ReadParse, &PrintHeader
- require "cgi-lib.pl";
- # now read key=value pairs from form into %in
- &ReadParse(*in);
- # construct html reply showing all variable values
- print &PrintHeader;
- print "<html><head>\n<title>Echo Form</title>\n";
- print "</head><body>\n<h1>Echo Form</h1>\n";
- print &PrintVariables(%in);
- print "<hr>\n</body></html>\n";
- exit(0);
- HTML Form Constructs
- <form action="URL" method=get|post> ... </form> defines an
entire form
- URL points to the CGI script handling the info
- method should usually be POST
- within form can have:
- <input ... > ... </input> input field
- <select ... > ... </select> menu
- <textarea ... > ... </textarea> text box
- form MUST include
- <input type=submit> button to send form
- <input type=reset> button to reset form
- HTML Input Fields
- <input ... > ... </input> defines an input field
- type=text enter text
- type=password enter password (no echo)
- type=checkbox toggle button on/off
- type=radio one of a group of toggles
- type=hidden value not displayed but sent
- type=submit|reset buttons to send/clear form
- name="name" give field a unique name
- value="value" default value (or on value)
- size=nnn specify size of field
- checked radio/checkbox defaults on
- HTML Select Menus
- the <select> ... </select> tags specify a menu
- <select name="name">
- <option selected> First option
- <option> second Option
- ....
- </select>
- the selected arg to <option> names which item is selected by
default (none if not given)
- multiple items can be selected using
- <select name="name" multiple> ... </select>
- HTML Textarea
- <textarea> ... </textarea> specifies a multi-line text input
box
- <textarea name="name" rows=4 cols=40>
- </textarea >
- rows specifies the number of rows displayed
- cols specifies the number of columns displayed
- any text between the textarea tags is entered in the box as default
- <textarea name="name" rows=4 cols=40>
- this text will appear in the box by default
- </textarea >
- Prototypical Form
- a prototypical form is available at:
- illustrates most components of forms
- best way to see how they work is to try them!
- More Form Handlers
- lookfile.pl and lookfile.html
- takes the supplied info and extracts matching lines from a specified file
and returns them
- mailme.pl and mailme.html
- allows anyone to send mail to the owner of the script (must be called
using cgiwrap)
- security note:
- great care must be taken with such scripts
- generally restrict recipients to pre-configured value(s)
- or they will be abused
- Caveats on Scripts
- common problem is that users break connections before script completes
- some scripts (esp. sh) don't detect this & die!
- consequently clog up and eventually kill server
- better to use perl, C etc as these do die in this case
- be aware of resources used by scripts
- scripts needing lots of computation can seriously impact server
preformance
- eg database lookups, dynamic image generation etc can be very costly
- anyone can customise a form to feed a script
- Periodic Update
- sometimes would like periodic update
- feature recently added to Netscape et al
- client-pull
- client periodically requests update
- include as part of html doc Head
- <meta http-equiv="Refresh" content=1>
- implies that client should refresh every 1 second
- nb. equiv to sending a special HTTP header "Refresh:"
- must be included in EACH returned doc
- can also force auto-load of another doc
- <meta http-equiv="Refresh" content=5; URL=next.htm>
- Periodic Update
- server-push
- server keeps connection open & periodically sends a new page,
replacing previous version
- uses a new HTTP (MIME) header:
- Content-type: multipart/x-mixed-replace; boundary=ThisRandomString
- each MIME part replaces the previous part
- nb. must send boundary as last line of each part
- efficiency considerations
- server-push usually better, BUT holds connection
- client-pull less efficient, doesn't hold connection
- Lab Session
- goals for this session
- create an ISINDEX shell script that calls "cal"
- displays a calendar
- can call as `cal 1996' or `cal 4 1996', see man entry
- modify sample finger script
- create a perl script that does the same
- modify sample finger.pl script
- create a form that calls lookfile.pl
- use supplied lookfile.pl & phone files, placing them in your cgi-bin
directory
- modify lookfile.pl to lookup your own file
- References
- some useful references on advanced HTML:
- World Wide Web FAQ
- Barebone Guide to HTML
- tutorial on using forms
- the definitive references
- Review
- ISINDEX documents and handlers
- CGI between http server and scripts
- forms and form handlers
- periodic document update
Lawrie.Brown@adfa.edu.au / 02-Feb-96