Peter Benjamin Seminars
CGI Basics For Programmers
Version 0.6
By Peter Benjamin
March 14, 1999

This paper is intended for the intermediate programmer already familiar with the concepts of piping, standard in and standard out, environment variables, command line arguments, and return codes.

The scope of this paper is to provide the briefest description of the basic mechanics of CGI for a programmer new to CGI, who desires to learn web CGI concepts and immediately begin programming CGI.  Advanced http header methods are not discussed. 

After reading this paper the intermediate programmer will be aware of basic, not advance, CGI requirements upon the CGI program.

First, CGI is a simple concept. You will learn it by reading this one paper alone. You will be competent in CGI within one to two days. Full expertise takes 2-3 years to understand all the various capabilities with http, problem areas and troubleshooting techniques. The latter is not within the scope of this paper.

CGI stands for Common Gateway Interface and is a standard, not a specification. It has nothing directly to do with web or the internet. It has nothing to do with programming languages and is not a language itself. A CGI program can be written in most languages. Common ones are Perl, unix shell script, DOS batch file, C, C++, Java, JavaScript, Frontier, Apple Script, AWK, php, TCL, and many more.

CGI is standard for two software programs to communicate to each other in the following manner, the first program invokes the second with information, and the second completes its task and returns information to the first program. The methods of passing information to and from the second program is what CGI is all about. The two separate methods, one for passing info to, and other for returning info are documented separately below for only web CGI. There are other information passing methods within the CGI standard and are not within the scope of this paper.

Remember that every rule has an exception that proves the rule. It is not possible to cover all the exceptions in this paper. 

Web Server Invokes CGI Program with Information

The CGI program is invoked in the form of a command line program. The following information "pieces" can be provided:

Environment Variables

There are four environment variables used for CGI as listed below. Environment variable implementation differs for Unix, Windows/DOS, Macintosh and Mainframes. They originate from Unix, DOS has them, and Macintosh must emulate them.

Required Environment Variables

    REQUEST_METHOD

Optional Environment Variables

    PATH_INFO
    QUERY_STRING
    CONTENT_LENGTH

The meaning of the environment variables can be shown by example. Three of the environment variables come the HTML FORM tag attributes as shown below or the anchor tag attribute HREF= value or the IMG tag SRC= value.

<FORM METHOD=POST ACTION="/cgi-bin/scriptname.cgi/PATH_INFO?QUERY_STRING">
<FORM METHOD=GET  ACTION="/cgi-bin/scriptname.cgi/PATH_INFO?QUERY_STRING">
<A  HREF="/cgi-bin/scriptname.cgi/PATH_INFO?QUERY_STRING">
<IMG SRC="/cgi-bin/scriptname.cgi/PATH_INFO?QUERY_STRING">

REQUEST_METHOD=POST - only for METHOD=POST
REQUEST_METHOD=GET  - used for HREF=, SRC=, and METHOD=GET

For the FORM tag the METHOD=GET is now considered obsolete. Now METHOD=POST is the preferred method.

The PATH_INFO is optional. It is the string of characters after the slash in "scriptname.cgi/" and before the "?." It can have forward slashes, /, in it.

The QUERY_STRING is optional. It is everything after the "?" and is optional. It should not have forward slashes in it. Put values needing a forward slash into the PATH_INFO.

Avoid using any symbol characters or spaces (blanks) in both the PATH_INFO and QUERY_STRING. They must be converted to the ASCII hex equivalent and preceded by a percent sign. For example: A space becomes the three characters: %20.

The CONTENT_LENGTH is the file size of the Standard Input file described in the next section, in other words it is the character count including the End Of Line (EOL) character.

Applications

<FORM ACTION=...> is required to be used with INPUT, TEXTAREA and SELECT tags.

<A HREF=...> would be used for tracking end users through all the web pages they browse and other capabilities.

<IMG SRC=...> is used for push animations and other advance features. An image is not the only MIME type file that can be returned.


Standard Input File

The Standard Input file or STDIN is only present if the environment variable REQUEST_METHOD equals "POST" (any combination of case). Has file size as given in environment variable CONTENT_LENGTH Has a predefined format.

STDIN File Format

The file format is best done by example relating the file format to the HTML form input tags that generate it.

The HTML form input tag attribute "name" 'value' is used in both the HTML and the CGI STDIN file format.

EXAMPLE:

gets converted by the web broswer into a file that looks like this:

This file and it's file size or CONTENT_LENGTH is sent to the web server to pass to the CGI program. The web server does not "look" into or process the content of the STDIN file in any way.

The structure of the file is two columns of data separated by an equal sign. The first column is the name "value" and the second column is the end user defined value. This relates the HTML name field for the CGI to know what the input values goes with what user-defined values.

The CGI program uses standard methods to access both the environment variables and the standard input file or it uses library methods. There is no difference between either method. They get the job done. Once the CGI program has the inputs the program will manipulate the inputs, store them in other files, send them as email, and invoke other programs.

Even the submit button has a "name" that tells the CGI program which of two or more buttons was pressed:

becomes

but not both, as only one button can be pressed.

The INPUT tag attribute TYPE=RESET does not invoke the CGI, but the web browser restores the page back to the original HTML page and it's default form values.

Eventually, the last thing a CGI program typically does is to write out the "response" HTML page that the web server receives and sends back to the end user sees after pressing the "submit" button. This CGI return is described in the next section, though the CGI program can return any type of output back to the end user, like streaming push animation images, audio, video or other MIME types.

The CGI Program Ends Returning Information

There are two pieces of information returned from the web CGI program to the web server and only one is returned to the end user's browser, that is the Standard Output File or STDOUT file. There are three parts, HTTP command header, MIME type header, and file content. They can be combined as follows:

  1. HTTP command
  2. HTTP command with MIME type for the content.
  3. MIME type with content 

The HTTP command alone is used for various non-content conditions like standard errors.

The HTTP command with MIME type and content is used for push animations and other special situations.

The MIME type with content is the most common. It is used for HTML pages, single images, animated gifs and the "Relocation" or redirection to other web pages or CGI.

The first two are rare and outside the scope of this paper. Many web browsers require such CGI script names to start with the letters nph which signals the fact the HTTP command will be supplied by the CGI back to the web browser. Otherwise, the web server will prefix a HTTP command to the CGI output and pass that back to the web browser, which is typical. CGI rarely output HTTP commands, just MIME type and content file.

CGI STDOUT FILE FORMAT

The STDOUT file has three parts, all can be optionally, but there must be at least one part present.

  1. HTTP Header or Command must be supplied as the first line.
  2. MIME type header stating the "type" of data file following it.
  3. Data File, which is typically a "reply" HTML page to the form.

The HTTP commands are listed at various CGI documentation sites and outside the scope of this paper, and the same is true for most of the MIME types. The MIME types can be viewed from any web server or browser software that comes standard with a modifiable file listing the MIME types the web browser or web server can handle.

The most commonly MIME type header is:

Content-type: text/html

This header must be followed by at least one blank line. The MIME type header is recognized by the leading "Content-type:," which is followed by two values, the generic type of data content and the specialized type.

A typical STDOUT file looks like this:

Content-type: text/html
 
<html>
<body>
<H1>Thank you for registering</H1>
</body>
</html>

The other CGI information passed back to the web server is the "return code" and is rarely used. There are no standard uses for the return code for CGI.


© 1999 Copyright by Peter Benjamin. Pete@PeterBenjamin.com All rights reserved.