txt2html - a text to HTML conversion tool

txt2html is a simple tool for converting text files into HTML files. Text files can contain simple formatting and mark-up commands that txt2html converts into HTML.

txt2html was written by Steve Plimpton. I use it for documentation and WWW pages. Anna Reese added the table formatting options.

See the example.txt and example.html files in the txt2html directory for examples of what all the formatting commands and mark-up syntax end up looking like in HTML.


Syntax:

txt2html file
read from text file, write HTML to standard output
txt2html file1 file2 file3 ...
read each argument as text file, write one HTML file per argument

Input files are first opened with the specified name. If that fails, a ".txt" suffix is added. Output files are created with an ".html" suffix, which is either added or replaces the ".txt" suffix.


Compiling:

The source for txt2html is a single C++ file. Compile it by typing:

g++ -o txt2html txt2html.cpp 

How the tool works:

txt2html reads a text file, one paragraph at a time. A paragraph ends with:

Any line in the paragraph which ends with "\" is concatenated to the following line by removing the "\" character and following newline. This can be useful for some of the formatting commands described below that operate on individual lines in the paragraph.

If a paragraph starts with a "<" character and ends with a ">" character, it is treated as raw HTML and is written directly into the output file.

If a paragraph does not end with a format string, then it is surrounded with HTML paragraph markers (<P> and </P>), mark-up is performed, and the paragraph is written to the output file.

If the paragraph ends with a format string, then formatting is performed, mark-up is performed, and the paragraph is written to the output file.


Formatting:

A format string is the last word of a paragraph if it starts with a ":" character. A format string contains one or more comma-separated commands, like ":ulb,l" or ":c,h3". Note that a format string cannot contain spaces, else it would not be the last word. An individual command can have 0 or more arguments:

Format commands add HTML markers at the beginning or end of the paragraph and individual lines. Commands are processed in the order they appear in the format string. Thus if two commands add HTML markers to the beginning of the paragraph, the 2nd command's marker will appear 2nd. The reverse is true at the end of the paragraph; the 2nd command's marker will appear 1st. Some comands, like line or image make most sense if used as stand-alone commands without an accompanying paragraph.

Commands that format the entire paragraph:

Commands that format the lines of the paragraph as a list:

Commands that treat the paragraph as one entry in a list:

Commands applied to each line of the paragraph:

Special commands (all HTML is inserted at beginning of paragraph):

Table command:

Arguments within tb() can appear in any order and are all optional, since they each have default values.


Mark-up:

The text of the paragraph is scanned for special mark-up characters which are converted into HTML.

Bold and italic characters:

Links are inserted by enclosing a section of text in double quotes, and appending an underscore to the ending quote, followed by the link. The link ends when whitespace is found, except that trailing punctuation characters (comma, period, semi-colon, colon, question mark, exclamation point, parenthesis) are not considered part of the link.

A link of the form "text"_link becomes <A HREF = "link">text</A> in the HTML output. The only exception is if "link" is defined elsewhere in the file as an alias (see the link command above). In that case, the value is used instead of the alias name.

With these rules, links can take several forms.