DRAFT-SWINDELL-PTSC-HDR-01.TXT
01-04-25 19:33
Line 1 / 825
Individual R. Swindell
Internet Draft Wind River Systems
Document: <draft-swindell-ptsc-hdr-01.txt> October 1999
Category: Informational Expires April 2000
Plain Text/Source Code File Header
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Distribution of this memo is unlimited.
Copyright (C) The Internet Society 1999. All Rights Reserved.
1. Abstract
Anyone that has dealt at length with plain (ASCII [1]) text and
source code files can testify that the lack of a global definition
of the effect of the horizontal-tab character, all too often, causes
ill-formed display and printed output of plain text files that
utilize the horizontal-tab character for formatting.
This document defines a common header for plain text and source code
(PT/SC) files, whose primary purpose is to specify the tab-dependant
formatting parameters to be used when displaying, printing, or
editing such files. The defined header also addresses such issues
as whether to use the line-feed character or carriage-return/line-
feed character sequence to terminate lines in the file. Widespread
adoption and support of the header defined in this document could
substantially improve the interoperability of text and source code
files distributed across the Internet and other mediums.
Swindell [Page 1] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
1.1 Change Log
This section tracks changes made to the revisions of the Internet
Drafts of this document. It will be *deleted* when the document is
published as an RFC.
October 1, 1999: Revision 1 <draft-swindell-ptsc-hdr-01.txt>
1: Introduced limitations on the location of headers within a file
(section 5). i.e. Headers must be located within the first 60
lines or 3000 characters of a file (whichever comes first) and
within the first 160 characters of an individual line. This
should simplify the parsing of large files (or files with
unusually long lines) and encourage the placement of headers
towards the top of files and beginning of lines for maximum
visibility.
2: Clarified the header syntax requirements in section 5.
Specifically, added paragraphs to detail the required white-
space, line-feed, or beginning-of-file that must immediately
precede a valid header token and the required white-space that
must immediately follow a valid variable name. Examples of
invalid headers that may be commonly mistaken for valid headers
(e.g. user@format.com) were added.
3: Section 4 now specifies that if a "programmer's editor" adds
PT/SC headers to a file and the file is determined to be a
source code file (possibly determined by the file's extension),
the editor must embed all headers in the comment delimiters of
the appropriate programming language for the file.
4: Added section 3.3, "Existing Proprietary Solutions".
5: Eliminated NOTE2 from section 5.1. This note was determined to
be a misleading and unnecessary suggestion.
6: Replaced the term "column" with "offset" in section 3.
7: Replaced the term "User" with "Name" in the tables contained in
section 3.1.
8: Replaced the term "document" with "file" in section 4.
9: Improved the consistency of the use of the terms "define" and
"specify" in regards to headers and format variables throughout
the document. i.e. Formatting variables are "defined", while
formatting variable values or parameters are "specified".
10: Section 5 now notes that formatting variables may be defined in
any order.
11: Copyright section was completed and miscellaneous typos were
fixed throughout the document.
Swindell [Page 2] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [2].
All numbers in this document a represented in decimal (base 10)
format unless otherwise noted.
3. Introduction
The tab key is typically used in plain (ASCII) text editors as a
fast and convenient way to align text on predetermined column
boundaries (tab-stops). Upon editing or creating a file, when the
tab key is pressed, the editor will usually place a horizontal-tab
(ASCII 9) character in the current insert position and move the
cursor forward to the next tab-stop position (i.e. indention). The
offset (tab-size) of each subsequent tab-stop is usually
configurable in the editor, though the default value of such a
parameter may be different from one editor to the next (typically in
the range of two to ten character positions). Additionally, some
editors allow asymmetric tab-stops (variable tab-size) where for
example, the first tab-stop may be at offset five and the second at
offset eight.
3.1 The Problem
The problem occurs when the file is printed, viewed, or loaded into
a different editor, or perhaps loaded into the same editor, but with
a different tab-size or tab-stop configuration. The resulting file
image may or may not resemble the original formatting of the file.
This is especially critical for the legibility of program and script
source code (C/C++, Java, HTML, etc.) files.
The problem is especially apparent in a multi-author environment,
where inevitably, different authors will have their editors and
other text processing applications configured with different tab-
dependent formatting parameters, resulting in what is affectionately
referred to as "tab-hell".
Example: Bob creates the following text file with his editor
configured with eight space (symmetric) tab-stops:
+-------+-------+-------+---------------+---------------+
| Name | Meat | Dairy | Favorite Food | Favorite Band |
+-------+-------+-------+---------------+---------------+
| Bob | No | Yes | Salad | Meat Loaf |
| Sally | Yes | No | Burrito | Cream |
| Mike | Yes | No | Pasta | Vanilla Fudge |
+-------+-------+-------+---------------+---------------+
Swindell [Page 3] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
Julie would like to add herself to the table, so she loads the file
into her editor, which happens to be configured for two space tab-
stops (her preference). This is what Julie is presented with:
+-------+-------+-------+---------------+---------------+
| Name | Meat | Dairy | Favorite Food | Favorite Band |
+-------+-------+-------+---------------+---------------+
| Bob | No | Yes | Salad | Meat Loaf |
| Sally | Yes | No | Burrito | Cream |
| Mike | Yes | No | Pasta | Vanilla Fudge |
+-------+-------+-------+---------------+---------------+
Confused, but determined, Julie adds herself to the table and prints
the file for the caterer and DJ of the upcoming company party:
+-------+-------+-------+---------------+---------------+
| Name | Meat | Dairy | Favorite Food | Favorite Band |
+-------+-------+-------+---------------+---------------+
| Bob | No | Yes | Salad | Meat Loaf |
| Sally | Yes | No | Burrito | Cream |
| Mike | Yes | No | Pasta | Vanilla Fudge |
| Julie | Yes | Yes | Milk | Roast Beef |
+-------+-------+-------+---------------+---------------+
Needless to say, the party is a disaster: Bob's a vegetarian, Sally
and Mike are lactose intolerant, and the DJ brings only Modern Dance
music.
This is obviously an extreme hypothetical example. Typically, tab-
related formatting problems are more of an esthetics than a
logistics problem, but you get the idea.
NOTE: This document must be viewed or printed using a non-
proportional font for the above tables to appear as intended.
3.2 Existing Work-around
Many existing text editors offer the option of writing the
appropriate number of space (ASCII 32) characters to a file in place
of horizontal-tab characters. While such an option can be a
functional work-around to the problem for files created and
subsequently edited with an editor configured in this manner, it
does nothing to solve the problem of correctly displaying, printing,
or editing files that utilize horizontal-tab characters.
Subsequent editing of a file that uses spaces in place of
horizontal-tab characters may still adversely affect the formatting
of the file if the author utilizes the tab key for indention and
their editor is configured with different tab-stop parameters than
the original editor configuration.
Swindell [Page 4] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
Utilizing such an option also eliminates the possibility of
convenient indentation/column resizing by simply adjusting the tab-
stop configuration. Additionally, most editors allow quick cursor
movement through white-space in a file that utilizes the horizontal-
tab character (typically, one arrow-key press per tab-stop), while
white-space in files that use spaces in place of horizontal-tabs
must be navigated one character position at a time (e.g. eight
arrow-key presses per tab-stop). An increasingly minor
consideration is the fact that files (particularly source code
files) that replace horizontal-tab characters with spaces can
require as much as twenty percent more storage space than files that
utilize horizontal-tab characters.
3.3 Existing Proprietary Solutions
There are developers of plain text editors that have identified the
need for a solution to this problem and have addressed the problem
by allowing authors to define tab-stops and other formatting
parameters in the contents of their files. Unfortunately, all known
solutions have been implemented in a proprietary, incomplete, non-
extensible, or non-adoptable manner. For this reason, these
solutions are not often found in existing text or source code files
and are not expected to garner widespread use in the future.
3.4 Proposed Common Solution
While common computer users have migrated toward modern word
processors and their elaborate document formats, this problem,
although seemingly obscure, remains a thorn in the side of the
minority of users who must still deal with plain text files.
Ironically, the one group of computer users who are most affected by
this problem are programmers, the same ones who are in a position to
solve it by implementing a common solution.
The solution proposed in this document is a plain text/source code
(PT/SC) file header, whose primary purpose is to specify the tab-
dependant formatting parameters to be used when printing,
displaying, or editing such files. Other common text file
formatting issues (such as whether to use the line-feed character or
carriage-return/line-feed character sequence to terminate lines) are
also addressed in this header.
Swindell [Page 5] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
4. Application
It is hoped that a significant percentage of the developers of plain
text editors, specifically those designed for use by programmers,
will adopt this proposal. Adoption would include parsing any PT/SC
headers (if present) in files opened in the editor and setting the
formatting parameters accordingly and optionally, adding PT/SC
headers (if not already present) to files written to disk.
A supporting editor MUST update any PT/SC headers (if present) with
the current formatting parameters when a file is written to disk. A
supporting editor SHOULD allow the option of adding PT/SC headers to
a file (if not already present) with all relevant formatting
parameter values specified.
A supporting programmer's editor (an editor designed specifically
for use by programmers) MUST embed all PT/SC headers in the comment
delimiters of the appropriate language for the file. The
appropriate language may be determined by the file's extension (e.g.
".c", ".pas", ".bas", etc.).
Text editors that support multiple concurrently opened files MUST
support a unique set of formatting parameters for each opened file.
Many editors already support a unique set of parameters based on the
extension (e.g. ".c", ".pas", ".txt") of the opened file. Such a
feature would need to be extended to set the appropriate formatting
parameters based on the values specified in any PT/SC headers
present.
Two-way support is defined as that of linking the PT/SC header
values and corresponding configuration menu options (if applicable)
such that a value changed in the file is reflected in the
configuration menu and vice versa. Two-way support is RECOMMENDED,
but not required.
It is also desirable that developers of applications designed to
view, print, or modify in anyway plain text or source code files
adopt this proposal. Such applications include (but are not limited
to) version control systems, file comparison utilities, syntax
verification utilities, source-level debuggers, and universal
document viewing and printing utilities.
Authors of plain text and source code files need not wait for PT/SC
header support in text editors. The simplicity of the PT/SC header
format allows authors to significantly help one another by at least
"documenting" the original formatting parameters by hand-coding the
PT/SC header so that other users and co-authors of such files need
not "guess" at the correct formatting parameters. And when
applications supporting the PT/SC header become available, existing
documents and source code files will immediately benefit from the
automatic adjustment of formatting parameters based on the pre-
existing PT/SC headers.
Swindell [Page 6] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
5. Header Format
<BOF or LF or white-space>@format.<variable><white-space><value>
Where <BOF> is the beginning of the file (offset 0), <LF> is the
line-feed (ASCII 10) character, <white-space> is one or more space
(ASCII 32) or horizontal-tab (ASCII 9) characters, <variable> is one
of the supported format variable names (see section 6), and <value>
is one or more desired values (separated by white-space) in the
appropriate format for the corresponding variable.
Example:
This is MyFile.txt, @format.tab-size 8
A space character, horizontal-tab character, line-feed character, or
the beginning of the file MUST immediately precede the "@format."
header token. Additionally, a supported format variable name MUST
immediately follow the header token. For example, the text string
"user@format.com" SHALL NOT be interpreted as a valid PT/SC header
because it does not meet either of these requirements.
A space or horizontal-tab character MUST immediately follow a
supported format variable name (e.g. " @format.tab-size: 8" is not a
valid header).
The "@format." header token and the supported format variable names
SHALL NOT be case sensitive (e.g. "@FoRmAt." is a valid header
token).
Decimal numeric values SHALL NOT be zero-padded (e.g. "08" is an
invalid decimal value). Hexadecimal numeric values MAY be zero-
padded (e.g. "0x08" is a valid hexadecimal value).
PT/SC headers MUST appear within the first 60 lines or 3000
characters of a file (whichever comes first) and they SHOULD be
located as close to the beginning of the file as possible (hence the
use of the term "header").
PT/SC headers MUST appear within the first 160 characters of an
individual line and, when possible, SHOULD appear within the first
80 characters for maximum visibility.
It is RECOMMENDED that no horizontal-tab characters precede the
definition of any tab-related variables (if present) and no end-of-
line characters or character sequences precede the definition of the
new-line variable (if present). If, for example, horizontal-tab
characters precede the definition of the tab-size or tab-stops
variables, they may not be expanded correctly if the file is
processed in a single pass (e.g. first line read, processed,
printed, next line read, processed, printed, etc.).
Swindell [Page 7] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
Multiple formatting variables may be defined by including multiple
headers. Multiple headers may be included in any order. Multiple
headers may be included on a single line:
This is MyFile.txt, @format.tab-size 8, @format.new-line crlf
or multiple lines:
This is MyFile.txt, @format.new-line crlf
@format.tab-size 8
Format variables SHOULD NOT be multiply defined. If a format
variable is defined more than once in a file (e.g. this document),
only the first occurrence SHALL be interpreted as valid.
5.1 Headers in Source Code Files
PT/SC headers may be embedded in program or script source code files
by including the header in the comment delimiters of the appropriate
language for the file.
Examples:
/* MyProgram.c, @format.tab-size 4 */
// MyProgram.cpp, @format.tab-size 3
// @format.use-tabs true
<!-- MyPage.html, @format.tab-size 2 -->
REM MyProgram.bas, @format.tab-size 8
{ MyProgram.pas, @format.tab-size 4 }
; MyProgram.asm, @format.tab-size 4
# MyProgram.mak, @format.tab-size 4
/**
* MyProgram.java, JavaDoc comment
* @author R. R. Swindell
* @version 1.00
* @format.tab-size 4
* @format.use-tabs true
*/
Since the comment delimiters are not part of the header format,
PT/SC headers are not restricted to a specific set of programming or
scripting languages and should remain compatible with any future
languages provided they allow for free-form in-line comments.
Swindell [Page 8] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
NOTE: PT/SC headers in source code files define formatting
parameters for the display, editing, or printing of the source code
itself and not the output of the resulting program or script.
Although certain languages (e.g. HTML) could benefit from a
standardized method of specifying formatting parameters for output
(specifically tab-size/tab-stops), such a definition is beyond the
scope of this document.
6. Format Variables
Supported PT/SC Format Variables:
tab-size
tab-stops
indent-size
line-length
new-line
use-tabs
If any of the supported format variables are not defined by a PT/SC
header in a file, the value of the corresponding format parameter
(if supported by the application) SHALL be left in its default or
user-configured state unless otherwise noted.
NOTE: Format variable names SHALL NOT be case sensitive (i.e. "tab-
size" and "TAB-Size" are both valid format variable names).
6.1 tab-size
The <tab-size> variable was the initial inspiration for the PT/SC
header and remains its primary purpose. The <tab-size> variable is
used to specify the symmetric offset of each tab-stop in the file
(in non-proportional character widths).
Syntax: @format.tab-size<white-space><value>
Where <value> is a positive decimal (base 10) number in the
range of 1 to 60.
Example: @format.tab-size 4
Would result in tab-stops at offsets (from the beginning of
each line) of 4, 8, 12, 16, 20, 24, etc.
If asymmetric tab-stops are supported by the application and the
<tab-stops> variable is defined, the application SHALL ignore any
definition of the <tab-size> variable and use the values specified
for the <tab-stops> variable instead.
Swindell [Page 9] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
6.2 tab-stops
The <tab-stops> variable is to be used in files that utilize
asymmetric tab-stops. The <tab-size> variable MAY still be defined
as a back up in the case of applications that do not support
asymmetric tab-stops.
Syntax: @format.tab-stops<white-space><value><white-space><value>...
Where each <value> is a positive decimal (base 10) number in
the range of 1 to 255, increasing in order (e.g. 4 8 10). A
minimum of two (2) values MUST be specified. A maximum of
forty (40) values may be specified. All values MUST be
separated by white-space.
Any tab-stops on a line beyond the offset of the last specified tab-
stop MUST be interpreted as symmetric tab-stops with the width
determined by the difference of the last two (2) specified tab-
stops.
Example: @format.tab-stops 4 8 10
Would result in tab-stops at offsets (from the beginning of
each line) of 4, 8, 10, 12, 14, 16, 18, etc.
If all tab-stops are symmetric, this variable MUST NOT be defined
and the <tab-size> variable MUST be defined instead.
6.3 indent-size
The <indent-size> variable is used in cases where the editor
supports an indent-size configuration option and it has been
configured with a value different than the configured tab-size. The
<indent-size> variable is used to specify the symmetric offset of
each indent-stop in the file (in non-proportional character widths).
Indent-stops are very similar to symmetric tab-stops, except that
they are used only in the editing of the file; they are not used in
the display or printing of the file. If the <indent-size> variable
is defined and the editor is configured to use horizontal-tab
characters, it MUST use a combination of horizontal-tab and space
characters to indent the proper number of character positions when
the tab key is pressed. If the editor supports an indent-size
configuration option, but the <indent-size> variable has not been
defined and the <tab-size> variable has, the indent-size option
shall be set to the value specified for the <tab-size> variable.
Syntax: @format.indent-size<white-space><value>
Where <value> is a positive decimal (base 10) number in range
of 1 to 60.
Example: @format.ident-size 4
Swindell [Page 10] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
6.4 line-length
The <line-length> variable is used to specify the maximum allowable
individual line length (excluding any end-of-line character
sequences). It is used in cases where the editor has been
configured to enforce a right-hand margin. In cases where the
editor does not support a right-hand margin, this variable may be
defined in a header by the author to notify the file's co-authors of
the desired maximum line length.
Syntax: @format.line-length<white-space><value>
Where <value> is a positive decimal (base 10) number in the
range of 1 to 255.
Example: @format.line-length 79
6.5 new-line
The <new-line> variable is used to specify the character sequence
that signifies the end-of-line. This character sequence will be
used to determine the end of each line when the file is read and to
terminate individual lines when the file is printed or written to
disk.
Syntax: @format.new-line<white-space><value>
Where <value> is one or more decimal (base 10) numbers in the
range of 0 to 255 or hexadecimal (base 16) numbers (signified
by a "0x" prefix) in the range of 0x00 to 0xff. A maximum of
forty (40) values may be specified. If multiple numeric values
are specified, they MUST be separated by white-space.
The keywords "CR" and "LF" may also be used in place of a
numeric value to signify the carriage-return (ASCII 13) and
line-feed (ASCII 10) characters respectively. If the "CR" and
"LF" keywords are used, they need not be separated by white-
space. The keywords are not case sensitive.
Example: @format.new-line lf
Would result in lines being terminated by the ASCII line-feed
character upon input or output.
Example: @format.new-line crlf
Would result in lines being terminated by the ASCII carriage-
return/line-feed character sequence upon input or output.
Swindell [Page 11] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
6.6 use-tabs
The <use-tabs> variable is used to specify whether the editor was
configured to write horizontal-tab (ASCII 9) characters to the file
or use the appropriate number of space (ASCII 32) characters in
place of each horizontal-tab character (see section 3.2).
Syntax: @format.use-tabs<white-space><value>
Where <value> is one of the following keywords (without
quotes): "TRUE", "FALSE", "ON", "OFF", "YES", or "NO".
The keywords "TRUE", "ON", and "YES" specify that horizontal-
tab characters are to be written to the file. The keywords
"FALSE", "OFF", and "NO" specify that the appropriate number of
space characters are to be used in place of each horizontal-tab
character when the file is read from or written to disk. The
keywords are not case sensitive.
Example: @format.use-tabs true
7. Formal Syntax
The following syntax specification uses the augmented Backus-Naur
Form (BNF) and Core Rules as described in RFC 2234 [3].
file = [header] *(*CHAR [escape header])
header = "@format." variable values
escape = WSP / LF
variable = "tab-size" / "tab-stops" / "indent-size" /
"line-length" / "new-line" / "use-tabs"
values = 1*40((1*WSP) value)
value = numeric / keyword
numeric = 1*3DIGIT / ("0x" 1*2HEXDIG)
keyword = "true" / "false" / "on" / "off" / "yes" / "no" /
"cr" / "lf"
8. Security Considerations
There are no known security issues with the solution proposed in
this document.
Swindell [Page 12] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
9. References
[1] ANSI X3.4-1986, "US-ASCII Coded Character Set--7-Bit American
Standard Code for Information Interchange".
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[3] Crocker, D., "Augmented BNF for Syntax Specifications: ABNF",
RFC 2234, November 1997.
10. Author's Addresses
Robert R. Swindell
Wind River Systems, Inc.
3961 MacArthur Blvd., Suite 212
Newport Beach, CA 92660
United States of America
Email: swindell@windriver.com
Swindell [Page 13] Expires April 2000
Internet Draft Plain Text/Source Code File Header October 1999
Full Copyright Statement
"Copyright (C) The Internet Society 1999. All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Swindell [Page 14] Expires April 2000
PgDn/PgUp · Home/End · Q=Back · Raw
0%