Individual R. Swindell Internet Draft Wind River Systems Document: October 1999 Category: Informational Expires April 2000 Plain Text/Source Code File Header Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Distribution of this memo is unlimited. Copyright (C) The Internet Society 1999. All Rights Reserved. 1. Abstract Anyone that has dealt at length with plain (ASCII [1]) text and source code files can testify that the lack of a global definition of the effect of the horizontal-tab character, all too often, causes ill-formed display and printed output of plain text files that utilize the horizontal-tab character for formatting. This document defines a common header for plain text and source code (PT/SC) files, whose primary purpose is to specify the tab-dependant formatting parameters to be used when displaying, printing, or editing such files. The defined header also addresses such issues as whether to use the line-feed character or carriage-return/line- feed character sequence to terminate lines in the file. Widespread adoption and support of the header defined in this document could substantially improve the interoperability of text and source code files distributed across the Internet and other mediums. Swindell [Page 1] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 1.1 Change Log This section tracks changes made to the revisions of the Internet Drafts of this document. It will be *deleted* when the document is published as an RFC. October 1, 1999: Revision 1 1: Introduced limitations on the location of headers within a file (section 5). i.e. Headers must be located within the first 60 lines or 3000 characters of a file (whichever comes first) and within the first 160 characters of an individual line. This should simplify the parsing of large files (or files with unusually long lines) and encourage the placement of headers towards the top of files and beginning of lines for maximum visibility. 2: Clarified the header syntax requirements in section 5. Specifically, added paragraphs to detail the required white- space, line-feed, or beginning-of-file that must immediately precede a valid header token and the required white-space that must immediately follow a valid variable name. Examples of invalid headers that may be commonly mistaken for valid headers (e.g. user@format.com) were added. 3: Section 4 now specifies that if a "programmer's editor" adds PT/SC headers to a file and the file is determined to be a source code file (possibly determined by the file's extension), the editor must embed all headers in the comment delimiters of the appropriate programming language for the file. 4: Added section 3.3, "Existing Proprietary Solutions". 5: Eliminated NOTE2 from section 5.1. This note was determined to be a misleading and unnecessary suggestion. 6: Replaced the term "column" with "offset" in section 3. 7: Replaced the term "User" with "Name" in the tables contained in section 3.1. 8: Replaced the term "document" with "file" in section 4. 9: Improved the consistency of the use of the terms "define" and "specify" in regards to headers and format variables throughout the document. i.e. Formatting variables are "defined", while formatting variable values or parameters are "specified". 10: Section 5 now notes that formatting variables may be defined in any order. 11: Copyright section was completed and miscellaneous typos were fixed throughout the document. Swindell [Page 2] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. All numbers in this document a represented in decimal (base 10) format unless otherwise noted. 3. Introduction The tab key is typically used in plain (ASCII) text editors as a fast and convenient way to align text on predetermined column boundaries (tab-stops). Upon editing or creating a file, when the tab key is pressed, the editor will usually place a horizontal-tab (ASCII 9) character in the current insert position and move the cursor forward to the next tab-stop position (i.e. indention). The offset (tab-size) of each subsequent tab-stop is usually configurable in the editor, though the default value of such a parameter may be different from one editor to the next (typically in the range of two to ten character positions). Additionally, some editors allow asymmetric tab-stops (variable tab-size) where for example, the first tab-stop may be at offset five and the second at offset eight. 3.1 The Problem The problem occurs when the file is printed, viewed, or loaded into a different editor, or perhaps loaded into the same editor, but with a different tab-size or tab-stop configuration. The resulting file image may or may not resemble the original formatting of the file. This is especially critical for the legibility of program and script source code (C/C++, Java, HTML, etc.) files. The problem is especially apparent in a multi-author environment, where inevitably, different authors will have their editors and other text processing applications configured with different tab- dependent formatting parameters, resulting in what is affectionately referred to as "tab-hell". Example: Bob creates the following text file with his editor configured with eight space (symmetric) tab-stops: +-------+-------+-------+---------------+---------------+ | Name | Meat | Dairy | Favorite Food | Favorite Band | +-------+-------+-------+---------------+---------------+ | Bob | No | Yes | Salad | Meat Loaf | | Sally | Yes | No | Burrito | Cream | | Mike | Yes | No | Pasta | Vanilla Fudge | +-------+-------+-------+---------------+---------------+ Swindell [Page 3] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 Julie would like to add herself to the table, so she loads the file into her editor, which happens to be configured for two space tab- stops (her preference). This is what Julie is presented with: +-------+-------+-------+---------------+---------------+ | Name | Meat | Dairy | Favorite Food | Favorite Band | +-------+-------+-------+---------------+---------------+ | Bob | No | Yes | Salad | Meat Loaf | | Sally | Yes | No | Burrito | Cream | | Mike | Yes | No | Pasta | Vanilla Fudge | +-------+-------+-------+---------------+---------------+ Confused, but determined, Julie adds herself to the table and prints the file for the caterer and DJ of the upcoming company party: +-------+-------+-------+---------------+---------------+ | Name | Meat | Dairy | Favorite Food | Favorite Band | +-------+-------+-------+---------------+---------------+ | Bob | No | Yes | Salad | Meat Loaf | | Sally | Yes | No | Burrito | Cream | | Mike | Yes | No | Pasta | Vanilla Fudge | | Julie | Yes | Yes | Milk | Roast Beef | +-------+-------+-------+---------------+---------------+ Needless to say, the party is a disaster: Bob's a vegetarian, Sally and Mike are lactose intolerant, and the DJ brings only Modern Dance music. This is obviously an extreme hypothetical example. Typically, tab- related formatting problems are more of an esthetics than a logistics problem, but you get the idea. NOTE: This document must be viewed or printed using a non- proportional font for the above tables to appear as intended. 3.2 Existing Work-around Many existing text editors offer the option of writing the appropriate number of space (ASCII 32) characters to a file in place of horizontal-tab characters. While such an option can be a functional work-around to the problem for files created and subsequently edited with an editor configured in this manner, it does nothing to solve the problem of correctly displaying, printing, or editing files that utilize horizontal-tab characters. Subsequent editing of a file that uses spaces in place of horizontal-tab characters may still adversely affect the formatting of the file if the author utilizes the tab key for indention and their editor is configured with different tab-stop parameters than the original editor configuration. Swindell [Page 4] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 Utilizing such an option also eliminates the possibility of convenient indentation/column resizing by simply adjusting the tab- stop configuration. Additionally, most editors allow quick cursor movement through white-space in a file that utilizes the horizontal- tab character (typically, one arrow-key press per tab-stop), while white-space in files that use spaces in place of horizontal-tabs must be navigated one character position at a time (e.g. eight arrow-key presses per tab-stop). An increasingly minor consideration is the fact that files (particularly source code files) that replace horizontal-tab characters with spaces can require as much as twenty percent more storage space than files that utilize horizontal-tab characters. 3.3 Existing Proprietary Solutions There are developers of plain text editors that have identified the need for a solution to this problem and have addressed the problem by allowing authors to define tab-stops and other formatting parameters in the contents of their files. Unfortunately, all known solutions have been implemented in a proprietary, incomplete, non- extensible, or non-adoptable manner. For this reason, these solutions are not often found in existing text or source code files and are not expected to garner widespread use in the future. 3.4 Proposed Common Solution While common computer users have migrated toward modern word processors and their elaborate document formats, this problem, although seemingly obscure, remains a thorn in the side of the minority of users who must still deal with plain text files. Ironically, the one group of computer users who are most affected by this problem are programmers, the same ones who are in a position to solve it by implementing a common solution. The solution proposed in this document is a plain text/source code (PT/SC) file header, whose primary purpose is to specify the tab- dependant formatting parameters to be used when printing, displaying, or editing such files. Other common text file formatting issues (such as whether to use the line-feed character or carriage-return/line-feed character sequence to terminate lines) are also addressed in this header. Swindell [Page 5] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 4. Application It is hoped that a significant percentage of the developers of plain text editors, specifically those designed for use by programmers, will adopt this proposal. Adoption would include parsing any PT/SC headers (if present) in files opened in the editor and setting the formatting parameters accordingly and optionally, adding PT/SC headers (if not already present) to files written to disk. A supporting editor MUST update any PT/SC headers (if present) with the current formatting parameters when a file is written to disk. A supporting editor SHOULD allow the option of adding PT/SC headers to a file (if not already present) with all relevant formatting parameter values specified. A supporting programmer's editor (an editor designed specifically for use by programmers) MUST embed all PT/SC headers in the comment delimiters of the appropriate language for the file. The appropriate language may be determined by the file's extension (e.g. ".c", ".pas", ".bas", etc.). Text editors that support multiple concurrently opened files MUST support a unique set of formatting parameters for each opened file. Many editors already support a unique set of parameters based on the extension (e.g. ".c", ".pas", ".txt") of the opened file. Such a feature would need to be extended to set the appropriate formatting parameters based on the values specified in any PT/SC headers present. Two-way support is defined as that of linking the PT/SC header values and corresponding configuration menu options (if applicable) such that a value changed in the file is reflected in the configuration menu and vice versa. Two-way support is RECOMMENDED, but not required. It is also desirable that developers of applications designed to view, print, or modify in anyway plain text or source code files adopt this proposal. Such applications include (but are not limited to) version control systems, file comparison utilities, syntax verification utilities, source-level debuggers, and universal document viewing and printing utilities. Authors of plain text and source code files need not wait for PT/SC header support in text editors. The simplicity of the PT/SC header format allows authors to significantly help one another by at least "documenting" the original formatting parameters by hand-coding the PT/SC header so that other users and co-authors of such files need not "guess" at the correct formatting parameters. And when applications supporting the PT/SC header become available, existing documents and source code files will immediately benefit from the automatic adjustment of formatting parameters based on the pre- existing PT/SC headers. Swindell [Page 6] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 5. Header Format @format. Where is the beginning of the file (offset 0), is the line-feed (ASCII 10) character, is one or more space (ASCII 32) or horizontal-tab (ASCII 9) characters, is one of the supported format variable names (see section 6), and is one or more desired values (separated by white-space) in the appropriate format for the corresponding variable. Example: This is MyFile.txt, @format.tab-size 8 A space character, horizontal-tab character, line-feed character, or the beginning of the file MUST immediately precede the "@format." header token. Additionally, a supported format variable name MUST immediately follow the header token. For example, the text string "user@format.com" SHALL NOT be interpreted as a valid PT/SC header because it does not meet either of these requirements. A space or horizontal-tab character MUST immediately follow a supported format variable name (e.g. " @format.tab-size: 8" is not a valid header). The "@format." header token and the supported format variable names SHALL NOT be case sensitive (e.g. "@FoRmAt." is a valid header token). Decimal numeric values SHALL NOT be zero-padded (e.g. "08" is an invalid decimal value). Hexadecimal numeric values MAY be zero- padded (e.g. "0x08" is a valid hexadecimal value). PT/SC headers MUST appear within the first 60 lines or 3000 characters of a file (whichever comes first) and they SHOULD be located as close to the beginning of the file as possible (hence the use of the term "header"). PT/SC headers MUST appear within the first 160 characters of an individual line and, when possible, SHOULD appear within the first 80 characters for maximum visibility. It is RECOMMENDED that no horizontal-tab characters precede the definition of any tab-related variables (if present) and no end-of- line characters or character sequences precede the definition of the new-line variable (if present). If, for example, horizontal-tab characters precede the definition of the tab-size or tab-stops variables, they may not be expanded correctly if the file is processed in a single pass (e.g. first line read, processed, printed, next line read, processed, printed, etc.). Swindell [Page 7] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 Multiple formatting variables may be defined by including multiple headers. Multiple headers may be included in any order. Multiple headers may be included on a single line: This is MyFile.txt, @format.tab-size 8, @format.new-line crlf or multiple lines: This is MyFile.txt, @format.new-line crlf @format.tab-size 8 Format variables SHOULD NOT be multiply defined. If a format variable is defined more than once in a file (e.g. this document), only the first occurrence SHALL be interpreted as valid. 5.1 Headers in Source Code Files PT/SC headers may be embedded in program or script source code files by including the header in the comment delimiters of the appropriate language for the file. Examples: /* MyProgram.c, @format.tab-size 4 */ // MyProgram.cpp, @format.tab-size 3 // @format.use-tabs true REM MyProgram.bas, @format.tab-size 8 { MyProgram.pas, @format.tab-size 4 } ; MyProgram.asm, @format.tab-size 4 # MyProgram.mak, @format.tab-size 4 /** * MyProgram.java, JavaDoc comment * @author R. R. Swindell * @version 1.00 * @format.tab-size 4 * @format.use-tabs true */ Since the comment delimiters are not part of the header format, PT/SC headers are not restricted to a specific set of programming or scripting languages and should remain compatible with any future languages provided they allow for free-form in-line comments. Swindell [Page 8] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 NOTE: PT/SC headers in source code files define formatting parameters for the display, editing, or printing of the source code itself and not the output of the resulting program or script. Although certain languages (e.g. HTML) could benefit from a standardized method of specifying formatting parameters for output (specifically tab-size/tab-stops), such a definition is beyond the scope of this document. 6. Format Variables Supported PT/SC Format Variables: tab-size tab-stops indent-size line-length new-line use-tabs If any of the supported format variables are not defined by a PT/SC header in a file, the value of the corresponding format parameter (if supported by the application) SHALL be left in its default or user-configured state unless otherwise noted. NOTE: Format variable names SHALL NOT be case sensitive (i.e. "tab- size" and "TAB-Size" are both valid format variable names). 6.1 tab-size The variable was the initial inspiration for the PT/SC header and remains its primary purpose. The variable is used to specify the symmetric offset of each tab-stop in the file (in non-proportional character widths). Syntax: @format.tab-size Where is a positive decimal (base 10) number in the range of 1 to 60. Example: @format.tab-size 4 Would result in tab-stops at offsets (from the beginning of each line) of 4, 8, 12, 16, 20, 24, etc. If asymmetric tab-stops are supported by the application and the variable is defined, the application SHALL ignore any definition of the variable and use the values specified for the variable instead. Swindell [Page 9] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 6.2 tab-stops The variable is to be used in files that utilize asymmetric tab-stops. The variable MAY still be defined as a back up in the case of applications that do not support asymmetric tab-stops. Syntax: @format.tab-stops... Where each is a positive decimal (base 10) number in the range of 1 to 255, increasing in order (e.g. 4 8 10). A minimum of two (2) values MUST be specified. A maximum of forty (40) values may be specified. All values MUST be separated by white-space. Any tab-stops on a line beyond the offset of the last specified tab- stop MUST be interpreted as symmetric tab-stops with the width determined by the difference of the last two (2) specified tab- stops. Example: @format.tab-stops 4 8 10 Would result in tab-stops at offsets (from the beginning of each line) of 4, 8, 10, 12, 14, 16, 18, etc. If all tab-stops are symmetric, this variable MUST NOT be defined and the variable MUST be defined instead. 6.3 indent-size The variable is used in cases where the editor supports an indent-size configuration option and it has been configured with a value different than the configured tab-size. The variable is used to specify the symmetric offset of each indent-stop in the file (in non-proportional character widths). Indent-stops are very similar to symmetric tab-stops, except that they are used only in the editing of the file; they are not used in the display or printing of the file. If the variable is defined and the editor is configured to use horizontal-tab characters, it MUST use a combination of horizontal-tab and space characters to indent the proper number of character positions when the tab key is pressed. If the editor supports an indent-size configuration option, but the variable has not been defined and the variable has, the indent-size option shall be set to the value specified for the variable. Syntax: @format.indent-size Where is a positive decimal (base 10) number in range of 1 to 60. Example: @format.ident-size 4 Swindell [Page 10] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 6.4 line-length The variable is used to specify the maximum allowable individual line length (excluding any end-of-line character sequences). It is used in cases where the editor has been configured to enforce a right-hand margin. In cases where the editor does not support a right-hand margin, this variable may be defined in a header by the author to notify the file's co-authors of the desired maximum line length. Syntax: @format.line-length Where is a positive decimal (base 10) number in the range of 1 to 255. Example: @format.line-length 79 6.5 new-line The variable is used to specify the character sequence that signifies the end-of-line. This character sequence will be used to determine the end of each line when the file is read and to terminate individual lines when the file is printed or written to disk. Syntax: @format.new-line Where is one or more decimal (base 10) numbers in the range of 0 to 255 or hexadecimal (base 16) numbers (signified by a "0x" prefix) in the range of 0x00 to 0xff. A maximum of forty (40) values may be specified. If multiple numeric values are specified, they MUST be separated by white-space. The keywords "CR" and "LF" may also be used in place of a numeric value to signify the carriage-return (ASCII 13) and line-feed (ASCII 10) characters respectively. If the "CR" and "LF" keywords are used, they need not be separated by white- space. The keywords are not case sensitive. Example: @format.new-line lf Would result in lines being terminated by the ASCII line-feed character upon input or output. Example: @format.new-line crlf Would result in lines being terminated by the ASCII carriage- return/line-feed character sequence upon input or output. Swindell [Page 11] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 6.6 use-tabs The variable is used to specify whether the editor was configured to write horizontal-tab (ASCII 9) characters to the file or use the appropriate number of space (ASCII 32) characters in place of each horizontal-tab character (see section 3.2). Syntax: @format.use-tabs Where is one of the following keywords (without quotes): "TRUE", "FALSE", "ON", "OFF", "YES", or "NO". The keywords "TRUE", "ON", and "YES" specify that horizontal- tab characters are to be written to the file. The keywords "FALSE", "OFF", and "NO" specify that the appropriate number of space characters are to be used in place of each horizontal-tab character when the file is read from or written to disk. The keywords are not case sensitive. Example: @format.use-tabs true 7. Formal Syntax The following syntax specification uses the augmented Backus-Naur Form (BNF) and Core Rules as described in RFC 2234 [3]. file = [header] *(*CHAR [escape header]) header = "@format." variable values escape = WSP / LF variable = "tab-size" / "tab-stops" / "indent-size" / "line-length" / "new-line" / "use-tabs" values = 1*40((1*WSP) value) value = numeric / keyword numeric = 1*3DIGIT / ("0x" 1*2HEXDIG) keyword = "true" / "false" / "on" / "off" / "yes" / "no" / "cr" / "lf" 8. Security Considerations There are no known security issues with the solution proposed in this document. Swindell [Page 12] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 9. References [1] ANSI X3.4-1986, "US-ASCII Coded Character Set--7-Bit American Standard Code for Information Interchange". [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] Crocker, D., "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. 10. Author's Addresses Robert R. Swindell Wind River Systems, Inc. 3961 MacArthur Blvd., Suite 212 Newport Beach, CA 92660 United States of America Email: swindell@windriver.com Swindell [Page 13] Expires April 2000 Internet Draft Plain Text/Source Code File Header October 1999 Full Copyright Statement "Copyright (C) The Internet Society 1999. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Swindell [Page 14] Expires April 2000