6.2 New Line
In the old days,
developers built applications for terminal and simple daisy-wheel
feed printers. They had agreed on the ASCII standard for 7-bit text
encoding, with the eighth bit reserved for system specific uses (such
as character-based graphics). These developers neglected, however, to
specify the precise encoding for generating a
new line. Some
systems used a carriage return (CR) to return the
printer head to the start of a new line, and then a line feed (LF) to
tell the printer to roll up the paper a line.
However, many developers decided that using two characters for a line
feed was wasteful and redundant. This led to the use of
either a CR or LF code (but not both) to
indicate the
end of a line. For these developers,
the single character was sufficient to tell the printer or terminal
character generator that a new line should be generated. Of course,
fragmentation occurred and applications didn't
always use the same line feed character, or didn't
correctly interpret documents and applications that used a different
character than they were programmed to interpret.
Since then, we've moved to a world of WYSIWYG and
GUI, where users typically associate the return key with a new
paragraph break, not a new line. Today, the
Windows environment is standardized on
the CR/LF value (the original
double-character line feed), the
Classic Mac OS is standardized on the CR
value, and the Unix world on LF. As you can see, this is
the worst possible scenario梩hree major platforms with three
different line feed standards. Therefore, a Java developer
doesn't know which of these bits actually renders
the proper logical result. Since Java is intended to be a
multiplatform language, this situation can be quite a problem.
Fortunately, Java
developers have a standard mechanism that queries the
system's properties for the current
system's correct value:
System.getProperty("line.separator",".");
However, this mechanism doesn't help text-file users
copy one system to another. Many of today's popular
text editors take a "best guess" by
scanning through the document until they find a CR, LF, or CR/LF
sequence, and then assuming that what they find is the proper new
line sequence for the file. This can lead to problems, however, if
the user opens the file with one line feed syntax and then pastes in
data from an application that uses a different line feed syntax.
For general text processing, the best solution is to keep track of
the original line break preference of the text document, normalize
the line breaks in memory to the platform standard, and then convert
the output back to the original when the document is saved. You may
wish to expose new line preferences to the user as well. This means
that you have to work harder at opening and saving documents. Opening
now involves an initial scan to get the line feed syntax, a possible
conversion, and then any normal opening steps; saving involves the
same process in reverse. However, your users will never notice your
work (which may seem frustrating) and never have problems with your
applications (which is definitely good).
You will also encounter this issue in the source files of the code
you write. A variety of tools is available for dealing with this,
including several programming text editors for Mac OS X and other
platforms that can deal with these issues seamlessly. If
you're aware of the problem, though,
it's much easier to avoid.
|