Comma Separated Values

Comma-delimited text, or comma separated variables/values (CSV), is a popular format for storing tabular record data in TextFile?s. Most spreadsheets and databases support import and export to this format.

Compared to ExtensibleMarkupLanguage, CSV is more compact and simpler to parse. However, XML can do a few things CSV can't handle:

Compared to TabDelimitedTables, CSV is slightly easier to hand-edit, having a visible field separator, but is harder to mechanically parse and write than a format which uses backslash escapes, and is harder to use with Unix command-line tools.

CSV has the PowerOfPlainText compared to binary formats.

Does it really? As evidenced below, the format is so convoluted that editing one in a TextEditor or other non-dedicated tool is inviting disaster. SqLite is far more useful IME.

DelimiterSeparatedValues are equally powerful but allegedly closer to DoTheSimplestThingThatCouldPossiblyWork.


Can't CSV files really be something other than comma separated? From a microsoft web page:

If a user selects English (United States), the decimal symbol is a period (for example, 3.14). If a user selects German (Germany), the decimal symbol is a comma (for example, 3,14). Similarly, the list separator character used in .csv files is a comma (,) in the United States but a semicolon (;) in Germany.


''Someone once wrote:"

Huh? Yes, if I need to put a string into a CSV file, if a string has one or more commas anywhere in it, the string must be quoted -- but the quotes go once around the entire string. There's no need to escape each individual comma. (In XML, each and every less-than sign in the data must be escaped with <).

From what I've seen, typically CSV strings are encoded using C-style escape characters. That would actually be a DelimiterSeparatedValues file that uses comma as the delimiter.

CSV strings are also frequently encoded using two quotation marks to represent one inside the string. Also newlines in a quoted string MAY be sometimes converted to C escapes or left unconverted. It's also common for quotation marks within quotation marks to be unquoted, this is normally considered a bug but tends not to get fixed.

RFC4180 recommends quoting fields containing double-quote, comma and/or newline, and doubling double-quotes within the fields. According to RFC4180, there's nothing special about backslash.


See also: TabDelimitedTables, DelimiterSeparatedValues, ExtensibleMarkupLanguage, RelationalAlternativeToXml, FlirtDataTextFormat, SqLite


OctoberTwelve


EditText of this page (last edited October 11, 2012) or FindPage with title or text search