Some table manipulation programs. Most are conversions between different forms:
sort
, cut
, paste
, and join
work well
on simple tables of tab separated values.
Although sort
needs -t"^I"
and awk
needs -F"^I"
.
Use vi
with ":set tabstop=20
" or some larger number
for a convenient editor.
The Unix utilities often have an option to specify a delimiter other than tab.
Some other common ones are colon, vertical bar, comma, or semicolon.
HTML makes a good output format.
You can get an ASCII pretty print by piping to lynx.
e.g. tsv2html table.tsv | lynx -stdin -dump
columns.awk columns.py columns.rb csv2tsv.c csv2tsv.rb csv2tsv2.c db2tsv.awk fillHTMLfromTSV.awk fs.awk qTable.awk tsv2html tsv2html.awk tsv2html.pl tsv2html.py tsv2html.rb tsv2html.sed tsv2html3.awk tsv2htmlplus.awk unquote.awk rdb2html.awk tsv2rdb.awk tsv2txt.awk txt2tsv.awk
The Perl program, cvs2tsv.pl, doesn't work
as can be seen by testing it with test.csv
.
Compare with: ./csv2tsv <test.csv | unquote.awk -F \\t OFS=\\t | cat -vt
For other text formats see ESR's Art of Unix Programming.
Simple tables contain just data in rows and columns.
Metadata can be introduced several ways.
One could consider the HTML tags to be meta data,
but just the tr
and td
tags
are not really metadata.
attribute values in HTML tags could contain metadata.
The simplest and most common bit of metadata are column headings in the first line.
Some utilities like cut
, paste
, and even join
still work with such files.
It breaks sort
,
but try head -1 file.tsv; sed '1d' file.tsv | sort -t"^I" ...
.
Many of the above scripts are designed to allow or even expect such headings.
The th
HTML tag can be used for this.
Other metadata is sometimes started with a #
which is interepreted as a comment
to be ignored by shell scripts, Perl, Ruby, and other languages.
The above scripts do not handle this.
RFC 822 or #%key=value
can be used
to store a hash of metadata.
The above scripts do not handle this.
See also Relation ASCII.
Eric@BlossomAssociates.Net 2005-12-02