[NTLUG:Discuss] w3m question
Fred James
fredjame at concentric.net
Fri Jun 28 07:00:49 CDT 2002
Application: Using w3m to convert an html file to an ASCII text file
Examples of command lines tested:
./w3m filename.html > filename.txt
./w3m -t 1 filename.html > filename.txt
./w3m -dump -cols 5 filename.html > filename.txt
./w3m -dump -cols 150 filename.html > filename.txt
./w3m -dump -cols 1 filename.html > filename.txt
Input test file is about 142 characters wide.
Problem 1: Wrap
On all but the example using "-cols 5", some lines had the last filed of
the record placed flush right, at the right hand end of a new line.
Example (view this example in fixed with font, or imagine that field_13
is directly under field_12):
field_1 field_2 field_3 field_4 ... field_12
field_13
Problem 2: Tabular format
Tabular format is somewhat preserved, that is the fields float a little,
in all cases.
Also, another field wraps within itself, as if with in a cell of a
table, though the html file (opened in a browser) does not display this
behavior (this is happening consistently to one specific field within
each record/line).
Questions:
Anyone have any experience, knowledge, or suggestions to correct or
modify these format issues?
Additional comments
(1) Input is a report generated in html format
(2) Output must be ASCII text
(3) w3m is the first converter I have tried that (somewhat) preserves
the tabular format, and can handle larger files (test input is 15,558
lines, 676,417 bytes)
(4) Documentation seems to be centered on using w3m as a browser, with a
slight node to using it as a file converter.
Thank you to all who have offered help on this subject so far, and
thanks in advance to all who may be able to offer more assistance.
PS: None of these are show stoppers so far, for our application, but I
anticipate that they may become.
--
"Wisdom begins in wonder." - Socrates
More information about the Discuss
mailing list