[NTLUG:Discuss] w3m question

Fred James fredjame at concentric.net
Fri Jun 28 07:00:49 CDT 2002


Application: Using w3m to convert an html file to an ASCII text file
Examples of command lines tested:
       ./w3m filename.html > filename.txt
       ./w3m -t 1 filename.html > filename.txt
       ./w3m -dump -cols 5 filename.html > filename.txt
       ./w3m -dump -cols 150 filename.html > filename.txt
       ./w3m -dump -cols 1 filename.html > filename.txt
Input test file is about 142 characters wide.

Problem 1: Wrap
On all but the example using "-cols 5", some lines had the last filed of 
the record placed flush right, at the right hand end of a new line.
Example (view this example in fixed with font, or imagine that field_13 
is directly under field_12):
     field_1  field_2 field_3 field_4 ... field_12
                                          field_13

Problem 2: Tabular format
Tabular format is somewhat preserved, that is the fields float a little, 
in all cases.
Also, another field wraps within itself, as if with in a cell of a 
table, though the html file (opened in a browser) does not display this 
behavior (this is happening consistently to one specific field within 
each record/line).

Questions:
Anyone have any experience, knowledge, or suggestions to correct or 
modify these format issues?

Additional comments
(1) Input is a report generated in html format
(2) Output must be ASCII text
(3) w3m is the first converter I have tried that (somewhat) preserves 
the tabular format, and can handle larger files (test input is 15,558 
lines, 676,417 bytes)
(4) Documentation seems to be centered on using w3m as a browser, with a 
slight node to using it as a file converter.

Thank you to all who have offered help on this subject so far, and 
thanks in advance to all who may be able to offer more assistance.

PS: None of these are show stoppers so far, for our application, but I 
anticipate that they may become.

-- 
"Wisdom begins in wonder." - Socrates





More information about the Discuss mailing list