[NTLUG:Discuss] Re: good "book" format for html? -- DocBook is more simple, more universal

Kevin Brannen kbrannen at pwhome.com
Sun Nov 28 23:44:37 CST 2004


Bryan J. Smith wrote:

...

>Kevin Brannen wrote:  
>  
>
>>BTW, I'm really not searching for docbook and the other formats that 
>>would require me to convert the HTML into something else.  I like HTML
>>because of it's simplicity and univerality.
>>    
>>
>
>I'm kinda scratching my head on what the problem is, what the objectives
>are and why the interest in HTML.  HTML is such a poor XML standard
>format for books.  It offers virtually no structure and capability for
>books.
>  
>

I'm not sure what "structure and capability" you mean that HTML doesn't 
have.  You'll need to define your terms.

HTML has structure to me.  It can markup (& do the right thing for) 
paragraphs, show headers for chapters, and can create TOC and an index, 
not to mention HR for section separators if you need them.  It has all 
the basic capabilities (text markup features) necessary for the basic 
book/article/HowTo.  It can change fonts (size, color, style), justify, 
and you can insert images (more on this in a second).  With the 
exception of some niche needs (e.g. math & music books, plus maybe a few 
other specialties which I'm not worrying about), I have yet to run into 
anything not in HTML that I need.  Of course, MathML may come to the 
rescue for the math niche.

Imagine you're a Robert Heinlein and you have a new novel (e.g. "Space 
Cadet" picking a book at random off my shelf).  For something like that, 
what do you need that is not in HTML?  Now if you were  JRR Tolkien, 
you'd have a problem with the Elvish fonts, but if someone came up with 
that font and distributed it widely enough, even the LOTR would not be a 
problem to do in HTML, except maybe the Appendies but images would 
probably come to rescue there for the genology trees.

Why do I care?  Because my HowTo's turn into novellas. :-)  Not to 
mention I write my own short stories in my free time, as well as collect 
ebooks (preferablly in open formats which limits what I can buy but 
that's life).

>SGML, DocBook SGML/XML, OpenOffice XML Writer and others are far
>better.  They have very strict formatting.  And they convert very well
>to/from many other formats.  They are largely eternal.
>  
>

I don't need an 18 blade Swiss army knife, my little 4 bladed one (HTML 
:-) works just fine.

>HTML is not.  It not lacks standardization other than basic headers, but
>it really wasn't designed for publication at all.
>  
>

Don't care what it was designed for.  It has a standard (4.0), the 
standard meets my needs (with 1 exception).  I don't even need CSS, 
though I recognize that's there if I need it.


Bryan J. Smith wrote:

>Kevin Brannen wrote:  
>  
>
>>BTW, I'm really not searching for docbook and the other formats that 
>>would require me to convert the HTML into something else.  I like HTML
>>because of it's simplicity and univerality.
>>    
>>
>
>I just gotta revisit this.  I'm not trying to be critical, but DocBook
>is even _simpler_ than HTML.  You focus on the structure, not the
>combined structure/format like HTML.  And it's cake to piece together.
>
>That's why it's the most ideal for distributed documentation.
>  
>

And what does it take to create Docbook?  Last I checked when looking at 
"printing" the Subversion docs (from docbook source), that took a mass 
of software -- not to mention a LOT of time and effort on my part.  HTML 
is simple, an editor is all I need.  Heaven forbid I should have to use 
Notepad, but even that is good enough if one must.  Simplicity is good.  
(Actually, I use the HTML editor in Mozilla long before I'd use Notepad. :-)

Please note, that's the only example of Docbook I've been exposed to.  
It does look some somewhat similiar to HTML, but it has more tags so I'm 
sorry, but it is NOT simplier than HTML.  I'm willing to allow for the 
Subversion folks to have used all the hard parts of docbook, or to have 
defined new tags (if it can do that) to make it harder, but I'm not 
going to accept that it's simplier than HTML.  Nope, just not gonna do 
it, it just wouldn't be prudent... :-)

>Format is then done with the application of stylesheets to DocBook. 
>Stylesheets generate various "publication" output like HTML, PDF, RTF,
>etc...  So it is not only easier to write, but easily "published" into
>another form with *1* command.
>
>This is major reason why the Linux Documentation Project (LDP) uses
>DocBook as its standard.  And why people write parsers for such
>documentation maintenance in DocBook XML, instead of HTML.
>  
>

If 1 source multiple outputs was my goal, I'd be using LaTeX. :-)  I've 
used it before and have been quite happy with it when I needed it.  
Again, wrong goal for me.

>Think of HTML like you do Postscript/PDF.  It's an end-user
>_publication_ format.  It's too "free-form" and "unstructured" for
>maintaining documentation in.  Just like it is very _difficult_ to
>convert Postscript/PDF back to an "editable" form, the same is true for
>HTML.
>  
>

Understand, I do convert PDF back to HTML because I know I can use HTML 
anywhere anytime -- that might not be true for PDF no matter what Adobe 
claims.  It's a bit rough and difficult to figure out where paragraphs 
are, but I do manage because the original source doesn't have much more 
"structure" than paragraphs and chapter headers.

>...
>You can't universally convert from HTML to anything else, at least not
>with any consistency.
>  
>

That's the great thing about HTML, there's no need to convert it to 
anything.  There's a boatload of programs that can use it as is.  It is 
a useful standard all on its own.

>-- Bryan
>
>P.S.  If you're still really not interested in DocBook XML, then I'd
>really look at OpenOffice XML Writer.  You _can_ crank out it's XML
>quite easily too.  And then apply style in other XML meta-data and
>package it all in a _single_ ZIP file (that's was SX- files are ;-).
>
>There is a pretty big reason why Boeing was the major end-user sponsor
>of the OASIS standardization of OpenOffice XML by Sun.  Boeing is
>probably the world's leading for-profit producer of technical
>documentation, and really sets the standard.
>  
>

Nope, still too complex.  I have a program or two that can understand 
and use HTML as input.  But it would have no clue as to what to do with 
docbook, or XML, or OO XML, or anything else, other than straight text.  
You really are trying to solve a problem that doesn't exist for me. :-)

>But DocBook sounds like the most ideal.  If you want simplicity and
>universality, DocBook is the best of both worlds.  Especially if you are
>going to write parsers for it.
>  
>

Nope, no parsers either.  Just display it on multiple OS's (Linux, Palm, 
& MS if you care).

The 1 program mentioned that wants HTML is a program that converts it to 
PDB files for my Sony Clie PDA.  This is extremely important to me, 
hence why the docbook and other things are not useful to me.  I truly 
only do need P, B, I, U, FONT, IMG, and A.  In a rare case you might 
also show me where I need TABLE, but that would be for a table of data 
and not for alignment/structure.  Oh, I can't forget the occassional 
BLOCKQUOTE for indenting. :-)

The 1 and only 1 problem with HTML is it's inability to store supporting 
files/data within it so you can have everything for a doc in only 1 file 
-- for convenience.  From what little I know about the XML standard, 
what I want is for HTML to have something like XML's <CDATA> tag, then 
encode the binary file into that -- which also means the various 
browsers and programs would need to understand what to do with that.

Maybe what I need to do is move to XHTML which has a CDATA tag; 
unfortunately, it's for scripts and not images AFAICT.  But in XHTML I 
could define my own module to take care of that, but since that's not 
supported everywhere, I'm not sure what good it would do me. :-(

Also, please don't underestimate the ubiquity of HTML; that is a major 
plus.  I can hand a HTML doc to someone on a Linux, MS, Mac, whatever 
box and they can read it right now, as is, no conversion required by me 
or them.  That is incredibly convenient!  Can't do that with docbook, OO 
XML, or anthing else except plain old ASCII, which doesn't have markup 
capability.  Hence, HTML is ideal; source --> browser --> WYSIWYG 
output, nothing else needed.

I appreciate the discussion.  Maybe I'll have to look at docbook more 
indepth someday, but for now, HTML is the answer to my question.  And 
Konqueror's ability to read TGZ or ZIP files of HTML & support files 
solves most of the 1 problem (at least for me, I'll have to work on a 
more universal way later).  I hope this helps explain where I'm coming 
from and trying to go to.

Maybe MS is right and CHM files are the way to go... ;-)  [OK, that's 
heavily into humor, but if MS would document the CHM format, that might 
become a serious statement.  The OSS chmlib people have done a good job 
of trying to document the format, but it's still unofficial.]

Kevin



More information about the Discuss mailing list