[NTLUG:Discuss] Re: good "book" format for html? -- DocBook is more simple, more universal
Kevin Brannen
kbrannen at pwhome.com
Sun Nov 28 23:44:37 CST 2004
Bryan J. Smith wrote:
...
>Kevin Brannen wrote:
>
>
>>BTW, I'm really not searching for docbook and the other formats that
>>would require me to convert the HTML into something else. I like HTML
>>because of it's simplicity and univerality.
>>
>>
>
>I'm kinda scratching my head on what the problem is, what the objectives
>are and why the interest in HTML. HTML is such a poor XML standard
>format for books. It offers virtually no structure and capability for
>books.
>
>
I'm not sure what "structure and capability" you mean that HTML doesn't
have. You'll need to define your terms.
HTML has structure to me. It can markup (& do the right thing for)
paragraphs, show headers for chapters, and can create TOC and an index,
not to mention HR for section separators if you need them. It has all
the basic capabilities (text markup features) necessary for the basic
book/article/HowTo. It can change fonts (size, color, style), justify,
and you can insert images (more on this in a second). With the
exception of some niche needs (e.g. math & music books, plus maybe a few
other specialties which I'm not worrying about), I have yet to run into
anything not in HTML that I need. Of course, MathML may come to the
rescue for the math niche.
Imagine you're a Robert Heinlein and you have a new novel (e.g. "Space
Cadet" picking a book at random off my shelf). For something like that,
what do you need that is not in HTML? Now if you were JRR Tolkien,
you'd have a problem with the Elvish fonts, but if someone came up with
that font and distributed it widely enough, even the LOTR would not be a
problem to do in HTML, except maybe the Appendies but images would
probably come to rescue there for the genology trees.
Why do I care? Because my HowTo's turn into novellas. :-) Not to
mention I write my own short stories in my free time, as well as collect
ebooks (preferablly in open formats which limits what I can buy but
that's life).
>SGML, DocBook SGML/XML, OpenOffice XML Writer and others are far
>better. They have very strict formatting. And they convert very well
>to/from many other formats. They are largely eternal.
>
>
I don't need an 18 blade Swiss army knife, my little 4 bladed one (HTML
:-) works just fine.
>HTML is not. It not lacks standardization other than basic headers, but
>it really wasn't designed for publication at all.
>
>
Don't care what it was designed for. It has a standard (4.0), the
standard meets my needs (with 1 exception). I don't even need CSS,
though I recognize that's there if I need it.
Bryan J. Smith wrote:
>Kevin Brannen wrote:
>
>
>>BTW, I'm really not searching for docbook and the other formats that
>>would require me to convert the HTML into something else. I like HTML
>>because of it's simplicity and univerality.
>>
>>
>
>I just gotta revisit this. I'm not trying to be critical, but DocBook
>is even _simpler_ than HTML. You focus on the structure, not the
>combined structure/format like HTML. And it's cake to piece together.
>
>That's why it's the most ideal for distributed documentation.
>
>
And what does it take to create Docbook? Last I checked when looking at
"printing" the Subversion docs (from docbook source), that took a mass
of software -- not to mention a LOT of time and effort on my part. HTML
is simple, an editor is all I need. Heaven forbid I should have to use
Notepad, but even that is good enough if one must. Simplicity is good.
(Actually, I use the HTML editor in Mozilla long before I'd use Notepad. :-)
Please note, that's the only example of Docbook I've been exposed to.
It does look some somewhat similiar to HTML, but it has more tags so I'm
sorry, but it is NOT simplier than HTML. I'm willing to allow for the
Subversion folks to have used all the hard parts of docbook, or to have
defined new tags (if it can do that) to make it harder, but I'm not
going to accept that it's simplier than HTML. Nope, just not gonna do
it, it just wouldn't be prudent... :-)
>Format is then done with the application of stylesheets to DocBook.
>Stylesheets generate various "publication" output like HTML, PDF, RTF,
>etc... So it is not only easier to write, but easily "published" into
>another form with *1* command.
>
>This is major reason why the Linux Documentation Project (LDP) uses
>DocBook as its standard. And why people write parsers for such
>documentation maintenance in DocBook XML, instead of HTML.
>
>
If 1 source multiple outputs was my goal, I'd be using LaTeX. :-) I've
used it before and have been quite happy with it when I needed it.
Again, wrong goal for me.
>Think of HTML like you do Postscript/PDF. It's an end-user
>_publication_ format. It's too "free-form" and "unstructured" for
>maintaining documentation in. Just like it is very _difficult_ to
>convert Postscript/PDF back to an "editable" form, the same is true for
>HTML.
>
>
Understand, I do convert PDF back to HTML because I know I can use HTML
anywhere anytime -- that might not be true for PDF no matter what Adobe
claims. It's a bit rough and difficult to figure out where paragraphs
are, but I do manage because the original source doesn't have much more
"structure" than paragraphs and chapter headers.
>...
>You can't universally convert from HTML to anything else, at least not
>with any consistency.
>
>
That's the great thing about HTML, there's no need to convert it to
anything. There's a boatload of programs that can use it as is. It is
a useful standard all on its own.
>-- Bryan
>
>P.S. If you're still really not interested in DocBook XML, then I'd
>really look at OpenOffice XML Writer. You _can_ crank out it's XML
>quite easily too. And then apply style in other XML meta-data and
>package it all in a _single_ ZIP file (that's was SX- files are ;-).
>
>There is a pretty big reason why Boeing was the major end-user sponsor
>of the OASIS standardization of OpenOffice XML by Sun. Boeing is
>probably the world's leading for-profit producer of technical
>documentation, and really sets the standard.
>
>
Nope, still too complex. I have a program or two that can understand
and use HTML as input. But it would have no clue as to what to do with
docbook, or XML, or OO XML, or anything else, other than straight text.
You really are trying to solve a problem that doesn't exist for me. :-)
>But DocBook sounds like the most ideal. If you want simplicity and
>universality, DocBook is the best of both worlds. Especially if you are
>going to write parsers for it.
>
>
Nope, no parsers either. Just display it on multiple OS's (Linux, Palm,
& MS if you care).
The 1 program mentioned that wants HTML is a program that converts it to
PDB files for my Sony Clie PDA. This is extremely important to me,
hence why the docbook and other things are not useful to me. I truly
only do need P, B, I, U, FONT, IMG, and A. In a rare case you might
also show me where I need TABLE, but that would be for a table of data
and not for alignment/structure. Oh, I can't forget the occassional
BLOCKQUOTE for indenting. :-)
The 1 and only 1 problem with HTML is it's inability to store supporting
files/data within it so you can have everything for a doc in only 1 file
-- for convenience. From what little I know about the XML standard,
what I want is for HTML to have something like XML's <CDATA> tag, then
encode the binary file into that -- which also means the various
browsers and programs would need to understand what to do with that.
Maybe what I need to do is move to XHTML which has a CDATA tag;
unfortunately, it's for scripts and not images AFAICT. But in XHTML I
could define my own module to take care of that, but since that's not
supported everywhere, I'm not sure what good it would do me. :-(
Also, please don't underestimate the ubiquity of HTML; that is a major
plus. I can hand a HTML doc to someone on a Linux, MS, Mac, whatever
box and they can read it right now, as is, no conversion required by me
or them. That is incredibly convenient! Can't do that with docbook, OO
XML, or anthing else except plain old ASCII, which doesn't have markup
capability. Hence, HTML is ideal; source --> browser --> WYSIWYG
output, nothing else needed.
I appreciate the discussion. Maybe I'll have to look at docbook more
indepth someday, but for now, HTML is the answer to my question. And
Konqueror's ability to read TGZ or ZIP files of HTML & support files
solves most of the 1 problem (at least for me, I'll have to work on a
more universal way later). I hope this helps explain where I'm coming
from and trying to go to.
Maybe MS is right and CHM files are the way to go... ;-) [OK, that's
heavily into humor, but if MS would document the CHM format, that might
become a serious statement. The OSS chmlib people have done a good job
of trying to document the format, but it's still unofficial.]
Kevin
More information about the Discuss
mailing list