[NTLUG:Discuss] Re: Is a AMD Athelon 2400 a i586 or i686? -- ISA by product

Sat Nov 27 14:40:57 CST 2004

Top Post Section:  

All processors are typically going to be i486 or i686 (Pentium Pro)
instruction set architecture (ISA) compatible.  The i586 (Pentium)
released in 1992 was Intel's first superscalar (5-issue) design, while
the i686 (Pentium Pro) was being near-simultaneously designed as its
future superscalar (7-issue) designed and released in 1994.

Unfortunately, the Pentium gained widespread popularity, so hacks to
address various design flaws (like it's ALU) were proliferated.  Not
only did these "Pentium Optimizations" hurt the performance of clones,
but even those of i686 design, including the native Pentium Pro, II,
etc...  Fortunately i686 ISA processor volume was significant around
1998 that most of the "deoptimizations" ended in software.

Processor by _core_ design:

**NOTE:  The ISA+Superscalar design is the actual _pipes_, _not_
extensions support (e.g., "+ 1 SSE" = 1 SSE pipe, _not_ SSE rev)

INTEL
Product & Variants  ISA + _Superscalar_ design    Scalability
------------------  ----------------------------  ---------------
i486                i386 + TLB + 1 FPU            20-100MHz
------------------  ----------------------------  ---------------
Pentium             i586 (5-issue)                60-300MHz
- P55C (MMX)        
------------------  ----------------------------  ---------------
Pentium Pro/P2      i686 (7-issue)                200-1000MHz
- P3                + 1 SSE
------------------  ----------------------------  ---------------
Pentium 4           i686 (7-issue ext) + 2 SSE    1300-?MHz
- "Yamhill" IA-32e  + most x86-64                 (3800 current)

AMD
Product & Variants  ISA + _Superscalar_ design**  Scalability
------------------  ----------------------------  ---------------
Nx586/686           RISC86/i386+TBL (4-issue)     84-550MHz
- Nx586-FP (K5)     + FPU        
- Nx686 (K6)        RISC86/i686 (4-issue) + FPU   
------------------  ----------------------------  ---------------
Athlon              RISC86/i686 (9-issue)         500-3000MHz (?)
- Athlon64/Opteron  + x86-64 + I/O MMU (10-issue)

INTEL 32-bit DESIGN

The Pentium was clearly a quick'n dirty design.  5-issue, 2 ALU, 1 FPU,
it was a great improvement over the i486 for software that was optimized
as such.  Unfortunately, it relied on a lot of "eccentric"
optimizations, ones that are _not_ recommended for the i686.  The
7-issue i686, 2 ALU, 2 FPU, was a vast improvement.  Although the second
FPUs could only do an ADD, and only when the other pipe was also doing
an ADD, this was nice for MMX.  P3 slapped on a SSE pipe for "lossy
math."

After using some asynchronous redesign in the P3 to overcome early
1+GHz design failures, the P3 wouldn't scale past 1.5GHz.  It's
important to note that Intel _still_ uses the P3 core in a number
of mobile solutions, because it far more "bang for the buck" than 
the P4 core both MHz and power-wise (sometimes 2-fold).  Using the "more
aggressive" SSE pipes of the P4, it can not only be inaccurate
(incorrect results), but imprecise (can't get the same result twice with
the same data run).  This is why one should be _very_careful_ in using
P4 optimizations with gcc -O3 if accuracy/precision is important.

The P4 core was a quick, 18-month redesign to extend the 7-issue
pipes so it would scale higher.  Unfortunately, a lot of general
design "no-nos" were made, including not addressing the branch
predictor unit (any branch-mispredict stalls the _entire_ CPU),
and not implementing advanced out-of-order execution or register
renaming in the old, 1994 design.

Long story short, the reasons why are due to Intel's pre-Itanium
release belief that EPIC/Predication would render the need for
branch prediction, OOExec, register renaming and other, CPU-size
optimizations unnecessary.  Of course, Intel was proven wrong with the
first Itanium, Itanium2 now sports some and Itanium3 will improve this.

I personally believe the "Yamhill" project is a 2-part, just like the
original i586/686 was.  The first "Yamhill," the new IA-32e/EM64T
processors, are largely P4s with extended registers and instructions for
x86-64, but lack a _number_ of x86-64 features which make it still
_only_ a feasible 32-bit/4GB processor (long story).  The 2006 "Yamhill"
should be Intel's first 32-bit redesign in a dozen years, using
virtualized i686 compatible instances on a new x86-64 architecture.  It
most likely will share the same 53-bit "Scalable Node" (not really, it's
still a 100% "Front Side Bottleneck" approach) interconnect of Itanium.

AMD 32-bit DESIGN

Although AMD did release its own 32-bit K5 based on its i486 ISA, it
failed to scale past 100MHz.  So all modern AMD designs begin with a
company by the name of NexGen, which AMD purchased in the mid-'90s.

NexGen was the first to come up with the idea of breaking down variable
length x86 CISC into fixed, 32-bit RISC instructions.  They called this
RISC86.  Not stopping there, they built an entire, superscalar 4-issue
RISC86 design, with a TLB, before Intel even released its i486 with a
TLB.  Unfortunately, they didn't design in a FPU, and a non-pipelined
one was "slapped on" in the Nx586-FP which became the AMD K5 for 110MHz+
(or was it 120MHz+?).

The Nx686 became the K6, which was a respin of the Nx586-FP with full
i686 (Pentium Pro) ISA support.  It still had a non-pipelined FPU.  Even
though its FPU was faster and more accurate than the Pentium's pipelined
FPU, because of the Pentium's ALU misdesign (the K6 can load 3 integers
into its 3-issue ALU in the time the Pentium loads 1 into one of its
2-issue ALU), a "software hack" appeared.  This hack used the Pentium's
pipelined FPU to load integers into the ALU -- yes, even though it took
many instructions to replaced 1 ALU load, it was faster on the Pentium. 
Unfortunately, it was slower on any CPU without a pipelined FPU,
including the K6 (even the K6-2/3[+] versions too).  Even the Pentium
Pro/II wasn't optimal when doing this either (it's 2-issue FPU was
designed differently, much faster, but slower for things like this).

The NexGen team, combined with an influx of Digital Alpha engineers
during the "Palmer sell-a-thon" of Digital, came up with a new, 32-bit
architecture.  It wasn't just designed for 32-bit, but in leveraging the
64-bit EV6 platform for Alpha 264, totally _chucked_ the PC-centric
design of Intel GTL (general transport logic?), which Intel is still
using today for even "Prescott" 64-bit extension CPUs (AGTL+).  In fact,
_all_ Athlon processors actually use a _physical_ 40-bit platform
interface.

The AMD Athlon series was a 9-issue, 3 ALU, 3 FPU.  It's FPU was
purposely designed to be overkill, able to not only do any 2 complex
instructions, but the 3rd pipe could do both an ADD _and_ a MULT while
the other two were doing any other instruction.  Why?  So AMD wouldn't
have to design an SSE pipe.  Anytime Intel added new MMX/SSE
instructions, AMD just had to write some new microcode to leverage their
FPU.  The Athlon's 3-issue FPU is able to "stay competitve" with the
2-issue (ADD+ADD or 1 complex) plus 2-issue SSE of the P4/Yamhill, while
being both far more accurate and, even more crucial, far more precise.

The 32-bit Athlon MP even has a 40-bit memory mapped I/O controller in
its on-chip (not chipset) AGPgart, which is really a "poor man's" I/O
MMU.  Using an enabled BIOS, with a Linux kernel, you can enable this
for "safe" and _direct_ memory access beyond 32-bit/4GB with_out_ using
PAE36 paging (including for I/O!).  The performance increase is 10%+
(even more for memory mapped I/O transfers), and much, much higher if
transfers "break" the 512MB paging limitation of PAE36.

The Athlon64/Opteron is a refinement of the 32-bit Athlon.  It uses the
same 40-bit external design, but extensions in several areas.  The
x86-64 not only includes 64-bit general registers, but a whole new set
of 128-bit XMM registers for MMX/3DNow/SSE operations.  Although Intel's
"Prescott" includes these two, the Athlon64/Opteron will _dynamically_
rename these registers for legacy operations (the Precott does not have
register renaming features -- remember, it's the old 1994 PPro core
;-).  But the killer new unit in the Athlon64/Opteron is the I/O MMU. 
Combined with its NUMA/HyperTransport (local memory _and_ local I/O
interconnect), the I/O MMU on the Athlon64/Opteron can memory map I/O
and transfer directly without any software or other bus usage -- and
safely above 4GB (unlike the current "Prescott" on the AGTL+ platform).

One thing to note about x86-64, it _only_ does up to 48-bit/256TB
addressing.  And both the current Athlon and "Yamhill" designs only have
physical support for 40-bit/1TB.  The reason for 48-bit is simple,
"Long" mode means no segmentation, and the 16-bit segment register is
above the 32-bit offset register = 48-bit.  There is also a new
_register_ approach known as PAE52.  Programmers get confused on this
because the documentation reads that this is a "physical" address of the
registers.  Yes, 52-bit is the way the registers work for "Long" mode,
largely so PAE52 is compatible with old PAE36 from a _programmer_
standpoint.  I.e., PAE36/36-bit "Xeon 512MB paging above 4GB" programs
work fine on PAE52/48-bit "x86-64" -- because the CPU translates the
memory as segmented (which is what PAE36/36-bit does, uses the 4-bit
"overhang" of the 16-bit segment register over the 32-bit offset
register = 36-bit), but it's still only upto 48-bit (40-bit as
implemented today) _physically_ (outside the chip) for full _platform_
"i486 TLB" compatibility.

And as I mentioned, Intel's AGTL+ platform is still only a 32-bit/4GB
platform.  There is no direct CPU-to-platform support in the AGTL+
platform/chipset for memory access and I/O transfers above 32-bit/4GB. 
Hence why _all_ memory mapped I/O for IA-32e/EM64T is done in _software_
for systems that have more than 4GB of memory.  AMD Athlon64/Opteron
with its native NUMA/HyperTransport and I/O MMU does not.  Furthermore,
OS drivers do not need to be re-written to utilize the I/O MMU -- its
the opposite, the I/O MMU of the A64/Opteron ensures memory mapped I/O
is _always_ enforced anywhere in the 40-bit/1TB space.

Which goes back to the AMD v. Intel mindset.  AMD is we still do
optimization and enforcement it at the hardware.  Intel's mindset, at
least in the past, was that they want the compiler to do everything.  It
explains everything about what AMD does, from the NUMA/HyperTransport
platform to the I/O MMU to Non-eXecute (NX) bit to how PAE52 works and
is PAE36 compatible along with the "Long" mode approach of maintaining
48-bit/256TB i486 TLB compatibility.  Do it at the hardware.

AMD, like Intel, is moving to a "virtualized" CPU to get around the
48-bit/256TB limitation.  This is also to get around "Moore's law" with
multi-cores, etc... as both AMD and Intel are having difficult scaling
past 3,000MHz and 4,000MHz, respectively.  The hardware is going to
virtualize everything for software in 2006+ -- much unlike what Intel
though would happen with EPIC/Predication.

Bottom Post Section:

On Sat, 2004-11-27 at 10:04, jpmiller at quorumhost.com wrote: 
> Believe k5 are considered to be 586. k6 and higher are considered 686.  I've
> even seen athlon's referred to as 786 along with pentium 4s.
> here's a reference: http://users.erols.com/chare/main.htm

Good breakdown.

While the Nx586/K5 _does_ have some i586 ISA compatibility, it should be
considered _only_ a i486 ISA compatible platform.  You should _never_ built
or optimize for i586 unless you have a _true_, non-Pro/II/etc... Pentium
product.  You should _never_ run i586 code on a i686 processor, _always_
built/optimize for i486 and/or i686.

In fact, that combination is best.  Build for i486, optimize for i686.

On Sat, 2004-11-27 at 10:28, al hardigree wrote: 
> On the pentium side:
> PentiumPro and later are 686
> On AMD side:
> Athlon and later

The Nx686/K6 series are _fully_ i686 compatible.

On Sat, 2004-11-27 at 10:31, al hardigree wrote: 
> Correction to previous post from me ( dont type so good in the morn {; )
> K6 and later are 686
> btw
> PPro's were the first 686

Correct.  i586 is Pentium (circa 1992), i686 is Pentium Pro through even the
"Yamhill" today (since 1994).

On Sat, 2004-11-27 at 11:34, Kipton Moravec wrote:
> It seems like I have lost track of the dividing line of what is a i586 or i686 
> from a linux package frame of reference.
> Since a Athelon 2400 is relatively new, I am guessing it is a i686.  Where is 
> the line between i586 and i686? 

There is no line.  The line is _really_ between i486 and i686.

The i586 ISA is _eccentric_ to the Intel branded Pentium
(non-Pro/II/etc...).  It is _not_ a well-known ISA, and it has various
"issues" with its design -- especially in the ALU.  In fact, one would
argue the creation of MMX was to address the issues with its 2-issue
ALU, largely by leveraging its 1-issue FPU.

-- 
Bryan J. Smith                                    b.j.smith at ieee.org 
-------------------------------------------------------------------- 
Subtotal Cost of Ownership (SCO) for Windows being less than Linux
Total Cost of Ownership (TCO) assumes experts for the former, costly
retraining for the latter, omitted "software assurance" costs in 
compatible desktop OS/apps for the former, no free/legacy reuse for
latter, and no basic security, patch or downtime comparison at all.