Опубликовано 21 февраля 2005, 12:00

Intel Pentium 4 660 and Extreme Edition 3.73 GHz - Prescott2 in the 32-bit make. Part 2

Usefulness of the new Pentium 4 core is not about the rise in performance only, but rather in those advanced technologies it is equipped with: improved anti-virus protection and security, better usability (both from using the EIST and from the less resource-hungry core produced following the finally "polished" process technology) and of course the EM64T bonus (and Vanderpool in future).
Intel Pentium 4 660 and Extreme Edition 3.73 GHz - Prescott2 in the 32-bit make. Part 2

In Part One of the review we got an idea of the main features of new Intel Pentium 4 processors on the base of Prescott2M released by the corporation on 21 February 2005. It is now high time we found out what performance the innovations provide as compared to the predecessors. In particular, we'll be interested in the two important things:

  1. How much does a processor built on the Prescott core (with a long pipeline) add to performance from getting a two-fold increase in the L2 cache memory size, and...
  2. ...How essential is the advantage of these processors based on the 1066 MHz system bus as compared to mass processors based on the 800 MHz bus?

Since, according to the claim by Intel, "the timelines for introduction of the EM64T technology on desktop platforms is linked to the release of the Microsoft Windows XP Professional x64 platform" which has not yet been released officially, we'll first look at how old and new processors behave in the traditional 32-bit environment where (powered by various Windows versions) these processors are to run most of their "image" life. Therefore, in this part of the review we'll be exploring the speed of new processors in the Windows XP Professional SP2 environment.

Participants of our tests:

Processors tested in this review:

• Pentium 4 EE 3.73 GHz (FSB 1067 MHz)
• Pentium 4 EE 3.47 GHz (FSB 1067 MHz)
• Pentium 4 EE 3.40 GHz (FSB 800 MHz)
• Pentium 4 660 @ 3.74 GHz (FSB 832 MHz)
• Pentium 4 660 (3.6 GHz, FSB 800 MHz)
• Pentium 4 650 (3.4 GHz, FSB 800 MHz)
• Pentium 4 640 (3.2 GHz, FSB 800 MHz)
• Pentium 4 560 (3.6 GHz, FSB 800 MHz)
• Pentium 4 550 (3.4 GHz, FSB 800 MHz)
• Pentium 4 540 (3.2 GHz, FSB 800 MHz)

Processors numbered 650, 640, and 550,540 were produced from processors 660 and 560, respectively, through reduction of the operating clock speed (multiplier at the fixed bus speed 800 MHz) from 3.6 GHz to 3.4 and 3.2 GHz, respectively. The Intel's motherboard used (which offers quite high performance among analogs) sets operating frequencies strictly by the standard with the error being less than 0.01% (i.e., 800.0 MHz and 1066.7 MHz for the FSB).

To find out the effect of doubled L2 cache size, we used the 6x0 and 5x0 lines whose processors in fact differ from one another (in the given tests) only by the L2 cache size.

To find out the effect of the 1067 MHz system bus versus 800 MHz, the Extreme Edition line of processors is used (whose two lower-end processors differ from one another in mostly the FSB speed; the difference between them at the core speed is merely 2%), as well as two processors based on the new Prescott2M core of 3.73 GHz clock speed – Pentium 4 EE 3.73 GHz and Pentium 4 660 running at 3.74 GHz. In the latter case, its system bus (832 MHz) does not differ much from the "nominal" (4% higher, whereas the 1067 MHz is 33.3% higher than 800 MHz), while the core speed of the processors is practically identical. That is, we can find out the effect of FSB 1067 MHz (on the same core and at the same core frequency) in pure form.

At the same time, note that the Prescott2M core of stepping N0 was able to overclock without an issue even with a boxed cooler up to approximately 3.8 GHz, and at clock speeds higher than that issues came up. Therefore, for this stepping , the 670 models is still quite real, but the 4GHz model, like the Prescott itself, perhaps will not be feasible (at least, in the mass series, although Extreme Edition 4 GHz may be produced in a limited batch).

Test configuration:

• Motherboard Intel D925XECV2 (i925XE) (BIOS version 404).
• Memory Kingston KHX5400D2K2/1G (two 512 MB modules) was running in the DDR2-533 mode with the latency timings set to 3-3-3-7 at the nominal supply voltage.
• Video card ATI Radeon X800 XT by Sapphire (Catalyst 5.2).
• Hard disk Maxtor 6Y080P0.
• Housing Arbyte YY-W201BK-A with the PSU HIPRO HP-W460GC31 (460 W).
• Cooler – boxed Intel for Pentium 4 LGA775 by Sanyo Denki.

Memory subsystem tests

We'll use Everest 1.51 to carry out express-evaluation of speed of the processor-memory subsystem.

We'll use Everest 1.51 to carry out express-evaluation of speed of the processor-memory subsystem.

We'll use Everest 1.51 to carry out express-evaluation of speed of the processor-memory subsystem.

Evidently, the effective speed of handling the memory of Prescott2M core has slightly gone up as compared to the Prescott core well optimized for that purpose - the boost here amounts to almost 200 MHz or over 3%! The old (Northwood) Extreme Edition cores essentially lose to the new even if FSB 1067 MHz(!) is used, but use of this bus with the Prescott2M core works miracles – the speed of the memory subsystem sharply jumps from 6 to 7.3 GB/s, fully making use of the speed capability of the dual-channel DDR2-533.

At the speed of writing to memory, the situation for FSB-1067 is much more modest

At the speed of writing to memory, the situation for FSB-1067 is much more modest

At the speed of writing to memory, the situation for FSB-1067 is much more modest – it was hardly able overtaking regular Pentium 4 with FSB-800, and, however strange it is, the increased cache size of new processors does not add - even on the contrary - slightly reduces the speed of writing into memory.

latencies in handling the memory are definitely lower than in the old Northwood core

latencies in handling the memory are definitely lower than in the old Northwood core

Lastly, at the latency of memory subsystem we can see some interesting things: latencies in handling the memory are definitely lower than in the old Northwood core (that may be caused by the effect of additional L3 cache which does the buffering of access to the system memory from the higher-level caches), although there is no positive effect from raising the L2 cache size in Prescott. On the contrary - the latency in the new core has gone up a bit, which may be related to the slightly increased latency of the L2 cache itself in Prescott2M. It is also interesting that latency in Prescott2M (FSB-1067) is definitely not better than in other Prescott cores, that is, to operate a faster bus the manufacturer has intentionally raised the bus latency timings (to improve its stability) thus making its possible advantages null and void.

Overall system performance

To evaluate the overall system performance (that is, the speed of running tasks not using the capabilities of the 3D accelerator), we'll run tests in specific and popular applications, as well as specialized tests in order to make a comprehensive estimate of the systems. Among the latter are PCMark04 and MetaBench 0.98.

PCmark04

PCmark04

MetaBench 0.98

MetaBench 0.98

In both of these tests, the situation is very similar - the processors have ranked in almost full compliance with their clock speeds (even if 90-nm and 130-nm cores are compared)! The advantage of raising the cache memory in Prescott is small enough - merely around 2%. The 1067 MHz system bus gives merely 0.3% to 1.7% speed advantage for the 130-nm core and is practically useless for the Prescott2M core – it is the same Pentium 4 660 running at 3.74 GHz (with smaller latencies at the 832 MHz), runs on par with Extreme Edition 3.73 MHz!

CPUmark 99

CPUmark 99

CPUmark99, a comprehensive benchmarking utility for simple mathematical operations, shows that the micro architecture with long pipeline (Prescott) skids on tasks not specially optimized for it, and even the 2 MB cache of Prescott2M is unable to make up for the shortcoming. Even the faster system bus proves to be useless here (although on the platforms as old as 1-2 years ago this test evidently leveraged from both increased cache size, faster system bus, and even much faster memory and chipset).

ScienceMark 2.0

ScienceMark 2.0

More complicated (but anyway not optimized for the NetBurst) scientific calculations in ScienceMark 2.0 repeat the trend of the previous test, but here we see the effect of increased cache size – at about +1%.

WinRAR

WinRAR

The relatively fresh and well optimized WinRAR archiver is definitely not optimized for either NetBurst or to the increased cache: the gain of Prescott2M versus the predecessor is about 5-6% (!), although the raise of the system bus speed is still useless.

JPEG encoding in ACDSee 5.0

JPEG encoding in ACDSee 5.0

JPEG-encoding in ACDSee 5.0 finally puts Prescott and Gallatin on par – with the minimum effect of the L2 cache size and almost useless 1067 MHz bus (+0,6% can't be regarded as an argument in its favor), the «good old» Extreme Edition finally lag behind the higher-end Pentium 4 6xx and 5xx models.

MP3 encoding using one of the most recent versions of Lame 3.96.1 codec

MP3 encoding using one of the most recent versions of Lame 3.96.1 codec

MP3 encoding using one of the most recent versions of Lame 3.96.1 codec has almost depended almost on the processor core (neither on memory, nor on the bus and even the cache). This case hasn't proved an exception either - the processors have ranked by their clock speeds and only "old" Extreme Edition took a sure lead over all the Prescott's due to the shorter pipeline.

Video re-encoding to MPEG4 with the modern DivX 5.2.1 codec

Video re-encoding to MPEG4 with the modern DivX 5.2.1 codec

Video re-encoding to MPEG4 with the modern DivX 5.2.1 codec returns the leadership chances to processors of higher clock speeds - with the zero effect from increased cache size and raised FSB, processors of 3.73 GHz clock speed take a lead.

Windows Media Encoder 9

Windows Media Encoder 9

The situation is about the same for video encoding in Windows Media Encoder 9 – but here the raise of cache size in Prescott has given about 1% advantage.

RealStorm Benchmark 2004

RealStorm Benchmark 2004

The 3D rendering performed by the efforts of the CPU (using the ray tracing method, test RealStorm Benchmark 2004) suddenly showed that Prescott2M can even lose to its predecessor - the lag is stable and amounts to about 2.5%! If we recall the memory speed tests in the beginning of this article, the cause of the effect becomes clear - with the much lower speed of writing to memory and a bit higher latency, the new processor can't keep pace with sending huge amount of data for calculating complex scenes.

CineBench 2003

CineBench 2003

Fortunately, this effect is visible on only rare occasions (nonoptimum applications) and, e.g., in the CPU rendering of scenes with the professional package Cinema 4D (test CineBench 2003) the positive effect of increased cache in Prescott2M fluctuates within 0.3 to 1.5%, although again there is no effect from FSB-1067.

3D performance

These tests were run at the screen resolution 1024x768 and 32-bit color depth. As usual, we start with the comprehensive 3Dmark05 suite.

3Dmark05

3Dmark05

Due to the significant effect of the 3D accelerator, this test does not show any essential difference among the processors. It is even more interesting that they have ranked strictly in compliance with their "ratings" - i.e. model numbers. That is, new Prescott2M are evidently faster than the former Prescott, and the gain from increased cache size here reaches 1.5-2.0% (too much for this test). Nevertheless, processors of the Extreme Edition line (for extreme gamers, of course) do not stand out among the regular Pentium 4 in something (high-end models, especially 6xx), and FSB-1067 does not demonstrate any advantages.

Cpu score

Cpu score

However, in the processor test of the suite, where the effect of 3D accelerator is smaller, they already rank in their clock speeds rather than in the size of cache memory, although the latter gives a 2.5-3.5% speed gain to Prescott2M, which rises as the core clock speed goes up! That is, the increased cache has definitely improved scalability of systems in this test. Nevertheless, the effect of FSB-1067 is again very small.

At DX games, Unreal Tournament 2003 and 2004 popular in the recent past, the situation is similar to the processor test in 3Dmark05

At DX games, Unreal Tournament 2003 and 2004 popular in the recent past, the situation is similar to the processor test in 3Dmark05

At DX games, Unreal Tournament 2003 and 2004 popular in the recent past, the situation is similar to the processor test in 3Dmark05 (including the accelerated scalability within 2 to 3%!), with the difference in that Extreme Edition on the old core performs definitely better than its followers albeit equipped with 2MB cache size. As regards the the much faster system bus, I am already tired of repeating about its uselessness.

Now moving on to gaming tests in OpenGL. First, some not very new but indicative applications.

Quake 3 Arena Crusher

Quake 3 Arena Crusher

At this "imperishable" masterpiece, we see the same picture as it was in Unreal Tournament 2003/2004: old cores are definitely faster, they show good scalability at frequency, the increase of cache size in Prescott results in a 3-4% performance boost (the more, the higher is the clock speed), and use of FSB-1067 (in the available versions for two different cores) is practically justified.

Wolfenstein Enemy Territory

Wolfenstein Enemy Territory

At the fresher and "harder" WET created with the same game engine, the situation changes a bit - the advantage of using L2=2MB is lost (even more, Prescott runs a bit faster than Prescott2M), and the old core is definite favorite.

Vulpine GLmark

Vulpine GLmark

Finally, another original OGL engine by Vilpine demonstrates a surprisingly high boost gained from using 2MB cache size instead of the 1MB: while for 3.2 GHz the boost gained from increased cache amounts to 3.2%, then for the 3.4 GHz it rises to 5% to reach 7.2% for the 3.6 GHz! What will it be like for 3.8 GHz clock speed?  This test like WinRAR is a visual example of what the increase in cache size can give to modern processors if a task (application) is inclined to it.

Tests in more modern games

Doom 3 demo

Doom 3 demo

This is just what I talked about in the previous paragraph - the effect of increased cache size is simply tremendous (despite the very "hard" game): although the 5% of gain due to the increased cache size do not favor the scalability, it anyway deserves respect. Just imagine - the Northwood (Gallatin) of much lower clock speeds performs on par with the most recent Extreme Edition 3.73 GHz!

Gun Metal Benchmark2

Gun Metal Benchmark2

At that, the effect of increased cache size is more moderate (1.5-2%), and the old 130-nm core again saves face versus the more speedy new kids.

Far Cry

Far Cry

The same holds true for Far Cry as well, although the advantage of using the doubled cache size makes itself felt and again reaches 5%.

SPEC viewperf 3DSmax

SPEC viewperf 3DSmax

SPEC viewperf

SPEC viewperf

Lastly, a couple of tests in 3D modeling professional suites - 3DSMsmax and ProENGINEER. In either cases, transitions from Prescott to Prescott2M turns out to be useless and even a bit negative - the system decelerates by 1-2%. The old 130-nm cores here run on par with the most speedy new kids, with the 1067 MHz bus showing its weakness.

Conclusion

The comprehensive tests of new Intel Pentium 4 processors built on the newly released Prescott2M core has shown that in the traditional 32-bit environment currently used in the vast majority of corporate and home PC the situation with performance of new Intel solutions is contradictory. On the other hand, there are many applications where the doubled L2 cache in Prescott results in a considerable performance boost (3-5%, and even more). However, there are much more applications where the boost (1-2%) is hardly noticeable, and in my view is absolutely not worth the money for the difference between the 5xx and 6xx models of the same clock speeds. That is especially true for the 3.8 GHz Prescott model which has been released in a restricted batch and will evidently take a lead over the 660 model. Moreover, there are such tasks (albeit few) where the new core runs even slower (by 1-2%) as compared to the predecessors! On the average, the "usefulness" of the new core as compared to the old in 32-bit environment is estimated by a scanty boost figure within 1 to 2% which fortunately goes up with the rise in the core clock speed and the size of the cache memory (see the overall diagram representing the averaged result for all the tests of the review, without memory tests).

Overall performance

Overall performance

Therefore, the usefulness of the new Pentium 4 core is about not only the rise in performance, but rather in those advanced technologies it is equipped: improved anti-virus protection and security, better usability (both from using the EIST and from the less resource-hungry core produced following the finally "polished" process technology) and of course the EM64T bonus (and Vanderpool in future).

As regards the prospects of the higher-frequency system bus 1067 MHz for use in mass Pentium 4 models, they seem rather obscure to me - in the existing implementations of the bus, there is almost no speed boost of the platform. This situation could be amended by either new chipsets with support for DDR2-667 or essential reduction in the latency of the bus itself (which is very unlikely for the moment), or transition (like in future dual-core processors) to using two parallel 667 MHz FSBs.

Автор:Alex Karabuto