The Intel SSD 545s (512GB) Review: 64-Layer 3D TLC NAND Hits Retail

Today Intel is introducing their SSD 545s, the first product with their new 64-layer 3D NAND flash memory and, in a move that gives Intel a little bit of bragging rights, the first SSD on the market to use 64-layer 3D NAND from any manufacturer.

The Intel SSD 545s is a mainstream consumer SSD, which these days means it’s using the SATA interface and TLC NAND flash. The 545s is the successor to last year’s Intel SSD 540s, which was in many ways a filler product to cover up inconvenient gaps in Intel’s SSD technology roadmap. When the 540s launched, Intel’s first generation of 3D NAND was not quite ready, and Intel had no cost-competitive planar NAND of their own due to skipping the 16nm node at IMFT. This forced Intel to use 16nm TLC from SK Hynix in the 540s. Less unusual for Intel, the 540s also used a third-party SSD controller: Silicon Motion’s SM2258. Silicon Motion’s SSD controllers are seldom the fastest, but performance is usually decent and the cost is low. Intel’s in-house SATA SSD controllers were enterprise-focused and not ready to compete in the new TLC-based consumer market.

The Intel SSD 545s continues Intel’s close relationship with Silicon Motion by being one of the first SSDs to use the latest SM2259 controller. Since the SATA interface is now a dead-end technology, the SM2259 is a fairly minor update over the SM2258 controller used by last year’s Intel SSD 540s. The only significant new feature enabled by the updated controller is hardware end-to-end data protection that includes ECC on the controller’s SRAM and on the external DRAM. This will make the 545s more resilient against corruption of in-flight data, but it should not be mistaken for the power loss protection that is typically found on enterprise SSDs.

The flash memory used in the Intel 545s is Intel’s second generation 3D TLC NAND, a 64-layer design with a floating gate memory cell. Intel did not use their first-generation 32-layer 3D NAND in a consumer SATA SSD, but the 32L 3D TLC is at the heart of Intel’s SSD 600p, their first M.2 NVMe SSD and one of the most affordable consumer NVMe SSDs.

Similar to the strategy Micron used last year when introducing the Crucial MX300, the Intel 545s initially brings a new generation of 3D NAND to the market with just a single SKU. The 512GB 545s is available now on Newegg, with other capacities and the M.2 SATA versions to follow over the next few weeks. The full lineup will include capacities from 128GB to 2TB in both 2.5″ and M.2 form factors.

Intel will be using their smaller 256Gb 64L TLC die for all capacities of the 545s, rather than adopting the 512Gb 64L TLC part for the larger models. The 512Gb die is not yet in volume production and Intel plans to have the full range of 545s models on the market before the 512Gb parts are available in volume. Once the 512Gb parts are available we can expect to seem them used in other product families to enable even higher drive capacities, but it is reassuring to see Intel choosing the performance advantages of smaller more numerous dies for the mainstream consumer product range.

Meanwhile, over the rest of this year, Intel plans to incorporate 64L 3D NAND into SSDs in every product segment. Most of those products are still under wraps, but the Pro 5450s and E 5100s are on the way as the OEM and embedded versions of the 545s.

Intel seems to be in a hurry to get this drive out the door so they can claim to be the first shipping SSDs with 64-layer 3D NAND. At Computex we saw Western Digital announce their first 64L 3D NAND SSDs due to be available in Q3, and Toshiba is already sampling the XG5 M.2 NVMe SSD to OEMs. Earlier this month, Samsung announced the start of volume production of their 64-layer 256Gb V-NAND. By launching with retail availability this week, Intel has narrowly secured first place bragging rights. (It seems Intel and Micron might have an agreement to take turns introducing new 3D NAND, given that Micron was first to ship the 32L 3D NAND last year with the Crucial MX300.)

The downside is that this is a rushed launch; I’ve had the drive in hand for less than five days as of publication time, and that time spanned a weekend. Intel’s press briefing on this drive was a mere fifteen hours before the embargo lift, and the slides included some changed specifications relative to the product brief that was delivered with the drive last week. As with several of their recent SSD launches, Intel is only providing the one-page product brief and is withholding the full specifications document from the general public and the press, but this time it might genuinely be due to the latter document not being ready instead of motivated by the questionable IP security concerns Intel cited earlier this year.

The most significant performance improvement Intel cites for the 545s over the 540s is in sustained sequential transfers where writes exceed the size of the drive’s SLC cache. In the briefing for the 545s Intel claimed the 480GB 540s would drop to 40MB/s while the 512GB 545s is capable of maintaining 475MB/s. The numbers given for the 540s are lower than what the full product specifications from last year list (125 MB/s). Without access to the comparable document for the 545s we can’t entirely explain this discrepancy, but the most plausible reason is that Intel is no longer measuring sustained write speed restricted to an 8GB span of the drive and that they are now instead using a more sensible test where the drive is full or nearly so. Either way, the 545s should be able to perform much better after its SLC cache is full.

Externally, the 545s looks like a typical Intel SATA SSD with only minor design variations. Internally, the density of Intel’s 3D NAND is readily apparent from the PCB that occupies less than half of the case and features only four NAND packages. With 256Gb (32GB) per die, this works out to four dies per package. Even the largest 2TB model should be able to use this PCB with sixteen dies per package and populating the empty pad for a second DRAM package. The Intel SSD 545s uses thermal pads on all four NAND packages and on the controller.

The 512GB Intel 545s debuts with a MSRP of $179. This is slightly higher than the launch MSRP of $174 for the 480GB Intel 540s, but on a price per GB basis the 545s is cheaper, and since its launch the MSRP of the 540s has been driven up to $189 by the onset of an industry-wide NAND flash shortage. In this narrow context the MSRP for the 545s may seem reasonable, but its true street price will need to be substantially lower. Intel’s 600p NVMe SSD is currently only $175 on Newegg. Since the 600p outperforms any SATA SSD for typical real-world desktop use, the 545s needs to do better than 35¢/GB. The competition based on Micron’s 32L 3D TLC includes the Crucial MX300 for around 30¢/GB, and the Samsung 850 EVO 500GB happens to be on sale on Newegg today for $165 (33¢/GB).

This launch comes at a bit of an awkward time for us. I’ve retired our aging 2015 SSD testbed and moved all the custom and homemade power measurement equipment over to a new system. Windows 8.1 is out and Windows 10 is in, and our IOmeter synthetic benchmarks are being replaced with Linux-based FIO tests that are more suited to modern TLC SSDs with SLC caches. For the past few weeks I’ve been focusing my efforts on validating the new testbed and test suite against NVMe SSDs, so the arrival at short notice of a new SATA SSD left me with no relevant comparison data. Given the time available, I chose to prioritize the benchmarks that are most relevant to real-world usage and to run a small selection of competing drives through those tests. This review will be updated with more benchmarks as the drives complete them, and the new SSD 2017 section of our Bench database will be going live soon and will be populated with results from the dozens of drives in our back catalog over the coming weeks.

For now, this review includes our three AnandTech Storage Bench (ATSB) workloads run on the new testbed, SYSmark 2014 SE and idle power management tests. The Intel SSD 545s is pitted against its predecessor the Intel SSD 540s, and most of the SATA SSDs with 3D NAND that have been on the market: Samsung’s 850 EVO and 850 PRO, the Crucial MX300 and the ADATA Ultimate SU800.

Micron Introduces 9200 Series Enterprise NVMe SSDs

Today at Flash Memory Summit, Micron is announcing their next generation of high-end enterprise NVMe SSDs. The new Micron 9200 series is the successor to last year’s 9100 series and uses Micron’s 32-layer 3D TLC NAND flash and a new generation of Microsemi SSD controllers. As with the 9100 series, Micron’s 9200 series covers a wide range of capacities, but adds a third tier of write endurance: ECO joins the PRO and MAX tiers, respectively aimed at read-heavy workloads, mixed workloads, and write-intensive workloads.

The Micron 9200 series will be available in either 2.5″ U.2 form factor or PCIe add-in card. Thanks to the new generation of SSD controllers, the add-in card version can now use a PCIe x8 interface and offer significantly higher sequential access performance than the U.2 version, with read speeds reaching up to 5.5GB/s. The range of capacities is also far different from the 9100 series, which topped out at 3.2TB for the 9100 PRO and 2.4TB for the 9100 MAX. The 9200 MAX now offers up to 6.4TB, the PRO up to 7.68 TB, and the new 9200 ECO is available in 8TB and 11TB capacities.

Micron’s enterprise SATA SSD lineup moved to 3D TLC NAND early this year with the introduction of the 5100 series. Micron’s 7100 series of lower-power enterprise NVMe SSDs has not been replaced with a 3D NAND-based successor and it appears Micron is phasing out the current generation.

More Good News from AMD: 30 Additional Free AFDS Passes Available

You guys really impressed AMD with how quickly you took advantage of their 50 free passes to AMD’s Fusion12 Developer Summit (AFDS). After seeing that a couple of commenters were unable to get in, I went back and asked AMD if there was any way we could get some more passes. After some initial hesitation (AFDS space is pretty limited), AMD agreed to give away another 30 passes to AnandTech readers as a show of appreciation for you guys.

The show runs from June 11 – 14 in Bellvue, WA, with extended early registration going for $395 per person today. Just like last time, the 30 passes are first come, first serve. Just use promo code Anand12 anytime between now and June 7 (or sooner if we run out of passes). Please only use the code if you are able to attend.

AMD Will Build 64-bit ARM based Opteron CPUs for Servers, Production in 2014

Last year AMD officially became an ARM licensee, although the deal wasn’t publicized at the time. Fast forward to June 2012 and we saw the first fruits of that deal: AMD announced it would integrate ARM’s Cortex A5 core into its 2013 APUs to enable TrustZone support.

Today comes a much bigger announcement: AMD will be building Opteron processors based on a 64-bit ARM architecture. There are no product announcements today, but the 64-bit ARM Opterons will go into production in 2014. Today’s announcement is about a processor license, not an ARM architecture license – in other words, AMD will integrate an ARM designed 64-bit core for this new Opteron. Update: AMD will integrate ARM’s new Cortex-A50 series of 64-bit ARMv8 CPU cores.

The only other detail we know is that these ARM based Opterons will embed SeaMicro’s Freedom Fabric, presumably on-die.

AMD offering ARM based Opterons is really to target the microserver market. As for why AMD isn’t using Jaguar for these parts, it’s likely that by going with ARM it can lower the development time and cost to get into this market. The danger here is the total microserver market is expected to be around 10% of the overall server market, but that includes x86 + ARM. With x86 as the default incumbent, it’s going to be an uphill battle for AMD/ARM to carve out a significant portion of that market.

AMD was quick to mention that despite today’s announcement, it will continue to build x86 CPUs and APUs for client and server markets.

Overall the move sounds a lot like AMD trying to move quickly to capitalize on a new market. It’s unclear just how big the ARM based server market will be, but AMD seems to hope that it’ll be on the forefront of that revolution – should it happen. Embracing ARM also further aligns AMD with one of Intel’s most threatening sources of competition at this point. The question is whether or not AMD is doing itself more harm than good by working to devalue x86 in the server space. I suspect it’ll be years before we know the real impact of AMD’s move here.

The other major takeaway is that AMD is looking to find lower cost ways of bringing competitive platforms to market. I do think that a Jaguar based Opteron would likely be the best route for AMD, but it would also likely require a bit more effort than integrating an ARM core.

Obviously competition will be more prevalent in the ARM server space, but here is where AMD hopes its brand and position in the market will be able to give it an advantage. AMD will also be relying heavily on the SeaMicro Freedom Fabric for giving its ARM based Opterons a leg up on the competition. This is one time where I really wish AMD hadn’t spun off its fabs.

Choosing a Gaming CPU: Single + Multi-GPU at 1440p, April 2013

One question when building or upgrading a gaming system is of which CPU to choose – does it matter if I have a quad core from Intel, or a quad module from AMD? Perhaps something simpler will do the trick, and I can spend the difference on the GPU. What if you are running a multi-GPU setup, does the CPU have a bigger effect? This was the question I set out to help answer.

A few things before we start:

This set of results is by no means extensive or exhaustive. For the sake of expediency I could not select 10 different gaming titles across a variety of engines and then test them in seven or more different configurations per game and per CPU, nor could I test every different CPU made. As a result, on the gaming side, I limited myself to one resolution, one set of settings, and four very regular testing titles that offer time demos: Metro 2033, DiRT 3, Civilization V and Sleeping Dogs. This is obviously not Skyrim, Battlefield 3, Crysis 3 or Far Cry 3, which may be more relevant in your set up.

The arguments for and against time demo testing as well as the arguments for taking FRAPs values of sequences are well documented (time demos might not be representative vs. consistency and realism of FRAPsing a repeated run across a field), however all of our tests can be run on home systems to get a feel for how a system performs. Below is a discussion regarding AI, one of the common usages for a CPU in a game, and how it affects the system. Out of our benchmarks, DiRT 3 plays a game, including AI in the result, and the turn-based Civilization V has no concern for direct AI except for time between turns.

All this combines in with my unique position as the motherboard senior editor here at AnandTech – the position gives me access to a wide variety of motherboard chipsets, lane allocations and a fair number of CPUs. GPUs are not necessarily in a large supply in my side of the reviewing area, but both ASUS and ECS have provided my test beds with HD7970s and GTX580s respectively, such that they have been quintessential in being part of my test bed for 12 and 21 months. The task set before me in this review would be almost a career in itself if we were to expand to more GPUs and more multi-GPU setups. Thus testing up to 4x 7970 and up to 2x GTX 580 is a more than reasonable place to start.

Where It All Began

The most important point to note is how this set of results came to pass. Several months ago I came across a few sets of testing by other review websites that floored me – simple CPU comparison tests for gaming which were spreading like wildfire among the forums, and some results contradicted the general prevailing opinion on the topic. These results were pulling all sorts of lurking forum users out of the woodwork to have an opinion, and being the well-adjusted scientist I am, I set forth to confirm the results were, at least in part, valid.

What came next was a shock – some had no real explanation of the hardware setups. While the basic overview of hardware was supplied, there was no run down of settings used, and no attempt to justify the findings which had obviously caused quite a stir. Needless to say, I felt stunned that the lack of verbose testing, as well as both the results and a lot of the conversation, particularly from avid fans of Team Blue and Team Red, that followed. I planned to right this wrong the best way I know how – with science!

The other reason for pulling together the results in this article is perhaps the one I originally started with – the need to update drivers every so often. Since Ivy Bridge release, I have been using Catalyst 12.3 and GeForce 296.10 WHQL on my test beds. This causes problems – older drivers are not optimized, readers sometimes complain if older drivers are used, and new games cannot be added to the test bed because they might not scale correctly due to the older drivers. So while there are some reviews on the internet that update drivers between testing and keep the old numbers (leading to skewed results), actually taking time out to retest a number of platforms for more data points solely on the new drivers is actually a large undertaking.

For example, testing new drivers over six platforms (CPU/motherboard combinations) would mean: six platforms, four games, seven different GPU configurations, ~10 minutes per test plus 2+ hours to set up each platform and install a new OS/drivers/set up benchmarks. That makes 40+ hours of solid testing (if all goes without a second lost here or there), or just over a full working week – more if I also test the CPU performance for a computational benchmark update, or exponentially more if I include multiple resolutions and setting options.

If this is all that is worked on that week, it means no new content – so it happens rarely, perhaps once a year or before a big launch. This time was now, and when I started this testing, I was moving to Catalyst 13.1 and GeForce 310.90, which by the time this review goes live will have already been superseded! In reality, I have been slowly working on this data set for the best part of 10 weeks while also reviewing other hardware (but keeping those reviews with consistent driver comparisons). In total this review encapsulates 24 different CPU setups, with up to 6 different GPU configurations, meaning 430 data points, 1375 benchmark loops and over 51 hours in just GPU benchmarks alone, without considering setup time or driver issues.

What Does the CPU do in a Game?

A lot of game developers use customized versions of game engines, such as the EGO engine for driving games or the Unreal engine. The engine provides the underpinnings for a lot of the code, and the optimizations therein. The engine also decides what in the game gets offloaded onto the GPU.

Imagine the code that makes up the game as a linear sequence of events. In order to go through the game quickly, we need the fastest single core processor available. Of course, games are not like this – lots of the game can be parallelized, such as vector calculations for graphics. These were of course the first to be moved from CPU to the GPU. Over time, more parts of the code have made the move – physics and compute being the main features in recent months and years.

The GPU is good at independent, simple tasks – calculating which color is in which pixel is an example of this, along with addition processing and post-processing features (FXAA and so on). If a task is linear, it lives on the CPU, such as loading textures into memory or negotiating which data to transfer between the memory and the GPUs. The CPU also takes control of independent complex tasks, as the CPU is the one that can make complicated logic analysis.

Very few parts of a game come under this heading of ‘independent yet complex’. Anything suitable for the GPU but not ported over will be here, and the big one usually quoted is artificial intelligence. Deciding where an NPC is going to run, shoot or fly could be considered a very complex set of calculations, ideal for fast CPUs. The counter argument is that games have had complex AI for years – the number of times I personally was destroyed by a Dark Sim on Perfect Dark on the N64 is testament to either my uselessness or the fact that complex AI can be configured with not much CPU power. AI is unlikely to be a limiting factor in frame rates due to CPU usage.

What is most likely going to be the limiting factor is how the CPU can manage data. As engines evolve, they try and use data between the CPU, memory and GPUs less – if textures can be kept on the GPU, then they will stay there. But some engines are not as perfect as we would like them to be, resulting in the CPU as the limiting factor. As CPU performance increases, and those that write the engines in which games are made understand the ecosystem, CPU performance should be less of an issue over time. All roads point towards the PS4 of course, and its 8-core Jaguar processor. Is this all that is needed for a single GPU, albeit in an HSA environment?

Multi-GPU Testing

Another angle I wanted to test beyond most other websites is multi-GPU. There is content online dealing mostly with single GPU setups, with a few for dual GPU. Even though the number of multi-GPU users is actually quite small globally, the enthusiast markets are clearly geared for it. We get motherboards with support for four GPU cards; we have cases that will support a dual processor board as well as four double-height GPUs. Then there are GPUs being released with two sets of silicon on a PCB, wrapped in a double or triple width cooler.

More often than not on a forum, people will ask ‘what GPU for $xxx’ and some of the suggestions will be towards two GPUs at half the budget, as it commonly offers more performance than a single GPU if the game and the drivers all work smoothly (at the cost of power, heat, and bad driver scenarios). The ecosystem supports multi-GPU setups, so I felt it right to test at least one four-way setup. Although with great power comes great responsibility – there was no point testing 4-way 7970s on 1080p.

Typically in this price bracket, users will go for multi-monitor setups, along the lines of 5760×1080, or big monitor setups like 1440p, 1600p, or the mega-rich might try 4K. Ultimately the high end enthusiast, with cash to burn, is going to gravitate towards 4K, and I cannot wait until that becomes a reality. So for a median point in all of this, we are testing at 1440p and maximum settings. This will put the strain on our Core 2 Duo and Celeron G465 samples, but should be easy pickings for our multi-processor, multi-GPU beast of a machine.

A Minor Problem In Interpreting Results

Throughout testing for this review, there were clearly going to be some issues to consider. Chief of these is the question of consistency and in particular if something like Metro 2033 decides to have an ‘easy’ run which reports +3% higher than normal. For that specific example we get around this by double testing, as the easy run typically appears in the first batch – so we run two or three batches of four and disregard the first batch.

The other, perhaps bigger, issue is interpreting results. If I get 40.0 FPS on a Phenom II X4-960T, 40.1 FPS on an i5-2500K, and then 40.2 FPS on a Phenom II X2-555 BE, does that make the results invalid? The important points to recognize here are statistics and system state.

System State: We have all had times booting a PC when it feels sluggish, but this sluggish behavior disappears on reboot. The same thing can occur with testing, and usually happens as a result of bad initialization or a bad cache optimization routine at boot time. As a result, we try and spot these circumstances and re-run. With more time we would take 100 different measurements of each benchmark, with reboots, and cross out the outliers. Time constraints outside of academia unfortunately do not give us this opportunity.

Statistics: System state aside, frame rate values will often fluctuate around an average. This will mean (depending on the benchmark) that the result could be +/- a few percentage points on each run. So what happens if you have a run of four time demos, and each of them are +2% above the ‘average’ FPS? From the outside, as you will not know the true average, you cannot say if it is valid as the data set is extremely small. If we take more runs, we can find the variance (the technical version of the term), the standard deviation, and perhaps represent the mean, median and mode of a set of results.

As always, the main constraint in articles like these is time – the quicker to publish, the less testing, the larger the error bars and the higher likelihood that some results are going to be skewed because it just so happened to be a good/bad benchmark run. So the example given above of the X2-555 getting a better result is down to interpretation – each result might be +/- 0.5 FPS on average, and because they are all pretty similar we are actually more GPU limited. So it is more whether the GPU has a good/bad run in this circumstance.

For this example, I batched 100 runs of my common WinRAR test in motherboard testing, on an i5-2500K CPU with a Maximus V Formula. Results varied between 71 seconds and 74 seconds, with a large gravitation towards the lower end. To represent this statistically, we normally use a histogram, which separates the results up into ‘bins’ (e.g. 71.00 seconds to 71.25 seconds) of how accurate the final result has to be. Here is an initial representation of the data (time vs. run number), and a few histograms of that data, using a bin size of 1.00 s, 0.75s, 0.5s, 0.33s, 0.25s and 0.1s.

As we get down to the lower bin sizes, there is a pair of large groupings of results between ~71 seconds and ~ 72 seconds. The overall average/mean of the data is 71.88 due to the outliers around 74 seconds, with the median at 72.04 seconds and standard deviation of 0.660. What is the right value to report? Overall average? Peak? Average +/- standard deviation? With the results very skewed around two values, what happens if I do 1-3 runs and get ~71 seconds and none around ~72 seconds?

Statistics is clearly a large field, and without a large sample size, most numbers can be one-off results that are not truly reflective of the data. It is important to ask yourself every time you read a review with a result – how many data points went into that final value, and what analysis was performed?

For this review, we typically take four runs of our GPU tests each, except Civilization V which is extremely consistent +/- 0.1 FPS. The result reported is the average of those four values, minus any results we feel are inconsistent. At times runs have been repeated in order to confirm the value, but this will not be noted in the results.

The Bulldozer Challenge

Another purpose of this article was to tackle the problem surrounding Bulldozer and its derivatives, such as Piledriver and thus all Trinity APUs. The architecture is such that Windows 7, by default, does not accurately assign new threads to new modules – the ‘freshly installed’ stance is to double up on threads per module before moving to the next. By installing a pair of Windows Updates (which do not show in Windows Update automatically), we get an effect called ‘core parking’, which assigns the first series of threads each to its own module, giving it access to a pair of INT and an FP unit, rather than having pairs of threads competing for the prize. This affects variable threaded loading the most, particularly from 2 to 2N-2 threads where N is the number of modules in the CPU (thus 2 to 6 threads in an FX-8150). It should come as no surprise that games fall into this category, so we want to test with and without the entire core parking features in our benchmarks.

Hurdles with NVIDIA and 3-Way SLI on Ivy Bridge

Users who have been keeping up to date with motherboard options on Z77 will understand that there are several ways to put three PCIe slots onto a motherboard. The majority of sub-$250 motherboards will use three PCIe slots in a PCIe 3.0 x8/x8 + PCIe 2.0 x4 arrangement (meaning x8/x8 from the CPU and x4 from the chipset), allowing either two-way SLI or three-way Crossfire. Some motherboards will use a different Ivy Bridge lane allocation option such that we have a PCIe 3.0 x8/x4/x4 layout, giving three-way Crossfire but only two-way SLI. In fact in this arrangement, fitting the final x4 with a sound/raid card disables two-way SLI entirely.

This is due to a not widely publicized requirement of SLI – it needs at least an x8 lane allocation in order to work (either PCIe 2.0 or 3.0). Anything less than this on any GPU and you will be denied in the software. So putting in that third card will cause the second lane to drop to x4, disabling two-way SLI. There are motherboards that have a switch to change to x8/x8 + x4 in this scenario, but we are still capped at two-way SLI.

The only way to go onto 3-way or 4-way SLI is via a PLX 8747 enabled motherboard, which greatly enhances the cost of a motherboard build. This should be kept in mind when dealing with the final results.

Power Usage

It has come to my attention that even if the results were to come out X > Y, some users may call out that the better processor draws more power, which at the end of the day costs more money if you add it up over a year. For the purposes of this review, we are of the opinion that if you are gaming on a budget, then high-end GPUs such as the ones used here are not going to be within your price range.

Simple fun gaming can be had on a low resolution, limited detail system for not much money – for example at a recent LAN I went to I enjoyed 3-4 hours of TF2 fun on my AMD netbook with integrated HD3210 graphics, even though I had to install the ultra-low resolution texture pack and mods to get 30+ FPS. But I had a great time, and thus the beauty of high definition graphics of the bigger systems might not be of concern as long as the frame rates are good.

But if you want the best, you will pay for the best, even if it comes at the electricity cost. Budget gaming is fine, but this review is designed to focus on 1440p with maximum settings, which is not a budget gaming scenario.

Format Of This Article

On the next couple of pages, I will be going through in detail our hardware for this review, including CPUs, motherboards, GPUs and memory. Then we will move to the actual hardware setups, with CPU speeds and memory timings (with motherboards that actually enable XMP) detailed. Also important to note is the motherboards being used – for completeness I have tested several CPUs in two different motherboards because of GPU lane allocations.

We are living in an age where PCIe switches and additional chips are used to expand GPU lane layouts, so much so that there are up to 20 different configurations for Z77 motherboards alone. Sometimes the lane allocation makes a difference, and it can make a large difference using three or more GPUs (x8/x4/x4 vs. x16/x8/x8 with PLX), even with the added latency sometimes associated with the PCIe switches. Our testing over time will include the majority of the PCIe lane allocations on modern setups, but for our first article we are looking at the major ones we are likely to come across.

The results pages will start with a basic CPU analysis, running through my regular motherboard tests on the CPU. This should give us a feel for how much power each CPU has in dealing with mathematics and real world tests, both for integer operations (important on Bulldozer/Piledriver/Radeon) and floating point operations (where Intel/NVIDIA seem to perform best).

We will then move to each of our four gaming titles in turn, in our six different GPU configurations. As mentioned above, in GPU limited scenarios it may seem odd if a sub-$100 CPU is higher than one north of $300, but we hope to explain the tide of results as we go.

I hope this will be an ongoing project here at AnandTech, and over time we can add more CPUs, 4K testing, perhaps even show four-way Titan should that be available to us. The only danger is that on a driver or game change, it takes another chunk of time to get data! Any suggestions of course are greatly appreciated – drop me an email at ian@anandtech.com. Our next port of call will most likely be Haswell, which I am very much looking forward to testing.

CPUS, GPUS, MOTHERBOARDS, AND MEMORY

The Haswell Ultrabook Review: Core i7-4500U Tested

I don’t think I had a good grasp on why Intel’s Haswell launch felt so weird until now. Haswell less than a month after the arrival of a new CEO, and it shows up a couple of weeks after the abrupt change in leadership within the Intel Architecture Group. Dramatic change at the top is always felt several levels below.

To make matters worse, there are now four very important Haswell families that need to be validated, tested, launched and promoted. There’s desktop Haswell, mobile Haswell, ultramobile Haswell ULT (U-series) and Haswell ULX (tablet, Y-series). The number one explanation I’m getting for why we don’t have a socketed K-series SKU with Crystalwell is that everyone is already too busy validating all of the other variants of Haswell that have to launch as soon as possible.

Unlike previous architectures where Intel spanned the gamut of TDPs, Haswell is expected to have success in pretty much all of the segments and as a result, getting everything out on time is very important.

As anyone who has tried to do too much with too little time/resources knows, these types of stories typically don’t end well. The result is one of the more disorganized launches in Intel history and it seems to be caused by dramatic changes at the top of the company combined with a very aggressive to-do list down below.

Haswell is viewed, at least by some within Intel, as a way to slow the bleeding of the PC industry. The shift of consumer dollars to smartphones and tablets instead of notebooks and desktops won’t be reversed, but a good launch here might at least help keep things moving ok until Silvermont, BayTrail and Merrifield can show up and fill the gaps in Intel’s product stack.

So Haswell is important, Intel management is in a state of flux, and there’s a lot of Haswell to bring to market. The result? We get a staggered launch, with only some parts ready to go immediately. Interestingly enough, it’s the high-end Haswell desktop parts that are most ready at this point. The stakes are high enough that we had to resort to testing a customer reference platform in order to evaluate Intel’s new Iris Pro graphics. And today, we had to track down a pre-production Haswell Ultrabook in Taiwan to even be able to bring you this review of Haswell ULT.

I’ve spent the past few days in Taipei hunting for bandwidth, running tests in my hotel room and trying my best to understand all there is to know about Haswell ULT, the third Haswell I outlined in our microarchitecture piece last year.

New Elements to Samsung SSDs: The MEX Controller, Turbo Write and NVMe

As part of the SSD Summit in Korea today, Samsung gave the world media a brief glimpse into some new technologies. The initial focus on most of these will be in the Samsung 840 Evo, unveiled earlier today.

The MEX Controller

First up is the upgrade to the controller. Samsung’s naming scheme from the 830 onwards has been MCX (830), MDX (840, 840 Pro) and now the MEX with the 840 Evo. This uses the same 3-core ARM Cortex R4 base, however boosted from 300 MHz in the MDX to 400 MHz in the MEX. This 33% boost in pure speed is partly responsible for the overall increase in 4K random IOPS at QD1, which rise from 7900 in the 840 to 10000 in the 840 Evo (+27%). This is in addition to firmware updates with the new controller, and that some of the functions of the system have been ported as hardware ops rather than software ops.

TurboWrite

The most thought provoking announcement was TurboWrite. This is the high performance buffer inside the 840 Evo which contributes to the high write speed compared to the 840 (140 MB/s on 120GB drive with the 840, compared to 410 MB/s on the 840 Evo 120GB). Because writing to 3-bit MLC takes longer than 2-bit MLC or SLC, Samsung are using this high performance buffer in SLC mode. Then, when the drive is idle, it will pass the data on to the main drive NAND when performance is not an issue.

The amount of ‘high-performance buffer’ with the 840 Evo will depend on the model being used. Also, while the buffer is still technically 3-bit MLC, due to its use in SLC mode the amount of storage in the buffer decreases by a factor three. So in the 1TB version of the 840 Evo, which has 36 GB of buffer, in actual fact can accommodate 12 GB of writes in SLC mode before reverting to the main NAND. In the 1TB model however, TurboWrite has a minimal effect – it is in the 120GB model where Samsung are reporting the 3x increase in write speeds.

In the 120GB and 250GB models, they will have 9 GB of 3-bit MLC buffer, which will equate to 3 GB of writes. Beyond this level of writes (despite the 10GB/day oft quoted average), one would assume that the device reverts back to the former write speed – in this case perhaps closer to the 140 MB/s number from the 840, but the addition of firmware updates will go above this. However, without a drive to test it would be pure speculation, but will surely come up in the Q&A session later today, and we will update the more we know.

Dynamic Thermal Guard

A new feature on the 840 Evo is the addition of Dynamic Thermal Guard, where operating temperatures of the SSD are outside their suggest range (70C+). Using some programming onboard, above the predefined temperature, the drive will throttle its power usage to generate less heat until such time as the operating temperature is more normal. Unfortunately no additional details on this feature were announced, but I think this might result in a redesign for certain gaming laptops that reach 80C+ under high loading.

Non-Volatile Memory Express (NVMe)

While this is something relatively new, it is not on the 840 Evo, but as part of the summit today it is worth some discussion. The principle behind NVMe is simple – command structures like IDE and AHCI were developed with mechanical hard-disks in mind. AHCI is still compatible with SSDs, but the move to more devices based on the PCIe requires an update on the command structure in order to be used with higher efficiency and lower overhead. There are currently 11 companies in the working group developing the NVMe specifications, currently at revision 1.1, including Samsung and Intel. The benefits of NVMe include:

One big thing that almost everyone in the audience must have spotted is the maximum queue depth. In AHCI, the protocol allows for one queue with a max QD of 32. In NVMe, due to the way NAND works (as well as the increased throughput potential), we can apply 64K queues, each with a max QD of 64K. In terms of real-world usage (or even server usage), I am not sure how far the expanding QD would go, but it would certainly change a few benchmarks.

The purpose of NVMe is also to change latency. In AHCI, dealing with mechanical hard drives, if latency is 10% of access times, not much is noticed – but if you reduce access times by two orders of magnitude and the level of latency stays the same, it becomes the main component of any delay. NVMe helps to alleviate that.

Two of the questions from the crowd today were pertinent to how NVMe will be applied in the real world – how will NVMe come about, and despite the fact that current chipsets to not have PCIe-based 2.5” SSD connectors, will we get an adapter from a PCIe slot to the drive? On the first front, Samsung acknowledged that they are working with the major OS manufacturers to support NVMe in their software stack. In terms of motherboard support, in my opinion, as IDE/AHCI is a BIOS option it will require BIOS updates to work in NVMe mode, with AHCI as fallback.

On the second question about a PCIe -> SSD connector, it makes sense that one will be released in due course until chipset manufacturers implement the connectors for SSDs using the PCIe interface. It should not be much of a leap, given that SATA to USB 3.0 connectors are already shipped in some SSD packages.

More information from Korea as it develops…!

Mushkin Atlas mSATA (240GB & 480GB) Review

The retail mSATA SSD market doesn’t have too many players. Most OEMs, such as Samsung (although that is about to change), Toshiba and SanDisk, only sell their mSATA SSDs straight to PC OEMs. Out of the big guys, only Intel and Crucial/Micron are in the retail game but fortunately there are a few smaller OEMs that sell retail mSATA SSDs as well. One of them is Mushkin and today we’ll be looking at their Atlas lineup.

Mushkin sent us two capacities: 240GB and 480GB. Typically 240GB has been the maximum capacity for mSATA SSDs due to the fact that there’s room for only four NAND packages and with 64Gbit per NAND die the maximum capacity for each package comes in at 64GB (8x8GB), which yeilds a total NAND capacity of 256GB. Crucial and Samsung have mSATA SSDs of up to 512GB (Samsung offers up to 1TB now) thanks to their 128Gbit NAND but currently neither Samsung nor Micron is selling their 128Gbit NAND to other OEMs (at least not in the volumes required for an SSD). I’m hearing that Micron’s 128Gbit NAND will be available to OEMs early next year and many are already planning products based on it.

Since Mushkin is limited to 64Gbit NAND like other fab-less OEMs, they had to do something different to break the 256GB barrier. Since you can’t get more than 64GB in a single NAND package, the only solution is to increase the amount of NAND packages in the SSD. Mushkin’s approach is to use a separate daughterboard with four NAND packages that’s stacked on top of the standard mSATA SSD. There are already four NAND packages in a regular mSATA SSD, so with four more the maximum NAND capacity doubles to 512GB. However, the actual usable capacity in Atlas is 480GB thanks to SandForce’s RAISE and added over-provisioning.

The result is a slightly taller design than a regular mSATA SSD but the drive should still be compatible with all mSATA-equipped devices. Mushkin had to use specially packaged NAND in the 480GB model (LGA60 vs LBGA100 in the 240GB) to lower the height and guarantee compatibility. The NAND daughterboard seems to be glued to the main PCB and dislocating it would require a substantial amount of force. I tried to dislocate it gently with my hands but I couldn’t, so I find it unlikely that the daughterboard would dislocate on its own while in use.

The Atlas is available in pretty much all capacities you can think of, starting from 30GB and going all the way up to 480GB. Mushkin gives the Atlas a three-year warranty, which is the standard for mainstream drives. The retail packaging doesn’t include anything else but the drive but you don’t really need any peripherals with an mSATA drive.

Here you can see the difference in NAND packages. The one on the left is the 480GB model and it’s NAND packages cover slightly more area on the PCB but are also a hair thinner. Like many other OEMs, Mushkin buys their NAND in wafers and does packaging/validation on their own. Due to supplier agreements, Mushkin couldn’t reveal the manufacturer but I’m guessing we’re dealing with 20nm Micron NAND. So far I’ve only seen Micron and Toshiba selling NAND in wafers and as Mushkin has used Micron in the past (the 240GB sample is a bit older and uses Micron NAND), it would make sense.

Micron M500DC (480GB & 800GB) Review

While the client SSD space has become rather uninteresting lately, the same cannot be said of the enterprise segment. The problem in the client market is that most of the modern SSDs are already fast enough for the vast majority and hence price has become the key, if not the only, factor when buying an SSD. There is a higher-end market for enthusiasts and professionals where features and performance are more important, but the mainstream market is constantly taking a larger and larger share of that.

The enterprise market, on the other hand, is totally different. Unlike in the client world, there is no general “Facebook-email-Office” workload that can easily be traced and the drives can be optimized for that. Another point is that enterprises, especially larger ones, are usually well aware of their IO workloads, but the workloads are nearly always unique in one way or the other. Hence the enterprise SSD market is heavily segmented as one drive doesn’t usually fit all workloads: one workload may require a drive that does 100K IOPS in 4KB random write consistently with endurance of dozens of petabytes, while another workload may be fine with a drive that provides enough 4KB random read performance to be able to replace several hard drives. Case in point, this is what Micron’s enterprise SSD lineup looks like:

In order to fit the table on this page, I even had to leave out a few models, specifically the P300, P410m, and P322h. With today’s release of the M500DC, Micron has a total of eight different active SSDs in its enterprise portfolio while its client portfolio only has two.

Micron’s enterprise lineup has always been two-headed: there are entry to mid-level SATA/SAS products, which are followed by the high-end PCIe drives. The M500DC represents Micron’s new entry-level SATA drive and as the naming suggests, it’s derived from the client M500. The M500 and M500DC share the same controller (Marvell 9187) and NAND (128Gbit 20nm MLC) but otherwise the M500DC has been designed from ground up to fit the enterprise requirements.

The M500DC is aimed at data centers that require affordable solid-state storage, such content streaming, cloud storage, and big data analytics. These are typically hyperscale enterprises and due to their exponentially growing storage needs, the storage has to be relatively cheap or otherwise the company may not have the capital to keep up with the growth. In addition, most of these data centers are more read heavy (think about Netflix for instance) and hence there is no need for high-end PCIe drives with endurance in the order of dozens of petabytes.

In terms of NAND the M500DC features the same 128Gbit 20nm MLC NAND as its client counterpart. This isn’t even a high-endurance or enterprise specific part — it’s the same 3,000 P/E cycle part you find inside the normal M500. Micron did say that the parts going inside the M500DC are more carefully picked to meet the requirements but at a high-level we are dealing with consumer-grade MLC (or cMLC).

To get away with cMLC in the enterprise space, Micron sets aside an enormous portion of the NAND for over-provisioning. The 480GB model features a total of six NAND packages, each consisting of eight 128Gbit dies for a total NAND capacity of 768GiB. In other words, only 58% of the NAND ends up being user-accessible. Of course not all of that is over-provisioning as Micron’s NAND redundancy technology, RAIN, dedicates a portion of the NAND for parity data, but the M500DC still has more over-provisioning than a standard enterprise drive. The only exception is the 800GB model which has 1024GiB of NAND onboard with 73% of that being accessible by the user.

A quick explanation for the numbers above. To calculate the effective over-provisioning, the space taken by RAIN must be taken into account first because RAIN operates at the page/block/die level (i.e. parity is not only generated for the user data but all data in the drive). A stripe ratio of 11:1 basically means that every twelfth bit is a parity bit and thus there are eleven data bits in every twelve bits. In other words, out of 192GiB of raw NAND only 176GiB is usable by the controller to store data. Out of that 120GB (~112GiB) is accessible by the user, which leaves 64GiB for over-provisioning. Divide that by the total NAND capacity (192GiB) and you should get the same 33.5% figure for effective over-provisioning as I did.

Before we get into the actual tests, we would like to thank the following companies for helping us with our 2014 SSD testbed.

Samsung SSD 850 Pro (128GB, 256GB & 1TB) Review: Enter the 3D Era

Over the last three years, Samsung has become one of the most dominant players in the SSD industry. Samsung’s strategy has been tight vertical integration ever since the beginning, which gives Samsung the ability to be in the forefront of new technologies. That is a massive advantage because ultimately all the parts need to be designed and optimized to work properly together. The first fruit of Samsung’s vertical integration was the SSD 840, which was the first mass produced SSD to utilize TLC NAND and gave Samsung a substantial cost advantage. Even today, the SSD 840 and its successor, the 840 EVO, are still the only TLC NAND based SSDs shipping in high volume. Now, two years later, Samsung is doing it again with the introduction of the SSD 850 Pro, the world’s first consumer SSD with 3D NAND.

For years it has been known that the scalability of traditional NAND is coming to an end. Every die shrink has been more difficult than the previous as the endurance and performance have decreased with every node, making it less and less efficient to scale the size down. Scaling below 20nm was seemed as a major obstacle but the industry was able to cross that with some clever innovations in the NAND design. However, the magic hat is now running out of tricks and a more signficant change to the NAND design is required to keep scaling the cost.

The present solution to the scalability problem is 3D NAND, or V-NAND as Samsung calls it. Traditionally NAND and other semiconductors are scaled horizontally along the X and Y axes but due to the laws of physics, there is a limit of how small the transistors can be made. To solve the problem, 3D NAND introduces a Z-axis i.e. a vertical dimension. Instead of cramming transistors horizontally closer and closer to each other, 3D NAND stacks layers of transistors on top of each other. I will be going through the structure and characteristics of 3D NAND in detail over the next few pages.

By stacking transistors (i.e. cells when speaking about NAND) vertically, Samsung is able to relax the process node back to a much more convenient 40nm. When there are 32 cells on top of each other, it is obvious that there is no need for a 10nm-class node because the stacking increases the density, allowing production costs to scale lower. As we have seen with the history of NAND die shrinks, a higher process node provides more endurance and higher performance, which is what the 850 Pro and V-NAND is all about.

Fundamentally the only change in the 850 Pro is the switch to V-NAND. The interface is still SATA 6Gbps and the controller is the same triple-core MEX from the 840 EVO, although I am still waiting to hear back from Samsung whether the clock speed is the same 400MHz. The firmware, on the other hand, has gone through a massive overhaul to adopt the characteristics of V-NAND. With shorter read, program and erase latencies and higher endurance, the firmware needs to be properly optimized or otherwise the full benefits of V-NAND cannot be utilized.

I bet many of you would have liked to see the 850 Pro move to the PCIe interface but I understand Samsung’s decision to hold off with PCIe for a little while longer. The market for aftermarket PCIe SSDs is still relatively small as the PC industry is figuring out how to adopt the new interface, so for the time being Samsung is fine with watching from the side. The XP941 is and will continue to be available to the PC OEMs but for now Samsung will be keeping it that way. From what I have heard, Samsung could bring the XP941 to the retail market rather quickly if needed but Samsung has always been more interested in the high volume mainstream market instead of playing in the niches.

The performance figures in the table above give us the first glimpse of what V-NAND is capable of. Typically modern 128GB SSDs are only good for about 300MB/s but the 850 Pro is very close to saturating the SATA 6Gbps bus even at the smallest capacity. This is due to the much lower program times of V-NAND because write performance has been bound by NAND performance for quite some time now.

The other major improvement from V-NAND is the endurance. All capacities, including the smallest 128GB, are rated at 150TB, which is noticeably higher than what any other consumer-grade SSD offers. Moreover, Samsung told me that the endurance figure is mainly meant to separate the 850 Pro from the enterprise drives to guide enterprise clients to the more appropriate (and expensive) drives as the 850 Pro does not have power loss protection or end-to-end data protection for example. However, I was told that the warranty is not automatically denied if 150TB is reached under a client workload. In fact, Samsung said that they have a 128GB 850 Pro in their internal testing with over eight petabytes (that is 8,000TB) of writes and the drive still keeps going, so I tip my hat to the person who is able to wear out an 850 Pro in a client environment during my lifetime.

Another interesting aspect of V-NAND is its odd capacity per die. Traditionally NAND capacies have come in powers of two, such as 64Gbit and 128Gbit, but with V-NAND Samsung is putting an end to that trend. The second generation 32-layer V-NAND comes in at 86Gbit or 10.75GB if you prefer the gigabyte form. I will be covering the reason behind that in more detail when we look at V-NAND more closely in the next few pages but as far as I know there has never been a strict rule as to why the capacities have scaled in powers of two. I believe it is just a relic from the old days that has stayed in the memory industry because deep down binary is based on powers of two but the abnormal die capacity should have no effect on the operation of the NAND or the SSD as long as everything is optimized for it.

Due to the odd die capacity, the die configurations are also quite unusual. I found two different capacity packages inside my review samples and with Samsung’s NAND part decoder I was able to figure out the die configurations for each capacity. Unfortunately, Samsung did not send us the 512GB model and I could not get the 128GB model open as Samsung uses pentalobe Torx screws and I managed to wear out the screw while trying to open it with an inappropriate screw driver (it worked for the other models, though), so thus there are question marks at those capacities in the table. However, this should not impact the raw NAND capacities as long as all capacities follow the same 7.6% over-provisioning trend but the package configurations may be different. I will provide an update once I receive a confirmation from Samsung regarding the exact configurations for each capacity.

The 850 Pro also switches to smaller PCB designs. The PCB in the 1TB model populates around two thirds of the area of the chassis, while the 256GB PCB comes in at even smaller size. The reason for the different PCB sizes is the amount of NAND packages as the 256GB only has four, whereas to achieve the capacity of 1TB eight NAND packages are required.