Tech —

Ask Ars: Of solid state drives and garbage collection

What is this, the '90s? Ask Ars is back! After a long hiatus, the newly …

Welcome to the re-launch of Ask Ars, brought to you by CDW! 

Re-launch, you ask? Why, yes! Ask Ars was one of the first features of the newly born Ars Technica back in 1998. Ask Ars is all about your questions and our community's answers. Each week, we'll dig into our bag of questions, answer a few based on our own know-how, and then comes the best part: we turn to the community for your take.

To launch, we reached out to some of our geekiest friends to solicit their burning questions. Without further ado, let's dive into our first question. Don't forget to send us your questions, too! To submit your question, see our helpful tips page.

Let's get started with a question that was unthinkable in 1998!

Q: I've heard that some SSD controllers do "garbage collection" while others don't. Is this really that big of a deal, and if so, which controllers should I be on the lookout for?

To begin with, an SSD that doesn't do garbage collection would be like an elevator that only goes up—that is, it would never delete anything. However, some drives are able to do it more quickly than others, and some engage in a process called "idle garbage collection" that distributes the workload across periods of inactivity. But before we get into that, we'll take a minute to describe how and why an SSD does garbage collection, and why a drive that does only that would be a weak one indeed.

Solid state drives have two hangups that force them to deal with data differently than hard disk drives do: they can only erase data in larger chunks than they can write it, and their storage cells can only be written a certain number of times (10,000 is standard) before they start to fail. This makes tasks like modifying files much harder for SSDs than HDDs.

Say, for example, you have a 4KB saved document that you modify and resave. When the document is sprinkled in among other data, an HDD just magnetizes the same spot on the platter in a slightly different way. A solid state drive, on the other hand, can only delete large chunks of data at once, usually between 512KB and 2MB depending on the drive. So to accomplish the same task, a solid state drive would have to copy everything on the 512KB block that isn't the document to memory, erase the entire 512KB chunk the document sits on, and then rewrite all of it again along with the new version of the document.

As is, that's straightforward garbage collection: recognizing that a file is old and invalid, removing it, and rewriting it with good data (many drives will collect little files to modify and write in big chunks, but the idea is the same). Generally, drives with higher read and write speeds and larger caches will do this on-the-fly more quickly. But handling data as described above significantly reduces the speed SSDs are known for, because reading, modifying, and rewriting is much slower than a simple write.

SSDs will also avoid writing to the same cell twice in a row because it would wear the cell down unevenly, so they won't do immediate read-modify-writes, especially to the same cell, unless they're completely out of space. To make sure they have space available to put a file in that isn't quite as worn down and to avoid slowing itself down with reading and modifying during writes, most SSDs engage in "idle" or "background" garbage collection, which I think is what you meant to ask about.

When a drive does idle garbage collection, instead of reading, modifying, and writing a cell on the spot, it just marks the location of the old data as available to write in the future and writes the good data to an empty spot on another cell. Then, when the drive is not doing anything, it goes through its backlog of invalid data and cleans up those locations. Usually, drives prioritize cells with the most invalid data, and will move the valid data around until it's in a reasonably compressed state (idle garbage collection is similar in this way to defragmenting HDDs) and works until a certain percentage of the drive is free space.

What varies in SSDs is how aggressive the idle garbage collection is—how much space it aims to free up during idle times, and how much rearranging it will do until it's satisfied with how compact the data is. More aggressive usually means more cell writes, but if you're not doing a ton of file modifications, this isn't a big concern, and your drives will still last you a few years. If you're doing some kind of enterprise work with your drives where they're in use all the time, this style of garbage collection won't work for you.

As far as specific model recommendations, SandForce controllers, particularly the SF-1200, are well-known for their efficient and aggressive idle garbage collection. They're used in a wide array of SSD models, but edge on the expensive side. Indilinx's Barefoot Martini controller isn't quite as good and slows down a bit when drives are fuller and older, but is less pricey. If you're looking at older castoff SSDs to rig up, avoid drives with the JMicron 601 and 602 controllers. They had tiny caches and performed very poorly once they had to start doing the three-card monte of regular garbage collection on drives with a lot of stored data.

Though you didn't specifically ask about it, there's one more feature included on some SSDs that bears mentioning here. While SSDs have no trouble marking up data that's been rewritten elsewhere for cleaning, they can't do the same for data that's been flat-out deleted. If the 4KB document from before is dragged to the trash, the SSD doesn't automatically register that, even though the OS does; instead, it will keep moving that invalid file around with other valid files during idle garbage collection as if it were valid data until the OS assigns a new saved file to that spot. This means that if you delete a lot of files but don't replace them, your SSD can waste a lot of time and energy moving that dead weight around.

TRIM is a command that an OS can issue to help the SSDs realize they have this deleted data problem and fix it. An OS with TRIM support is able to tell an SSD to mark a deleted file for cleaning during idle garbage collection, even before new data has been assigned to that place, so that it doesn't get kept around for an inordinate amount of time. TRIM can be a big speed-saver, but it's only supported natively in Windows 7 and newer versions of Linux; Windows XP and Vista users are out of luck. Apple is said to be working on TRIM support, and while it may make it into OS X Lion, Apple doesn't appear to feel any sense of urgency about it. Still, it's a feature worth looking out for if you're running Windows 7 and are in the market for a new SSD. Hopefully support for it will grow in the coming years.

So unless you live in the Sideways Stories from Wayside School universe where there are elevators that only go up, you shouldn't worry about accidentally buying a solid state drive without garbage collection, because they don't exist. But to find a drive with good garbage collection on-the-fly, look for ones with high read and write speeds, both random and sequential, and a large cache (but again, real-time garbage collection would only kick in if your drive is completely full). If you're going to be doing a lot of file rewriting and will be able to give the drive some downtime, drives with controllers that do aggressive idle garbage collection are better, and drives and OSes with TRIM support are best.

Channel Ars Technica