Posts Tagged ‘SimpleDB’

The Joy of SimpleDB

September 23rd, 2009 - 3 Comments »

Amazon’s SimpleDB is one of the hardest of their services to understand, despite being one of the simplest.  I think the difficult part is getting over the “database” in the description — we’re prone to start comparing it with the relational databases we work with every day, and unfortunately that’s not a reasonable comparison.

Think of it this way:  SimpleDB is a big hash of hashes that’s web accessible.  That’s basically it.  You get to store arbitrary sets of key-value pairs, each with it’s own unique identifier, in a big bucket in the sky.

Huh.  Interesting.  So where would you use SimpleDB instead of a traditional relational database?

There are two things that SimpleDB handles incredibly well: concurrency, and accessibility.

SimpleDB is designed to stay responsive to queries even when you’re pumping it full of records.  Logging and analysis is a great example of this, and a great example is BrowserMob’s monitoring service.  They poke your site every few minutes to ensure it’s responding — but it’s not just checking to see that your web server is alive, but also monitoring the load time for every object in the page:  images, CSS files, etc.  You can check the status and compare responses over time to see how your site is doing as it gets more popular, and as you change pieces under the hood.  The data from all of the sites they monitor are pumped into SimpleDB, and the results are available for their customers to see in real time.1

That level of concurrency, of accepting a substantial number of simultaneous writes while querying the complete data set, is hard to do, especially when every piece of the data is being indexed.  Most relational databases fall over pretty quick, but SimpleDB keeps on ticking.

The second thing that SimpleDB excels at is being accessible.  It’s available to any device that can talk to the web: your computer, of course, but also phones (even the cheap ones), game consoles (portables too), DVD players, information kiosks, environmental monitors, toasters, etc.  Instead of building a web service around a traditional database, you can save a lot of time, energy, and frustration by using SimpleDB as queryable, web accessible, data storage.2

… But only if your data model works with SimpleDB, of course.  There are some drawbacks, like eventual consistency, no transactions, and weak constraints that make it difficult or impossible to use for many applications.  Never the less, it’s an important tool to have in your toolbox — a complement rather than competition to traditional RDBMSes.

Last week I gave a presentation at “The Act of Making Clouds” on SimpleDB.  I touched on some of these subjects, and although there isn’t an audio track, you’re welcome to check out the deck, below.

1. “real time” defined as within a few seconds; eventual consistency is exactly that.
2. Also check out CouchDB. It’s a web accessible hash of hashes that you can manage yourself!

Eventual Consistency, Explained

December 20th, 2007 - 3 Comments »

Werner Vogels, the CTO at Amazon, has a great post about the contentious idea of “eventual consistency” for the new SimpleDB service. The idea that a database could be inconsistent is a little disconcerting to a lot of people — after all, inconsistent means unpredictable, and that just doesn’t fly for us deterministic computer people. Right?

Well, “eventual consistency” isn’t entirely unpredictable. And, it has it’s benefits — especially when it means avoiding locking on highly concurrent read and write operations. That’s exactly what SimpleDB was designed to do. To quote Vogels:

“Inconsistency can be tolerated for two reasons: for improving read and write performance under highly concurrent conditions and for handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running.

“Whether or not inconsistencies are acceptable depends on the client application. A specific popular case is a website scenario in which we can have the notion of user-perceived consistency; the inconsistency window needs to be smaller than the time expected for the customer to return for the next page load. This allows for updates to propagate through the system, before the next read is expected.”

(from “Eventually Consistent“)

SimpleDB was intentionally designed to behave this way, which means it certainly wasn’t built to replace traditional ACID relational databases for all scenarios. If you think about how often you require immediate consistency in your web applications, you’ll likely find that a very significant portion of your database interactions don’t.

My biggest concern about SimpleDB isn’t consistency or relationships, it’s latency. SimpleDB queries from outside of the Amazon cloud won’t be fast enough to feed sites that require more than a couple of queries per page — unless those queries can be executed in parallel, which isn’t an easy option in single-threaded web environments (PHP, Rails, etc.).

I’m excited to see how it operates with parallel queries, though. If an application is built to make dozens of queries simultaneously, rather than sequentially, the performance could be excellent.

I have a little Java toolkit for querying web services in parallel, and I’m itching to unleash it on SimpleDB. All this hot air blowing isn’t worth much without real numbers, right? :)

Amazon SimpleDB

December 14th, 2007 - Comment »

Amazon will soon be releasing their SimpleDB service under a limited beta program.

I’m very excited about this. Persistent, high performance databases are a big missing piece in Amazon’s cloud computing initiative — EC2 doesn’t offer storage that persists across reboots, and S3 isn’t structured to provide the IO required by a database.

Conceptually, SimpleDB is very compelling. It’s designed for real time querying, has no hard limits on storage, and is metered based on storage and CPU time. It looks a lot like Amazon’s Dynamo technology … and it wouldn’t surprise me if they released the Dynamo paper to gauge interest in exposing such a service.

But, there are three big caveats.

It’s not SQL. This isn’t actually as big a deal as it seems, but I know there are going to be a lot of people who are bent out of shape on this one. Why isn’t it a big deal? Because …

It’s not relational. SimpleDB provides a big flat table, with arbitrary attributes per row. So, queries are all about filtering through data, and while they can have very complex rules, it doesn’t behave like the “normal” relational databases we’re accustomed to using.

Updates are “eventually consistent.” This means that if you immediately query for data you just pushed into SimpleDB, it may not show up. You have a guarantee that it will show up within a few seconds, but not immediately. Amazon calls this “eventual consistency.”

It may be a little scary for some folks who are most comfortable with the traditional model of building apps around a single relational database. On the other hand, it appears to be a great system for people who have built big websites, and are already comfortable dealing with lazy synchronization and custom data sources.

I’m looking forward to playing with it!

(Tip o’ the hat to @grigs)

Update: Here’s a great post that goes into a little more detail about the give and take of SimpleDB. Fun fact: it’s written in Erlang.

Update:  More stats.  Looks like their opening offering lets you create up to 100 “domains” containing up to 10 GB of data each.   That’s a good start.

Tagged ,