Posts Tagged ‘Database’

Playing with Neo4J

May 14th, 2008 - Comment »

Almost all of the apps I build rely heavily on graphs — not pretty charts, but rather networks of people, places, things, or concepts that relate to each other in different ways. I’ve been looking for alternatives to standard table oriented relational databases, and the most interesting project I’ve found so far is Neo4J.

The concept behind Neo4J is simple, and it’s relatively easy to get started with if you’re comfortable with Java. The basic semantic concepts you work with are nodes, relationships, and properties. For example, Peat (node) is friends (relationship) with Howard (node). Nodes and relationships both support freeform key:value properties, so I could set a birthday on the Howard node, or a note about how we met on the Friend relationship. Very simple and flexible.

The real power in Neo4J is in it’s traversal system — complex graphs are pretty useless without being able to pull information out of them, and SQL based systems pretty much suck at handling complex queries through nested or recursive structures. Neo4J’s “traversers” are much simpler to build, and feel pretty darned quick.

What I’m really enjoying is the simplicity of the system. The API only describes 12 classes, and there’s only 3 or 4 you need to be familiar with. The jar file weighs in at under half a meg. The “hello world” example is readable straight out of the gate. There’s even a command line shell for exploring your data. Simple concepts. Easy to learn. Small foot print.

Pretty cool stuff.

If your work involves designing and building systems that rely heavily on relationships, Neo4J is definitely worth checking out. Caveats? As of May 13th, it’s still in beta — but it looks like the 1.0 release is around the corner.

Eventual Consistency, Explained

December 20th, 2007 - 3 Comments »

Werner Vogels, the CTO at Amazon, has a great post about the contentious idea of “eventual consistency” for the new SimpleDB service. The idea that a database could be inconsistent is a little disconcerting to a lot of people — after all, inconsistent means unpredictable, and that just doesn’t fly for us deterministic computer people. Right?

Well, “eventual consistency” isn’t entirely unpredictable. And, it has it’s benefits — especially when it means avoiding locking on highly concurrent read and write operations. That’s exactly what SimpleDB was designed to do. To quote Vogels:

“Inconsistency can be tolerated for two reasons: for improving read and write performance under highly concurrent conditions and for handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running.

“Whether or not inconsistencies are acceptable depends on the client application. A specific popular case is a website scenario in which we can have the notion of user-perceived consistency; the inconsistency window needs to be smaller than the time expected for the customer to return for the next page load. This allows for updates to propagate through the system, before the next read is expected.”

(from “Eventually Consistent“)

SimpleDB was intentionally designed to behave this way, which means it certainly wasn’t built to replace traditional ACID relational databases for all scenarios. If you think about how often you require immediate consistency in your web applications, you’ll likely find that a very significant portion of your database interactions don’t.

My biggest concern about SimpleDB isn’t consistency or relationships, it’s latency. SimpleDB queries from outside of the Amazon cloud won’t be fast enough to feed sites that require more than a couple of queries per page — unless those queries can be executed in parallel, which isn’t an easy option in single-threaded web environments (PHP, Rails, etc.).

I’m excited to see how it operates with parallel queries, though. If an application is built to make dozens of queries simultaneously, rather than sequentially, the performance could be excellent.

I have a little Java toolkit for querying web services in parallel, and I’m itching to unleash it on SimpleDB. All this hot air blowing isn’t worth much without real numbers, right? :)