Making DynamoDB Hum
"DynamoDB is a fast, fully managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic. All data items are stored on Solid State Drives (SSDs), and are replicated across 3 Availability Zones for high availability and durability."
DynamoDB is interesting because of its ability to scale. Let us talk about the bits of DynamoDB relevant at scale.
Keys
Imagine our DynamoDB table is a giant hash. We call each key of that hash the hash key — clever, eh? When we query or get an item from DynamoDB we must know (or be able to deduce—more on this later) its exact hash key.
It is possible to configure DynamoDB such that there is one item—an item is a hash—per hash key. That is boring. We will not talk about that.
Instead imagine that our giant hash is full of arrays that are full of items. These arrays are indexed by each item’s range key and can be accessed using the Query API.
Consider a table containing Articles. We can use the UserId value of the article as the hash key and the UpdatedAt value as the range key. We can now use the Query API to efficiently retrieve a user’s articles for the last thirty days. DynamoDB goes to the hash key location, scans through the range keys (we can tell it which direction to look) and there we go.
Indexes
One table with two indexes means we need roughly triple the write capacity compared to the table alone. Solutions that involve external indexes require management of write capacities across multiple tables and possible data inconsistency—boo—but in the next few weeks AWS will offer Global Secondary Indexes which will hopefully render such schemes obsolete. Local Secondary Indexes—additional range key indexes—are available now.
What is an “additional range key index”?! Sorry! Remember our giant hash filled with arrays filled with hashes, those arrays ordered on a range key value selected from the hashes they contain? LSIs allow us to specify additional range keys so, to add to our above example, we have UserId as the hash key and UpdatedAt as the range key—we add an LSI so we can make a second range key on Name. Now we can also query Articles for a user’s articles that have a certain name or start with ‘z’ or whatever.
Global Secondary Indexes will allow us to add additional hash keys to index our tables. If we add Name as a GSI instead of an LSI then we can query all articles by name rather than just querying articles by name for a given user.
Partitioning
The write capacity of a table is divided evenly among a number of partitions. We do not know the number of paritions but it increases/decreases as capacity is added/removed. In order to utilize our total provisioned write capacity we must evenly spread writes across partitions to avoid hotspots because exceeding the capacity of any one partition will cause our table to be throttled. One approach is to choose random hash keys but this requires that we always know the exact (random) hash key of the item we want. To get around this a hash key index (external or GSI) is added. But! That index has the same hash key constraint—indexes are just other DynamoDB tables so the index hash key will suffer the same partitioning hotspot issues. Thus we have not really resolved the issue by choosing a random primary hash key but rather moved it elsewhere and doubled our required write capacity for the effort.
What to do now? There are two primary options. One is we can use something useful, like a UserId, as the primary hash key and add something to help randomize it. For example we could prepend a random digit 0-9 to each hash key. If we are storing a key for a specific item somewhere we can store it with the random digit and use that to get the item directly. If, on the other hand, we need to query for an item and we are not sure which of the ten possible hash keys it uses we have to query each of them (or better if we can figure that out) until we find it.
The other option is to spread writes for a given hash key out over time (or better if we can figure that out). Continuing with our example above let us imagine our app can import a user’s articles from WordPress in bulk. When that job runs for a prolific blogger we are suddenly beating the snot out of that UserId hash key. Throw those writes in a queue and work on them over time and that snot-beaten spot is not so hot.
Which approach to take (perhaps both!) and the implementation details will naturally depend on our data and usage patterns, but now we know enough to reason about the problem.
Patterns
DynamoDB’s constraints might be off-putting for a lesser data store, but DynamoDB’s speed, scalability, simplicity and cost make its contraints worth thinking about. There are interesting and useful patterns to be discovered.
For example let us go back to our Articles table example from above. We will use the UserId for the hash key again. But this time we are going to make a composite range key by prepending a SequenceId to a randomly-generated UuId. [Note that foreign keys—from Dynamo or other data stores—are an interesting choice to use here instead of random uuids long as they are unique within the primary hash key.] We take advantage of DynamoDB’s flexible schema and store the metadata item for a sequence—representing a Category—in position 0 and store the data items for a sequence—representing articles—in positions 1-n. We write the metadata item last to ensure consistency (potentially at the expense of some orphan items) and persist the total number of items written in it. We could also persist some sort of sequence index here, tags related to the sequence, etc. Once the metadata item is written we consider the sequence immutable. Now we can query a user’s categories by searching for all items that start with 0. From there we select a category whose articles we want; we know from the metadata item the sequence’s uuid and how many items are in it so we batch_get them. Thus if our metadata indicates our sequence contains three items we do a batch_get for hash_key[1.uuid, 2.uuid, 3.uuid] and DynamoDB efficiently retrieves the sequence for us. Yay. We could, of course, get them one at a time or in paginated batches instead. This covers a lot of use cases and does so without using an index. Naturally an appropriate solution to the partitioning problem above must be implemented as well.
Deleting
Deleting items from DynamoDB at scale is expensive. Avoid it if possible. Prefer to rotate tables instead—a new table for each month, for example—eventually expiring or moving old tables to cold storage.
Backups & Reporting
If we are running at scale we are probably going to be very unhappy trying to restore a DynamoDB table of any significant size from some other media. Rotating tables helps here by keeping things small but the only real way to quickly recover from, say, an accidentally dropped table, is a hot spare table. Also if we want to analyze the data on a table, as with reporting or other data mining activity, the recommended method is to use a hot spare table and do the analysis on it so as not to impact performance of the production table.
Scan API
The Scan API iterates over each item in a DynamoDB table. Just no.
Conclusion
Whew! There is plenty more to talk about but this should be enough to get us moving in the right direction. I will write a follow up article if there is interest. Have fun! Feel free to hit me up @fuzzleonard on Twitter.
UPDATE
I wrote a Ruby gem for this, https://github.com/fuzz/moe
Chasing Mercury
Last night @kristinalford asked on Twitter
"This is meta, but how do describe that feeling trying to grasp the ungraspable only to push it further away?"
to which my response was “chasing mercury” but as I was falling asleep (in the wrong place) it occured to me that I do not get that feeling any more. Somehow I have learned to trust my mind and that calm allows the information to wash up on shore eventually. This was not something I learned to do consciously so I tried to think of what I might have done that caused it to happen. The two things that came to mind were learning to play music and learning functional programming. As I lay there trying to remember when I crossed the threshold from chasing mercury to waiting for it to come to me I drifted off to sleep.
I woke up groggy a while later and had to go downstairs and across the house to the bed where I was supposed to be sleeping. As I walked into the kitchen I had a sudden flash of what I had been thinking about. I could not remember it except that it was interesting to me and I knew I wanted to remember. I grabbed a marker and a piece of mail that were handy, made a note of the words I could remember and then padded off to bed.
- Connection!
- Playing/learning music b/w
- chasing mercury
Woke up this morning the whole thing was completely gone from my mind. Until after lunch when I stumbled across the envelope with the note on it. As soon as I saw the note I remembered everything.
Because my mind had not tried to chase the mercury.
Ruby in Jails: Hardening Your Infrastructure
A number of Ruby and Rails security vulnerabilities have come to light recently. http://www.kalzumeus.com/2013/01/31/what-the-rails-security-issue-means-for-your-startup/ is an excellent article about the situation and what you should do about it. If you do not have hardened servers you should follow the given advice to the letter. If you do need to rebuild this would be a good opportunity to migrate your infrastructure to chef, puppet or another tool that will allow you to easily rebuild again in the future.
But is it true that all servers will always be vulnerable? Can you not build out some servers that you can genuinely trust against common threats on an ongoing basis?
It is true that all networked servers will always be vulnerable against a suitably skilled & determined attacker. But, yes, you can build out hardened servers that are extremely difficult to compromise—they will not protect you from a crack team of NSA PhDs or, worse, a really, really smart 13-year-old—but they will be immune to automated attacks as well as determined attacks from less-than-great hackers. Anyone who can compromise a well-hardened server has the skills to be making a lot of money, so you will generally be safe from most attack vectors unless compromising your server would be worth a lot of money to someone. But hopefully if you are in banking or such this is not your first introduction to hardened servers.
The Ruby community mostly uses Linux servers. I have extensive experience with Linux but when it comes to locking things down I prefer BSD. I started using Ruby about a decade ago when I ran across portupgrade(1) which is written in Ruby and at the time was FreeBSD’s primary tool for keeping third-party software from the ports system up to date. I was fascinated by how slow yet intelligent its behavior was; I could just keep running portupgrade(1) over and over again if I got stuck and it would eventually sort everything out. I have been a Rubyist ever since.
FreeBSD’s port system includes rubygem ports for gems with external dependencies. Install those and FreeBSD will make sure a gem’s dependencies are there prior to installation and everything gets upgraded in sync with proper versions forever. My upgrades look like
portupgrade -aRr && gem update
I almost never have to manually intervene; I cannot even remember the last time it happened. I have not had to install a dependency for a gem manually on FreeBSD in years.
My point here is that FreeBSD and Ruby go together like peanut butter and jelly and have for many years; this is not a new thing. Matz used to hang out on freebsd-hackers and much of Ruby’s philosophy comes from that community. That said, I do not expect many converts (and that’s fine with me—it makes me a harder target); but I do hope this will provide some ideas on how to lock down your own infrastructure, whatever platforms you may use. As with anything, one size does not fit all. In most of my environments I have at least a pair of hardened servers and try to keep the rest of the boxes as throwaway as possible. You do not have to lock every box down like Fort Meade. Unless you do.
I have included some links to the relevant FreeBSD documentation, which I find to be well-written, informative and helpful on most topics it covers.
Securing the disk: full encryption at the hardware level. This provides privacy even if the storage media itself is stolen or otherwise compromised (block-level copy).
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/disks-encrypting.html
Securing access: only run services inside of jails (shared kernel VMs) and do not allow in-band remote server access whatsoever; instead connect serial ports on your servers to a console server and only access physical hardware out of band. These days an analog modem is a worthy choice as it is completely off the radar of today’s script kiddie and, as a bonus, you can access your physical machines *even if your data center’s network goes down*. Plus you can access BIOS settings, single-user mode and other system functions not typically available with a remote shell. Console servers are traditionally pared with remote power distribution units (PDUs) which are essentially power strips that you can control programmatically if you, say, need to power cycle a stuck server on a given outlet(s). Together these devices provide the mechanisms necessary to write scripts to self-diagnose and self-correct pretty much any server failure short of actual hardware failure. Virtual versions of some of these facilities are available with virtualization software (VMWare Server, etc) and server management cards (though beware a lot of the UIs for the latter are Java-only. And buggy to boot.)
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails.html
http://avocent.com/Products/Category/Serial_Appliances.aspx
http://avocent.com/Products/Category/Power_Distribution_Units.aspx
Securing the process table and filesystem: securelevels, file flags, TrustedBSD/MAC framework; in the hardlink OS jail model the read-only paritions (/, /usr — BSD is very good about keeping read-only and read-write files in separate trees so they can be treated different on disk) are all hardlinked (saving inodes across jails) and immutable from inside the jail even by root. Better yet, a jail does not have to have root access at all and only needs a bare minimum OS—even a a single service from a single binary file on a read-only file system with no root access—in order to run. You can take things even further by setting file flags (such as immutable, append-only, etc) that can only be changed when the system is in single-user mode. Thus an attacker can only change these files when sitting in front on the server or when connected via console server as networking (much less sshd) is not available in single-user mode. Or take it further still and use the MAC framework for truly deep control, including the ability to limit the powers of root. SELinux provides similar facilities in Linux.
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/mac.html
Securing the logs: these security features can provide extremely detailed security auditing and logging (and control). With a bit of elbow grease you can capture any sort of malicious system activity you can imagine. Be sure to secure a copy of those logs; seperate, hardened log servers are good. One set up to burn logs to non-rewritable optical media (DVD-R, etc) is even better. Security features can be configured to only allow appending to certain files, etc, unless the system is in single-user mode (or a similar safe state). There are many different paths you can take, just make sure your logs are safe.
Securing the cloud: You do not own the cloud, you should not show the cloud your private data. Any data not meant for public download should be encrypted before being persisted in the cloud.
Securing the data: If possible (unless you are dealing with petabytes of data, in other words) you should also be backing your data up to hardened boxes. If the worst happens you want to be 100% sure that no one can tamper with your backups. Put another set of encrypted backups in the cloud, by all means—two or three copies even to be safe—but always keep one set on a hardened server. Backups to physical media should be encrypted, no exceptions.
Securing the network: I assume you already use an external firewall of some sort, whether it belongs to you or your provider. That’s great but I always also run the pf firewall locally on each server both to enforce security (for incoming traffic and for any ports opened by an unauthorized service) and to give me eyes into what is going on with the network on that box. This also lets me adapt the firewall on the fly—for example a script can watch the logs for excessive sshd login failures and then tell the firewall to block all traffic to and from from that IP address. PF also lets you route packets to and from local IP addresses so you can have any number of jails running on local IP addresses (127.0.0.2, 127.0.0.3, etc) each providing a different service on a different port on the same public IP address. In this way every service on a given public IP is provided by a different VM and a compromise to one is not a compromise to any other. Meanwhile the remote logging for all jails is running on the jail host (you can see all jail activity and read/write their filesystems from the jail host), which need not have in-band (ssh) access at all, and you have captured all the nefarious activity to your hardened log servers. Now you can make informed decisions about how to respond to the incident rather than having to assume that a compromised user account means wipe everything and start over. Which IS the correct response if you have not properly locked down your infrastructure.
You can never be entirely crackproof while connected to a network, but you can harden key bits of your infrastructure (the more layers of protection you have the more you frustrate any would-be attacker) and be prepared to quickly rebuild the rest from scratch if necessary—Chef, Puppet, PaaS from which you can spin up instances at will, whatever works best for your environment. Naturally not all of these safeguards will be available to use depending on your OS, hosting provider and other variables. This is not meant as a how-to but rather as an introduction to some tools that I use when securing my servers. With a bit of research and effort you can no doubt find similar tools for your platforms of choice.
Hit me up @fuzzleonard if you would like to discuss.
My response, as a former core IP & WAN backbone engineer, to the Vanity Fair feature on World War 3.0.
http://www.vanityfair.com/culture/2012/05/internet-regulation-war-sopa-pipa-defcon-hacking
We have secure DNS. It has not rolled out everywhere but it is not a problem that needs to be solved; it is a project that needs to be completed.
It is relatively simple to cache DNS locally, which not only prevents a government DNS shutdown from having serious effect on sites you already use, but also speeds up browsing and reduces load on DNS servers. New DNS entries would be missing in the event of a shutdown, but the majority of functionality would still be there. And it would be relatively simple to pass around an update file containing new entries to keep email, social media and other key sites functioning properly. Augmenting existing DNS caching software with versioning and rollback would make it even more robust and be trivial to implement with (for example) djbdns and git.
We have good cryptography.
What we need is one simple (ha ha) technology disruption to take government and big corporations out of the equation: cheap WAN technology. Once we have cheap WAN technology it will not matter what governments and big corporations want to happen unless we move to a 1984 style of government intrusion where private journals are illegal. World War 3.0, game over.
Of course WAN technology is currently very expensive, which is why government and big corporations are so wrapped up in it. Even if the gear was inexpensive, with today’s technology you still have to deal with laying and maintaining massive amounts of fiber optic cable (and leasing the right-of-way to do so), blasting satellites into orbit and other big, big dollar expenses.
This will not be an easy problem to solve, but once it is solved all this hand-wringing and regulation nonsense will be out the door. When we can spend $50 for a high-capacity WAN endpoint that does not need satellite or fiber it will not matter what governments or big corporations want to happen; it will be our Internet forever.
I am thinking a public relations firm for science. Rather than individual research organizations writing grants, seeking crowdsource funding and so on, there would be a central organization, or multiple decentralized organizations, to perform this function—a layer of abstraction. A Philanthropy Director would use quantifiable metrics to identify projects that will do humanity the most good such that philanthropic organizations could donate with confidence, knowing their money will go to those projects. An Industry Director would maintain relationships with industry, identify and pitch projects with potential commercial value. Crowdfunding Director would maintain relationships with the major crowdfunding sites, identify and pitch projects likely to have mass appeal. Government Director would maintain relationships with government agencies and keep up with the latest programs, matching up projects to available funding opportunities. A few points would be shaved from all incoming monies to fund pure research. Monies from overfunded projects would go to underfunded projects.
The biggest obstacle would seem to be creating a fair system, such that the people operating these agencies do not try to influence research with their own agenda.
UPDATE March 21, 2012:
A key function of this agency could be as a buffer between sources of funding and researchers. Scientists are not, generally speaking, great business negotiators, a problem exacerbated when they are dealing with a sole source of funding. I have witnessed a fair amount of stress and arm-twisting and have worked my share of late nights to keep project sponsors happy. An agency in the middle could mitigate this, negotiate on behalf of researchers, manage the expectations of sponsors and even step in with backup funding should a major dispute arise. And since there would be a central agency or a group of central agencies that share information, sponsors could not easily say, “Well, I will just take my money elsewhere and find more compliant scientists.”
A futurist moves back to his hometown in Louisiana to turn it into a technology showcase, but no one in the city government seems interested in the jobs, prestige and prosperity this will bring to the area. As he probes to understand why he realizes not everything in this town is as it seems—the town has a secret, a big one. Perhaps something to do with the large military base nearby? A cast of colorful characters, intrigue and plenty of forays into futurist thought and technology.
In its current form the Internet is expensive. Massively, mind-bogglingly expensive. Your end-point is relatively inexpensive—I pay $80/month for an Internet connection that would have cost me $20,000/mo in 2000. And a data center, while not cheap, is easy enough to get together funding for. But WAN technology is expensive. Leasing rights along railroad tracks, digging thousands of miles of trenches, laying down fiber. Or blasting satellites into space. Laying cables across the ocean. We are talking Big bucks here.
We do not want the Internet controlled by government, but government is the structure we humans have in place to control and manage essential shared resources like roads and fire departments. We also do not want the Internet controlled by big corporations. Or a group of eccentric billionaires. But those are the realistic options with the current cost of WAN technology. The option where we let government, big corporations and/or billionaires pay for the Internet but insist that we retain control of it is not on the table.
But I have an idea! Instead of lashing out in a battle that cannot be won how about instead we get together and develop inexpensive WAN technology and then we can say goodbye to the oligarchy-controlled Internet once and for all.
It is not just a good idea, it is the only viable solution to a free Internet. Period.
utopiapi
I am a big fan of buying local. I am big fan of shopping on the Internet. I am a big fan of living a car-free lifestyle. We walked to the hospital to have our first child, and we walked him home. Our last child was born at home and, as far as I know, has never ridden in an automobile, though he does ride public transportation from time to time.
It is great to ride a bicycle and I have a trailer I can load with groceries and such, but this is the rainy Pacific Northwest and we have four little kids. What I really want is to be able to shop an Amazon-like site that knows where I am and what stores near to me have in stock and lets me buy stuff from them via credit card/Paypal/whatever. And have it delivered either directly by them or contract someone to go pick it up—cabs would probably be awesome for this, especially if it can be arranged so that they are doing the pickups nearby after dropping off a fare, such that time that was being wasted on unpaid, unfruitful return trips can now earn them extra money. And since most deliveries are not especially time-sensitive as long as they happen within a day (unless frozen food or donor organs are involved) the picked-up items could be brought back to the cab garage for later delivery, when a call for a cab from/to the area is received.
This goal is easy to achieve with today’s technology. So what’s the holdup?
Lack of a standardized API and lack of someone or something to drive adoption of such an API by POS/inventory system vendors. Lack of a geolocation-aware API directory publishing service.
The way I see the future, a shop or restaurant in a well-traveled area will not need its own website—third-party developers will use the available API data to create aggregate websites (perhaps supported by advertising) that let you browse the menus/inventory of all restaurants and shops in the area. Thus rather than zooming in on Alberta Street in Google Maps and then going along checking out the websites, yelp reviews, etc, of each place, one could just go to an Alberta Street site and see/search it all.
Less investment for the shop owners, more exposure, better data/experience for consumers, new income streams for web developers, a fantasy playground for entrepreneurs. Delivery services are a simple, obvious example; once this actually exists the ensuing creativity and innovation will produce game-changers we can only begin to imagine.
I see this as less of a technical challenge and more as something that requires people and persuasion skills—coming up with a generic standardized retail/restaurant API and directory service can probably be done over a weekend. The real work will be in getting all of the various POS/inventory system vendors to agree to the spec and either getting them to either develop the interfaces for their products internally or convincing them to allow others to do it for them.
I do not mind participating, but I do not have the spare cycles to be the lead on this. Any takers?