Designing Distributed Systems

Scalability And Performance Split Your Data and Simplify

Introduction

Here are a few guidelines for supporting scalability and performance in your systems.
  1. Simplify — Simplify your code and design, you will gain from it an easier to understand and a scalable system, your life will be scalable, the more complex it is the less it’s possible to scale it out and the more complex your life is. Of Course if its not possible to simplify do not we are sane people, but many times we only think its not possible to simplify while it’s possible, so do ourself a favour and put some effort on this.
2. X Axis Duplicate Data Create multiple read only db’s or clones for your data and thus scale your reads. You can then use a read query to read across multiple copies of your data thus less strain on your servers.
3. Y Axis Split on business your data like microservices also in db level not only in service level, different roles, different db’s. Do you sell both underwear and have another line of business for atomic energy manufacturing? what do you think of having 2 db’s or more, or just splitting your data you could still have a single db, just make sure you split things.
4. Z Axis Split on same - If you have multiple customers split on customer id, you can put smaller customers on same shard and larger one on different. Be sure you split on categories that makes sense for an even split and not for example on location, in this case if 80% of your customers are from a certain place you didn’t split ok.
5. 3 Data centers rules — If you had only 2 data centers you need 200% capacity if one goes down in order to serve 100% capacity, if you have 3 data centers you need total of 150% so two of them would make 100% capacity if one of them goes down.
6. Remember storage alternatives — Remember you have different storages such as file storage such as `ceph` (don’t forget the file system), nosql, wide column storage such as cassandra and relational. Column storage usually provide automatic row sharding and asynchronous replication with eventual consistency, column split requires more of manual intervention.
7. Consistency - If you increase consistency for example on nosql then operations such as `getSomething` would require to contact all nodes to make sure they return the recent and greatest version.
8. Firewalls are like locks — You lock your main door but you don’t lock internal doors is that right? Credit card request through lock but not image request. Don’t overuse your firewall, it’s complex enough without it.
9. Really need a transaction? When you pass money from one customer to another do you really need a transaction? Consider all options, usually when you start considering event sourcing you see you can compromise without transactions.
10. Dont read validate your write Have you just wrote something to disk/cache/db? don’t reread it in order to validate it your servers have more useful things to do.

Resources



Comments