Scale a relational database
sql faster
Overview
- replication: master-slave replication, master-master replication
- federation
- sharding
- denormalization
- SQL tuning.
Replication
Federation
Splits up databases by function. chú ý là chia theo db, không phải là chia theo table.
For example, instead of a single, monolithic database, you could have three databases: forums, users, and products.
Hạn chế: khó join từ 2 db
Sharding
Chia db theo từng khoảng dữ liệu vào các shard.
Ưu điểm: chia nhỏ dữ liệu nên các task liên quan song song sẽ nhanh hơn
Hạn chế: cần logic phức tạp khi xử lí liên quan shard. rebalance shard khó (cần consistent hashing)
Denormalization
Storing redundant information (with maintenance) about join
Nên dùng cho các task đọc » ghi (vì các phép đọc có join là tốn kém)
Hạn chế: data duplicate, constraints, heavy write load might perform worse
SQL tuning
It’s important to benchmark (simulate high-load ) and profile (slow query log) to simulate and uncover bottlenecks.
Tighten up the schema
- MySQL dumps to disk in contiguous blocks for fast access.
- Use CHAR instead of VARCHAR for fixed-length fields.CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string before moving onto the next one.
- Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field results in storing a pointer on disk that is used to locate the text block.
- Use INT for larger numbers up to 2^32 or 4 billion.
- Use DECIMAL for currency to avoid floating point representation errors.
- Avoid storing large BLOBS, store the location of where to get the object instead.
- VARCHAR(255) is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
- Set the NOT NULL constraint where applicable to improve search performance.
Use good indices
- Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
- Indices are usually represented as self-balancing B-tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
- Placing an index can keep the data in memory, requiring more space.
- Writes could also be slower since the index also needs to be updated.
- When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.
Avoid expensive joins: Denormalize where performance demands it.
Partition tables: Break up a table by putting hot spots in a separate table to help keep it in memory.
Tune the query cache: In some cases, the query cache could lead to performance issues.