Scale a relational database

2024-02-20

sql faster

Overview

  • replication: master-slave replication, master-master replication
  • federation
  • sharding
  • denormalization
  • SQL tuning.

Replication

xem tại bài CAP

Federation

Splits up databases by function. chú ý là chia theo db, không phải là chia theo table.

For example, instead of a single, monolithic database, you could have three databases: forums, users, and products.

Hạn chế: khó join từ 2 db

Sharding

Chia db theo từng khoảng dữ liệu vào các shard.

Ưu điểm: chia nhỏ dữ liệu nên các task liên quan song song sẽ nhanh hơn

Hạn chế: cần logic phức tạp khi xử lí liên quan shard. rebalance shard khó (cần consistent hashing)

Denormalization

Storing redundant information (with maintenance) about join

Nên dùng cho các task đọc » ghi (vì các phép đọc có join là tốn kém)

Hạn chế: data duplicate, constraints, heavy write load might perform worse

SQL tuning

It’s important to benchmark (simulate high-load ) and profile (slow query log) to simulate and uncover bottlenecks.

Tighten up the schema
  • MySQL dumps to disk in contiguous blocks for fast access.
  • Use CHAR instead of VARCHAR for fixed-length fields.CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string before moving onto the next one.
  • Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field results in storing a pointer on disk that is used to locate the text block.
  • Use INT for larger numbers up to 2^32 or 4 billion.
  • Use DECIMAL for currency to avoid floating point representation errors.
  • Avoid storing large BLOBS, store the location of where to get the object instead.
  • VARCHAR(255) is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
  • Set the NOT NULL constraint where applicable to improve search performance.
Use good indices
  • Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
  • Indices are usually represented as self-balancing B-tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
  • Placing an index can keep the data in memory, requiring more space.
  • Writes could also be slower since the index also needs to be updated.
  • When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.

Avoid expensive joins: Denormalize where performance demands it.

Partition tables: Break up a table by putting hot spots in a separate table to help keep it in memory.

Tune the query cache: In some cases, the query cache could lead to performance issues.

Ref

system-design-primer