Scale a relational database

2024-02-20

sql faster

Overview

Splits up databases by function. chú ý là chia theo db, không phải là chia theo table.

For example, instead of a single, monolithic database, you could have three databases: forums, users, and products.

Hạn chế: khó join từ 2 db

Chia db theo từng khoảng dữ liệu vào các shard.

Ưu điểm: chia nhỏ dữ liệu nên các task liên quan song song sẽ nhanh hơn

Hạn chế: cần logic phức tạp khi xử lí liên quan shard. rebalance shard khó (cần consistent hashing)

Storing redundant information (with maintenance) about join

Nên dùng cho các task đọc » ghi (vì các phép đọc có join là tốn kém)

Hạn chế: data duplicate, constraints, heavy write load might perform worse

It’s important to benchmark (simulate high-load ) and profile (slow query log) to simulate and uncover bottlenecks.

Tighten up the schema

MySQL dumps to disk in contiguous blocks for fast access.
Use CHAR instead of VARCHAR for fixed-length fields.CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string before moving onto the next one.
Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field results in storing a pointer on disk that is used to locate the text block.
Use INT for larger numbers up to 2^32 or 4 billion.
Use DECIMAL for currency to avoid floating point representation errors.
Avoid storing large BLOBS, store the location of where to get the object instead.
VARCHAR(255) is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
Set the NOT NULL constraint where applicable to improve search performance.

Use good indices

Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
Indices are usually represented as self-balancing B-tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
Placing an index can keep the data in memory, requiring more space.
Writes could also be slower since the index also needs to be updated.
When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.

Avoid expensive joins: Denormalize where performance demands it.

Partition tables: Break up a table by putting hot spots in a separate table to help keep it in memory.

Tune the query cache: In some cases, the query cache could lead to performance issues.