MySQL
type
Post
status
Published
date
Feb 19, 2026
slug
mysql
summary
tags
category
Java Web
icon
password
Place
Fundamental NormalizationComplete execution sequence of an SQL statementMySQL ArchitectureLogRedo LogBin logUndo logTwo-phase commitWhy not flush to disk immediately after modifying a page?TransactionsTransaction Properties: ACID Concurrent Transactions Isolation LevelsHow to Avoid Phantom Reads at RR LevelIndex Index Structures What do B+ tree leaf nodes and non-leaf nodes store?When should indexes be created? Index ineffectivenessLeftmost Matching Principle
Fundamental
Normalization
A set of rules followed when constructing databases.
- First Normal Form: Each column contains atomic data that cannot be further split.
- Second Normal Form:Each column must depend on the entire primary key, not just a part of it. (Primarily focus on composite primary keys)
- Third Normal Form: Each column must directly depend on the primary key, not indirectly.
Complete execution sequence of an SQL statement
- FROM: Specifies the table(s) to query
- WHERE: Filters records based on conditions
- GROUP BY: Groups filtered records
- HAVING: Filter grouped records (aggregate functions apply here)
- SELECT: Selects required columns
- order by: Sorting
MySQL Architecture
MySQL consists of two main components: the Server and the Storage Engine
The Server layer primarily includes the Connector, Parser, Optimizer, and Executor
- The main role of the Connector is to check the username and password.
- The parser performs lexical analysis first, followed by syntactic analysis. For instance, it identifies keywords like "SELECT" and then checks if the entire SQL statement follows the correct grammar rules.
- The optimizer's job is to optimize SQL queries. A single SQL statement can have multiple execution paths, and the optimizer helps us choose the best one.
- Finally, the Executor takes over. Before running the query, it first checks if the user has the necessary privileges. If authorized, it follows the execution plan and calls the engine's interfaces directly to execute
Log
Redo Log
The redo log operates on the write-ahead log principle. When a record requires updating, the InnoDB engine first writes the record to the redo log. Subsequently, the operation record are flushed to the disk. If a system crash occurs, MySQL can use the redo log to recover data that has been written but not yet flushed to disk.
Bin log
The bin log records all SQL statements that modify data and stores them in binary format on disk. The bin log employs append-only writing. This means that when the bin log file reaches a certain size, it switches to the next file without overwriting previous logs.
Undo log
Undo log records the inverse operations of database changes made by transactions. Therefore, during transaction rollback, we only need to read the undo log to restore the database to its original state
Two-phase commit
Its purpose is to ensure consistency between the two logs. During the commit process, the redo log is written first but kept in a 'prepare' state. The transaction is finally committed only after the binlog is successfully written.
This ensures that if the system crashes after writing the redo log but before writing the binlog, recovery will check whether the redo log is prepared. If so, it will roll back, guaranteeing logical consistency between the two logs.
Why not flush to disk immediately after modifying a page?
Flushing to disk is very expensive. We use WAL (Write-Ahead Logging) to write changes to the redo log first, and then update the actual data files later. This reduces the number of I/O operations and significantly improves efficiency.
Transactions
Transaction Properties: ACID
ACID (Atomicity, Consistency, Isolation, Durability)
- Atomicity: A transaction either succeeds entirely or fails completely. Atomicity is ensured through the undo log.
- Consistency
- Isolation: Ensures safety during concurrent transactions through MVCC (Multi-Version Concurrency Control)
- Durability: Ensures data modifications are persistent and cannot be lost even if the database crashes. Achieved through logs (redo log and bin log).
Concurrent Transactions
- Dirty Read: A dirty read happens when a transaction reads uncommitted data from another transaction. If that transaction rolls back later, the data you read essentially never existed.
- Non-repeatable Red: Non-repeatable read means you read the same data twice in one transaction, but get different results because someone else updated and committed it in between.
- Phantom Read: It’s when you run the same range search twice and get more rows the second time. It's usually caused by another transaction inserting new data into your range
Isolation Levels
- Read Uncommitted: A transaction can read data modified but not yet committed by another transaction.
- Read Committed:Changes become visible to other transactions only after the transaction is committed.
- Repeatable Read: In Repeatable Read, you always see the same data no matter how many times you read it. Even if other transactions commit updates, those changes stay invisible to you until you finish.
- Serializable:as the name suggests, means transactions run one after another. It forces a later transaction to wait until the previous one is fully committed, ensuring the highest level of safety.
How to Avoid Phantom Reads at RR Level
In MySQL's Repeatable Read, phantom reads can still occur if you mix snapshot reads with current reads. The root cause is the inconsistency: a snapshot read looks at an 'old view,' but a current read sees the 'latest data.' When you update a record you couldn't see before, it suddenly appears in your next snapshot. To avoid this, we should stick to one type—either stay with snapshot reads or use
FOR UPDATE for consistent current reads.Index
The primary purpose of an index is to speed up queries.
Index Structures
B-tree: A multi-way balanced search tree.
- Each node has multiple branches
- Both leaf and non-leaf nodes store values
B+ trees are essentially an enhanced version of B-trees
- All data is stored exclusively in leaf nodes; non-leaf nodes store pointers
- Leaf nodes are connected via doubly linked lists
Advantages:
- Reduced I/O since non-leaf nodes store no data, resulting in fewer levels for the same data volume
- Efficient range queries due to bidirectional linked lists connecting leaf nodes
- More stable query performance, as all queries traverse to leaf nodes with consistent path lengths
What do B+ tree leaf nodes and non-leaf nodes store?
Non-leaf nodes store pointers. For leaf nodes, primary key indexes store the row's values, while non-primary key indexes store the primary key index. This is known as a backtrace.
When should indexes be created?
- Fields frequently used in queries
- Fields frequently used in joins (JOIN)
- Fields frequently requiring sorting. Indexing these fields speeds up sorting queries since the index is already sorted.
Index ineffectiveness
Index invalidation usually happens when the query breaks the sorted structure of the B+ tree. Here are the 4 most common scenarios:
- Violating the Leftmost Prefix Rule: For composite indexes, if the leading column is missing in the WHERE clause, the index won't be used
- Functions or Calculations: Using functions or performing math on indexed columns makes the index ineffective since it only stores raw values.
- Implicit Type Conversion: For example, comparing a VARCHAR column to a numeric value without quotes.
- Leading Wildcards: Using
LIKEwith a leading wildcard (e.g.,'%abc') forces a full table scan.
Leftmost Matching Principle
MySQL follows the 'leftmost prefix rule' for composite indexes, matching fields from left to right. Therefore, placing high-selectivity columns on the left is a best practice, as it helps filter out more records earlier in the search.
上一篇
Redis
下一篇
Structured Output
Loading...