CMU 15-445 Project1: BUFFER POOL MANAGER

OS vs. DB

TASK #1 - LRU REPLACEMENT POLICY

1.1 问题描述

需要完成的函数

  • Victim(frame_id_t*) : Remove the object that was accessed least recently compared to all the other elements being tracked by the Replacer, store its contents in the output parameter and return True. If the Replacer is empty return False.
  • Pin(frame_id_t) : This method should be called after a page is pinned to a frame in the BufferPoolManager. It should remove the frame containing the pinned page from the LRUReplacer.
  • Unpin(frame_id_t) : This method should be called when the pin_count of a page becomes 0. This method should add the frame containing the unpinned page to the LRUReplacer.
  • Size() : This method returns the number of frames that are currently in the LRUReplacer

想到lc上有类似的实现可以参考一下,双链表+哈希表实现LRU,大致思路和本题是一样的。pin时表示该页正在被使用,因此需要将该页从lru_replacer中移除,unpin反之。需要注意的是,不能多次unpin,这点与lc上有差别。

1.2 部分实现

lru_replacer.h

1
2
3
4
5
6
7
8
private:
// TODO(student): implement me!
std::unordered_map<frame_id_t, ListNode*> cache_;
std::mutex lru_latch_;
ListNode* head_; // dummy.
ListNode* tail_;
size_t curr_size_;
size_t capacity_;

lru_replacer.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
bool LRUReplacer::Victim(frame_id_t *frame_id) {  // evict the old frame.
std::lock_guard<std::mutex> guard(lru_latch_); // lock
if (curr_size_ == 0) { // lruReplacer empty.
frame_id = nullptr;
return false;
}
ListNode* delframe = tail_->prev;
frame_id_t delframe_id= delframe->frame_id;
*frame_id = delframe_id;
cache_.erase(delframe_id); // remove from frame.
curr_size_--;
// Evict frame.
RemoveFrame(delframe);
delete delframe;
return true;
}

void LRUReplacer::Pin(frame_id_t frame_id) {
std::lock_guard<std::mutex> guard(lru_latch_); // lock
if (cache_.find(frame_id) != cache_.end()) {
// decrease the attribute size.
curr_size_--;
ListNode* delframe = cache_[frame_id];
cache_.erase(frame_id);
RemoveFrame(delframe);
// release the source.
delete delframe;
}
}

void LRUReplacer::Unpin(frame_id_t frame_id) {
std::lock_guard<std::mutex> guard(lru_latch_); // lock
if (cache_.find(frame_id) == cache_.end()) { // didn't find Unpin frame.
ListNode* frame = new ListNode(frame_id);
++curr_size_;
if (curr_size_ > capacity_) {
frame_id_t fid;
Victim(&fid);
cache_[frame_id] = frame;
curr_size_--;
AddToHead(frame);
} else {
cache_[frame_id] = frame;
AddToHead(frame);
}
}
// couldn't Unpin many times but once.
}

size_t LRUReplacer::Size() { return curr_size_; }

void LRUReplacer::AddToHead(ListNode* frame) {
frame->next = head_->next;
head_->next->prev = frame;
head_->next = frame;
frame->prev = head_;
}

void LRUReplacer::RemoveFrame(ListNode* frame) {
frame->prev->next = frame->next;
frame->next->prev = frame->prev;
}

TASK #2 - BUFFER POOL MANAGER INSTANCE

2.1 问题描述

可以联系课程给的slide了解对应的关系,需要理解的概念以及注意的点:

  1. free_list、lru_replacer、buffer_pool是独立存在的。
  2. free_list中存放的是未被使用的frame,页表中不存在其frame的映射,即page_id为INVALID_PAGE_ID。
  3. unpin_page,存放在lru_replacer中,页表中存在其frame的映射,但其在buffer pool中未被使用。
  4. pin_page,存在于buffer pool中且正在被使用。
  5. 刷盘的时机,只需要在victim、deletePage时刷盘即可,不需要在每次unpinpage时刷盘,在unpinpage时保存其传入的dirty状就行。因此每次fetch或者new一个页,从free_list中返回的页都不是脏页,同时victim时刷盘也保证获取的页不是脏页。
  6. 从fetch、new获取lru_replace中的页时,需要在页表中删除原page与frame的映射。
  7. 避免加锁函数的嵌套,可能会出现死锁的情况。

2.2 代码实现

可以进一步细化锁粒度,这里只给了未优化的实现。

NewPgImp

fetchpage和newpgImp都需要增加pin_count值,实现是差不多的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Page *BufferPoolManagerInstance::NewPgImp(page_id_t *page_id) {
// 0. Make sure you call AllocatePage!
// 1. If all the pages in the buffer pool are pinned, return nullptr.
// 2. Pick a victim page P from either the free list or the replacer. Always pick from the free list first.
// 3. Update P's metadata, zero out memory and add P to the page table.
// 4. Set the page ID output parameter. Return a pointer to P.
std::lock_guard<std::mutex> lock(latch_);
// step 1
bool all_pinned = true;
for (size_t i = 0; i < pool_size_; ++i) {
if (pages_[i].pin_count_ <= 0) {
all_pinned = false;
break;
}
}
if (all_pinned)
return nullptr;
// allocate a page on disk.
*page_id = AllocatePage();

Page* p = nullptr;
// pick a victim page P.
frame_id_t fid;
if (free_list_.empty()) { // pick from free list first.
if (!replacer_->Victim(&fid)) {
return nullptr;
} else {
p = &pages_[fid];
if (p->is_dirty_) {
disk_manager_->WritePage(p->page_id_, p->data_);
p->is_dirty_ = false;
}
page_table_.erase(p->page_id_);
}
} else {
fid = free_list_.front();
free_list_.pop_front();
p = &pages_[fid];
}
// add P to the page table.
page_table_[*page_id] = fid;
// updata P's metadata.
p->page_id_ = *page_id;
p->pin_count_ = 1;
replacer_->Pin(fid);
// zeroes out the data that is held within the page
p->ResetMemory();
// step 4
return p;
}

FetchPgImp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Page *BufferPoolManagerInstance::FetchPgImp(page_id_t page_id) {
// 1. Search the page table for the requested page (P).
// 1.1 If P exists, pin it and return it immediately.
// 1.2 If P does not exist, find a replacement page (R) from either the free list or the replacer.
// Note that pages are always found from the free list first.
// 2. If R is dirty, write it back to the disk.
// 3. Delete R from the page table and insert P.
// 4. Update P's metadata, read in the page content from disk, and then return a pointer to P.
std::lock_guard<std::mutex> lock(latch_);
// step 1
frame_id_t fid;
Page* p = nullptr;
if (page_table_.find(page_id) != page_table_.end()) { // find.
fid = page_table_[page_id];
pages_[fid].pin_count_++;
// pin it.
replacer_->Pin(fid);
return &pages_[fid];
}
if (free_list_.empty()) { // pick from free list first.
if (!replacer_->Victim(&fid)) {
return nullptr;
} else {
// the page placed in the lrureplacer need to be flushed.
p = &pages_[fid];
if (p->is_dirty_) {
disk_manager_->WritePage(p->page_id_, p->data_);
p->is_dirty_ = false;
}
page_table_.erase(p->page_id_);
}
} else {
fid = free_list_.front();
free_list_.pop_front();
p = &pages_[fid];
}
// step 3
// insert p.
page_table_[page_id] = fid;
replacer_->Pin(fid);
// update P's metadata.
p->page_id_ = page_id;
p->pin_count_ = 1;
// read in the page content from disk.
disk_manager_->ReadPage(page_id, p->data_);
return p;
}

DeletePgImp

删除buffer pool中的页后需要将page的元数据还原到初始值,再放入free_list中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
bool BufferPoolManagerInstance::DeletePgImp(page_id_t page_id) {
// 0. Make sure you call DeallocatePage!
// 1. Search the page table for the requested page (P).
// 1. If P does not exist, return true.
// 2. If P exists, but has a non-zero pin-count, return false. Someone is using the page.
// 3. Otherwise, P can be deleted. Remove P from the page table, reset its metadata and return it to the free list.
std::lock_guard<std::mutex> lock(latch_);
// P does not exist.
if (page_table_.find(page_id) == page_table_.end()) {
return true;
}
// P does exist.
frame_id_t fid = page_table_[page_id];
Page* deletepage = &pages_[fid];
if (deletepage->pin_count_ != 0) {
return false;
}
// flush the page before deallocate it.
if (deletepage->is_dirty_) {
disk_manager_->WritePage(page_id, deletepage->data_);
deletepage->is_dirty_ = false;
}
DeallocatePage(page_id);
// remove P from the page table.
page_table_.erase(page_id);
// reset its metadata.
// the page returned to freelist does not stores any page.
deletepage->page_id_ = INVALID_PAGE_ID;
deletepage->is_dirty_ = false;
deletepage->pin_count_ = 0;
// return it to free_list_.
free_list_.push_back(fid);
return true;
}

UnpinPgImp

保留传入的dirty状态,在Victim时刷盘。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
bool BufferPoolManagerInstance::UnpinPgImp(page_id_t page_id, bool is_dirty) {
std::lock_guard<std::mutex> lock(latch_);
frame_id_t fid = page_table_[page_id];
Page* p = &pages_[fid];
p->is_dirty_ = is_dirty; // hold the state until victim.
if (p->pin_count_ <= 0)
return false;
--p->pin_count_;
if (p->pin_count_ == 0) {
replacer_->Unpin(fid);
return true;
}
return false;
}

FlushPgImp

1
2
3
4
5
6
7
8
bool BufferPoolManagerInstance::FlushPgImp(page_id_t page_id) {
// Make sure you call DiskManager::WritePage!
std::lock_guard<std::mutex> lock(latch_);
if (page_table_.find(page_id) == page_table_.end() || page_id == INVALID_PAGE_ID)
return false;
disk_manager_->WritePage(page_id, pages_[page_table_[page_id]].data_);
return true;
}

TASK #3 - PARALLEL BUFFER POOL MANAGER

3.1 问题描述

问题是从task2进一步延出来的,单个缓冲池管理器可能会照成大量的锁争用,因为在这种情况下每个线程和缓冲池交互都争着用单个锁存器,因此需要实现一个并行管理器来管理多个缓冲池管理器,进而实现每个缓冲池都有自己的latch。在这里只需要复用上一个task所实现的缓冲器实例即可,再实现一些基本的逻辑即可通过。

3.2 代码实现

parallel_buffer_pool_manager.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ParallelBufferPoolManager::ParallelBufferPoolManager(size_t num_instances, size_t pool_size, DiskManager *disk_manager,
LogManager *log_manager) {
// Allocate and create individual BufferPoolManagerInstances
num_instances_ = num_instances;
pool_size_ = pool_size;
for (size_t i = 0; i < num_instances; ++i) {
parallel_.emplace_back(new BufferPoolManagerInstance(pool_size, num_instances, i, disk_manager, log_manager));
}
}

// Update constructor to destruct all BufferPoolManagerInstances and deallocate any associated memory
ParallelBufferPoolManager::~ParallelBufferPoolManager() {
for (auto bpm : parallel_) {
delete bpm;
}
parallel_.clear();
std::vector<BufferPoolManager*>().swap(parallel_);
}

parallel_buffer_pool_manager.h

1
2
3
4
5
private:
std::vector<BufferPoolManager*> parallel_;
uint32_t num_instances_;
size_t pool_size_;
int start_ = 0;

总结

通过完成lab1,让我站在DBMS的角度去理解了操作系统的工作原理,以及对页表,缓存,刷盘时机,页面调度算法LRU有了更深刻的理解。