Shu-Yu Fu: 今天就以False-sharing就做為一天的結束吧

電腦上的儲存裝置的速度，暫存器﹙register﹚ > cache﹙快取﹚> RAM﹙Random-access memory﹚ > blah...blah...blah...

這篇文章的主題是cache，cache line是CPU管理cache的單位，有數種方法可以取得cache的大小[1]。在寫程式的時候，需要盡量保持資料在cache裡，這樣程式可以跑的比較快。

在multi-core的環境下，cache管理機制﹙MESI [2]﹚裡隱藏著一個陷阱，False-sharing。

False-sharing是指程式以multi-thread(process)的模式在存取資料時，意外造成cache invalidate，然後導致效能下降﹙因為要重新讀取資料更新cache﹚。

引用wiki[3]上的範例來看，

struct foo {
  int x;
  int y;
};

static struct foo f;

/* The two following functions are running concurrently: */

int sum_a(void)
{
  int s = 0;
  int i;
  for (i = 0; i < 1000000; ++i)
    s += f.x;
  return s;
}

void inc_b(void)
{
  int i;
  for (i = 0; i < 1000000; ++i)
    ++f.y;
}

兩個threads，一個執行sum_a﹙讀取f.x﹚，另一個執行inc_b﹙更新f.y﹚，兩個threads貌似彼此獨立，但，實際上卻因為f被分別塞進兩個CPUs的cache line，一邊不斷的更新，造成另一邊的cache invalidate，然後更新CPU cache，時間就無形的浪費了。但，這是可以解決的，[4, 5]列出了可能的解法。但，記得先profile[6]，確定你的cache miss問題嚴不嚴重！

隨著nVIDIA在CES展上發表了Tegra 4[7]，手動手持裝置﹙嵌入式系統？﹚也進入了很多核時代﹙其實早就雙核了啦﹚。撰寫程式時就需要知道這些雜七雜八的事。

但，話說回來，JavaScript / Python / Ruby / Lua，這種script language，要怎麼做才能控制到這種細節？

[1] Programmatically get the cache line size?,http://stackoverflow.com/questions/794632/programmatically-get-the-cache-line-size
[2] MESI protocol, http://en.wikipedia.org/wiki/MESI_protocol
[3] False sharing, http://en.wikipedia.org/wiki/False_sharing
[4] False sharing問題及其解决方法,http://rritw.com/a/JAVAbiancheng/thread/2011/0604/87966.html
[5] 多核平台下Cache的False Sharing问题 ,http://blog.csdn.net/duofeng/article/details/1525876
[6] Are there any way to profile cache miss in linux kernel?,http://stackoverflow.com/questions/9394193/are-there-any-way-to-profile-cache-miss-in-linux-kernel
[7] 輝達Tegra 4亮眼台積電代工,http://news.chinatimes.com/tech/171706/122013010800435.html

Shu-Yu Fu

星期一, 1月 14, 2013

今天就以False-sharing就做為一天的結束吧

沒有留言:

朋友的Blog