Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review Request: Rough Implementation of LBR on Memtrace #8

Open
wants to merge 111 commits into
base: memtrace
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
6f281b5
clang zero latency icache consumes 85% less cycles
takhandipu Dec 24, 2019
3a5e8e0
LBR data is defined, but mechanism is not
takhandipu Dec 24, 2019
d97aa92
Implemented LBR
Dec 24, 2019
0db17e3
Semantic bug fix
Dec 24, 2019
3ce9dfb
Checking whether we can pass LBR pointer via MemReq
Dec 24, 2019
3690a2d
LBR uses uint64_t instead of Address to ensure MemReq can use that
Dec 24, 2019
e002c68
remove bits/stdc++.h from lbr.h
Dec 24, 2019
c2130ac
Trying to fix MemReq plain old data type error
Dec 24, 2019
ef7f0fd
Trying to see whether ooo_filter_cache load can take lbr (by feault n…
Dec 25, 2019
4d8b751
Trying to see whether filter_cache load can take lbr (by feault nullptr)
Dec 25, 2019
1996ccf
Trying to see whether filter_cache replace and cache_arrays lookup ca…
Dec 25, 2019
2e9c002
LBR logging for a cache_arrays miss is enabled
Dec 25, 2019
bebf08d
Tiny syntactic bug fiz
Dec 25, 2019
0917048
Enabling lbr logging for each L1i->load() misses
Dec 25, 2019
b932c7a
Minor semantic bug fiz
Dec 25, 2019
6aabb32
Minor implementation bug fix
Dec 25, 2019
5b52963
l1i->load for ooo_core now includes program counter so that we can lo…
Dec 25, 2019
2c5689b
Wrong path L1i fetch does not include LBR or PC logging, Have to talk…
Dec 25, 2019
a77cec4
LBR data for a given BBL is pushed even before calling ooo_core->bbl …
Dec 25, 2019
e6be05b
added missed cache line address when logging LBR, I guess LBR impleme…
Dec 25, 2019
fbd76e1
forgot to add lbr.h in the previous commit, I guess LBR implementatio…
Dec 25, 2019
f7634af
LBR driven prefetch distance based prefetching implementation on zsim…
takhandipu Jan 8, 2020
bd5aab4
Syntax error fix
takhandipu Jan 8, 2020
50829ab
Syntax error fix
takhandipu Jan 8, 2020
bb80432
Syntax error fix
takhandipu Jan 8, 2020
b500b9d
Syntax error fix
takhandipu Jan 8, 2020
39bed0b
Syntax error fix
takhandipu Jan 8, 2020
11ad32c
Trying to fix c++ map SIGFPE
takhandipu Jan 8, 2020
fe83948
Trying to fix c++ map SIGFPE
takhandipu Jan 8, 2020
7d83033
Profile-guided prefetching now works inside zsim, however, the result…
takhandipu Jan 9, 2020
e49e10e
Full BBL tracing is enabled to calculate Fan-out
takhandipu Jan 9, 2020
f50c79d
Semantic error fix
takhandipu Jan 9, 2020
e217caa
Minor error fix
takhandipu Jan 9, 2020
8b3256c
Major bug fix regarding front-end cycle computation on prefetched ins…
takhandipu Jan 10, 2020
10b7368
Initiated the prefetch priority reduction in replacement policy
takhandipu Jan 20, 2020
b97795f
Didn't work, undoing last commit, :-(
takhandipu Jan 20, 2020
56cd56d
Prefetches don't update replacement policy timestamp
takhandipu Jan 20, 2020
47011d0
Previous one segfaulted, let's see whether this one works
takhandipu Jan 20, 2020
c940ffc
Picking the third one worsen the performance
takhandipu Jan 20, 2020
185c8b0
Picking the third one worsen the performance
takhandipu Jan 20, 2020
51b6654
Now picking the third one from the reverse order
takhandipu Jan 20, 2020
2e97283
Now picking the median
takhandipu Jan 20, 2020
e20f6b5
second latest one works best empirically, so we picked the second las…
takhandipu Jan 21, 2020
184b270
Added new bbl-info printing capability to log number of instructions …
takhandipu Jan 22, 2020
e8a5683
Trying pushing on to lbr after the bbl function
takhandipu Jan 23, 2020
d879b9c
Undoing previous commit
takhandipu Jan 23, 2020
ae92a07
Added cfg flags to enable/disable first-pass profile data and prefetc…
takhandipu Jan 24, 2020
8ea842b
Fixed bug in zeroLatencyCache implementation, ideal cache now has acc…
takhandipu Jan 28, 2020
386ea17
Started merging PT trace reader in to the memtrace branch
takhandipu Jan 29, 2020
529e41f
Not yet compiled
takhandipu Jan 29, 2020
9c3f196
Added dummy implementation of trace_reader virtual functions for pt t…
takhandipu Jan 29, 2020
c165df3
Debugging why xed decoding is failing
takhandipu Jan 29, 2020
97b6ee0
Bug fix while calling fill cache from pt trace reader
takhandipu Jan 29, 2020
d2f6391
Making pt instruction valid, but mem ops are still invalid
takhandipu Jan 29, 2020
581f6d4
Making memory ops valid, but memory addresses are still invalid
takhandipu Jan 29, 2020
e3a3053
Implemented the static code bloat fetch impact on the trace and the s…
takhandipu Jan 30, 2020
fe104ee
Prefetch buffer implementation is complete but may not work for cRec …
takhandipu Feb 10, 2020
ed3671b
Minor compile fix
takhandipu Feb 11, 2020
dba5e4b
Minor compile fix
takhandipu Feb 11, 2020
097d34a
Minor compile fix
takhandipu Feb 11, 2020
cf19392
name convention change in dynamorio, due to commit 5ed1a118e820fccae7…
takhandipu Feb 11, 2020
f65a866
Minor update
takhandipu Feb 11, 2020
03e007b
Removing load store match between expected and actual so that PT trac…
takhandipu Feb 11, 2020
9755424
Adding changes for prefetch buffer and hacky solution of jitted code …
Feb 14, 2020
562dd93
moving prefetch buffer from filter cache to ooo filter cache to avoid…
takhandipu Feb 14, 2020
76c6a92
Minor compile fix
takhandipu Feb 14, 2020
eda75c9
Minor compile fix
takhandipu Feb 14, 2020
38a979c
Minor bug fix
takhandipu Feb 14, 2020
b177791
Minor bug fix
takhandipu Feb 14, 2020
3a896cf
Removed BBL decoding cache for PT trace and added bbl info trace for …
takhandipu Feb 15, 2020
72dcd6b
Minor compile fix
takhandipu Feb 15, 2020
0cb2ee6
Minor compile fix
takhandipu Feb 15, 2020
9be9e66
Enabling the disabled assertion after PT decoding cache disable
takhandipu Feb 15, 2020
21719cc
Minor bug fix
takhandipu Feb 15, 2020
df85483
Minor bug fix
takhandipu Feb 15, 2020
f478bd5
Minor update
takhandipu Feb 15, 2020
2794794
Implemented context sensitive improvement on zsim
takhandipu Feb 26, 2020
438bd8c
Minor compile fix
takhandipu Feb 26, 2020
02bd99b
Bug fix
takhandipu Feb 27, 2020
fd70b1b
Added speculative flag on software prefetch so that mGETS does not in…
takhandipu Feb 27, 2020
f7acb03
Replaced original issuePrefetch with my load based prefetch
takhandipu Feb 28, 2020
f5e27cf
minor compile fix
takhandipu Feb 28, 2020
e4726c8
Pulled Heiner's fix from cdf5a3c0031854837639ffeaf3f0ed3a5f48a504
takhandipu Feb 28, 2020
71991a1
Pulled Heiner's fix from cdf5a3c0031854837639ffeaf3f0ed3a5f48a504
takhandipu Feb 28, 2020
921b0e9
reverting back to original issuePrefetch
takhandipu Feb 28, 2020
bdd030a
L1i prefetch now uses issuePrefetch
takhandipu Mar 2, 2020
9a5773a
making isSW flag true for software prefetch
takhandipu Mar 2, 2020
0e52c52
Making prefetches skip 1 to see what is the resut
takhandipu Mar 2, 2020
9b28dea
Undoing previous commit which reduces the performance by 14%
takhandipu Mar 2, 2020
fae80ae
Defining MONITOR_MISS_PCS based on Heiner's suggestion
takhandipu Mar 2, 2020
95b7828
monitoring top-25 missed PCs
takhandipu Mar 2, 2020
21e8f8e
Adding a new counter profPrefLateSavedCycles
takhandipu Mar 3, 2020
77baa28
Modified context-sensitive prefetch to support variable sized context
Mar 20, 2020
a5593a3
context size up to 8
Mar 20, 2020
1f85e8c
Minor fix
Mar 20, 2020
45e7ceb
made isSW prefetch false to see the results of all different counters
Mar 27, 2020
67b6298
divided profPrefEarlyMiss counter into profPrefEarlyMiss and profPref…
Mar 29, 2020
1a5f6cc
Added a new entry at the end of the LBR string to map it on to the dcfg
Apr 1, 2020
a004e37
minor bug fix
Apr 1, 2020
5991382
Changed context sensitive prefetch file description again
Apr 3, 2020
febc4d4
minor bug fix context sensitive prefetch
Apr 3, 2020
412a8c1
minor garbage collection
Apr 3, 2020
9a45a0c
AsmDB n-next line prefetching sim parameter is added to simulate the …
Apr 9, 2020
ed7862e
AsmDB miss-based next line prefetching is implemented, let's see whet…
Apr 9, 2020
04b7242
Figuring out verilator next line anomaly
Apr 13, 2020
eec8b6c
Trying to combine context and coalesce in a better way
Apr 14, 2020
fd7fc0d
Added the false positive rate parameter on the context sensitive pref…
Apr 18, 2020
6d222ca
Replace repz's with nops
takhandipu Feb 11, 2021
bd1b17b
repz's could be 4 bytes as well
takhandipu Feb 11, 2021
f5f1059
Added new flag and support to print branch and unconditional-branch d…
takhandipu Feb 18, 2021
d8abe76
Uncondition branch means every other branch types except conditional …
takhandipu Feb 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions SConstruct
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def buildSim(cppFlags, dir, type, pgo=None):
#env['CXX'] = 'g++ -flto -flto-report -fuse-linker-plugin'
#env['CC'] = 'gcc -flto'
env['CC'] = 'gcc7'
env['CXX'] = 'g++-7'
env['CXX'] = 'g++'
#env["LINKFLAGS"] = " -O3 -finline "
if useIcc:
env['CC'] = 'icc'
Expand Down Expand Up @@ -122,7 +122,7 @@ def buildSim(cppFlags, dir, type, pgo=None):
##env["CPPFLAGS"] += " -DDEBUG=1"

# Be a Warning Nazi? (recommended)
env["CPPFLAGS"] += " -Werror "
# env["CPPFLAGS"] += " -Werror "
env["CPPFLAGS"] += " -Wno-unused-function "
env["CPPFLAGS"] += " -Wno-int-in-bool-context "

Expand Down
4 changes: 4 additions & 0 deletions source.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
export PINPATH=/home/takh/tools/pin-3.11
export XEDPATH=~/git-repos/xed
export LD_LIBRARY_PATH=$XEDPATH/kits/xed-install-base-2019-11-27-lin-x86-64/lib/
export DRIOPATH=/home/takh/git-repos/dynamorio/
25 changes: 23 additions & 2 deletions src/cache_arrays.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,16 @@ void SetAssocArray::initStats(AggregateStat* parentStat) {
objStats->init("array", "Cache array stats");
profPrefHit.init("prefHits", "Cache line hits that were previously prefetched");
objStats->append(&profPrefHit);
profPrefEarlyMiss.init("prefEarlyMiss", "Prefetched cache lines that were never used or fetched too early so they were already evicted from the cache");
profPrefEarlyMiss.init("prefEarlyMiss", "Prefetched cache lines that were never used or fetched too early so they were already evicted from the cache before 400 cycles from startCycle");
objStats->append(&profPrefEarlyMiss);
profPrefNeverUsed.init("prefNeverUsed", "Prefetched cache lines that were never used or fetched too early so they were already evicted from the cache after 400 cycles from startCycle");
objStats->append(&profPrefNeverUsed);
profPrefLateMiss.init("prefLateMiss", "Prefetched cache lines that were fetched too late and were still in flight");
objStats->append(&profPrefLateMiss);
profPrefLateTotalCycles.init("prefTotalLateCyc", "Total cycles lost waiting on late prefetches");
objStats->append(&profPrefLateTotalCycles);
profPrefLateSavedCycles.init("profPrefLateSavedCycles", "Total cycles saved waiting on late prefetches");
objStats->append(&profPrefLateSavedCycles);
profPrefSavedCycles.init("prefSavedCyc", "Total cycles saved by hitting a prefetched line (also if late)");
objStats->append(&profPrefSavedCycles);

Expand Down Expand Up @@ -148,6 +152,7 @@ int32_t SetAssocArray::lookup(const Address lineAddr, const MemReq* req, bool up
profPrefLateMiss.inc();
profPrefLateTotalCycles.inc(*availCycle - req->cycle);
profPrefSavedCycles.inc(req->cycle - array[id].startCycle);
profPrefLateSavedCycles.inc(req->cycle - array[id].startCycle);
if (isHWPrefetch(req)) {
profPrefHitPref.inc();
}
Expand All @@ -173,6 +178,13 @@ int32_t SetAssocArray::lookup(const Address lineAddr, const MemReq* req, bool up
profPrefNotInCache.inc();
}

#ifdef LOG_L1I_MISS_LBR
if(req->core_lbr)
{
req->core_lbr->log_event(req->pc,lineAddr);
}
#endif

#ifdef MONITOR_MISS_PCS
//Gather Load PC miss stats
if (MONITORED_PCS && isDemandLoad(req)) {
Expand Down Expand Up @@ -227,7 +239,16 @@ void SetAssocArray::postinsert(const Address lineAddr, const MemReq* req, uint32
}

if(array[candidate].prefetch) {
profPrefEarlyMiss.inc();

if(array[candidate].startCycle + 400 > req->cycle)
{
profPrefEarlyMiss.inc();
}
else
{
profPrefNeverUsed.inc();
}

if (isHWPrefetch(req)) {
profPrefReplacePref.inc();
}
Expand Down
8 changes: 6 additions & 2 deletions src/cache_arrays.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@
#include "stats.h"
#include "g_std/g_unordered_map.h"
#include "g_std/g_multimap.h"
//#define MONITOR_MISS_PCS //Uncomment to enable monitoring of cache misses
#define MONITOR_MISS_PCS //Uncomment to enable monitoring of cache misses

#define LOG_L1I_MISS_LBR

struct AddrCycle {
Address addr; // block address
Expand Down Expand Up @@ -78,7 +80,7 @@ class SetAssocArray : public CacheArray {
uint32_t setMask;

#ifdef MONITOR_MISS_PCS
static const uint32_t MONITORED_PCS = 10;
static const uint32_t MONITORED_PCS = 25;
g_unordered_map<uint64_t, uint64_t> miss_pcs;
g_unordered_map<uint64_t, uint64_t> hit_pcs;
g_unordered_map<uint64_t, uint64_t> late_addr;
Expand All @@ -97,8 +99,10 @@ class SetAssocArray : public CacheArray {

Counter profPrefHit;
Counter profPrefEarlyMiss;
Counter profPrefNeverUsed;
Counter profPrefLateMiss;
Counter profPrefLateTotalCycles;
Counter profPrefLateSavedCycles;
Counter profPrefSavedCycles;
Counter profPrefInaccurateOOO;
Counter profHitDelayCycles;
Expand Down
26 changes: 20 additions & 6 deletions src/filter_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
#include "galloc.h"
#include "zsim.h"
#include "ooo_core_recorder.h"
#include "lbr.h"

/* Extends Cache with an L0 direct-mapped cache, optimized to hell for hits
*
Expand Down Expand Up @@ -128,16 +129,16 @@ class FilterCache : public Cache {
parentStat->append(cacheStat);
}

inline uint64_t load(Address vAddr, uint64_t curCycle, Address pc) {
inline uint64_t load(Address vAddr, uint64_t curCycle, Address pc, LBR_Stack *lbr=nullptr, bool no_update_timestamp=false, bool is_prefetch=false) {
Address vLineAddr = vAddr >> lineBits;
uint32_t idx = vLineAddr & setMask;
uint64_t availCycle = filterArray[idx].availCycle; //read before, careful with ordering to avoid timing races

if (vLineAddr == filterArray[idx].rdAddr) {
if (vLineAddr == filterArray[idx].rdAddr && availCycle < curCycle) {
fGETSHit++;
return MAX(curCycle, availCycle);
return MAX(curCycle+accLat, availCycle);
} else {
return replace(vLineAddr, idx, true, curCycle, pc);
return replace(vLineAddr, idx, true, curCycle, pc, lbr, no_update_timestamp,is_prefetch);
}
}

Expand All @@ -149,18 +150,31 @@ class FilterCache : public Cache {
fGETXHit++;
//NOTE: Stores don't modify availCycle; we'll catch matches in the core
//filterArray[idx].availCycle = curCycle; //do optimistic store-load forwarding
return MAX(curCycle, availCycle);
return MAX(curCycle+accLat, availCycle);
} else {
return replace(vLineAddr, idx, false, curCycle, pc);
}
}

uint64_t replace(Address vLineAddr, uint32_t idx, bool isLoad, uint64_t curCycle, Address pc) {
uint64_t replace(Address vLineAddr, uint32_t idx, bool isLoad, uint64_t curCycle, Address pc, LBR_Stack *lbr=nullptr, bool no_update_timestamp=false, bool is_prefetch=false) {
//assert(prefetchQueue.empty());
Address pLineAddr = procMask | vLineAddr;
MESIState dummyState = MESIState::I;
futex_lock(&filterLock);
MemReq req = {pc, pLineAddr, isLoad? GETS : GETX, 0, &dummyState, curCycle, &filterLock, dummyState, srcId, reqFlags};
if(lbr)
{
req.core_lbr = lbr;
}
else
{
req.core_lbr = nullptr;
}
req.no_update_timestamp = no_update_timestamp;
if(is_prefetch)
{
req.flags = req.flags| MemReq::SPECULATIVE;
}
uint64_t respCycle = access(req);

//Due to the way we do the locking, at this point the old address might be invalidated, but we have the new address guaranteed until we release the lock
Expand Down
125 changes: 125 additions & 0 deletions src/lbr.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
#ifndef LBR_H_
#define LBR_H_

#include <stdint.h>
#include <deque>
#include <sstream>
#include <string>
#include <fstream>
#include <iostream>
#include <set>
#include <unordered_map>
#include <string>

#define ENABLE_LBR
#define LBR_CAPACITY 32

class LBREntry
{
private:
uint64_t _bbl_address;
uint64_t _cycles; /*elapsed core clocks since last update to the LBR stack*/
public:
LBREntry(uint64_t bbl_address, uint64_t cycles)
{
_bbl_address = bbl_address;
_cycles = cycles;
}
std::string get_string()
{
std::ostringstream os;
os<<_bbl_address<<";"<<_cycles;
return os.str();
}
};

class LBR_Stack
{
private:
std::deque<LBREntry> _queue;
uint64_t last_cycle;
std::ofstream log_file;
std::ofstream full_log_file;
std::ofstream bbl_info_file;
std::ofstream self_modifying_bbl_info_file;
std::set<uint64_t> observed_bbls;
std::unordered_map<uint64_t,std::set<uint32_t>> bbl_size_difference_check;
uint64_t current_bbl_index;
public:
LBR_Stack()
{
last_cycle = 0;
current_bbl_index = 0;
_queue.clear();
}
void set_log_file(const char *path_name)
{
log_file.open(path_name);
}
void set_full_log_file(const char *path_name)
{
full_log_file.open(path_name);
}
void set_bbl_info_file(const char *path_name)
{
bbl_info_file.open(path_name);
std::string tmp(path_name);
tmp+="-self-modifying";
self_modifying_bbl_info_file.open(tmp.c_str());
}
void push(uint64_t bbl_address=0, uint64_t cur_cycle=0, uint32_t instrs=0, uint32_t bytes=0)
{
uint64_t result = cur_cycle;
if(cur_cycle!=0)
{
assert(cur_cycle>=last_cycle);
result=cur_cycle-last_cycle;
last_cycle = cur_cycle;
}
if(full_log_file.is_open())full_log_file<<bbl_address<<","<<result<<std::endl;
LBREntry new_entry(bbl_address, result);
if(likely(_queue.size()==LBR_CAPACITY))
{
_queue.pop_front();
}
_queue.push_back(new_entry);
if(observed_bbls.find(bbl_address)==observed_bbls.end())
{
observed_bbls.insert(bbl_address);
if(bbl_info_file.is_open())bbl_info_file<<bbl_address<<","<<instrs<<","<<bytes<<std::endl;
bbl_size_difference_check[bbl_address]=std::set<uint32_t>();
bbl_size_difference_check[bbl_address].insert(instrs);
}
else if (bbl_size_difference_check[bbl_address].find(instrs)==bbl_size_difference_check[bbl_address].end())
{
bbl_size_difference_check[bbl_address].insert(instrs);
if(self_modifying_bbl_info_file.is_open())self_modifying_bbl_info_file<<bbl_address<<","<<instrs<<","<<bytes<<std::endl;
}
current_bbl_index+=1;
}
std::string get_string()
{
std::ostringstream os;
for(int i=_queue.size()-1; i>-1; i--)
{
os<<_queue[i].get_string()<<",";
}
return os.str();
}
void log_event(uint64_t pc,uint64_t miss_cl_address)
{
if(log_file.is_open())log_file<<miss_cl_address<<","<<pc<<","<<get_string()<<current_bbl_index-1<<std::endl;
}
~LBR_Stack()
{
if(log_file.is_open())log_file.close();
if(full_log_file.is_open())full_log_file.close();
if(bbl_info_file.is_open())bbl_info_file.close();
observed_bbls.clear();
if(self_modifying_bbl_info_file.is_open())self_modifying_bbl_info_file.close();
for(auto it: bbl_size_difference_check)it.second.clear();
bbl_size_difference_check.clear();
}
};

#endif // LBR_H_
4 changes: 4 additions & 0 deletions src/memory_hierarchy.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include "g_std/g_vector.h"
#include "galloc.h"
#include "locks.h"
#include "lbr.h"

/** TYPES **/

Expand Down Expand Up @@ -108,6 +109,9 @@ struct MemReq {
//At that point PREFETCH is effectively set for the target cache insertion.
//Use with the 'SPECULATIVE' flag above to separate from demand accesses and to prevent additional reactive prefetches
uint32_t prefetch;

LBR_Stack * core_lbr;
bool no_update_timestamp;

inline void set(Flag f) {flags |= f;}
inline bool is (Flag f) const {return flags & f;}
Expand Down