Skip to content
Paul An edited this page Aug 22, 2022 · 1 revision

CSM (Context-preserving Squash Merge)

  • 필요한 이유
    • DAG의 병합 커밋에서 부모로 돌아가는 것을 추적하는 것은 힘듬
    • DAG를 STEM으로 변환하는 것은 복잡도를 낮출 수 있으나 STEM의 수를 줄이지는 못함
  • 방식
    • 병합 커밋을 단순성을 위해서 단일 노드에 합침
    • 컨텍스트를 보존하기 위해서 메시지를 가져옴
    • ! 커밋이 여러 CSM 기반의 부모인 경우 가장 왼쪽에 있는 커밋을 기준으로 선택
    • Author, Commit Type, Log Message 등을 수집해 필드 끝에 추가
    • 병합된 PR의 경우 PR의 추가 정보들 (Pull Number, Message, Content)를 포함

Reducing raw PR data to compact PR data

Python

with open("../log/" + repo_name + ".pulls_raw.json", "r", encoding="utf-8") as pulls_json:
    raw_pulls = json.load(pulls_json)
    pulls_compact_data = []
    for item in raw_pulls:
        newItem = {}
        newItem["number"] = item["number"]
        newItem["state"] = item["state"]
        newItem["title"] = item["title"]
        newItem["body"] = item["body"]
        newItem["message"] = item["title"] if item["body"] == None else item["title"] + " " + item["body"]
        newItem["merge_commit_sha"] = item["merge_commit_sha"]
        newItemHead = {}
        newItemHead["sha"] = item["head"]["sha"]
        newItem["head"] = newItemHead
        newItemBase = {}
        newItemBase["sha"] = item["base"]["sha"]
        newItem["base"] = newItemBase
        newItem["commitsLink"] = item["_links"]["commits"]["href"]
        newItem["merged"] = item["merged"]
        pulls_compact_data.append(newItem)
    
    with open("../log/"+ repo_name + "." + "pulls_compress.json","w", encoding="utf-8") as info_json:
        json.dump(pulls_compact_data, info_json, indent="\t")

JS (porting sample)

return pull_requests.map({number, state, title, body, merge_commit_sha, head, base, _link: {commits: {href}}, merged} => {
  number,
  state,
  title,
  body,
  message: body ? `${title} ${body}` : title,
  merge_commit_sha,
  head,
  base,
  commitsLink: href,
  merged,
}

Remark

개발 단계에서 commitsLink를 그대로 사용할지 의사결정 필요 (좋은 단어 선택이 있다면 변경 가능)

Connecting pull / issue info with commit history

Python

with open('./token.txt', "r") as token_file:
    access_token = "?access_token=" + token_file.readline()

def add_issue():
    origin_file_name = "../log/" + repo_name + ".nlp.json"
    
    with open(origin_file_name) as origin_commit_file:
        origin_commits = json.load(origin_commit_file)
        for commit in origin_commits:
            message = commit["message"]
            issue_reg = re.compile("#\d+")
            m = issue_reg.findall(message)
            
            related_issues = []            
            if m:
                for issue in m:
                    related_issues.append(issue[1:])
            
            commit["issues"] = related_issues
    
        return origin_commits

def add_pull(origin_commits):
    origin_pull_file_name = "../log/" + repo_name + ".pulls_compress.json"
    final_file_name = "../log/" + repo_name + ".nlp.withissue.json"
    
    sha2Index = {}
    
    for (idx, commit) in enumerate(origin_commits):
        sha2Index[commit["id"]] = idx
        
    with open(origin_pull_file_name) as pull_info_file:
        pulls_info = json.load(pull_info_file)
        
        print("Total pull #: " + str(len(pulls_info)))
        for (idx, pull) in enumerate(pulls_info):
            link = pull["commitsLink"]
            r = requests.get(link + access_token)
            if (r.ok):
                repoItem = json.loads(r.text or r.content)
                for commit_info in repoItem:
                    try:
                        index = sha2Index[commit_info["sha"]]
                    except:
                        continue
                    
                    if "pulls" not in origin_commits[index].keys():
                        origin_commits[index]["pulls"] = [int(pull["number"])]
                    else:
                        origin_commits[index]["pulls"].append(int(pull["number"]))
            else:
                while True:
                    print("Wait until the api rate restores...[3 minutes]")
                    time.sleep(180)
                    remaining_rate = retreive_rate(access_token)              
                    print("Remaining API Rate: " + str(remaining_rate) + " times")
                    if(remaining_rate > 2000):
                        break
            if idx % 10 == 0:
                print("Pull #" + str(idx) + " handled")

    final_file = open(final_file_name, "w")
    final_file.write(json.dumps(origin_commits, indent=4, separators=(',', ': ')))
   
add_pull(add_issue())

JS (porting sample)

const regex = new Regex('#\d+');

const commits = origin_commits;

for(const commit of commits) {
  const {message} = commit;
  const pullRequestMessages = message.filter(m => regex.test(m));
  const related_issues = [];
  for(const pullRequestMessage of pullRequestMessages) {
    related_issues.push(pullRequestMessage.slice(1));
  }
  commit.issues = related_issues;
}

const add_pull = async (origin_commits) => {
  const sha2Index = {};
  origin_commits.foreach((commit, index) => {
    sha2Index[commit.id] = index;
  });

  const pullRequests = pulls_compression;
  for(const [index, pullRequest] of pullRequests.entries()) {
    const response = await axios.get(pullRequest.commitsLink) // TODO : request as octokit
    if(response.ok) {
      const targetCommit = response.text || response.content;
      targetCommit.foreach(info => {
        let index;
        if(info.sha) {
          index = sha2Index[info.sha]; 
        } else {
          continue;
        }
        if(origin_commits[index].keys().has("pulls"))
          origin_commits[index]["pulls"].push(+pullRequest["number"]);
        else
          origin_commits[index]["pulls"] = +pullRequest["number"];
      });
    } else {
      // TODO : retry as octokit
    }
  }
}

Remark

regex를 사용하는 이유는 #35와 같은 PR로 구성되어 있는 커밋 메시지를 찾기 위함

개발 단계에서 확인이 필요한 사항

  • message가 어떤 Type인지 (Array | Object)일 가능성 높음
  • slice(1)을 하는 행위
  • axios call의 response Type
  • 현재 python코드에서 exception처리되어 있는 부분을 핸들링하기

Squash Merge