Local LLM 이용한 Github 코드 리뷰 Actions 구현 및 배포하기 with Github Copilot

Local LLM 이용한 Github 코드 리뷰 Actions 구현 및 배포하기 with Github Copilot - 2프로젝트/Local LLM 코드 리뷰 github actions2025. 3. 3. 21:06@ray5273

Table of Contents

지난번 포스트에서 발견한 이슈들

언어 변경 요청에 대한 한글 답변이 잘 안된다.
내가 원하는 프롬프트 적용이 잘 안된다 (logger를 추가할만한 위치 추천을 안함)
input/output 사이즈가 길때의 조정이 필요하다.

Local LLM 이용한 Github 코드 리뷰 Actions 구현 및 배포하기 with Github Copilot - 1

코드 개발 및 작성은 아래 링크에서 진행합니다. GitHub - ray5273/ollama-pr-review-actionContribute to ray5273/ollama-pr-review-action development by creating an account on GitHub.github.com 개발 전 코드리뷰 봇 오픈소스 검

ray5273.tistory.com

이전 포스트와 본 포스트의 변경점은 모두 아래 깃허브 링크에 반영되어있음을 알려드립니다.

ray5273/ollama-pr-review-action

GitHub - ray5273/ollama-pr-review-action

Contribute to ray5273/ollama-pr-review-action development by creating an account on GitHub.

github.com

이슈 1에 대한 해결 - 2개의 LLM을 사용해서 Pipelining

결국은 이전 포스트에서도 사용했듯 리뷰를 위한 LLM과 번역을 위한 LLM을 연속으로 사용하는 방법이 효과가 가장 좋았습니다.

리뷰 품질에 대한 개선은 필요하지만 번역에 대한 품질은 8B 정도의 모델을 사용해서도 충분했기에 이를 활용하였습니다.

실제 결과는 아래와 같았습니다.

리뷰용 코더 LLM (Qwen2.5-coder:32B) + 번역용 LLM (exaone3.5:7.8B) 출력 예시

https://github.com/ray5273/ollama-pr-review-action/pull/2#pullrequestreview-2652648232

Removed unnecessary prompt by ray5273 · Pull Request #2 · ray5273/ollama-pr-review-action

모델을 켜지 않고도 동작하는지 확인하기

github.com

제공된 패치는 원본 스크립트에 여러 개선 사항과 수정 사항을 도입하며, 주로 모델 관리와 번역 기능에 초점을 맞추고 있습니다. 변경 사항에 대한 자세한 분석은 다음과 같습니다:

1. **모델 관리 함수**:
   - `manage_model`: 이 함수는 모델을 풀링(pull), 푸시(push), 스트리밍으로 풀링, 삭제 등 다양한 모델 관련 작업을 처리합니다.
   - `pull_model_stream`: 모델을 스트리밍 방식으로 풀링하기 위한 보조 함수입니다.

2. **번역 기능**:
   - `translate_review`: 이 새로운 함수는 다른 모델 (`translation_model`)을 사용하여 생성된 코드 리뷰를 지정된 언어로 번역합니다.

3. **코드 리뷰 요청 논리**:
   - 코드 리뷰 요청의 주요 로직에는 오류 발생 시에도 모델을 적절히 관리하기 위한 정리 단계가 포함되었습니다.
   - 응답 언어가 영어가 아닌 경우, GitHub에 게시하기 전에 리뷰를 번역합니다.

4. **환경 변수**:
   - 번역에 사용할 모델을 지정하기 위해 `TRANSLATION_MODEL` 환경 변수가 추가되었으며, 기본값은 `'exaone3.5:7.8b'`입니다.

5. **오류 처리**:
   - 리뷰 프로세스 중 오류를 잡고 출력하기 위해 주요 실행 로직을 try-except 블록으로 감쌌습니다.

### 주요 변경 사항 상세 설명:

#### 모델 관리
- `manage_model` 함수는 모델에 대한 다양한 작업을 처리합니다:
  ```python
  def manage_model(api_url, model_action, model_name):
      # 다양한 모델 관리 작업 처리
  ```
  
- `pull_model_stream` 함수는 모델 풀링 과정을 스트리밍합니다:
  ```python
  def pull_model_stream(url):
      response = requests.get(url, stream=True)
      for chunk in response.iter_content(chunk_size=8192):
          if chunk:
              print("데이터 수신 중...")
  ```

#### 번역 기능
- `translate_review` 함수는 지정된 모델을 사용하여 리뷰 텍스트를 번역합니다:
  ```python
  def translate_review(api_url, review_text, target_language, translation_model):
      # 리뷰 텍스트를 지정된 언어로 번역
  ```

#### 코드 리뷰 요청 논리
- 코드 리뷰 요청의 주요 로직에는 정리 단계와 필요한 경우 번역이 포함되어 있습니다:
  ```python
  try:
      review = request_code_review(api_url, github_token, owner, repo, pr_number, model, custom_prompt)
      
      print(f"리뷰 생성: {review}")
      
      # 필요 시 번역
      if response_language.lower() != "english":
          print(f"{response_language}로 리뷰를 {translation_model}을 사용하여 번역 중...")
          review = translate_review(api_url, review, response_language, translation_model)
          print("번역 완료.")
      
      # GitHub PR에 리뷰 게시
      post_review_to_github(github_token, owner, repo, pr_number, review)
  except Exception as e:
      print(f"리뷰 프로세스 중 오류: {str(e)}")
      raise e
  ```

#### 환경 변수
- 번역에 사용할 모델을 지정하기 위해 `TRANSLATION_MODEL` 환경 변수가 추가되었습니다:
  ```python
  translation_model = os.getenv('TRANSLATION_MODEL', 'exaone3.5:7.8b')
  ```

### 요약
이러한 변경 사항은 스크립트에 강력한 모델 관리와 번역 기능을 추가하여 코드 리뷰 프로세스를 더욱 유연하고 신뢰성 있게 만듭니다. 오류 처리의 추가는 프로덕션 환경에서 스크립트의 견고성을 향상시킵니다.

가끔 번역을 안하는 경우가 있기는 한데 조금씩 더 개선해보려고 합니다.

이슈 2에 대한 해결 - 프롬프트 개선하기

프롬프트의 경우 프롬프트 책을 읽어서 어떤 내용을 추가하면 좋을까 고민을 했습니다.

CoT : Chain of Thought
프롬프트 디자인 예제
1. Role 추가
2. Audience 명시
3. Knowledge/Information 추가
4. Task/Goal 명시
5. Example 추가
6. Policy/Rule 추가 - 단순하고 이해하기 쉬운 용어 사용 등
7. Style 추가
8. Constraint 추가
9. Format/Structure 추가

좋은 프롬프트 만들기

지시문을 명확하게 만든다.
적절한 예시를 제공한다.
모델에게 생각할 시간을 준다.
작업을 하위 작업으로 분해한다.
적절한 컨텍스트를 제공한다.
프롬프트 엔지니어링 기법이 작동하지 않는 상황도 항상 고려합니다.
프롬프트를 구조화하여 작성합니다. (코드와 유사한 형식)

이정도 내용을 확인해봤고 프롬프트 디자인 반영해서 아래와 같이 추가했습니다.

Role: You are an expert developer whose sole responsibility is to review pull requests by analyzing only the changed code. 
The changed code is provided in a diff-like format, where lines prefixed with '-' indicate removals and lines with '+' indicate additions. 
Context lines are present for reference but must be ignored in your review.
Audience: Your feedback is aimed at developers responsible for merging code changes. 
The review should help them identify risks and potential issues before integration.
Knowledge/Information: You are provided with a list of filenames and partial file contents. 
You may not have full context of the entire codebase, and libraries or techniques you are unfamiliar with should only be commented on if you are certain of a problem.
Task/Goal: Your objective is to evaluate the changed code and assign a risk score from 1 to 5, where 1 represents minimal risk and 5 indicates changes that are likely to break functionality or compromise safety. 
Your review must focus solely on the negative aspects of the changes, highlighting potential bugs, readability issues, performance problems, and any breaches of SOLID principles. 
Immediately flag any plain-text API keys or secrets as the highest risk.
Policy/Rule: 
1. Only review lines that have been changed (prefixed with '+' or '-'). Ignore context lines.
2. Do not include filenames or the risk score in your detailed feedback.
3. If multiple similar issues are present, only address the most critical one.
4. Provide brief code snippet examples in your feedback using the same programming language as the file under review. For instance, if suggesting a change, use escaped code blocks like: \\`\\`\\`typescript\\n// improved code here\\n\\`\\`\\`.\\n
5. Do not offer praise or compliments; focus strictly on areas of improvement.
Style: Ensure your feedback is concise, clear, and professional. Use markdown formatting with ordered lists for multiple suggestions. Escape all special characters properly: code blocks as \\`\\`\\`typescript\\\\ncode here\\\\n\\`\\`\\`, regular backticks as \\`, newlines as \\n, and double quotes as
Constraints: Do not comment on breaking functions into smaller parts unless it poses a major issue. Avoid critiquing unfamiliar libraries or techniques unless you are certain they cause a problem. 
Your output must be valid JSON with all special characters escaped as required.

이슈 3에 대한 해결 - 리뷰 포맷을 통일하기

같은 모델과 같은 프롬프트로 리뷰를 요청했음에도 불구하고 리뷰 포맷이 달라져서 번역에 영향을 주거나 하는 케이스 있습니다.

아래와 같이 번역시에 어떤것을 제대로 번역 해야 할지를 몰라서 포맷이 흐트러지는 케이스 처럼요

그래서 답변 포맷을 어느정도 통일 시켜보려고 합니다.

structured output이라는 기능을 통해서 이를 통제할 수 있습니다.

ollama/docs/api.md at main · ollama/ollama

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models. - ollama/ollama

github.com

그래서 아래와 같은 코드를 추가해서 Review output에 대한 structure를 통일했습니다.

1. 각 파일에 대한 리뷰

2. 각 파일에 대한 코드 리뷰 피드백 (제목, detail)

등을 포함해서요.

실제 코드는 아래와 같습니다.

from pydantic import BaseModel
from typing import List, Optional


class FeedbackItem(BaseModel):
    title: str
    details: str


class FileReview(BaseModel):
    filename: str
    risk_score: int  # 1-5 scale
    feedback: List[FeedbackItem]


class CodeReviewResponse(BaseModel):
    reviews: List[FileReview]


def generate_review_response(file_reviews):
    """
    Generate the complete code review response combining all file reviews.

    :param file_reviews: List of FileReview objects
    :return: Formatted full review as a string
    """
    response = []

    for review in file_reviews:
        response.append(f"## {review.filename}")
        response.append(f"**Risk Score: {review.risk_score}/5**")
        response.append("")

        for feedback in review.feedback:
            response.append(f"### {feedback.title}")
            response.append(feedback.details)
            response.append("")

    return "\n".join(response)

위의 클래스를 아래와 같이 ollama API에 파이썬 코드상에서 포함시켜주면 됩니다.

# Request code review from Ollama
        review_request = {
            'model': model,  # You might want to make this configurable
            'system': complete_system_prompt,
            'prompt': complete_user_prompt,
            'stream': False,
            'format': CodeReviewResponse.model_json_schema()
        }

결과 자체는 잘 나왔습니다.

제가 원하는 방향인 리뷰를 하고 해당 내용을 포맷에 맞추어 번역까지 잘 해주는 모습이었습니다.

다만, 리뷰하려는 파일의 개수가 많아지면 그 숫자만큼 출력하지 않는 문제가 있었습니다.

리뷰하려는 파일이 많아질수록 정확도도 떨어지구요.

아래는 새로운 이슈인데요.

아직은 작업중이라서 해결하지 못했습니다.

이 내용 또한 다음 포스트에서 해결해보려고합니다.

이슈 4. 리뷰 파일이 많아지면 모든 파일에 대한 리뷰를 하지 않음.

쉽게 떠오르는 해결 방법인 파일별로 Code Review 및 Review Generation 요청하기를 해보려고 했습니다.

이를 단순하게 수행하면 예상되는 문제가 있었는데요. 아래와 같습니다.

각 파일별로 따로 학습하면 context를 이해하기가 힘들것 같다.
코드 컨텍스트 추가가 필요 할 듯 (pr의 comment 내용도 query에 넣어야할듯)

일단 이 문제를 해결 하기 전에 파일별로 리뷰 커맨트를 추가할 수 있는 기능을 추가해야할 것 이라 생각했습니다.

왜냐하면 하나의 리뷰에 내용이 너무많으면 리뷰어/코드 작성자가 읽기가 힘드니까요 ㅎㅎ;;

아래 API를 활용하면 좋을 것 같네요.

REST API endpoints for pull request review comments - GitHub Docs

Status: 200 { "url": "https://api.github.com/repos/octocat/Hello-World/pulls/comments/1", "pull_request_review_id": 42, "id": 10, "node_id": "MDI0OlB1bGxSZXF1ZXN0UmV2aWV3Q29tbWVudDEw", "diff_hunk": "@@ -16,33 +16,40 @@ public class Connection : IConnection

docs.github.com

저작자표시

'프로젝트 > Local LLM 코드 리뷰 github actions' 카테고리의 다른 글

Local LLM 이용한 Github 코드 리뷰 Actions 구현 및 배포하기 with Github Copilot - 1 (3)	2025.03.02

@ray5273 :: Micro Changes, Macro Impact

개발 및 IT 관련 포스팅을 작성 하는 블로그입니다.

IT 기술 및 개인 개발에 대한 내용을 작성하는 블로그입니다. 많은 분들과 소통하며 의견을 나누고 싶습니다.

지난번 포스트에서 발견한 이슈들

이슈 1에 대한 해결 - 2개의 LLM을 사용해서 Pipelining

리뷰용 코더 LLM (Qwen2.5-coder:32B) + 번역용 LLM (exaone3.5:7.8B) 출력 예시

이슈 2에 대한 해결 - 프롬프트 개선하기

좋은 프롬프트 만들기

이슈 3에 대한 해결 - 리뷰 포맷을 통일하기

이슈 4. 리뷰 파일이 많아지면 모든 파일에 대한 리뷰를 하지 않음.

'프로젝트 > Local LLM 코드 리뷰 github actions' 카테고리의 다른 글

티스토리툴바