Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meeting Minutes for Week 3 #51

Closed
10 of 12 tasks
SoloSynth1 opened this issue May 10, 2024 · 6 comments
Closed
10 of 12 tasks

Meeting Minutes for Week 3 #51

SoloSynth1 opened this issue May 10, 2024 · 6 comments
Assignees
Labels
admin meeting related

Comments

@JohnShiuMK JohnShiuMK changed the title Sprint Planning - 2024/05/13 Week 3 Meeting Minutes for Week 3 May 13, 2024
@JohnShiuMK JohnShiuMK added the admin meeting related label May 13, 2024
@tonyshumlh
Copy link
Collaborator

tonyshumlh commented May 15, 2024

Mentor Meeting - 2024/05/15 Week 3

  • Agenda: 1) partner presentation on Friday; 2) API usage; 3) overview on how to move from checklist into application & pipeline
  • Leader Update
    • Share the message of inviting collaboration for Checklist
      • CSV-YAML converter to convert YAML into CSV
    • Consolidated different research paper
    • Engineered the prompt for checklist
  • Simon Comment
  • Should allow users to add/edit the checklist for their own use case.
  • CSV should be the place to make the editing as it is easily editable. YAML is prepared for the application to consume
  • Google Sheet allows collaboration. Invite Tiffany/Simon to edit. (Refer to Yingzi sheet for Simon comment https://docs.google.com/spreadsheets/d/16mo6pp76_-msJt93iRhHHv24fQKNzVbENQJXdP-7wrQ/edit#gid=406022926)
  • Consolidate the multiple copies into a single source for the checklist. Preferably in CSV
  • Checklist Item Reference: Some common, general checklist might not require reference. Use DOI reference in the Checklist. Own key -> First Author Last Name + Year + some keyword
  • Cognitive load: Split the System Update into a few slides, 1 slide to add 1 method/function
  • Consistency: Give some evidence to user to address the current consistency lvl when user use the application
  • Build MVP for code analyzer and revise it later
  • Try to switch to GPT 3.5 Turbo and move the completeness score calculation to scripts instead of LLM
  • Do better prompt engineering in GPT 3.5 Turbo, and move and test the prompt in GPT 4o. If GPT 3.5 Turbo performance is good, stick to GPT 3.5 Turbo
  • Get MVP done with 4-5 checklist items (4 straightforward, 1 complicated) before Friday (May 17th)
  • Run consistency evaluation on the MVP with GPT 3.5 Turbo

@tonyshumlh
Copy link
Collaborator

Possible Issue for Checklist:

  • Convenience vs Version Control

Possible Issue for the Application:

  • Error handling on the variation of LLM response (e.g. some JSON might not be easily parsed)
  • Error handling on the truncation of LLM response (e.g. LLM might miss to output the evaluation test of part of the checklist items)

@tonyshumlh
Copy link
Collaborator

tonyshumlh commented May 17, 2024

Partner Meeting - 2024/05/17 Week 3

  • Comment
  • If a leader/teacher evaluates, they might not know the path of test function/file. Better if the report provides the path and line number for easier review -> extract line number with scripts
  • Need more information for partial/non-satisfied checklist items, e.g. add function name and line number per each test file
  • (Good to Have) output HTML report for more detailed report than report in CLI -> refer to DSCI522 pytest coverage session
  • Need error handling when there is no test file / function, e.g. identify edge case, raise error, give 0 scores and give the checklist in human readable format
  • Use Regression to evaluate if a parameter (X) is associated with consistency - completeness score (Y)
  • For consistency, We can do 1) prompt engineering on checklist, 2) show the explanation and/or uncertainty
  • We can come back to Consistency after functionality development and prompt engineering
  • Depth vs Breadth on the system: 1) If checklist-oriented, focus on 1 or a few repo and revise and enrich the checklist; 2) focus on certain test area and apply to multiple repos; focus more on 2) in system dev
  • Checklist format: confirmed to use CSV, then enable to convert into QMD/HTML for view
  • For Product 1.0, it is important for user to read the checklist instead of editing it
  • Might need a converter to convert code checklist to human readable checklist (HTML,PDF)
  • CSV can be embedded into QMD/HTML file, user can use Pandas to make table
  • Open Github Issue for Tiffany to add/review the (3-5) checklist item + Slack message, e.g. review N-th item in the website
  • For Checklist Citation, can put "General Knowledge" for common sense items and Tiffany will review it

@tonyshumlh
Copy link
Collaborator

tonyshumlh commented May 17, 2024

  • Checklist:
    • Produce a complete checklist (e.g. 20-30 items), we focus on and refine the part that is used in the application (e.g. 4-5 items) and ask Tiffany for review
    • Concentrate into 2 - 3 area (Data, Model pre-train, Model post-train) with 2 - 3 items each
    • Convert the current files from YAML to CSV
  • System:
    • Extend the checklist's loader functionality to export the checklist in HTML/PDF format
    • Features:
      • Means to validate LLM output (i.e. Can I be parse the output? Is it truncated? (not covering all checklist items))
      • Retry logic if output failed to be validated
      • Add functionality to parse markdown
      • Make sure the report to have line numbers, file paths, and function name
      • (Optional) Export report into HTML format
      • (Optional) Add multiple calling abilities
      • (Optional) Develop a consistency tool
      • (Optional) Make LLM to output test cases specification
    • Bugfix:
      • Handle repos with no test cases
      • (Optional) Handle repos which are not related to DS

@JohnShiuMK
Copy link
Collaborator

JohnShiuMK commented May 20, 2024

Partner Meeting Minutes - May 17, 2024

Attendees: John, Orix, Tiffany (Partner), Tony, Yingzi

Key Points Discussed:
System for Researcher Persona

  1. Evaluation Report Output:
    • Include the path and line number of related functions for each checklist item
    • Provide more elaboration behind partial/non-satisfied checklist items (from a teacher’s perspective)
    • Render the report (including score, summary, and breakdown) into HTML format
    • Refer to examples from DSCI522 Pytest coverage in HTML format
  2. Edge Case Handling:
    • Example 1: If a repository has no test files or functions, the system may output a message like "there are no test cases in this repo."
    • Example 2: Detect and handle cases where the project is not related to Machine Learning
  3. Focused Development:
    • Focus on 3-5 checklist items
    • (First) Build the system in depth based on these items using one repository (lightfm)
    • (Then) Apply the system to 4-5 other repositories.

System Evaluation for Ourselves (System Developer Persona)

  1. "Completeness Score" Consistency Metrics:
    • Examine the Consistency improvement using a regression model (Response: Y = Consistency; Explanatory variable: X = the System Change)
    • Continue working on prompt engineering to minimize uncertainty; and/or,
    • Consider outputting the uncertainty of the Score/Evaluation along with the report and explanation

Checklist for Leader Persona

  1. Checklist Format and Visualization:
    • Confirmed to use CSV format of the checklist as the single source of truth
    • As a version 1.0 of the System, we will focus on users reading the checklist instead of editing it
    • Convert the checklist CSV(s) into a human-readable HTML format using Pandoc or Quarto to facilitate visualization
  2. Checklist Collaboration:
    • Use Github issues + Slack for communication with Tiffany
    • Focus each communication on 3-5 checklist items or one area of items instead of the entire checklist
    • For checklist citation, use "General Knowledge" for common sense items. Tiffany will review these citations

@JohnShiuMK
Copy link
Collaborator

Proceed to #72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
admin meeting related
Projects
None yet
Development

No branches or pull requests

4 participants