-
My current CPU is a Ryzen 7 1700x, GPU Radeon 5700 XT 8GB with RAM 32GB, I'm running Tabby through Vulkan and I mainly code in Python and JavaScript. I tried several models and beetwen low latency and answer accuracy the ones that worked better were StarCoder-3B and DeepSeekCoder-1.3B. I looked at the benchmarks of each model in their repo and I saw that DeepSeekCoder outperforms StarCode but in my tests (creating an example JS file that can handle axios API calls and data to CSV conversion) StarCoder gave me more accurate suggestions but being a bit slower. I also checked the ML leaderboards provided by Tabby. I saw that there are plans to add StarCoder2-3B to the Tabby model registry (not yet available when I checked). I'm relatively new to using AI models to help me work so maybe that can also be one of the reasons of my doubts. Also, I tend to prefers OSS, I checked both licenses and seems similar to me. I don't know if I should trust the benchmarks and use DeepSeekCoder-1.3B or trust what I felt in my tests and stick to StarCoder-3B? Or change to StarCoder2-3B when it becomes available? For chat I'm currently using WizardCoder-3B. Benchmarks I checked:
Thanks.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
LLM evaluation is actually the most mystic part of the entire ecosystem, a combination of scientific and intuitive feelings. :) In general, we receive pretty good feedback regarding the model performance of DeepSeekCoder-6.7B (which tops the leaderboard). We also see some mixed feedback between DeepSeekCoder-1.3B and StarCoder-3B. If you feel that a particular model runs better in your environment but performs poorly on the leaderboard, it's likely that your working setup is more accustomed to that model's training environment. Anyway, our suggestion is to use the leaderboard as a reference and stick with the model that you feel is the best. Lastly, FYI for enterprise or team-wise use cases, we do offer consultative support on fine-tuning models for better performance. |
Beta Was this translation helpful? Give feedback.
LLM evaluation is actually the most mystic part of the entire ecosystem, a combination of scientific and intuitive feelings. :)
In general, we receive pretty good feedback regarding the model performance of DeepSeekCoder-6.7B (which tops the leaderboard). We also see some mixed feedback between DeepSeekCoder-1.3B and StarCoder-3B. If you feel that a particular model runs better in your environment but performs poorly on the leaderboard, it's likely that your working setup is more accustomed to that model's training environment.
Anyway, our suggestion is to use the leaderboard as a reference and stick with the model that you feel is the best. Lastly, FYI for enterprise or team-wise use cas…