Curriculum Vitae

1. Who am I?

Name: Hyunwoong Ko (Kevin Ko)
Birth: 1995.09.12
Job: Software Engineer / AI Researcher

2. Skill Sets

Languages
- Python: Excellent
- Java: Excellent
- C++: Available
Topics
- Language model pre-training
- Language model alignment
- Language model optimization
- Prompt programming
- Korean language processing
- Data processing and crawling
- Distributed programming
- DevOps / MLOps
- Backend development
Libraries
- Language model training: Pytorch, Transformers, Megatron-LM, DeepSpeed, TRL and OpenRLHF
- Language model evaluation: LM-evaluation-harness
- Language model optimization: Torchscript, Triton inference server, ONNX and TensorRT
- DevOps: Docker, Kubernetes, ECS, EKS, GKE, Github Action and Code Pipeline
- Backend development: Flask, FastAPI and Spring Boot

3. Biography

[`2014.03 ~ 2020.08`] BS in Software Engineering, Chonbuk National University

GPA
- 4.15 (major) / 4.07 (total)
- 1st ranked.
Opensources
- Transformer: PyTorch implementation of Attention Is All You Need
- KoChat: The first Korean opensource chatbot framework
Awards
ETC
- I founded an AI robot startup for autistic children.

[`2019.08 ~ 2020.08`] Undergraduate Researcher at Autonomous Robot Lab, Chonbuk National University

Researches
- I conducted a research about citrus pest and disease recognition.
- I conducted a research about autonomous strabismus recognition.
ETC
- I was a lecturer at 2020 Data Campus School held by Korea Data Agency.

[`2020.08 ~ 2021.02`] Machine Learning Engineer at Kakao Brain

BrainQA
- I researched Korean quiz generation model.
- This model was integrated to Pororo library
Pororo
- Pororo is an opensource multilingual NLP toolkit.
- I developed almost all generative models in Pororo such as Question Generation, Text Summarization and Machine Translation.
ETC
- I hosted Jiphyeonjeon, a natural language processing paper review group.

[`2021.03 ~ 2023.05`] Co-Founder & Machine Learning Engineer at TUNiB

Coco & Mas
- Coco & Mas are Korean persona chatbots which have dog persona.
- We collected Korean chatbot dataset with crowdsourcers, and tested crowdsourcing methods to improve data quality and yield.
- I pre-trained 1.3B Korean model to create these chatbots and fine-tune the models using the data we've collected.
- I researched the impact of pre-training and continual learning techniques.
- I deployed these models with Triton Inference Server and AWS ECS cluster.
BLOONY
- BLOONY is an English chatbot powered by OpenAI GPT
- I developed backend server using Java Spring Boot
- I researched instruction following ability of OpenAI GPT3, and found innovate prompting methodology. In that time there's no instruction fine-tuned model, so the instruction following ability of the models was not very good. The methodology I made at the time became famous a year later as a technique called COT (Chain of thought).
- I deployed overall services using AWS ECS cluster.
TUNiBridge
- TUNiBridge is a service that provides various natural language models API.
- I improved safety check module's TPS from 7 to 240 using NVIDIA TensorRT, Triton Inference Server and AWS ECS.
- TUNiB N행시 service had been very popular in the Korean Internet community for about two weeks, with about 2 million requests per day. I improved the existing system to make the service more desirable.
- I designed and developed overall system and deployed 20+ APIs.
Opensources
- OpenChat: Easy to use opensource chatting framework via neural networks
- Kss: The most famous Korean sentence segmentation toolkit.
- Pecab: Pure python morpheme analyzer based on Mecab-ko-dic.
- Large-scale LM Tutorials: Large-scale language modeling tutorials with PyTorch
- Parallelformers: Easy-to-use transformer model deployment toolkit based on Hugging Face Transformers. The core mechanism of this library was integrated to Microsoft DeepSpeed Inference.
Awards
- 1st place in Korea AI competetion held by Ministry of Science and ICT
- 1st place in Korean Document Abstractive Summarization Competition held by Dacon (username: gusdnd852)
ETC
- I became a manager of Chatbot Korea, Korean facebook group for chatbot research.

[`2022.02 ~ 2023.09`] Lead Machine Learning Scientist at EleutherAI

Polyglot
- Polyglot is EleutherAI's multilingual modeling project.
- We collected 1.2 TB of korean texts for language model pre-training.
- We released 1B, 3B, 6B, 13B models trained on large-scale Korean dataset.
- We publihed a technical report of polyglot-ko
- I was managing all members in Polyglot team and developing dataset preprocessing, training, evaluation pipelines.
Japanese StableLM
- We released Japanese StableLM models by co-working StabilityAI Japan.
- We released 7B foundation language model and instruction fine-tuned model.
OSLO
- OSLO is a framework that provides various GPU based optimization technologies for large-scale modeling.
- Features like 3D parallelism and kernel fusion which could be useful when training a large model are the key features.
- I was developing Tensor Parallelism, Pipeline Parallelism, Kernel Fusion, Activation Checkpointing engines.

[`2023.05 ~ present`] Machine Learning Researcher at Kakao Brain

Foundation Model Pre-training
- I pre-trained KoGPT2, the foundation model at KakaoBrain.
- Among its various sizes, the largest one is over 60B.
Long Context Fine-tuning
- I conducted research to extend KoGPT2 for long contexts.
- I tested various existing techniques and internal ideas.
- I successfully extended the model from 4k length to 32k and 65k lengths.
Multimodal Fine-tuning
- I conducted research on extending KoGPT2 for multimodal tasks, primarily text-image input and text output.
- I combined vision encoders with auto-regressive language models, effectively handling image inputs.
Codebase Management
- I am managing various internal codebases.
- Notably, Megatron-LM for pre-training and OpenRLHF for alignment
ETC
- I developed GKE (Google Kubernetes Engine) based evaluation pipelines.
- I improved data deduplication pipeline, its speed has doubled by the improvement.
- I improved monitoring / planning methods for language model pre-training.

4. How to contact?

Github : https://github.com/hyunwoongko
Twitter: https://twitter.com/hyunwoongko
Facebook : https://www.facebook.com/hyunwoongko
Instagram: https://www.instagram.com/hyunwoong.ko
LinkedIn : https://www.linkedin.com/in/hyunwoongko

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RESUME.md

RESUME.md

Curriculum Vitae

1. Who am I?

2. Skill Sets

3. Biography

[`2014.03 ~ 2020.08`] BS in Software Engineering, Chonbuk National University

[`2019.08 ~ 2020.08`] Undergraduate Researcher at Autonomous Robot Lab, Chonbuk National University

[`2020.08 ~ 2021.02`] Machine Learning Engineer at Kakao Brain

[`2021.03 ~ 2023.05`] Co-Founder & Machine Learning Engineer at TUNiB

[`2022.02 ~ 2023.09`] Lead Machine Learning Scientist at EleutherAI

[`2023.05 ~ present`] Machine Learning Researcher at Kakao Brain

4. How to contact?

Files

RESUME.md

Latest commit

History

RESUME.md

File metadata and controls

Curriculum Vitae

1. Who am I?

2. Skill Sets

3. Biography

[2014.03 ~ 2020.08] BS in Software Engineering, Chonbuk National University

[2019.08 ~ 2020.08] Undergraduate Researcher at Autonomous Robot Lab, Chonbuk National University

[2020.08 ~ 2021.02] Machine Learning Engineer at Kakao Brain

[2021.03 ~ 2023.05] Co-Founder & Machine Learning Engineer at TUNiB

[2022.02 ~ 2023.09] Lead Machine Learning Scientist at EleutherAI

[2023.05 ~ present] Machine Learning Researcher at Kakao Brain

4. How to contact?

[`2014.03 ~ 2020.08`] BS in Software Engineering, Chonbuk National University

[`2019.08 ~ 2020.08`] Undergraduate Researcher at Autonomous Robot Lab, Chonbuk National University

[`2020.08 ~ 2021.02`] Machine Learning Engineer at Kakao Brain

[`2021.03 ~ 2023.05`] Co-Founder & Machine Learning Engineer at TUNiB

[`2022.02 ~ 2023.09`] Lead Machine Learning Scientist at EleutherAI

[`2023.05 ~ present`] Machine Learning Researcher at Kakao Brain