Safety & Security
thinking about how doing RL finetunes on reasoning models for cyber makes a lot of...
thinking about how doing RL finetunes on reasoning models for cyber makes a lot of sense.
found this paper: https://arxiv.org/pdf/2406.05590
it also mentions the idea:
```This also presents opportunities for improving LLM reasoning
capabilities through unsupervised learning or reinforcement learning, where models can attempt
challenges repeatedly, with success serving as a signal for model improvement.```
according to chatgpt it hasn't been done yet, seems like an interesting research project
LearningResearchResearch Paperarxiv.org