AI Friends

thinking about how doing RL finetunes on reasoning models for cyber makes a lot of...

thinking about how doing RL finetunes on reasoning models for cyber makes a lot of sense. found this paper: https://arxiv.org/pdf/2406.05590 it also mentions the idea: ```This also presents opportunities for improving LLM reasoning capabilities through unsupervised learning or reinforcement learning, where models can attempt challenges repeatedly, with success serving as a signal for model improvement.``` according to chatgpt it hasn't been done yet, seems like an interesting research project
arxiv.org
LearningResearchResearch Paperarxiv.org