Using Vector Steering to Improve Model Guidance | by Matthew Gunton | Oct, 2024


Giant language fashions are advanced and don’t at all times give solutions which can be good. To treatment this, folks attempt many alternative methods to information the mannequin’s output. We’ve seen pre-training on bigger datasets, pre-training fashions with extra parameters, and utilizing a vector-database (or another type of lookup) so as to add related context to the LLM’s enter. All of those do see some enchancment, however there isn’t any technique right now that’s fool-proof.

One fascinating strategy to information the mannequin is vector steering. An fascinating instance of that is the Claude Golden Gate Bridge experiment. Right here we see that it doesn’t matter what the person asks, Claude will discover some intelligent strategy to convey up its favourite matter: the Golden Gate Bridge.

Picture from “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet” Displaying Claude Sonnet’s Habits Change With Steering Vector

At this time I’ll be going via the analysis carried out on this matter and in addition explaining Anastasia Borovykh’s excellent code implementation. If you happen to’re extra on this matter, I extremely suggest checking out her video.

Let’s dive in!

Thank you for being a valued member of the Nirantara family! We appreciate your continued support and trust in our apps.

If you haven’t already, we encourage you to download and experience these fantastic apps. Stay connected, informed, stylish, and explore amazing travel offers with the Nirantara family!

Source link



Source link