Discussion about this post

User's avatar
Logan Thorneloe's avatar

Reading back through this again and have some thoughts:

Can you elaborate more on this? "just don’t give it the capability to do that to begin with". I'm guessing this is a simplified way of saying: If you don't want it to be able to expose a certain piece of data, don't train it on that piece of data but just making sure I understand.

I find alignment super interesting because I don't think anyone (at the commercial scale at least) has done it well yet. You have examples like this where privacy is clearly an issue. There are also examples like Bard where it's doesn't seem to leak private info, but it also hinders the core functionality because it constantly tells me it doesn't have access to things of mine it should have access to. As you kind of touched on toward the end, it makes me wonder how alignment-as-a-service will play out. Will it be helpful? Will it work? Will it scale?

The memorization definition makes me realize just how much we don't understand about evaluating LLMs. I feel like there should be a better method of quantifying extracting training data, but I can't think of one myself.

Expand full comment
Milan Reichl's avatar

The DeepMind research on extracting training data from ChatGPT is a real eye-opener. It challenges our assumptions about AI security and the effectiveness of alignment in preventing data leaks. This discovery is a reminder that in AI development, sometimes the simplest solution—limiting certain capabilities from the start—is more effective than trying to fine-tune our way out of potential problems. The nuances in AI behavior, especially in large language models, underscore the importance of a cautious and critical approach in AI research and application.

Expand full comment
5 more comments...

No posts