9 Comments

Amazing, congrats for cracking top 10 in such a short amount of time! It's incredible!

Expand full comment

Thank you. Appreciate the support

Expand full comment

Dev, congrats on getting into the top ten! Very cool.

Today I learned about the stripper index.

Expand full comment

Thank you. The stripper index blew my mind as well

Expand full comment

"She blew my nose and then she blew my mind" just came to mind. BRB, divergent thinking time!

Expand full comment

Such a great update. Thank you.

Expand full comment

Glad you liked it

Expand full comment

While the certification method for adversarial prompting is interesting, if the false positive rate for safe prompts is as high as the false negative rate for adversarial prompts, it's a dead letter. The history of technologies such as voice recognition has shown that technology which doesn't achieve close to 99% success rate for the vast majority of interactions (the real interactions which, in this case, do not involve adversarial prompts) won't be widely adopted.

I'm sure their method can be tuned. But I'd bet that it's unlikely to be able to achieve a <10% false negative rate for adversarial prompts while also achieving <1% false positive rate for safe prompts.

So, while this might be one method in a toolbox to address adversarial prompts, I expect that certification to meet the practical real-world needs would require the use a few different methods in parallel.

Expand full comment

I’d totally forgotten about the stripper index. It’s truly mindblowing.

Expand full comment