While the certification method for adversarial prompting is interesting, if the false positive rate for safe prompts is as high as the false negative rate for adversarial prompts, it's a dead letter. The history of technologies such as voice recognition has shown that technology which doesn't achieve close to 99% success rate for the vast majority of interactions (the real interactions which, in this case, do not involve adversarial prompts) won't be widely adopted.
I'm sure their method can be tuned. But I'd bet that it's unlikely to be able to achieve a <10% false negative rate for adversarial prompts while also achieving <1% false positive rate for safe prompts.
So, while this might be one method in a toolbox to address adversarial prompts, I expect that certification to meet the practical real-world needs would require the use a few different methods in parallel.
Amazing, congrats for cracking top 10 in such a short amount of time! It's incredible!
Thank you. Appreciate the support
Dev, congrats on getting into the top ten! Very cool.
Today I learned about the stripper index.
Thank you. The stripper index blew my mind as well
"She blew my nose and then she blew my mind" just came to mind. BRB, divergent thinking time!
Such a great update. Thank you.
Glad you liked it
While the certification method for adversarial prompting is interesting, if the false positive rate for safe prompts is as high as the false negative rate for adversarial prompts, it's a dead letter. The history of technologies such as voice recognition has shown that technology which doesn't achieve close to 99% success rate for the vast majority of interactions (the real interactions which, in this case, do not involve adversarial prompts) won't be widely adopted.
I'm sure their method can be tuned. But I'd bet that it's unlikely to be able to achieve a <10% false negative rate for adversarial prompts while also achieving <1% false positive rate for safe prompts.
So, while this might be one method in a toolbox to address adversarial prompts, I expect that certification to meet the practical real-world needs would require the use a few different methods in parallel.
I’d totally forgotten about the stripper index. It’s truly mindblowing.