4 Comments

I have mentioned this excellent write-up on moral alignment in my newsletter 'AI For Real' this week. Here's the link https://aiforreal.substack.com/p/are-you-ai-positive-or-ai-negative

Expand full comment

This is pretty thorough Devansh, well done.

I would like to say that moral alignment is crucial, yet 11% of participants who did not feel the moral graph was fair is a huge number! Think about it, 11% alone could be "all Indian men" or "all Latino men" for any set of questions (because morality is so complex that it will never be "just one issue or question"). We know better than that, however, and the likelihood of such a situation arising (i.e., one in which a single population is targeted solely on the basis of its ethnicity) is extremely low. However, it does point to the fact that it is morally difficult to reach a consensus among populations.

I think we should still make efforts to elicit moral graphs. On the other hand, I do not think it is moral or ethical to appease end users with unscientific and potentially immoral or unethical beliefs. This means that AGI companies will have to "steer" certain users toward less extreme views, which can itself be considered unethical (but not immoral), and I am willing for them to do this with the most extreme users.

In summary, moral graphs seem to be an intuitive and worthwhile area of research in AI safety, it helps to realize a way to a cohesive and potentially coherent superset of values based on morality and ethics, which is very important for future machine learning models. On the other hand, there are still issues of representativeness, such as the 11% of users who felt that the generated moral graph was not fair. We need to avoid that 11% corresponding to a single-class target on metrics such as ethnicity, socioeconomic status, religious beliefs.

Conversely, we need to recognize that there is a large group of users with unethical and immoral beliefs, and AGI companies cannot "compromise" or " accommodate" these types of users, and whether the end result is to alienate them or steer them toward a more reasoned view, they should be OK with that.

Is my view reasonable or itself unethical or immoral? I am willing to change my mind and views as I learn and improve as a human.

Expand full comment