We investigate how to build generative AI systems that can produce and manipulate content across a wide range of modalities, including text, images, audio, and video. Generative models hold enormous potential for creativity, communication, and problem-solving—but realizing that potential requires addressing key challenges around coherence, controllability, and safety. Our research explores how to design models that generate content with consistency and fidelity, follow user intent accurately, and remain reliable under diverse and complex prompts. We are particularly interested in the foundations of generative alignment: how to ensure that model outputs are not only plausible and creative, but also grounded, trustworthy, and appropriate for real-world use.