Skip to main content
  • UC Berkeley
  • College of Letters & Science
Berkeley University of California

URAP

Project Descriptions
Spring 2025

URAP Home Project Listings Application Contact

Enhancing Safety and Trustworthiness in LLM Agents

Dawn Song, Professor  
Electrical Engineering and Computer Science  

Applications for Spring 2025 are closed for this project.

Large Language Model (LLM) agents are increasingly deployed in diverse and critical applications, ranging from customer service to decision support systems. As these agents become more integral to various domains, ensuring their safety and trustworthiness is paramount to prevent misuse, unintended behaviors, and to build user confidence. This project focuses on developing methodologies and frameworks specifically tailored to enhance the robustness, reliability, and ethical alignment of LLM-based agents.

Key areas of exploration include:
- Robustness Against Adversarial Prompts: Designing techniques to make LLM agents resistant to malicious or misleading inputs that could induce erroneous or harmful responses. Prevent various prompt injection-based attacks.
- Alignment with User Intentions: Developing methods to ensure that LLM agents accurately understand and align their responses with user intentions and contextual needs.
- Ethical Frameworks for LLM Agents: Establishing guidelines and mechanisms to ensure that LLM agents adhere to ethical standards, respect user privacy, and align with societal values in their interactions.

Role: Participants will engage in both theoretical research and practical implementations, collaborating with interdisciplinary teams to address real-world challenges associated with the safety and trustworthiness of LLM agents.

Qualifications: - Strong background in machine learning and AI, with a focus on natural language processing.

- Familiarity with ethical considerations and frameworks in technology, particularly in AI.

- Proficient in programming languages such as Python.

- Experience with adversarial machine learning and working with LLMs (e.g., GPT, BERT) is a plus.

Day-to-day supervisor for this project: Tianneng Shi

Hours: 12 or more hours

 Engineering, Design & Technologies

Return to Project List

Office of Undergraduate Interdisciplinary Studies, Undergraduate Division
College of Letters & Science, University of California, Berkeley
Accessibility   Nondiscrimination   Privacy Policy