Enhancing Safety and Trustworthiness in LLM Agents
Dawn Song, Professor
Electrical Engineering and Computer Science
Applications for Spring 2025 are closed for this project.
Large Language Model (LLM) agents are increasingly deployed in diverse and critical applications, ranging from customer service to decision support systems. As these agents become more integral to various domains, ensuring their safety and trustworthiness is paramount to prevent misuse, unintended behaviors, and to build user confidence. This project focuses on developing methodologies and frameworks specifically tailored to enhance the robustness, reliability, and ethical alignment of LLM-based agents.
Key areas of exploration include:
- Robustness Against Adversarial Prompts: Designing techniques to make LLM agents resistant to malicious or misleading inputs that could induce erroneous or harmful responses. Prevent various prompt injection-based attacks.
- Alignment with User Intentions: Developing methods to ensure that LLM agents accurately understand and align their responses with user intentions and contextual needs.
- Ethical Frameworks for LLM Agents: Establishing guidelines and mechanisms to ensure that LLM agents adhere to ethical standards, respect user privacy, and align with societal values in their interactions.
Role: Participants will engage in both theoretical research and practical implementations, collaborating with interdisciplinary teams to address real-world challenges associated with the safety and trustworthiness of LLM agents.
Qualifications: - Strong background in machine learning and AI, with a focus on natural language processing.
- Familiarity with ethical considerations and frameworks in technology, particularly in AI.
- Proficient in programming languages such as Python.
- Experience with adversarial machine learning and working with LLMs (e.g., GPT, BERT) is a plus.
Day-to-day supervisor for this project: Tianneng Shi
Hours: 12 or more hours
Engineering, Design & Technologies