From Attacks to Security-Enhancing Insights in NLP Models

Name: From Attacks to Security-Enhancing Insights in NLP Models
Start: 2026-03-12T11:00:00+02:00
End: 2026-03-12T12:00:00+02:00
Location: ביניין עמיר

מרץ 12 @ 11:00 am - 12:00 pm

Abstract:

Recent advances in natural language processing (NLP) have given rise to transformative models, including large language models (LLMs) and text retrievers. Still, critical concerns remain regarding the security of these models: chiefly, LLMs can be jailbroken and misused (e.g., to launch cyberattacks), and text retrievers in search applications can be manipulated to prioritize adversary-chosen content. In this talk, I will present our recent efforts toward making LLMs and text retrievers more secure. In particular, I will show how potent attacks can provide explanations for models' vulnerabilities, which, in turn, enable us to enhance security. Crucially, I will also demonstrate how our insights can inform the design of even stronger attacks, establishing a cycle that guides continuous model improvements.

Based on joint work with Matan Ben-Tov and Mor Geva.

Bio:

Mahmood Sharif is a senior lecturer at the Blavatnik School of Computer Science at Tel Aviv University, where he directs the privacy, learning, usability, and security (PLUS) group—a research group primarily working at the intersections of computer security and privacy with machine learning, specifically adversarial machine learning, and with human factors. Mahmood obtained his Ph.D. from Carnegie Mellon University, where he was affiliated with the CyLab Security and Privacy Institute. Before joining Tel Aviv University, Mahmood was a postdoctoral researcher in the VMware Research Group and a principal research engineer in the NortonLifeLock Research Group. His work has been recognized by multiple awards, including an Intel Rising Star Faculty award and a Maof prize for outstanding new faculty.

Details

Date: מרץ 12
Time:
11:00 am - 12:00 pm
Event Category: סמינרים מדעי המחשב

Venue

ביניין עמיר
הנמל 65, חדר 507
חיפה, North Israel + Google Map

הפקולטה למדעי המחשב והמידע

אוניברסיטת חיפה

From Attacks to Security-Enhancing Insights in NLP Models

Details

Venue