TechnologIST Talks

The Coming Age of Agentic AI

Episode Summary

“The goal of AGI is to create solutions that humans haven't thought of. But to make an agent safe, we need to be able to anticipate all solutions that the system would think of.” - Dr. Margaret Mitchell IST CEO Philip Reiner sits down with Dr. Margaret Mitchell, Chief Ethics Scientist at Hugging Face. Margaret walks us through her research on machine learning and ethics informed AI development, and what she thinks is next for the industry.

Episode Notes

In this episode of TechnologIST Talks, IST CEO Philip Reiner is joined by Dr. Margaret Mitchell, a computer scientist and researcher focused on machine learning and ethics informed AI development. She currently serves as Chief Ethics Scientist at Hugging Face, where she studies ML data processing, responsible AI development, and AI ethics. In her previous role at Google, she founded and co-led Google's Ethical AI group, which seeks to advance foundational AI ethics research and operationalize AI ethics internally at Google. With Hugging Face, Margaret helps to contribute to IST’s AI Risk Reduction Initiative Working Group work developing technical- and policy-oriented risk mitigation strategies associated with AI foundation models. 

Philip and Margaret sat down to discuss agentic, autonomous, and transparent models – and the pathway to truly secure AI. 

“I think that we can do a lot of really great work with AI by having rigorous data curation that corresponds to specific kinds of use cases, and building models based on that,” Margaret said. 

What’s the current state of AI agents in the marketplace? How autonomous are so-called autonomous models? And once you’ve identified a model’s vulnerabilities, how do you go about protecting against unauthorized actions? Join us for this and more on this episode of TechnologIST Talks. 

Learn more about IST: https://securityandtechnology.org/

Episode Transcription

Philip: Technology revolutionizes the way we live, but insecure, negligent, or exploitative technological advancements are threatening global stability and security as we speak. Welcome to TechnologIST Talks, a podcast from the 501(c)3 nonprofit the Institute for Security. TechnologIST Talks features conversations with technology and policy leaders at the forefront of tackling these challenges.

I’m Philip Reiner, CEO at the Institute for Security and Technology and your host today.

I’m joined in this episode by Dr. Margaret Mitchell, Chief Ethics Scientist at Hugging Face. Margaret is a computer scientist studying machine learning and the development of responsible, ethics-informed AI models. She is the founder and previous co-leader of Google's Ethical AI group, which seeks to advance foundational AI ethics research and operationalize AI ethics internally at Google. She was also named one of TIME’s 100 Most Influential People in 2023. 

Hugging Face is a member of IST’s AI Risk Reduction Initiative Working Group, which works to develop technical- and policy-oriented risk mitigation strategies associated with AI foundation models.

Margaret and I sat down to discuss recent developments in AI and where she believes the technology industry should be turning its attention as AI models proliferate. 

I see it more as a focus of attention, and changing the focus of attention and seeing where we really need innovation. And where we really need innovation right now is around guardrails and, you know, security, handling potential privacy breaches, that kind of thing. 

Because the way these are currently being developed and deployed, is sort of not seriously dealing with the fact that we don't have corresponding advancements in security and privacy to handle foreseeable negative outcomes.

What’s the current state of AI agents in the marketplace? How autonomous are so-called autonomous models? And once you’ve identified a model’s vulnerabilities, how do you go about protecting against unauthorized actions? 

Join us for this and more on this episode of TechnologIST Talks.

Philip: Welcome Meg. 

Margaret: Great. Thank you. Yeah, thank you for having me here. 

Philip: It's really a pleasure. I thought maybe we could kick the conversation off just asking a little bit about your role at Hugging Face, and kind of how you came to be there, and how things have been going.

Margaret: Yeah. So I have a role that we created, called Chief Ethics Scientist, where the idea was that you should be a computer scientist, but then also work to translate human values to what the company is doing and how the company communicates publicly as well. So I come to it from a background in computer science, but I've worked at Microsoft Research and at Google, thinking about how human values come to bear on the technology that we're developing in the tech industry, and kind of wanted to translate that expertise to Hugging Face where I could, you know, do everything from provide advice and guidance on how things were prioritized to fill in some gaps where other people weren't developing technology that would be really important for people. So, for example, like measuring efficacy, providing water mark capabilities, that kind of thing. As well as helping with communication. 

Philip: The nature of that conversation around ethics is really, I find in a lot of the conversations that I'm in, is increasingly part of those concerns. Have you seen it take root on the technical side? 

Margaret: Yeah, yeah, so I think first it's useful for me to kind of clarify that ethics doesn't necessarily have to just be about concerns. I worry that when people are thinking about harms and risks and concerns, that becomes an ethics discussion.

But when they're thinking about benefits and positive impact, that's not an ethics discussion. I think what ends up happening is that people who think about ethics are sort of filling out the space of considerations, and there's always a lot of people talking about all the benefits and hyping things.

And so when you're thinking about ethics, you often times end up being the one having to bring in the bad news 'cause no one else is doing it, right? But one of the things that I try to do generally and within my role at Hugging Face is like, have a full picture. So there are benefits from technology and stuff I've worked on in the past, and I really like to highlight things that people are doing well that really speak to different human values. But then there's also concerns and stuff as well, which often is the realm of efficacy people to navigate and communicate about and try to do cultural shifts.

I think one of the difficulties there, like you mentioned culture, that is one of the problems, but another one is market dynamics. And that's something that people who work in the tech industry know very well that you might wanna do the responsible thing, but the market, the user base will go to a competitor if you are being slow or something like that.

And so now you can't even play in the AI space because you've marginalized yourself out. So there's a cultural issue and I work on that cultural issue. There's also a market dynamics issue, which is much harder to kind of change, because supply and demand and consumer demand sort of is what it is, you know, within like, capitalist systems.

So I do try and inform the public generally because that shapes market dynamics, as well as working on ways to incentivize responsible practices within the culture. So one of the things that has really taken off in the past three years is work I had previously done on model cards, which are transparency artifacts that document how the model is meant to be used, how well the technology actually works, what the intended uses are, what's out of scope, that sort of thing. Previously, people would release machine learning models and provide, like, no information on it. And so people just assume that it works for everything. And so one of the things I had done was sort of work within tech culture to figure out how, can I incentivize people to think critically about the downstream use cases? That sort of thing. And one of the ways to do that was to create an artifact, because if you have an artifact, you have a deliverable. You have a launch, that sort of thing. So that plays within tech culture well, while also incentivizing more responsible long-term thinking about the technology.

And that's taken off pretty well. You see model cards all, all over the place now. 

Philip: It very much has become part and parcel of how folks are thinking about assessments, evals, and, and what have you. Absolutely. It's just a given almost at this point. Alright, so I think that's a good way to kind of get us started here.

And what we are really looking to dig into is this question around AI agents. And you and some colleagues recently published this piece, Fully Autonomous AI Agents Should Not Be Developed. But I thought maybe for those who are listening in, I'm thinking about the audience that we've got out there, and before delving into the substance of that paper, I really did want to just pause for a moment with you, because this is something you spend so much time with, and ask about agents themselves, what is it that we are actually talking about?

How good are they? Is this a real thing? Or is it really kind of just one of those hype cycles that we're in that at the end of this year, everyone will look back and be like, yeah, that agent thing really didn't come to fruition.

Margaret: I definitely first started diving into AI agents with a perspective that this was just hype. Because it was time for some new, exciting thing to happen. Chat GPT had been around for like two years. There's always these cycles, so it was sort of like, what's the next big thing? 

And that corresponded to when AI agents first started really flourishing in discussions, coming out in products, that kind of thing. But actually AI agents are a significant step forward, in language model development in particular, but there's also been sort of a stream of work on autonomous vehicles and that kind of thing, which also have these sort of agentic capabilities. So like a quick TLDR for what an AI agent is. What is that? It's essentially like a chatbot, except they can take actions outside of the chat window. If you want, like the most simple way of understanding what this shift has been. We had chat bots. And now we have chat bots that can do things outside of the chat window. So that's everything from, like, doing a web search to creating documents for you to, organizing meetings and that kind of thing. But that said, there are other types of agents that aren't fundamentally based on language models, which power chatbots. So that's where, like robots, autonomous vehicles and stuff come in a bit. But either way, the really big uptick in AI agents has been driven by the increase in large language model capability, and in particular around code. So language models can generate code. That code can be used in systems, and when you have a system that can start generating code, now you start to have the option of having it be agentic in some way, of essentially creating actions and creating tool decisions, and this sort of thing that has not been explicitly designed by a human. 

Philip: Say a little bit more about that, in terms of how that's different. 

Margaret: So previously, essentially developers had to account for every type of action that a system might take.

So in any sort of user facing system, you'd have to, like, classify the different kinds of requests that would come in, figure out how that got routed to other parts of the system. So, if they want to form, then you have to have that sort of sequence already pre-planned as something that might happen.

So you essentially had to just work through all of the ways that user interactions could progress and then create the code to accommodate all those different kinds of user interactions. Now we're in a position where we can have increasingly less code written by the person, and more code written by the agent.

And the paper that you alluded to is sort of getting into how this means increasing autonomy. But essentially, you can have situations where the agent determines the basic program flow, so which way to go, depending on some user input without a developer having to program that. It can go all the way to creating and executing its own code.

So this is like the fully autonomous approach, where the code is written by the agent system, written by the model, and executed by the model. And so you could say something like, you know, program me a game, and it would create the game and make it immediately available for you to play with.

Philip: I think what's been interesting as I've tried to understand it myself, but then also try to help make sense of it for others that we engage with, particularly in the national security space. How does that actually play out in terms of its ability to do various things? And I think we’ll get into this in a bit more detail. 

We think a lot about risk at IST. And so the way you characterized it, in short, from your recent MIT Tech Review article was, the more autonomous an AI system is, the more we seed human control. So maybe talk a little bit about what that means.

Margaret: I think it's helpful to contextualize that with respect to the full spectrum of autonomy. First as some background, there's been discussions within AI about whether you can say something is an agent or not. Is this an agent or not? What that kind of misses is that the way that agents are being developed and used, has essentially more and more options for the agent to create new content. So it's not a binary, either this is an agent or it's not an agent, so much as it's a spectrum of increasing autonomy. And so we've done a bit of a deep dive to sort of pick apart what is that increasing autonomy, what are some of the points within that space that would mark a shift to sort of a next level of autonomy.

And so, if something just generates a response, like a chat bot, that would be something like agent level zero. No autonomy. It's just creating a response based on human specifications, based on the prompt from the human. That's all human controlled, even if the content itself isn't controlled.

The whole functionality and action is human controlled. The next stage, the next sort of level worth noting in the spectrum, is when models determine the basic program flow. So that's like with routers, where a human writes out how different functions are done and then the system decides when, based on various inputs from the user.

The next step up in autonomy is when the model determines how functions are executed. So that's in situations where agents fill in what's called tool calls. So you have a basic function written by a human, and the agent sort of fills in the blanks about what tools might be used in order to return what that function is supposed to return.

And this is where things like web search starts to come in, additional kind of indexing of external resources. And then there's a lot to be said about the type of tools one can make available to an agent to complete tasks, and the control we might have over that, which I think is probably relevant to your work as well.

The next step up there is like multi-step agents, which control iteration and program continuation. So essentially, humans define what are the basic functions that exist and the agent figures out which to do, when, and how. And then the last level that we've sort of identified here is where models create and execute their own code. And that's what we're calling fully autonomous. 

That also has a spectrum. So this is, these are all like points within a multidimensional space. So something can be fully autonomous in that they write and execute their own code, and still work within a sandbox secure environment, or they could not have that sandbox secure environment.

And it's that last point where there's no secure environment offered by a person, there's no explicit functions that the agents have to abide by. That is the highest risk use case, because we have fully ceded human control, we have removed all human constraints from what it can do. And now we can't know how it will act.

Philip: So this gets to the, to the thesis of the paper and the kind of the assertion that, as you've kind of systematically thought about that spectrum of autonomy, you see an increasing level of risk as you move closer toward fully autonomous, right? I thought as I went through the piece, your conversation around action surface. What is an example of how that would look realistically? Where, by way of example, like an agent is trained to help a particular type of employee do their diligence?

Margaret: The question you ask about what it might be trained to do is really key here because the thing about modern AI agents is that a lot of them are based on large language models that are intentionally marketed as general purpose.

And so a lot of the agents that are being developed now are using, you know, so-called general purpose models. I have some disagreements with saying it's general, but capable of doing a lot of different types of tasks, so generating cookie recipes as well as generating directions. You know, that sort of thing, as well as writing an essay, maybe not a true essay, but still writing an essay on a book. So these so-called general purpose, meaning like multifunctional models, are the brains of a lot of the current AI agents. And that means that the models are not limited to one domain, or limited to one task.

It is possible to have models that are limited and constrained in that way, and I would argue that that is a good idea. I actually have a paper out also right now arguing against this idea of generality, and how this brings harms. And so, having task specificity and domain specificity at the heart of the AI agent, the brain or the heart, that model, that gives rise to everything else.

Philip: As one moves into a world of, of a large proliferation of these agents that are built off of these foundational models, how do you actually begin to constrain or address these vulnerabilities that you've identified around unauthorized actions? What are some examples of how that would actually work?

Margaret: Philosophically at a higher level, I do take issue with using foundation models, as the backbone for this kind of thing. I think that we can do a lot of really great work with AI by having rigorous data curation that corresponds to specific kinds of use cases, and building models based on that.

There's a whole discussion to be had. It's called artificial narrow intelligence, as opposed to like artificial general intelligence. Clearly, this is a large discussion to have. Given that people are using these general purpose models, in order to build AI agents, then the question is, okay, well now how can you make it secure?

In the situation where we might not be able to tell what the systems are going to do, we don't have a clear definition of the domains that it can work in, 'cause that's the design of the model. What do we do then? And so there's a few different things. One thing we've done at Hugging Face is to have secure sandboxing environments for deployment.

Manus was another AI agent that came out recently, that sort of said it was fully autonomous, which seems to mean that it can write and execute its own code. But Manus as well, operates with insecure environments. That's not bulletproof, but it basically means that you as the developer, or the person who's deploying it with control over the system, get to decide what libraries it has access to, get to decide like, what is its world of knowledge in terms of what it's actually able to access.

It is not bulletproof. There's definitely ways to figure out things to do, to go around it. And actually, like, there's hallucinations happening right now with these systems, or they're like, they'll hallucinate methods that don't exist in packages, and then hackers are creating those methods, so that they have like a back door to start having influence on a code, even within a secure environment. I don't know how technical I should be here. I'm like, kind of trying –  

Philip: No, that's great. That's great. 

Margaret: But basically, so it's not foolproof, it's not bulletproof. But it is a way to have some constraints over what the model can actually have access to.

And this is like using something like Docker, or E2B is another one. And that's essentially the state of the art right now. But I think that means that we need more research on how to certify security, on how to create methods that provide, like, AI agent arenas. Where AI agents can interact, you know, maybe across a few different domains or applications, and then can't do anything outside of that.

That needs to be further developed. I don't think the technology is robust enough now to really handle all the ways that agents might be able to generate content outside of something specified by humans. 

Philip: So let's go back to the paper and the way that you've characterized this, and get as technical as you feel, that's the audience here. Why is the assertion then that going toward fully autonomous is so risky? What are some examples? 

Margaret: Because we can't make well-informed predictions about how the models will behave, and that's because they're based on, sort of generative technology, where we don't have a good grasp of what the training data had in the first place.

So it goes all back to the start of the pipeline. So, the problems that we have with language models hallucinating, creating inappropriate facts, that sort of thing. False facts. That is now further propagated into AI agent frameworks where it could do all kinds of things like including libel, and security breaches, and privacy breaches, and that sort of thing because it's capable of generating arbitrary code that we have not anticipated or designed.

One of the examples that has really struck me, in terms of what might happen based on how people are using AI agents right now, is that some people are starting to use AI agents to be better influencers, social media influencers. And so they provide the agent with various content relevant to, to what they're interested in, and then the agent kind of summarizes that and puts it forward and posts on social media so they can, you know, extend their reach.

It adds emoji, and that kind of thing to maximize engagement. Other people are using AI agents in their emails to sort of take, you know, top snippets of what is worth knowing about from the email for today, you know, summaries, that sort of thing. And then, you know, maybe using that to do further calendar scheduling, that kind of thing. So more of, like, the enterprise usage?  

If you combine those two functionalities, so an agent that can read your email and then post on social media, then you have a massive risk. Of a system that can read private information sent to you by someone else in email, that they don't want shared publicly, and having it post that information on social media for everyone to see, and further hallucinate stuff that the person didn't actually say, but with enough grounded in what was said that it seems like it could happen. So like, that sort of situation seems like we're just one step away from it, because the different pieces are already there. So like that's one that I think most people would sort of be aware of or have familiarity with, that kind of, kind of issue as being concerning.

Philip: And so, coming back to a little bit of what we started to delve into, right, in terms of restraining agentic systems, maybe not allowing them to be as interoperable as we would like them to be until they get to a point where we can build in some of the guardrails. How does that actually work? As these systems, as you were just alluding to, move in that direction, how do you slow things down so that that is actually being done? Is it even possible to put in those guardrails? 

Margaret: Yeah so, I see things less as slowing things down because of the position I'm in, you know, within the tech industry. But I see it more as focus of attention, and changing the focus of attention and seeing where we really need innovation. And where we really need innovation right now is around guardrails and, you know, security, handling potential privacy breaches, that kind of thing. 

Because the way these are currently being developed and deployed, is sort of not seriously dealing with the fact that we don't have corresponding advancements in security and privacy to handle foreseeable negative outcomes.

And so part of the work that I've done recently, you know, one of the reasons why you had me here, was that I really wanted to kind of have a call to action where we do have some technology that's useful, like differential privacy, secure sandboxes, but we need a lot more, we need a lot more advancement on exactly that type of secure, private, safe-type technology, in order to be able to increase autonomy in these systems. 

And I would argue that it's also a good reason to have more task specific models, because it's not going to be possible for a system to do something totally unanticipated in another domain if it doesn't have within its stored parameters, anything representing that domain or actions that can be taken in it.

So, there's a few different ways that research and development could go to handle the foreseeable risks. And that includes everything from data curation practices, to model development practices that focus on task specificity, to coming up with ways to verify or certify security, or the inability to extend beyond some given constraints that a human developer has put forward.

Philip: How much more complicated does this get as we move even beyond a conversation around just agentic systems, into a conversation around multi-agent systems? Which is seemingly, that's where things are really going to change for enterprises. That's where things are really gonna change for companies who are not just, you know, building the foundation models, but everyone else in the ecosystem.

Margaret: I think it's complicated any way you look at it, so I don't know if it's more complicated. I mean, the thing about multi-agent systems is that even if you're able to have, sort of, security and guardrails around one specific agent or each specific agent in the system, they might still combine in problematic ways.

So I know like DHL for example, has a multi-agent system that's used to optimize routing and scheduling of delivery vehicles. And so that takes into account, like real time traffic, information delivery requests, changing weather, that sort of thing. And then there's work in – there's been some work in trying to do like traffic routing using sensor data. So this is like edge computing and that kind of thing, where you take in information from a lot of different sources. And you can imagine this further interacting with self-driving cars, autonomous vehicles, to get information about how traffic might flow, in order to have people get home as quickly as possible.

Now you plug it into more devices like actually changing street lights. Now things start to get a little weird, right? So, and this is where, like, the fact that different pieces of the system, different agents might have different goals could start to come into play. So for example, you know, one agent that tries to make sure that there's not like there's a limited rate of car flow through some area due to like, concerns of wildlife, or children, or whatever it is. That agent might conflict with the fact that there's like, a wildfire burning everywhere and people need to be able to escape. But if it has control over the stoplights, now it's creating traffic to congestion when people need to get out of there.

So it's just kind of like, the misaligned goals that I think end up becoming part of the issue here, and the inability to have like, the overarching picture in a way that humans can…in order to understand sometimes the decisions that you're designed to make, or that you're supposed to prioritize, need to be deprioritized in this situation.

So one of the things that has been developed recently, and this is something else like Hugging Face has been involved in is coming up with a setup where you have a manager agent and that's kind of like the boss. And then you have other types of agents like code interpreter agents, web-serv agent, that sort of thing.

And so there is perhaps something to be said for having, like, a boss agent, so you don't run into the tragedy of the commons as well. 

Philip: Watch out everybody. You've all got a boss agent in your future. 

Margaret: Right? Yeah. Yeah. Well, I mean a boss of the other agents, but yes, maybe people will be part of the system too. I mean, I think you're right, that that is likely to happen. 

Philip: What have you been seeing in terms of those examples of emergent, behaviors or, or potential risks that kind of align with your thesis that you've put forward in this, in this recent piece about moving towards full autonomy might not be a thing we're ready for. 

Margaret: So, one of the biggest issues, like, at the moment is around privacy and security breaches. So, for example, people are using agents and AI in like medical domains, where at least in the U.S. we have HIPAA constraints.

You don't wanna be able to make private medical information about people available, especially not to other systems that might be in charge of hiring and have that be part of why they're discriminated against, that sort of thing. But agents aren't providing the kind of security that you do need to abide by HIPAA constraints. So there is the potential for private information to be exfiltrated to some other location, and have that picked up by another agent to be used for whatever. So that's one of the ones that I–that like, pressing now. That's happening now. And then, you know, in addition to that, there’s sort of the ones that we can see foreseeably happening, and one of the big areas there that I'm really concerned about is around the military usage. 

I would really hope that there would always be a human in the loop. And our paper talks about some of the lessons we've had in the past about the need for human in the loop.

But that said, you know, a lot of times in military work you have to have split second decisions, like split millisecond decisions. And so automation really helps with that. But if we can't, you know, reliably predict that it's making the right decisions, then we're potentially creating really harmful outcomes around the world.

So this is an area I'm pretty concerned about, and just really hoping that enough people care to, to focus on how to solve the issue. 

Philip: I think it's incredibly important to inject that into the conversation. I think while, you know, by way of example, the U.S. government may make very clear about its intention to always keep a human, quote unquote, in the loop. On the loop. You have a whole burgeoning system around those who may be making decisions, who are being informed by these agentic and/or multi-agent systems over the course of the coming years. And so when, you know, taking into consideration some of the risks that you've highlighted, in building, in those guardrails, what are the ways that those risks can actually be driven down?

I think that's really one of the most important pieces of this and really why I was so interested in speaking with you here today. How do you actually move in this direction? Like we just talked about, accelerating in this direction, but still build in those guardrails so that the risks are somewhat mitigated.

Margaret: There's the world as I'd like it to be ideally and the world as it is. I think that one of the things you're getting at is that like, even if I think that the foreseeable harms outweigh the foreseeable benefits, I don't think that realistically there's going to be attention to these issues in a way where fully autonomous systems are not developed and deployed without any human constraints. I do think that will happen. So the question is sort of what do we need to develop alongside that to make things as secure as possible, to make things as verifiable as possible.

You know, and this is where again, like developing a more rigorous science of these systems is necessary. So much of this is based on vibes. You know? Like, this seems like a good system for that. Our benchmarks don't measure capabilities as they translate to real world deployment. Like our benchmark evaluation system for our language models is, is not good.

And I actually have a piece on this as well in the National Academies of Engineering Journal. So, so minimally, minimally an entire new science of, of development and evaluation, and how to actually evaluate to really understand how well these systems work. But then beyond that, coming up with like, verifiable safety guarantees coming up with ways to do that.

In the past we've been able to innovate on privacy guarantees. So like differential privacy, for example? As the way that's like, mathematically supported, as being something verifiable, as maintaining privacy. We need to have something like that for safety as well.

I think that there really is a need for development here that hasn't really been prioritized, but should minimally be prioritized alongside, where a lot of developers are, are going right now. 

Philip: As we move, as you just, I think aptly alluded to, as we move towards a world where probably fully autonomous systems are being built and deployed, how do you see the bigger picture risk that that might pose? As they may learn on their own, they may be able to write their own code, they may be able to do certain things cross domain that maybe they weren't intended to do. How has your risk analysis changed over time? 

Margaret: I had, I think, naively thought that no one would be allowing systems to use untrusted code and execute things without any human oversight.

I just sort of assumed, like minimally, like market dynamics wouldn't let that happen because people wouldn't want to have systems that are faulty and you know, are incorporating things that don't make sense and break and that sort of thing. But you know, some organizations have gone forward and done that even without demand for that.

And there's a lot to be said about what their incentives are in doing that. So my sense of the risks, I think, has been somewhat more towards the risks that emerge from unmonitored code execution. Where it really wasn't a big concern of mine a few years ago because I just didn't think that people would do that. 

For anyone who codes, like if you've ever typed RM, hyphen, RF, star. So remove, this is using dash, like remove, recursively, force everything, delete, everything. Like that is enough to give you nightmares for the rest of your life. Like, and that is one small code snippet, right?

Like I once lost three months of work because I thought I was just deleting a directory, but I was actually one level up the director. So like, it's just so easy to see if you're a programmer, it's so easy to see how just very simple lines of code can have massive impact on people's lives, on everything they've done on their work, on everything. I just sort of didn't think that people would develop this technology and let it be deployed like, without more serious work on the safeguards.

Philip: And so now that we find ourselves, or, you know, maybe there will be some constraints on that as things evolve or maybe there will be some sort of an accident or, or something that begins to put more standards or regulatory constraints in place that prevent some of the harm that might come from a world like that.

But as these become more powerful systems, where does, what sort of risk does that begin to create with the code that you're speaking to? How have your considerations around how to restrict those risks, how does that play out in that, in that space, as these systems get even more powerful?

Margaret: So there's, there's clearly work that can be done in what's not allowable, like putting into place executions that are just not possible to run. So if an agent system puts forward a command, you know, using this library or using an exec function or something like this, like it just can't just fail or whatever it is. So that's clearly one, one way to go.

Philip: And if they're smart enough to get around that? 

Margaret: Right. So the thing that's getting me is that the goal of AGI is a bit at odds with safeguards for AI agents. Because the goal of AGI is to create solutions that humans haven't thought of. 

But to make an agent safe, we need to be able to anticipate all solutions that the system would think of. And so I'm someone who works on AI agents. I don't pursue AGI for a variety of reasons, I've never liked it, and the dual sort of tracks here, of working on AGI and AI agents, really come to a head at this point of safety and being able to, as a person, foresee how everything can go wrong.

Philip: It was in the paper you, you, you and your colleagues had written, we contend that if AGI is to be developed, it should not be developed with full autonomy. 

Margaret: Yeah. I'm glad I said that. I agree with myself 

Philip: And I do find that to be inherently a contradiction. I think the notion of  AGI is such that it's, it's capable of being fully autonomous.

I think that's kind of the point. How can we make sure, I think, as you've alluded to here, how do we do the innovation that's necessary? How do we make sure that the benefits are just as immense as we would like them to be, but then account for some of this risk that you're, that you're identifying might be coming along with it.

Margaret: This is actually where openness and transparency really have a role to play. Because you don't have to just trust the word of a big tech company that something is safe. You can have independent auditors examine, close up at the level of the code, what systems are actually doing, come up with novel stress tests and adversarial testing that other people hadn't thought of.

This, like, openness and transparency as a way to have assurances of safety is a really critical thing for people to wrap their heads around that I think. Like sometimes that doesn't happen. It sort of seems counterintuitive. But for well-maintained, updated code, having open, transparent, able for anyone to test, find security holes in, address, is a really, really helpful approach to move forward towards more beneficial AI. 

Philip: Sounds like a pretty good pitch for Hugging Face. 

Margaret: There's a reason I work there. 

Philip: I think that's a fantastic potential way to bring us to a close. Thank you so much for taking the time for the conversation here, but also for, you know, pushing this out there. I think it's thought provoking, and I for one personally and as I think at IST, can look forward to making sure that a lot of people are hearing about this, talking about it and doing something about it.

I think the more rigorous approach that you alluded to is absolutely essential. So thanks for the work you're doing. Thanks for joining us here on the podcast. 

Margaret: Absolutely. Yeah. Thanks for having me.