Sidecar Blog

All About AI Agents [Sidecar Sync Podcast Episode 17]

Written by Mallory Mejias | Feb 16, 2024 7:23:58 PM

In this episode of Sidecar Sync, Amith and Mallory delve into the evolving world of AI agents, exploring OpenAI's latest developments in creating autonomous software that can operate devices and execute tasks on behalf of users. They discuss the revolutionary potential of AI agents to transform our interaction with digital tools, the ethical considerations and privacy implications of such deep integration, and the competitive landscape within the AI industry. The conversation also covers the significance of language models in facilitating communication between humans and AI, and the future vision for AI agents, including their role in increasing productivity and the challenges in developing these advanced technologies.

Thanks to this episode’s sponsors! 

Tools/Experiments mentioned:  

Social:  

 

 

This transcript was generated by artificial intelligence. It may contain errors or inaccuracies.

[00:00:00]

Amith Nagarajan: Just think about the number of things you're doing manually that you could offload to an agent and then think about what you could do at that time.

Welcome to Sidecar Sync, your weekly dose of innovation. If you're looking for the latest news, insights, and developments in the association world, especially those driven by artificial intelligence, you're in the right place. We cut through the noise to bring you the most relevant updates, with a keen focus on how AI and other emerging technologies are shaping the future.

No fluff, just facts and informed discussions. I'm Amit Nagarajan, chairman of Blue Cypress, and I'm your host.

greetings and welcome back to another episode of the Sidecar Sync. It is super fun to be with all of you. As always, but particularly this week, here in New Orleans, where Mallory and I are, it is full Mardi Gras season in action right now. So it is noisy and crazy and loud and super fun here. Um Before we get going, uh, we'd like to [00:01:00] present a brief message about our AI Learning Hub.

Mallory Mejias: If you are looking to dive deeper on your AI education in 2024 and beyond, I encourage you to check out Sidecar's AI Learning Hub. With the learning hub, you'll get access to flexible, on demand lessons, and not only that. Lessons that we regularly update. So you can be sure that you are keeping up with the latest in artificial intelligence.

You'll also get access to weekly live office hours with our AI experts, and you get access to a community of fellow AI enthusiasts in the association and greater nonprofit space. You can get the learning hub for 399 a year on an annual subscription, and you can also get access for your whole team for one flat rate.

If you want more information on Sidecar's AI Learning Hub, go to https://sidecarglobal.com/ai-learning-hub

As Ameth mentioned, we are in full Mardi Gras mode down here in New Orleans. So if you hear any crazy sounds outside of my window, please excuse those. Ameth, have you been to any parades so far?

Amith Nagarajan: Just one. So [00:02:00] I'm, I'm kind of the Mardi Gras party pooper, but, uh, I do enjoy it. I just have been super busy this week.

Mallory Mejias: Absolutely. I went to a parade last Friday. For those of you who don't know, it seems like there are parades. It's not just Mardi Gras day or even Mardi Gras weekend. There are parades for the full month leading up to it every single weekend. So lots of opportunities to have fun here in New Orleans, but.

That is not the topic we are discussing today, Amith. We are going into AI agents and I, when looking into research for this topic, went back and looked at previous episodes and I realized AI agents were one of the topics on, I think, our second episode that we ever did. So it's not surprising that we're revisiting it here for sure.

Amith Nagarajan: Definitely AI agents are going to be a big deal. So I'm really excited to share some information this week with our listeners.

Mallory Mejias: This week, we got some interesting news from OpenAI that they are working on AI agents, which is not a surprise. But what's more of the surprise is what Kind of agents [00:03:00] they're working on that might be released quite soon. So today we're focusing the whole podcast on that topic and we're breaking it down into three subtopics.

So first, we're going to introduce you to the idea of AI agents. Talk about what we mean and what Open AI is working on currently, then we'll move into a conversation around the ethical considerations and the user privacy piece of these AI agents. And finally, we'll wrap it up with a discussion around the competitive landscape and the future vision for AI agents.

So let's dive in to our subtopic one, introducing you to the idea of these AI agents. OpenAI, as we know, is known for its groundbreaking chat GPT, but they are shifting gears and pioneering again with the development of AI agents. AI agents are software designed to perform tasks autonomously on behalf of users, mimicking human interactions with digital systems.

They can navigate interfaces, execute commands, and manage data across various applications. OpenAI's AI [00:04:00] agents are designed not just to perform tasks, but to take control of user devices. Automating complex tasks that typically require human interaction. Imagine an AI that can navigate your computer, moving the cursor, clicking links, typing text, and seamlessly transferring data between applications.

This is the revolutionary potential of OpenAI's new venture. Now this move by OpenAI reflects a broader shift in the AI landscape toward creating more autonomous, interactive technologies that promise to redefine our interaction with digital tools and platforms. So Amith, can you generally explain this concept of AI agents that we've talked about before on the podcast and then maybe get into OpenAI's version?

Amith Nagarajan: Happy to. I think that an agent could be really thought of quite simply, which is it's an AI that can do something, whereas the types of AI people have started to become familiar with where you ask a question and get a response is indeed quite useful, but it does [00:05:00] not take action on your behalf. So the key concept of agency is that these programs can actually perform actions for you. Um, there's tons of examples. We'll go through a bunch of them in this podcast, but that's the basic concept is AI that can take action. And fundamentally that represents a really broad array of potential. There are so many ways that an agent can do things for you to simplify your life, make your work easier.

So it is an exciting concept. Of course, like with all things in AI it means that you have to think about it from an offense perspective where you're excited about all the new things you can do. And you also have to think about defense where you think about ethics, safety, responsibility, which with agency, you know, there's a whole host of new issues that we can tackle.

So, but that's the basic concept is that an agent can do something for you either entirely autonomously where it can literally take the idea of a task like You know, plan a vacation and book it for me, and it'll actually do all the steps for me, [00:06:00] including booking a flight, booking a hotel, uh, perhaps some tours, maybe some restaurant reservations and all of that stuff that, you know, could be completely autonomous or it can be semi autonomous where it comes back, perhaps ask some clarifying questions, uh, and maybe does part of it for you and part of it you do yourself. Um, I, myself, I'm quite excited about the idea of using agents when I have to make phone calls. So, it's not that I'm antisocial and don't want to speak to people, it's just that phone calls are slow, and in the workflow that I'm in, I often would prefer to have somebody else make a phone call on my behalf. And, really, what I'm looking for there is an agent that is smart enough to be able to interact with a human. And to perform a task like pay for a bill on the phone or something like that. So there's a lot of smaller vendors I deal with my personal life, like a gardener or someone like that, that I don't have like an automated interface.

So that's purely like my laziness example. But if I can save five minutes here and five minutes there, it really compounds. So I'm looking forward to that. Um, but I [00:07:00] think that agents have much broader implications than these, like really small personal life examples. Although those things, of course, do stack up.

Mallory Mejias: I read that back in November at OpenAI's developer conference, they released Assistance API, which was a tool that could take action on a user's behalf, but only through APIs. Can you explain how this new type of agent that OpenAI is currently working on is different from Assistance API?

Amith Nagarajan: Sure. Well, the assistance API is actually quite rudimentary. We've worked with it quite a bit across a number of our companies, including skip, which is our agent that does data analysis for people within the member junction platform. Some I'm quite familiar with it through that project and what that assistance API does allow you to do is to basically specify some kind of instruction like I'd like you to do, you know, some kind of a task and to provide data And to enable tools.

So the tools that the agent has access to are things like internet search, um, creating and running code. Uh, and you can also provide custom tools that it can call [00:08:00] back to. So you're, you're kind of arming the assistant with certain capabilities. Um, and that's useful. You can build a lot of applications with it. But it's really not autonomous in the same sense of an AI agent that has control of your device, certainly, or even control of a particular application on your device. So, um, it is, it's quite different in its scope. Um, also, as you pointed out, Mallory, it's only an API, so there's no application built on top of it. Of course, people are building apps on top of that API. And Open AI. Is themselves, um, in comparison what they're releasing now seems to be. And again, Mallory and I don't have any inside knowledge of Open AI. Early knowledge were basically, you know, sharing with you what we're hearing through the media. The information is a great resource that we read a lot that tends to have a scoop on a lot of things that are happening in Silicon Valley. But what appears to be the case is they're releasing a consumer centered a I agent that can run on your computer, whether it be a Mac or a windows machine, possibly on a phone. And that's a different story, because then your device, you know, it opens up the potential to do [00:09:00] anything with a much broader array of data and a much wider set of tools.

Mallory Mejias: Would you say this is a novel approach, the idea of AI taking control of your device, or is this the route we probably always expected in terms of AI agents? I

Amith Nagarajan: I think agents are going to appear in lots of different forms. You'll see them take, you know, a computer level agent. I mean, Microsoft essentially said they were doing that with copilot for windows, right? So with all the copilots that Microsoft has released in the last, 12 months in particular, it's easy to lose track of the fact that with Windows 11, they introduced a co pilot there, which is somewhat similar to what OpenAI is talking about, but very rudimentary. Um, so I think a computer level agent that can help you use the computer makes a ton of sense. I don't think it's a novel approach in terms of OpenAI building it. I think they're very well positioned to use both frontier models, like the most cutting edge models. As Well as their distribution, because remember, open a I [00:10:00] has hundreds of millions of users that are out there.

So they have a wide distribution and they're seen as A very cutting edge, which, of course, they are so. They have the opportunity, I think, to penetrate the market in ways few other companies can right now. So it'll be interesting to watch them, even if what they're building isn't necessarily novel. I think it's novel in the sense that there's no one out there doing this at any scale.

There are other tools that are out there that attempt to do similar things. , but when a company like Open AI, which we know moves very quickly and is quite innovative, is doing this, it's worth paying attention to it.

Mallory Mejias: would say we've spent a lot of time within the last year and in the greater media discussing large language models and the competition there between companies, open source, closed source. How do these LLMs fit into the AI agent conversation? Is this the layer that we'll interact with to work with the agent?

Like, how does that fit into this greater context?

Amith Nagarajan: So the language model essentially presents the connection between us as people who communicate in language and [00:11:00] technology of any kind. So that's really the key unlock with language models. It's not about Chat GPT specifically. It's not about Google's Gemini. It's about the fact that you can actually communicate with a computer of any type. And a phone, for example, is a good example. It's just another computer and you can talk to it and you can, you know, basically get responses that make sense. And that is by itself a really interesting thing to just think more deeply about because up until recently, computers ability to understand language has been extremely limited. And if you think about like Siri and Alexa is the two prevalent examples we've had in our consumer experience in years past, up until really when chat GPT became, you know, part of the, contemporary awareness that's out there, um, it was very limited. You wouldn't even contemplate talking to Siri about anything more than a trivial thing, like send a text message or what's the weather or what times the game, , because it was super, super limited.

And now all of a sudden with language models, we've unlocked. The ability for technology to [00:12:00] interact with us through our natural language, whereas in all prior time periods, we had to learn the computer's language to get it to do anything meaningful, right? You'd have to have a software engineer who knew how to write code in Python or JavaScript or C++ or whatever the language is.

You'd have to translate human thought to into computer language, and then the computer language would do what you wanted. , but now computers can talk to us in our language, and that's the key. An interesting societal unlock labor, unlock personal life, unlock that. I think people kind of miss the point of sometimes with LLMs because LLMs are productivity tool. But the reason they're so interesting is because language has been unlocked. And so coming back to your question of where do LLMs fit? In the context of agents, without language understanding, agents can't operate. So, think of it as, okay, like, you might have a person who has the ability to physically execute tasks, but if they don't know how to speak to you, [00:13:00] you can't ask them to do anything for you.

So, maybe they speak a foreign language, right, like a computer's language is like a foreign language, but you can't interact with them, they can't interact with you. That's where we've been. Now, we have complete fluency. So it's a really interesting time and the language model is part of the brain of what an agent is an agent's just a bigger concept because the language model is part of it. And the last thing I'll say about that is when we talk about, the physical manifestation of AI is robotics. When we talk about, like, you know, the physical form of AI, it's going to also, and that will be actually a multitude. Of physical expressions, but robotics like humanoid robots or other forms of robots that you'll start to see in the workplace, certainly an industry and even at home will use language models as the way they interact with people.

And that's going to be an unlock of of incredible impact. So, uh, lms fit into everything is the short answer to your question. Without language models, you don't have the ability to [00:14:00] interface with these technologies and devices in a fluid way.

Mallory Mejias: that makes sense. So large language models allow us to communicate with these agents. My follow up question to that will be then, if we have agents soon, maybe this year, maybe next, that can take actions for us, that can write the blog, post the blog to our website, plan the vacation, make the reservation, so on and so forth, will we be interacting with LLMs on their own?

Or will that kind of go away and we'll only be using the agent piece?

Amith Nagarajan: Great question, Mallory. I think the idea of a separate interface for something like chat GPT might cease to exist. You know, it might just be baked into the tools that are purpose built for specific things. So if copilot within word becomes so outstanding at the tasks you do in word, do you go to chat GPT separately from word? Um, I do think the idea of some kind of a really clean UI that helps you interact with an agent kind of in a general way probably has you know, some persistent value and staying power, essentially. So [00:15:00] I think that You know, chat GPT will probably be in use for quite some time. I think there's a few interesting things to think about here. Um, a few weeks ago, there was a release of a device called the Rabbit, which we briefly touched on, I think, in a prior episode. And the Rabbit was a device that was all the rage at the Consumer Electronics Show in Vegas in early January. And the reason it was so exciting for people is that they did two things. First, they created agent capabilities. So the rabbit is able to do things for you, and it can connect with all the other apps. Um, and it's a device that actually doesn't really have an interface other than voice. It has a little bit of a screen, but it's primarily voice interaction. Now, it's interesting as a trend line to watch with the rabbit, and you're going to see this also with Apple's upcoming AI releases that everyone is quite sure that they're going to be doing. is on device AI and so that's a key thing to watch. And in this podcast, we've previously talked about open source. We've talked about small models as being as exciting as large models. So [00:16:00] this week, actually, Google finally released Gemini Ultra, and Gemini Ultra is the most capable version of Gemini.

It's a massive model It is multimodal from the start, meaning it's been trained on not just text, video, images, audio, all sorts of things, right? Remember, Google has YouTube, the largest source of video anywhere on the planet, and so they've trained Gemini and all that. It's an incredible model. I haven't yet had a chance to play with the ultra edition of it, but I'm looking forward to doing that.

That is a GPT four level model, which now we have two of these, you know, top tier models. That's exciting. And that's a massive model. You will not run Gemini Ultra on your device. But Google also mentioned back in December when they first announced all this stuff that they would have Gemini Nano. Gemini Nano is an on device language model. Similarly, the Rabbit has an on device proprietary language model. Apple is believed to be developing an on device language model. I keep emphasizing that because these are small models that are still quite capable. [00:17:00] They're very good at natural language, obviously.

They can do a lot of things for you. And that's important because of privacy. Uh, do you want all of the data on your computer potentially going up and down to an AI in the cloud? And the short answer is most people aren't comfortable with that. Most people won't be comfortable with it. But if that AI

ran locally and did all these things for you and never needed an internet connection now That's interesting because then it's just another program on your computer So I think that's the thing to watch with agents is how do people package the agent functionality? We'll open a eyes forthcoming agent tool Have a local LLM. Because up until recently, you know, most of open as work is about a G. I. It's about frontier models, the biggest, most powerful models you can get to by scaling with more and more compute and data. But they haven't really looked at creating more compact models. So I'm really curious what they're gonna do.

I sense that there will be a local edition of something coming from open a I that will be perhaps a GPT three five class model that you can run on a PC, a [00:18:00] Mac or possibly even a phone. And that might be the engine behind their agent model. I'm purely speculating. I just think that architecturally that would make a lot of sense.

Mallory Mejias: And that is a perfect segue into our next subtopic of the day, which is around the ethical considerations and user privacy piece of these computer using agents. This capability, while innovative, raises concerns like those associated with malware, including unauthorized access to sensitive data and the ethical implications of such deep integration into personal and professional spaces. The push toward developing these computer using agents reflects the need for stringent ethical standards and robust privacy safeguards to protect users in this new frontier of AI technology.

I am one of those people who really enjoys examples. to contextualize the things that we talk about on this podcast, but kind of really in my whole life. So I asked Chat GPT actually to create an example of this type of agent going wrong. And this is what it came up with. Imagine a scenario where an AI [00:19:00] agent designed to streamline financial reporting by accessing various documents and software mistakenly interprets a draft document as final.

It then proceeds to submit incomplete financial data for an important report leading to inaccuracies in financial forecasting and potential compliance issues. I think that was a pretty good example on, on ChatGPT's part, you kind of got into this, but what do you think are the greatest concerns and the greatest obstacles right now to getting agents out there?

Amith Nagarajan: Well, the safety question is going to be perhaps the greatest obstacle to overcome with any new technology. There will be the early adopters, and there will be people that come along later. And the challenge, of course, is that Without early adopters, things never go anywhere, right? And so the adoption curve for any new technology is, is really, really long. And the more potential impact the technology could have, the more likely it is that you'll have a larger number of early adopters willing to experiment. Now, in the case of agents, I think that some people are going to go out there and [00:20:00] do silly things like they're going to ask an agent to operate, you know, do some operation that is, uh, you know, unchecked by a human.

So in the example that Chat GPT gave you. Um, I would certainly hope that a finance department would put in place a human review in that agent process before anything went beyond, you know, that initial compilation that's being done by the AI agent. Um, and that's part of what I think will happen as the technology matures.

But I guarantee you, you will hear stories of people who have unintentionally, you know, published personal information on the Internet or something like that, where an agent just did that without, you know, getting feedback. So. Agents will do what you tell them to do. Ultimately, I'm sure some of these tools will have safeguards in them.

Well, it'll like force you to approve things. But but ultimately, as a tool, you could have these things automate just about anything you can imagine. And that doesn't mean it's a good idea, though. It means that you have to be thoughtful about what you're willing to fully automate. Versus what? She might want to semi automate, right? So [00:21:00] I'll give you a good example of this in the association world. Um, one of the things we talked about at digital now, back in November in Denver was on stage with a leading AI researcher who was part of the team that built auto gen, which is Microsoft Research, research Microsoft Research. Microsoft Research.

Uh, their project around agency or multi agent framework. Specifically, we talked about this use case of managing submissions of abstracts to a conference. So a typical process associations go through is if they have an upcoming conference, they'll say, Hey, we'd like to call for speakers. We have people submit essentially proposals for a talk at an upcoming conference. And right now it's a totally manual process. People go through it. This very labor intensive cycle of reviewing those abstracts or the submissions essentially, uh, and then grading them or rating them and then deciding who gets to speak at the event and who does not. And with AI, we can automate a very large portion of that.

And we talked about that in detail using autogen as the example [00:22:00] at digital. Now, the point though, was that autogen and other agent frameworks do allow you to do something called AI plus human, meaning there's this review process you go through. So. I think partly what this needs is common sense, honestly, because I like to tell people think of AI as a freshly minted college graduate who is brilliant but does not necessarily have a lot of real world experience. Would you allow that person to do all that work without someone more senior, someone more experienced, checking it? The short answer is I really doubt it, right? But sometimes people don't think like that with technology because technology has this aura of perfection. Because it's a computer, it has to be right. But that's not the way AI works, which is part of its power and also its risk. So coming back to the idea of A big concern for associations and for people in general is you got to be careful about how much you unleash to these things. So don't necessarily look at it as a full automation solution right away, at least not right away, you know, step, go through it stepwise, take steps [00:23:00] forward, experiment, go play with this stuff because it's going to remake the world, but take a little bit of time to learn it and see what it's good at and what it's not. Think of the AI as a partner to you in a way. And if you think of it that way and almost humanize it in that sense. You'll probably have a little bit better decision making framework for thinking about where it should go full auto versus where you should probably review its results.

Mallory Mejias: Amith, you and I talk a good bit about the importance of experimentation on last week's episode with Sharon. We talked about the importance of experimentation. It seems like it comes up constantly. I can imagine whenever these AI agents are released, there will be People who are feel fearful, certainly of this experimentation piece.

I feel like with chat GPT, you can go in, you can test it very low risk, very low stakes, get some results, work with those. It seems like when you're talking about putting an agent on your device, it seems a bit more all or nothing. Like either this agent is on my device or it's not. I know we don't have all the answers at this moment, but what do you foresee as a safe way to [00:24:00] experiment with an agent that has control of your device?

Amith Nagarajan: Well, I mean, you know, before I would do something like that, I'd make sure I'm getting an agent from a company that I have some experience with or some trust for. So I think OpenAI or Google, Microsoft, someone like that, I'd be a little bit more comfortable, um, just because quite frankly, those people already have all of your data anyway. Um, so anybody who uses any cloud services, which is all of us pretty much has our data, you know, in the hands of apple, in the hands of Microsoft, Google, Amazon, et cetera. Those guys have a lot to lose, so they're going to be focused on safety and security a lot more than probably some random startups.

I'd be very thoughtful about downloading some program from some. You know, rando AI companies as cool as it might sound and just using it says one thing, and that's, I think that's probably common sense, but maybe not so much when you're dealing with a hot topic where it seems like this is so revolutionary.

We should try it out. Um, experimentation also can mean having a sandbox that you set up where you install it on a computer that any of your employees can go play with. [00:25:00] And but not a computer that's connected to the Internet, not a computer that's connected to real data. That's easy enough to test. Maybe you wouldn't do that at home, but it's pretty easy to do that in an office environment or even set that up in a virtual machine, meaning a computer that you can connect to through the cloud.

Um, so there's ways to do this. I think experimentation is so key here because agents and then really LLMs. And AI as we know it today, It represents a new platform, and a platform doesn't immediately reveal its potential applications. So think about the Internet and all the things we rely on the Internet for. As excited as the, you know, early pioneers and founders of early Internet and early web technologies were about its potential, even they would not have been able to imagine all of the robust, you know, innovative examples of use of the web and use of the Internet. Similarly, with mobile devices, yesterday I gave a talk to a group of CEOs and I was talking about something called vector databases, which is a somewhat technical topic, but at the same [00:26:00] time it is an important foundational platform to understand and in the same in the same sense, AI agents are very much a new platform.

And so when a new platform comes out, you immediately have these use cases people get excited about. Usually it has to do with saving time, which I totally get. Um, but there's opportunities galore that you don't really think of. So in order to think about what those opportunities are, you have to go out there and experiment. The example I gave yesterday to the ceo group was that when mobile became a thing, meaning smartphones, you had for the first time mobile computing of significance, meaning the devices you were carrying around in your pocket or in your purse were substantial computing capabilities. And they had internet access through 3G at the time, which was remarkable back then. And they had GPS. So you had location awareness, internet access, and a lot of compute. And that led to applications like Uber, which never could have existed prior to that. So, um, I think there's, that's the type of thing you have to think about.

What's [00:27:00] the Uber moment for your sector, and how can you be prepared for that? And will agents affect your association, but also how will they affect your members?

Mallory Mejias: One of the immediate use cases I've thought of when hearing about these AI agents are that they could replace personal assistants and executive assistants. It's probably kind of a milder use case in the greater scheme like you just mentioned, but I think that's what I imagine these first steps to be within the next few months or the next few years.

I don't know if this question is going to make sense to you, Amith, so I'm just going to say it, and then you can help me walk myself through it, but in my mind, personal assistants and executive assistants don't need control of your device, Amith, if we're talking about you, to take actions on your behalf.

They just need access to your accounts. Why does an agent need to access your device? Is there any world where an agent is able to take actions on your behalf by having access to your accounts, but using its quote unquote own device?

Amith Nagarajan: Sure. I mean, you can definitely envision agents that work within, say, your email account. But think about the personal and P. A. N. E. A. S. [00:28:00] Scenario. A lot of CEOs I know actually have their assistants have full access to their email, including replying to people on their behalf. And sometimes they reply on the CEO's behalf with the knowledge of the person they're replying to, and sometimes not. And that's been going on for a long time, right? That kind of communication through an assistant. And so, uh, in the context of an email program, certainly you can have an advanced agent inside your Outlook, inside your Gmail that does that. I guarantee you Microsoft and Google are working on stuff like that. Uh, this is kind of the next edition of Copilot and Duet. We'll do those types of things, undoubtedly. And the question is, is what? What is the difference when accessing your accounts and accessing your device? Um, on the one side, I'll explain why I think an agent needs to actually be able to control hardware to do certain things in just a second. But what is the difference from a data privacy perspective? Because almost all of your data is in your accounts anyway. There's very rare scenarios where you don't have your files in a drop box or in a Google drive or something like that. You have your photos on your iPhone or on [00:29:00] your computer. Android device, which sync with the Google cloud or with the Apple cloud.

So like accessing your accounts essentially means accessing all of your data. Um, but for really high secure environments, which is very rare in most organizations that are listening, you know, you don't typically have those kind of local only data elements. So then why is it important for some of these scenarios of the agent have access to the actual hardware? The reason has to do with actually a parallel to what we talked about earlier. which is connecting things. So in the context of connecting people with computers, we said, hey, up until recently, the language unlock wasn't there. So you had to converse with the computer in its language, right? In computer code, essentially.

Now you can converse in language, in our language, um, which is cool. Well, the same thing applies to how agents interact with other programs. So there's a lot of programs out there that run on the desktop or run in a browser, and they, they're expecting a human to interact with those programs. Not another program. And so, what happens [00:30:00] to these agents is they are able to learn those programs just by doing what you do. They can basically see the screen, they can move the mouse, they can type on the keyboard. And so, by essentially mimicking human interaction with these pre existing programs, which are not designed to be used by an agent, those programs think that a human is working them. And so therefore it doesn't require any changes whatsoever to the literally millions of programs that are out there in the world that can then be used by the agent on your behalf. So let's think about this for a second in the context of our listeners who are mostly associations and nonprofits, many of whom have legacy software. So legacy software, I generally mean older software. It doesn't typically have APIs. Uh, it's perhaps a little bit of the clunky side. It is perhaps desktop software instead of web based, and it's not going to play well with a modern ecosystem of APIs. And data interchange and all that. So what if the agent could actually [00:31:00] learn that custom proprietary or otherwise legacy style software and operated on your behalf?

Right? It's a bridge. It's connecting thing. older technology with newer technology. So that's the reason for device level control or at least control over the browser.

Mallory Mejias: That's really interesting. I think that's what I was trying to get to the meat of here in researching this topic. I was struggling to understand exactly why the agent would need to control our devices, but I think that makes a ton of sense.

Amith Nagarajan: I think the key to it in those scenarios is, um, you know, looking at it from the viewpoint of, if you let it go on device, what can it do on device that it couldn't do to your question with just account access? And so for contemporary technologies like a Gmail or access to, you know, cloud based services that have APIs and have account level access, Probably not a whole lot.

I mean, it's those APIs tend to be robust enough where you can do just about everything through API, and so it wouldn't make sense for an agent to go through a user [00:32:00] interface design for a human. But there's so many websites and so many programs that aren't designed that way that the agent being able to access those device level, you know, tools like the keyboard and the mouse and being able to read the screen is really key. Um, one other quick point I want to make about screen reading that I think might be interesting for some folks. So for a long time, there have been tools primarily used by quality assurance engineers in the world of software development to be able to interact with a web based app or even a windows based or Mac based app. And these tools would essentially look at like a screen recording and you would essentially train it. Oh, this particular button on this screen does this thing, click it, then do this next thing, then fill in this this text box, then click this next button and these so called automated scripts. We're actually significant efforts.

And a lot of companies would develop these full suites. of automated scripts to do regression testing or comprehensive testing on their software, whether it's internal software or external software. [00:33:00] And those kinds of tools are still valuable in many respects because they have a predetermined path that you're testing. But as we all know, end users are not predetermined in what they do. They just kind of randomly hop around software and do somewhat unexpected things. So, A, agents I think are going to be a really interesting tool for software testing, which is exciting. But be, um, the way these things work is different.

You don't have to give them pre recorded steps. They literally look at the image on your screen and they interpret that image. So we've talked a touch about this in past episodes, but multi modality in these AI systems means they can understand not just text, but also images, video, audio. And so these models like GPT for vision as well as Gemini. are equally adept at looking at an image and understanding it as well as looking at text. And so, in the context of agents, that means they don't need any pre training. You can just say, hey, I have this app, it does these things, go figure out how to use it and place an order in my, [00:34:00] you know, 20 year old order entry system, and it'll figure it out.

Mallory Mejias: Wow. I had a scenario cross my mind. While looking at this topic that I want to share with you, not so much a question, but more to get your thoughts on it. And that was if we did have an agent that was controlling our email, more or less responding to things on our behalf, I was thinking of a situation where we receive a phishing email, maybe saying, Hey, is this bank alert you?

And they want you to log into your bank account. You know how that goes. And thinking through whether an AI agent would be able to fall for phishing scams or not. What do you think about that? I mean, do you think AI agents will be better or worse than humans at discerning those types of things?

Amith Nagarajan: Great question. I think it's potentially a great use case for defense, for helping detect these types of scams on device. Um, I don't know. I think that, you know, it's kind of like cyber security where every time defense gets better, there's new threats that come up that take into account the defense. It's like move, counter move, move, counter move.

[00:35:00] And so there's often some defense playing out in cyber security and the same thing applies to AI. So current generation AI, will it be outsmarted by the next generation AI? Undoubtedly so. And that's true with everything in life, right? So I think that's an important reason, if for no other reason, to get up to speed on this stuff. Uh, because the attacks are going to get more sophisticated. You're gonna get phishing attacks that actually, you know, are phone calls from people you know that sound an awful lot like them, uh, asking you for something, and it's gonna be You know, maybe a five minute conversation where they talk to you about recent events in your life and you're going to really get fooled.

So it's super scary. And I think that agents might be able to help with that, where an agent might pick up on something you wouldn't, right? Like an agent might be listening to the call , while you're having the call with someone, which of course is a privacy question, right? But let's say you had this defensive cyber security agent on your phone. That automatically was listening to every phone call. Maybe not recording it, because that's, you know, probably beyond what people would want, right? But, not recording it, but listening to the call with you, and then [00:36:00] giving you, you know, maybe the phone starts to vibrate, you know, really, really strongly if you are on a call that it, it detects a threat. Um, stuff like that is gonna happen, you know, and I, and I actually hope it happens because I know for a fact That there will be, you know, malicious actors out there trying to come after people more so than they ever have because they have the scale of AI working on their behalf.

Mallory Mejias: We've said it before, the only thing that can fight bad AI is good AI, so it seems like we're entering into a world where, uh, it'll be exactly that, good AI versus bad, and we'll see who wins, I guess.

Amith Nagarajan: Well, it's yet another reason to invest in learning because, you know, the, it's scary sounding. It's scary to those of us that are deep in this stuff every day. Um, but I think it, you know, I really empathize with people in our market and beyond who haven't yet started in their learning journey and feel totally overwhelmed by AI.

They might be either curious about it or mostly afraid of it. Um, but they haven't yet really started to learn it, and that's the challenge is it is. It feels like a bit of a steep [00:37:00] learning curve initially, um, just to understand this stuff. And I understand that. I think that's probably like very true that there's there's some work you have to do to get going. But that's the key to it is that you have to put a little bit of effort into learning these new technologies every day, every week, every month. Don't try to learn everything overnight. I mean, if you're if you're super excited about it, by all means, there's a ton of resources out there where you can go deep on it. But most people aren't done at the time or the interest necessarily to learn it for learning it sake. Right? Um, and so, but I think that's really critical to invest in it. Um, you know, it's as if aliens landed on earth and all of a sudden we have to learn how to communicate them. Let's just communicate with them.

Let's say that they're not here to kill us, but they're just, you know, another species that's landed from another planet. And now we have to figure out what to do about it. Um, you better figure out how to interact with those new creatures, right? And so AI is kind of like that in a weird way, right? It's, it's strange.

It's kind of scary. It's a little bit exciting. It's a lot exciting for some people, but the only solution to this, I think really. Is to [00:38:00] invest in yourself and to start learning this stuff now, just a little bit every day,

Mallory Mejias: Yep. And I'm thinking too, going back to the earlier part of our conversation, if agents become more mainstream soon, And you're not on the train and you already missed kind of the LLM train. It seems like you'll just be that much more behind. However, the people who are listening to this podcast, I would say you're probably a little bit more ahead than everybody else.

Amith Nagarajan: you know, if anyone's listening has personally had the experience of not being particularly savvy with computers and having to get up to speed with computers, um, or, or has experienced with a relative or a colleague who's kind of gone through that, um, I think it's a similar thing where initially it's like the computer is this Box that like you have to interact with, and it's, you know, just not sure where to start.

And then over time you learn one program, you learn the next program and things carry over. So I think that it's a similar experience, perhaps with AI. The good news is with AI, you'll be able to ask it to help you to learn it, right? So Just like you do with chat GPT, asking it for scenarios to be concerned with, with respect to [00:39:00] agents.

So this is the first time in tech where we've had tech that can help you learn it, right? As opposed to humans having to help you learn the new tech,

Mallory Mejias: We want to wrap up our discussion today with a talk about the competitive landscape and the future vision. So OpenAI is stepping into a fiercely competitive arena with its development of these AI agents. And while it's not news that OpenAI is in direct competition with Google and Meta as just a few examples, it's also competing with emerging companies like AdeptAI.

Adept, co founded by David Luan, a former OpenAI engineer, is creating computer using agents for enterprises. I went to their website and their tagline is, we're building a machine learning model that can interact with everything on your computer. A wave of AI experts from Google, DeepMind, and Meta left to form AdeptAI, so I would say they'll be a company to keep an eye on within the next few months for sure, and maybe well into the future.

I also, as a note, signed up for the waitlist for a demo of their product, so I will report back [00:40:00] when I get that. It's impossible to say who will get there first, but we can say certainly that this evolution toward more autonomous AI tools will lead to a redefinition of job roles, workflow processes, and productivity strategies, and it will absolutely alter the digital workspace as we know it.

Amith, we have seen fierce competition around LLMs, we talked about that earlier in this episode. Do you think this next wave in the age of AI will be revolving around agents? Do you think we'll still see the LLM competition going on in parallel? What are your thoughts?

Amith Nagarajan: it's going to be all of those things. So the agent competition is going to be extremely intense because it's like a form of applied models that actually can take action on your behalf. And that's super interesting. The economic value, if you think of nothing else, just think about the economic value. Of the kinds of tools we've been discussing on this episode is just it's hard to fathom, right?

How much economic value agents could create both purely in digital form, but also in physical form. [00:41:00] Because if you then wrap an agent, the software with hardware in the form of some kind of robotics, you can then take actions in the physical world. And we're not far off from that in a much broader sense.

Right now, robots are super expensive and they're highly specialized. Soon they're gonna be more general purpose and a lot more available to businesses and consumers. And so all of these things are happening together. When I give my talk about exponential growth and exponential change, I talk about how the multiple exponential curves that are going on essentially compound each other in a way.

So AI is one exponential curve. Raw compute continues to grow at an incredible pace in multiple different forms. Um, and of course, agents are yet another curve in a sense. I mean, it's obviously a branch of the AI curve, but it's Agents are going to drive LLMs to do new things and new LLM capabilities will enable features in agents that go far beyond what they can currently do. So it's definitely not one or the other. The competition, as you pointed out, will be fierce. I think one [00:42:00] thing to think about here is the same framework for analysis I would provide for language models or tools around language models, which is to think about what makes sense over a period of time as things settle down.

So let's say adapt. Or someone else has an unbelievable agent tool and you're thinking about using it for your company and you say, well, this is an amazing tool. Ask yourself a few, few questions. First, ask yourself, does it make sense for the major players to have this tool in their suite? The answer is not always yes.

Sometimes there's scenarios where specialized tools do make sense. to run on their own. Um, but sometimes it doesn't, right? So if you think about like my favorite AI tool to beat up on as something that's a feature and not a product are meeting note takers for zoom or Microsoft teams or slack because these are tools.

These are obviously features that are gonna get embedded and have now been embedded in teams and in slack and in zoom. And so the major platforms, what do they have? Obviously have economic You [00:43:00] know, strength. They have a lot of financial resources, but the most important thing they have is distribution power. So if you're Microsoft, Google, Amazon, Metta, et cetera, you have hundreds of millions, if not billions of people using your products. And so to tack on a feature that does agent type work, uh, is a very natural extension. And so, of course, they're gonna do that. Both to play offense to go out there and capture more users and capture more time on device or in their in their platform, or to think about it from a defensive perspective, they'll say, Well, we can't let this other company have, you know, this area because that's gonna undermine the importance of our platform.

It's the usual thing that plays out with every major new technology. Of course, everything's happening and fast forward right now. So I think we're going to see LLMs continue to radically advance this year, next year and beyond. And then agents are going to get smarter as not just as a byproduct of that, but they're their own independent form of AI research as well. So, you know, for my viewpoint, very important thing to be watching. Um, the, the takeaway I'd, I'd share here [00:44:00] on that whole thing with. Associations is to say, um, don't jump to go implement like a proprietary agent framework from a company like an adept or anyone else. Um, just experiment with it first and then do some analysis to think about whether or not it's likely to be a standalone company. Or if it's likely to just get woven into an existing tool or framework, you already have access to.

Mallory Mejias: And you are saying in your opinion, at least you think things like AI agents will be woven into Google Duet or, or Microsoft copilot.

Amith Nagarajan: I am confident that that will be true, but also those types of tools will be very general purpose. So there will also be room in the world for specialized agents. And that's where you might see companies coming out. With tools or agents that do particular things really well. Like for example, in the domain that you're in, if you're in healthcare or if you're in financial services, uh, you're going to see tools that are hyper-specific to those verticals. Uh, obviously for ourselves and the world of associations and nonprofits, you know, our [00:45:00] software companies. Are there because they provide a very specialized set of solutions that are tuned for the verticals that were in. So you will see verticalized solutions that have a tremendous amount of staying power because they've addressed the last mile problem.

What I mean by that is that top level of functionality that makes the solution work really well in a particular industry. So there's definitely room for a lot of people to play. I don't think it's just gonna become like a web browser where there's only a handful of them out there or an operating system. I don't think it's that play necessarily. Okay. But I do think that the general purpose type functionality will definitely be in the hands of the , major players. And the major players that don't have an agent type product in the mix will start acquiring people like Adept.

Mallory Mejias: So wrapping back up to what you said a little bit earlier, the competition for LLMs and for agents, none. None of that's going away. Are there any other areas within the world of AI that we need to keep an eye on? If we think L LMS is one, agents is another, any other third, fourth areas?

Amith Nagarajan: Yeah, I think that the thing to watch for is smaller [00:46:00] models exploding, where they'll have capabilities that you think of as GPT 4. Class problems, meaning the at the moment, the current state of the art models, um, pretty soon you'll see that level of functionality available on your phone. Um, and so when you have that local model, then it'll open up a whole bunch of other applications independent of the agent conversation. And I think some people will be caught by surprise because how quickly that will happen. So this year we're expecting to see Llama 3 from Meta, which is an open source model that probably will have a version of Llama 3 that's small enough to run on device, maybe not on a phone, but certainly on a on a Mac or a PC. Um, and you'll see more and more of that for more and more companies because the state of the art keeps advancing. And even if what we have today never advances further. It'll become more accessible. So I don't, by the way, I don't believe that it's not going to advance. I think there's tremendous advances in store for us right now, in part because of [00:47:00] competition and the amount of capital being thrown at this opportunity, I think is unprecedented, even compared to recent technology waves that have been tremendous.

So, um, I don't know. I think ultimately what people have to do is just stay tuned in and just pay attention to this stuff because stuff's going to come out of seemingly left field that when you hear about it, it's like, Oh, that totally makes sense. Yes. Like a few episodes ago, we talked about Google's DeepMind Gnome project in the material science world, um, projects like that, that are again, super specialized, um, they will keep exploding and that's going to have a lot of, uh, really interesting consequences in a lot of different fields. And the last thing I'll say about competition is, you know, the other news floating around this week is that, uh, OpenAI CEO, Sam Altman. Has been out looking at raising a significant amount of money for other ventures around AI specifically on the hardware side, but we're talking about trillions of dollars that he's looking to raise, not billions.

So, you know, back in the nineties and two thousands, you know, people were raising millions or tens of millions, maybe occasionally hundreds of millions for technology [00:48:00] ventures. And then now with AI and how hungry they are for compute and energy talking about raising billions of dollars. And Sam Altman, of course, is raising the bar saying, Hey, we have to do more with compute.

We have to do more, um, at this the scale. So there's, you know, word out there that he's looking to raise between five and 7 trillion, um, in order to, you know, build new foundries for chip fabrication. energy supply, all the stuff that you need to really have orders of magnitude, greater compute resources. So, and people are not only listening, he's got people lining up to invest in that type of opportunity. And the reason if you think about it is, you know, we have a hundred plus trillion dollar global annual GDP. So you say, Oh, five to 7 trillion sounds like a big number. Of course it is. Uh, but on a relative basis compared to the economic impact.

AGI Can have or artificial general intelligence could have. It's nothing s O. You know, you think about it from that scale, and it's just it's kind of mind blowing. So my bottom line is, yes, you will see a ton of competition around all of these [00:49:00] areas, and you ain't seen nothing yet. You know what's what's happened so far. Isn't even the first inning. It's kind of like a pre game warm up.

Mallory Mejias: In terms of the future vision, is it safe to say that these companies know exactly how to build these AI agents and it's more of an issue of resources or compute? Or do you think we're still dealing with the fact that we don't know exactly how to create these products, but we will soon?

Amith Nagarajan: There are definitely scientific challenges that need to be solved, particularly around reasoning and the language models. Uh, reasoning is something the language models kind of simulate right now, but don't actually do. That's starting to change. In fact, Gemini Ultra has, uh, apparently some significantly enhanced Reasoning capabilities compared to GPT four.

But still, there's some scientific breakthroughs that are needed to really go to the next level of AGI type capabilities, scaling what we have now. If you just throw more compute more data at the current models we have will definitely take us further. But there probably will at some point be a limit to that, both in terms of what the models can do, but also just as a [00:50:00] practical matter.

How much energy and how much compute do you need to throw out these things? So There will be both, I think, scientific breakthroughs that will, you know, change our perspective on all of this and engineering opportunities to say, let's reassemble pieces and parts of what we have and use those building blocks to create new applications that are very exciting.

Mallory Mejias: All right. My last, last question, assuming maybe in a few months, maybe in a few years, we get these AI agents, they roll out, they work really well. They don't fall victim to phishing attacks and we have no major problems with them. What do you see Amith as the biggest opportunity for our listeners on this podcast?

Amith Nagarajan: Well, the big one that's fairly low hanging fruit is just productivity. Just think about the number of things you're doing manually that you could offload to an agent and then think about what you could do at that time. So that's the simplest one because you don't have to really spend a lot of time thinking deeply about business processes or about like changing your business model, which I think are big opportunities. But just thinking about what you currently do, think about what you do repetitively, either you or others [00:51:00] in your team, um, you know, think about the significance of that, right. In terms of just automating significant portions of those repetitive tasks, that's where agents can step in and make a big difference pretty quickly.

Mallory Mejias: Well, thank you so much Amith for your time today. Um, and happy Mardi Gras to all those who celebrate.

Amith Nagarajan: Add to Mardi Gras.

Mallory Mejias: There are tons of great AI resources out there for you, many of which are free. You can listen to this podcast, of course, but there are tons of other great AI podcasts out there, free AI courses. And of course, we want to remind you that we have the sidecar AI Learning Hub as well. As a reminder, that is flexible on demand lessons that we regularly update.

You get access to weekly office hours with live experts, and you also get access to a full. flourishing community of fellow AI enthusiasts within the association and non profit space. And on that note, you can sign up as an individual, but you can also sign up your whole team, all the staff at your organization for one flat rate based on your organization's revenue.

We have a [00:52:00] few teams that have done that thus far, and we've gotten some great feedback. So if you're interested in learning more about that opportunity, https://sidecarglobal.com/ai-learning-hub. Thanks for your time.

Amith Nagarajan: Thanks for tuning into Sidecar Sync this week. Looking to dive deeper? Download your free copy of our new book, Ascend, Unlocking the Power of AI for Associations, at ascendbook. org. It's packed with insights to power your association's journey with AI. And remember, Sidecar is here with more resources, from webinars to boot camps, to help you stay ahead in the association world.

We'll catch you in the next episode. Until then, keep learning, keep growing, and keep disrupting.