Generating Instant Avatars with HeyGen & Anthropic's New 'Computer Use Feature [Sidecar Sync Episode 53]

Written by Henry McDavid | Oct 24, 2024 3:17:59 PM

Timestamps:

00:00 - Introduction
02:07 - digitalNow Preview
06:38 - HeyGen’s Instant Avatars
10:20 - Real-Time AI Translation and Dubbing Tools
18:21 - Applications of AI in Learning and Onboarding
21:03 - Claude’s New Computer Use Feature Explained
28:52 - The Future of AI-Powered Assistants

Summary:

In Episode 53 of the Sidecar Sync Podcast, hosts Amith Nagarajan and Mallory Mejias explore AI innovations for associations, focusing on HeyGen’s AI-driven video avatars and Anthropic’s new "computer use" feature for Claude. They discuss the potential of HeyGen for business applications despite current limitations while highlighting how Claude’s ability to interact with computer interfaces could streamline tasks like quality assurance and enhance member engagement. The episode underscores AI's growing role in reshaping association operations and workflows.

Let us know what you think about the podcast! Drop your questions or comments in the Sidecar community.

This episode is brought to you by digitalNow 2024, the most forward-thinking conference for top association leaders, bringing Silicon Valley and executive-level content to the association space.

Follow Sidecar on LinkedIn

🛠 AI Tools and Resources Mentioned in This Episode:

HeyGen ➡ https://www.heygen.com

Claude ➡ https://www.anthropic.com/index/claude

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey. Follow Amith on LinkedIn.

Mallory Mejias is the Director of Content and Learning at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space. Follow Mallory on LinkedIn.

Read the Transcript

Amith Nagarajan: [00:00:00] Hey everybody, and welcome to the Sidecar Sync, your source of all things AI for the association community. We're excited to be here again. My name is Amit Nagarajan

Mallory Mejias: And my name is Mallory Mejias.

Amith Nagarajan: and we are your hosts. We can't wait to get into our topics for today, which as usual are interesting things happening in the world of AI, but how they apply specifically to the world of associations and non profits.

Before we dive in, let's take a moment to hear a quick word from our sponsor.

Mallory Mejias: As you all heard from our sponsor, Digital Now is upon us next week. October 27th is the official kickoff date. Amit, how are you feeling about that?

Amith Nagarajan: I'm feeling fantastic. It's so exciting. We've got, you know, record breaking registrations. We've got an awesome venue, uh, amazing speakers lined up. It's going to be a lot of fun.

Mallory Mejias: It's been so neat to see it all come together. I'm sure a lot of our listeners can relate. Events can be very time intensive to plan at the [00:01:00] beginning. You don't see how it's all going to come together in the end, but being a few days out, I'm really excited to see our audience, to see all the relationships we've built throughout the last few years and to celebrate the biggest.

Digital now we've ever had.

Amith Nagarajan: It's going to be awesome. Yeah. And when you run an event for association people and knowing that association people are professionals in event planning, It, uh, raises the bar a little bit in terms of expectations, but, uh, it's, it's also awesome because people who attend the event tend to know, uh, how challenging it can be to pull together an event, even at the scale of something like digital now, compared to many of the associations who listen into this podcast or watch us on YouTube have events that are 10 or 50 times the size of digital now, but.

Um, you really want to have every detail nailed down and, uh, AI can help with some of that, but at the end of the day, it's, uh, people helping people at an in person conference, which, you know, just as an aside, I, I'm really bullish on in person conferences for associations. I think it's one of the things associations do an [00:02:00] incredible job with.

And, uh, no matter what happens with technology, like biologically, we're wired to want to help. Community and connection and having that in person, you know, is, is something that's kind of roared back to life after COVID. And I think it's going to continue to be a strong suit. That doesn't mean that associations don't have to adapt their conferences to take into account a augmentation in various ways, of course they should.

And they will, but, uh, I'm really pumped about it. I think conference season, um, is just a fantastic time of the year.

Mallory Mejias: I agree. And I know this week, Amit, you're doing a lot of thinking and reflection about your own keynote. Are there any details or sneak peeks you'd want to share with our listeners?

Amith Nagarajan: Well, I have to figure out what I'm talking about first. No, I'm

Mallory Mejias: You know,

Amith Nagarajan: know, it's, it's, uh, you know, as you know, because you're helping me put it together, it's like, I don't like to prepare a keynote too far in advance of when I'm going to be on stage somewhere, because it just ultimately things are changing so fast.

And some of the things we're going to be talking about next week during my keynote to [00:03:00] open up the event, um, are things that have happened in the last like week or two. So. Um, you know, you can definitely count on it being focused on the latest and greatest in AI generally. But what I plan to do is talk specifically about agents.

So most of my keynote is about how exponential change is driving the rise of agents and how agents can dramatically affect customer service, um, both from the perspective of the association in terms of the workload, but also from the viewpoint of the member, um, receiving a far superior experience to what they've ever, you know, experienced.

So that's part of what I'm talking about. And, um, hopefully I'll be setting the stage for, uh, keynotes and other speakers to follow me that will be able to weave in, you know, other specialized topics, but I think that the world of agents is an area that we've just got to pay more attention to. Um, so much focus goes towards, Hey, there's a new model released and this model is really cool.

Like. Yesterday, Claude 3. 5 sonnet update came out, including computer use. And that's really [00:04:00] cool, but, um, that's just a model. And so I say just a model it's incredibly powerful and very exciting, but with agents, you can build all this other software around these models and do things today, even with last year's models that are just stunning that people don't realize.

So we're trying to cast a bigger spotlight on agents and that's what my keynote aims to do. So if you have not yet registered for digital, now you're a slacker. But you can still come so sign up today

Mallory Mejias: You are a slacker, but you're not the only one we've seen. As we've mentioned, a lot of registrations come in in the past few weeks. We still have some room. I mean, we're not capping digital now this year. So if that sounds of interest to you, please join us at the Omni Shoreham hotel, October 27th. And as a reminder, we will be recording the keynote sessions as well and putting those into our AI learning hub.

If you can't make it or are joining from a far away place. Today we are talking about two topics. First and foremost, Haygen's instant avatars. I think that'll be a fun [00:05:00] conversation. And then we're also talking about exactly what Amit just mentioned, Claude's computer use feature that they just rolled out.

So Haygen instant avatars to remind you, Haygen is an AI powered video creation platform that specializes in generating realistic, instant Video avatars. If you have joined us for our intro to AI webinar, you'll know that I typically show a video from Hagen that actually Amith created translating his video and his voice into different languages.

They're interactive avatars create AI powered digital versions of individuals that are capable of joining zoom meetings and other live interactions, mimicking the user's appearance. And decision making style. More on that later. The avatars are designed to look and sound like the user with the ability to think and respond in a way that reflects the user's personality and preferences.

Users can input specific information like company data or brand guidelines to ensure the avatar accurately represents them or their [00:06:00] organization. These avatars can be used for various applications like customer support, online coaching, sales calls, job interviews, language learning, and even therapy sessions.

This comes from Hagen, just so you know. The avatars integrate with Google Calendar for scheduling and can connect to large language models via API for enhanced interaction capabilities. So, I'll let you all in on a secret, our plan for today's episode was actually to have one of these interactive avatars join us on this zoom call.

I tested it out, uh, and I'll let you all know we were not impressed enough to actually include that on the pod. So I didn't create an avatar of myself or of Meath, I thought that would be a little too weird. I used one of their avatar templates and I trained it up. on the Sidecar website, and I provided it a prompt that I came up with using Claude, actually, that said, you're a seasoned technology strategist specializing in AI implementation for professional and trade associations, so on and so forth, with [00:07:00] different areas of expertise.

I then had the avatar join a Zoom call with me and then recorded and shared that with Amith, but overall, not so bad. super impressed. I was able to interact with the avatar and say, you know, welcome to the sidecar sync podcast. And it said, Oh, I'm glad to be on this podcast. And then it was able to talk about AI for associations, able to discuss some of the sidecar offerings around AI education.

I can surely say this is not going to replace me or any of you at this moment. On a zoom call on a customer service call on a sales call or a coaching call That's not to say that won't be the case in a few Months, maybe not weeks, but maybe a few months. So amith i'm curious What do you have to say about hagen's interactive avatars?

Amith Nagarajan: Well, uh, my point of view is that the, I mean, what you shared the other day is that little video of you having a chat with the avatar through zoom was really impressive on the one hand, because it was [00:08:00] near real time video generation, responding to you. Um, I thought the quality of the, uh, content in terms of what it was saying was reasonable, um, they're obviously using some strong LLM underneath the hood to do that.

Um, I thought the quality of the synthesized audio is pretty poor and it also is not really optimized for giving shorter and sweeter responses as opposed to more of like a text based response where you ask a question like, Hey, what's the best way for my association to start using AI? And the avatar, um, you know, just kind of goes on and on and on.

So I think that, um, that part of it was less conversational and more speechy. Um, and the quality of the audio wasn't as great as like what I've seen from 11 labs or a chat GPT is advanced voice mode, which is also available through their real time API. Now that being said, Hey, Jen's video is head and shoulders above everybody else's in terms of real time avatar, like stuff, it's not trying to compete with video models, like runway.

Or [00:09:00] things like that, or Sora, it's not in that realm, even though it's video. Um, it's more about like, you know, business use cases where we're saying, Hey, we're going to create, uh, an avatar or an avatar based on a video we provide. Right. Um, so it's really, really good at that. The fact that it's real time now versus previously, if I wanted to get my AI avatar to say something, I'd give it a text script.

And then I'd wait somewhere between 10 to 30 minutes. And then Hey, Jen would say, Hey, it's ready. And then you'd download it and you'd view it. And it was usually fine. A little bit glitchy here and there, but it was usually fine. Now a year later, you know, you can do near real time conversations that what it really tells you is the pace of progress is crazy.

Um, to your earlier point, Mallory. I anticipate in the next three to six months that the quality of all the stuff that I just criticized will be radically improved. You know, when you have something like Eleven Labs, for example, I'm a big fan. I use their tools all the time when I want to either do text to speech generation or speech to speech, uh, dubbing.

[00:10:00] Where, for example, I record something, but for whatever reason, I don't want my voice to be the voice in a voiceover. Um, on a presentation or a video or something like that. Well, I can take just the audio clip, 11 labs and get an awesome dubbing in a, in a synthesized voice. Um, and the same thing with text to speech.

It's really, really good. The image, by the way, doing audio to audio is the timing, the intonation, all that stuff is coming from the information in your voice, and then it's just translated into somebody else's voice, essentially, or an AI's voice. Um, whereas Text to speech. You can't really give it that much information to generate the right timing and the right sequence and the right intonations and all that.

But in any event, um, the point about it is 11 labs is a super specialized audio platform and it's awesome. It's, it's way better than Hey, Jen's audio. Um, but. It's kind of like saying, listen, you know, do we have specialization so much so that we need to use these different things and stitch them together?

I suspect that because everything is improving so fast, HeyGen, being a [00:11:00] video platform, will get very good audio. It's kind of like saying this, like, You know, if you were to say, Oh, okay. Well, Boeing has an airplane that's a jetliner that cruises at 550 miles per hour, but Airbus has a hypersonic plane, um, that can get you from Paris to New York city in two hours.

Um, you know, there's a market difference there between the two. jetliner seem like it's super antiquated, even though it's You know, at the moment, state of the art. Right. Uh, the point would be that, um, you know, similarly the audio in Haygen's product seems clunky. Um, compared to 11 labs, but you know, if all the airliners are kind of cruising at Mach five or whatever, then it's going to be very different.

And like the incremental difference is so small, you wouldn't notice. So my point is, is, Hey, Jen's audio is going to get a hell of a lot better really fast. Um, I would say that for now, I wouldn't probably invite an, a Haygen avatar into my conversations people, but that's exactly where this stuff is going to go.

Within [00:12:00] the next few years, certainly you're going to have all sorts of AI to avatars joining meetings. You might even have an avatar of yourself joining meeting on your behalf, which is maybe scary, but, um, that's all going to be happening.

Mallory Mejias: Certainly exciting stuff on the Hagen front in terms of creating the avatar. It was a bit like creating a custom GPT. You could provide links. So I provided a link to the sidecar website, of course, and then kind of a longer prompt to provide information on the persona that the avatar was taking, but there weren't a ton of controls outside of that.

And something I thought was interesting. There was a little disclaimer as I was training it up that said, If you assign your avatar a name, so let's say Susan, for example, every time your avatar responds, it might say, my name is Susan, and here's everything about AI for Association. So it just seemed a bit clunky all the way around.

Our avatar was named Wayne. I did not change the name. So we didn't encounter that issue. But across the board, I think the only thing that was surprising to me. And I don't [00:13:00] know what your thoughts are on this, Amit, where just how all of the marketing around it was so intense, like you're gonna send this avatar to a Zoom meeting on your behalf, it's gonna make decisions, like, in your decision making style.

I think we're a ways off from that.

Amith Nagarajan: Yeah, I agree. We're, you know, two or three kind of leaps, um, away from that, which, you know, might be two or three kind of AI cycles. So maybe 18 months, 24 months, something like that. And then there's, even if the technology was there, if you just kind of, Like, let's just say that we presuppose that in 24 months time, um, we have an AI avatar version that is as good as you could possibly want, right.

In terms of quality and, um, just accuracy of answers, all that stuff. The question then is, do you want or not? Do you actually want that? Do you want to use it as an additional participant in the meeting? Do you want it to attend a meeting on your behalf if you can't make it? Um, I don't know. Um, I think those are really interesting questions to ask.

Mallory Mejias: [00:14:00] Well, that was actually my next question for you, Amit, was in this ideal world, latency's reduced, quality of the avatar shoots up. Do you write in this moment? Do you see that as a path that business leaders will go on? It seems kind of hard to grasp that I would send an avatar to a meeting in my place.

Amith Nagarajan: I could see myself having an avatar for a, like an AI assistant where there's some kind of visual representation of my AI assistant that knows me really well, but it's clearly a different entity than me. I don't, I don't personally. I think I'll ever feel comfortable with the idea of a version of me interacting with people.

Maybe if it's known to be like, Hey, this avatar can answer questions or whatever, but it's, it's clearly just an AI that's trained on content. But, you know, people very quickly get over, lose, lose track of the fact that they're dealing with AI. Um, and you know, if they see you and they hear you. They're going to quickly think it's you.

And so I'm not comfortable with that. No matter how good the AI is, I'd rather be in the meeting. But, um, but I do think that what's going to happen is [00:15:00] AI avatars of your assistants, um, you'll be able to kind of shape and mold them to whatever you want. They could be some. Alien looking creature or they could be a person or an animation or whatever and you could include those in your meetings And one of the things that may have to happen is if you want to sell me something Instead of getting a hold of me You'll always have to go through my gatekeeper Which is my ai assistant that knows me really well and the goal of that assistant isn't to block All potential opportunities, but to really intelligently filter things out and knows me well enough to know what I actually want to take a look at

Mallory Mejias: Hmm. I, I agree with that. I think if I ever. We we're to meet with Amme avatar at this point in time. I wouldn't take anything it said seriously because I'd be like, well, what does Amme really think about what Amme avatar has to say? So I agree with your, perhaps an interesting, uh, use case in the future would be some sort of, uh, like a, an avatar that takes the counterpoint to whatever you're talking about in your meetings, just to help you kind of brainstorm and flesh out [00:16:00] ideas that could be interesting.

Amith Nagarajan: totally. And then an avatar for like a really great facilitator who can ensure that everyone in the conversation has had an active opportunity to participate. Um, using things like personality profiles, for example, to employ techniques like saying, Hey, the person that has, you know, usually it's the, the people that have kind of the, the, you know, high extroverted kind of, uh, character traits that tend to be the ones that talk the most in a meeting.

That doesn't mean they have the best ideas, but a lot of times they just consume all the oxygen in the room just as soon as they walk in. And, uh, so potentially there's help on the way for that. If you have an AI. Avatar that's job is to facilitate the meeting and try to pull ideas out of everyone that could be super powerful and valuable to just to remind everyone that there's more people in the room that may, you know, jump into the speaking ring right away.

Mallory Mejias: absolutely.

Amith Nagarajan: it's an interesting technology also, um, in the world of associations in the context of learning. So you think about like sidecars AI learning hub and think about all the different courses that are there. And then think about [00:17:00] like if you had an avatar available throughout that experience that you could just have a voice video to video voice to voice conversation with about the content about any topic and it was an avatar of the instructor for that particular course, um, that could be very interesting.

Um, now it may be better to choose an avatar that's not the instructor, but some animated cartoony thing that's like, Hey, I'm the, AI assistant to this instructor. And I'm here, I know all the material I'm here to help you in that way. It's clearly not that person. That's probably more comfortable, but still very useful.

So, you know, I think that's a way of adding a real time element to asynchronous learning that could be very powerful. Um, it could also be something where it's like, Hey, I'm having a hard time with this particular concept. I don't really understand how vectors work. Can you walk me through it? And then that avatar.

Is in the context of your learning experience, knows where you're at in your learning journey based on your LMS data, knows all the curriculum in the LMS and can give you really guided help. Um, and we talked about similar ideas in the past, in the [00:18:00] world of AI education, uh, or AI for education, I should say, um, where, you know, you think about like what the Khan Academy has done with Conmigo, um, it's actually somewhat similar to that, it's just a text thing versus video to video.

Mallory Mejias: that would be an excellent use case for us to roll out an avatar for each course. I'm going to add that to the docket. Well, for

Amith Nagarajan: bet, I bet most LMS is. That are investing in R and D anyway, and that, which is not every LMS, but most LMSs that are looking ahead are going to add features like this.

Mallory Mejias: For all of our listeners and viewers, I would say at this point, unless you just want to try it out for fun, Haygen interactive avatars is probably not something you're going to be using in your meetings regularly, but certainly something to keep an eye on in the next few months.

Amith Nagarajan: Yeah, I agree with that. I would just say like, I do think checking out HeyGen and there's a few other tools like it, but HeyGen is the one that seems to be the market leader for video translation is definitely a use case. I'd encourage people to go try. If you haven't yet translated a video of yourself speaking in whatever your native [00:19:00] language is into another language, Go try that.

Just do a 32nd video on your phone, upload it to Hagen, ask it, translate it into some other language and see what happens. It should blow your mind if you haven't gone through that yourself. Um, it just opens up your brain to the idea that these tools exist. The AI avatar feature is their newest feature, uh, at Hagen, and that one is what Mallory a is saying is perhaps not quite ready for prime time from our viewpoint.

But you know, if you go check it out and you have a different point of view, shoot us a note. We'd love to talk to you about it.

Mallory Mejias: Yep. We might bring you on a future pod episode, just a warning. All right. Next up, we're talking about Claude computer use. Anthropic introduced a new feature called computer use for its AI model Claude, which allows the AI to control a computer in a manner similar to human users. This feature is still in the public beta phase and is available through Anthropx API and it enables Claude to perform various tasks by interacting with computer interfaces such as moving a cursor, typing, and executing commands based on visual [00:20:00] inputs from the screen.

Claude can perceive the computer screen, Identify elements and interact with them by moving the cursor and clicking based on pixel positions, which allows the model to use everyday software and tools without requiring task specific programming. The AI can perform complex tasks autonomously, like filling out forms by gathering data from spreadsheets and CRM systems or navigating web browsers to complete coding tasks, despite its capabilities.

Claude's computer use feature is still experimental and can be error prone. And its ability to use computers is not yet on par with human proficiency. On evaluations like Oswald, it scores a 14. 9%, whereas typical human scores range from 70 to 75%, but it's nearly double the score of the next. Best AI model in the same category, which scored 7.

7%. So Amit, you shared this with me. I think it just dropped yesterday. So this is like hot off the press for our association listeners. Um, what do you [00:21:00] see as some near term like use cases for computer use?

Amith Nagarajan: Well, um, you're right that at the moment, these models aren't as good as a human in terms of their use of a computer. Um, but they are really good in certain narrower contexts and they're also good at learning over time as you tell them, well, no, not that I meant this. Um, one of the things I think that's going to be really powerful for this is, um, improving quality assurance.

So think about your associations website and the number of times people tell you that there's a broken link on it. Or that a particular feature doesn't work. Um, you know, you do QA typically when you have a new system that you're putting in, like a new e commerce system or a new LMS, you'll do a bunch of QA on it.

And then, you know, hopefully, hopefully you're doing a bunch of QA on it before you go live with it. Um, And then once you go live, that kind of goes away and you just start incrementally changing it. And so then you add this button and that button and this image and that image and new pages and all this stuff is happening.

And over time, there's, it's like a death by a thousand paper cuts scenario. And then your website no longer works. Well, it's [00:22:00] slow. So there have been automated quality assurance tools for years and there's companies and the whole industry around this, but those tools have been the domain of QA engineers.

These are very technical tools. They're super expensive, like a good. Uh, QA engineering kind of tool set can cost into the many tens of thousands of dollars per person, but they've, they have been able to create automation scripts, and this is for years now that are very, very valuable. And that's why companies, um, pay for those, but with this kind of advancement in AI, both the capabilities are better, but it's also available to anyone.

So you could Mallory go to Claude's new, uh, computer use and say, Hey, this is our sidecar AI learning hub, LMS site. I want you to log in and I want you to test all the courses. I want you to go through and, you know, make sure that the course links work, that the videos size correctly. I want to, I want you to do this on desktop, and I also want you to simulate mobile use using this tool and you can give cloud the tools.

Now, if you do that right now, [00:23:00] I suspect the current version of cloud would probably not be able to do the whole thing, um, but very close. And it's going to keep improving at this pace where, you know, you could put in place that type of an assistant that just. Every day is running that QA on a continuous basis and sending you feedback saying hey Mallory There's this broken link on this on this on this page, you know So there's stuff like that.

The broader way to think about this is for a long time We have had to adapt to the interface of the web So think about going way back in time, um, and say, well, originally mainframe type computers, you actually didn't even interact with the keyboard. You would load your program with punch cards. Um, and these were literally perforated, you know, thicker pieces of paper cards that, you know, that's how you would encode a program using special machines.

I never worked with punch cards. I'm not that old, but, um, My dad told me a lot about it. And so he actually did that in his college work and loaded the punch cards for his [00:24:00] PhD thesis and ran them on a, on a, on a computer. And, um, that is a great example of a, an incredibly inefficient way of communicating with a machine.

Right. Um, Um, and then from there we went to, you know, little green screens and keyboards. And then from there we got this thing called the mouse, which was a pointing device where we could tell the computer what part of the screen we want to interact with and graphical user interfaces blew up from there.

And then with mobile, we had, you know, touch capabilities, multi touch capabilities, and now we're getting into audio and video. So our interfaces with technology keep getting better. And that's what made chat GPT in 2022, such a revelatory experience for people, because for the first time. Um, for most people, they could interact with the computer in their choice of language.

Um, as opposed to having to use the computer's preferred way of interacting, whether that's punch cards, keyboards, or if you want to make it do something more complex, programming code. Now we're shifting to the point where the computers can understand us, and that's kind of cool. But the other thing that's interesting, and this comes [00:25:00] back to this topic, Is that as part of understanding us and gaining these new capabilities, the understand our world too, including, um, all of the devices we interact with and the computer screen is one of the great examples is if the computer can see the computer and then the.

This AI can control the computer. It has access to every piece of software that's ever been written. It has access to everything. Right. And it already has the world's knowledge. And so, um, it's very powerful idea that because not all softwares are going to be available in a technical way, like via API, where the AI can interact with it, but there's lots of software that is used across so many businesses, like imagine if your association had a legacy AMS and that AMS has been in place for God knows how long, but it, it.

Basically works. It might be really kludgy, but it works and you don't want to spend millions of dollars and two years or longer to replace it But what if you could use AI using computer use to automate? Processes right and then you [00:26:00] use the AI essentially as a way of extending the life of a legacy tool There's a lot of examples like that that come to mind the broader concept again, though is It's computers using our interfaces to connect with and extend use cases.

It's another reason why humanoid robots are such a big thing. The whole world, we've crafted it, you know, in a way that basically interacts with us, right? So we have two hands and two feet and legs and all this other stuff. So if we can make a humanoid robot, Robotics that are similar to us. They just plug right in to everything else, you know?

So that's the idea behind, um, this in my mind is it's just, it's an accelerant. It's basically opening up a whole bunch of new doors for the AI to walk through.

Mallory Mejias: Mm hmm. If you check out the demo video, which we will include some of those in the show notes, you can see that Claude is actually taking screenshots of your screen over and over again, and that's how it's figuring out kind of what to do next and what to click and how to fill out these forms, which seems a little bit clunky.

So to your point, Amit, Interfaces have gotten [00:27:00] better for us, um, throughout history. But then I'm wondering, do we now go backward with AI or not, or backwards for us in the sense that it doesn't seem like an AI needs a computer, right, to interact with all these things? You mentioned APIs being an option, but do you see, do you think that AI will eventually go away from using our systems and kind of create more of a direct path?

Amith Nagarajan: I think there's definitely a possibility. I mean, AI systems left to their own devices will actually create their own languages, you know, cause English is pretty inefficient, but I don't know if we want that or not. Um, I think there may be value in always forcing the AIs to go down a path that we can interpret and understand.

Um, Um, but I do think that, you know, for API use, for example, an API calling an API, for example, let's say I have a more modern AMS and I want to create a new member in the database. If I have an API to do that, that's a structured way of connecting to the AMS programmatically. I could connect that to Zapier.

I could connect that to a CDP. I can connect that to whatever. And AI can, of course, [00:28:00] connect to APIs as well. That's going to be. A higher resolution way of doing it, if you will, or a higher, uh, you know, probability of success kind of way of doing it. But there's a lot of things in the world that don't have, you know, those kinds of capabilities, or you have systems that may have APIs, but the API is very limited and the user interface can do a whole lot more.

Um, so there's a ton of, uh, really interesting business use cases around this. Um, I also think that, uh, independent of the computer doing the driving, think of it this way too. If you enable a tool like this to watch your screen and watch you work while you're perhaps talking to it continuously speech wise, right?

Forget about video, but just audio and I'm talking to Claude like I'm talking to you. I'm saying, Hey, Claude, listen, I'm about to work on this Excel spreadsheet. I'm going to do this and I'm stuck here. Like, what do you think's wrong? And then Claude says, Oh, well, that cell, it should be this. Uh, let me change it for you.

And it changes. So you're kind of like working in parallel. It's almost as if you imagine, Hey, I opened up a Google sheet. And you and I are on an audio call or a video [00:29:00] call. We can both see the Google sheet. We can both work on it together. That's how this tool, I think it's really, really interesting. Um, there are limitations right now, partly the models intelligence.

Part of it though, is the latency with the way it's doing what you described, taking this sequence of screenshots, which becomes kind of like a children's flip book of what's happening animation wise, it's very low res, but that's because there's some major limitations with vision models right now. All that's changing, you know, we're on this accelerating exponential curve.

And so vision models today, the way you can efficiently interact with them is to pass them smaller images that they can process, make sense of, and then drive kind of the next thing that the, that the model does, um, over a period of time, you're going to have a real time, high resolution video feed going into the model.

It's going to have. You know, complete access to that on a continuous basis. And we're not going to even think about it. Just like we're using zoom today and we have, you know, HD video in both directions. And we're not thinking about that as a remarkable thing, even though it really is. So we will soon have that as well, where the AI will be with us [00:30:00] with continuous video.

It's like 30 frames per second or higher, as opposed to like one frame every three seconds, which I think is what this thing's doing right now.

Mallory Mejias: And think about how powerful for onboarding new team members to just have the AI kind of follow you throughout your work for maybe a few weeks. You would have the ultimate onboarding avatar, even potentially, that you could use through Haygen.

Amith Nagarajan: Well, and now imagine a version of this type of technology that is your associations. Uh, AI avatar, um, that is perhaps the video part, perhaps not, but is capable of doing things like computer use, uh, in, in combination with knowing everything about your association. So let's say for example, you are a CPA society and your members are accountants.

What if your organization had an AI assistant that was capable, if the individual, uh, Authorized it to be able to kind of be part of their computer session. And your AI agent, of course, knew everything that the CPA society has training on and which is a [00:31:00] vast amount of content. And so that becomes a more powerful agent to potentially help.

And so I think it's almost like you think of the AIs as different collaborators. I might say, Hey, I'm going to invite this CPA society's AI agent. To hang out with me right now, because I'm working on, you know, this report and I want that level of expertise. I just don't want, I don't want the generic Claude hanging out with me.

Cause that's not quite at that level, or maybe it's highly specialized software used for taxes or something. I'm kind of making this up, but every domain has uniquenesses to it. And associations living in those veins where they have that deep domain expertise, but if they can. Essentially activate that expertise through new modalities and new channels like this, it gets more and more exciting.

So I see this being super relevant to associations in terms of internal use, where you can think about things that you do repetitively, or you should be doing more like the QA stuff I mentioned earlier, and use this type of technology, play with it to try to automate some of that, to save yourself time.

And then the other side of it is think about how this affects your members. And [00:32:00] think about ways you could potentially help those members use this technology. In some cases, it might just be training them on it, providing them, you know, an ability to learn this stuff through, you know, focus training that's relevant to them and their context.

Some cases it might be custom tools.

Mallory Mejias: think I'm in AI agent mode because of your keynote session, Amit, but I'm curious here. We have a model, FOD, that can take action through a feature called computer use. Uh, are we talking about an agent here or would you still call that something separate from an AI agent?

Amith Nagarajan: I think it's, it's kind of semantics at some point where. There are things that computer use a hundred percent is agentic in the sense that it has access to tools. It's able to interact with your computer. It's able to kind of semi autonomously do things with your computer. So, but it's packaged within the model.

Now is cloud 3. 5 sonnet capable of doing that as a raw LLM? No. And LLM, LLVM, you know, basically a model. It's there's software [00:33:00] engineering on top of that. That's allowed cloud to see your screen, take the screenshots, pass them in. So there's definitely an agentic layer. Um, even though it's being kind of packaged into the cloud offering.

Um, but the reason I say it's semantics and you're just kind of, I don't know that we ultimately are going to care too much is that these are just going to be called systems. They're AI systems. There are one or more models involved and kind of the agentic sauce that surrounds those models will be a lot of times the things that actually drive the real world value.

Um, But I think the main differentiation between the two is that models are passive and waiting for you to tell them that you want some specific thing and that agents can be active and take action on your behalf, either autonomously or semi autonomously. And that's the definitional, uh, break. I think that most people use.

That's certainly how I think about it. Um, but I think ultimately it's not going to matter. It's just going to be AI solutions or AI systems, and you're going to have bits and pieces coming together. It's just like, you know, think about a system you're more familiar with, like a CRM [00:34:00] software. Within the CRM software, you have databases, you have web servers, you have user interfaces, you have all these, there's network elements.

There's all these different components that you pull together. The same thing is true in this world too.

Mallory Mejias: That makes sense. So essentially, as we've talked about with Microsoft Copilot Wave 2, with Claude's computer use, we really have the ability to create agents or agent like systems at our fingertips. So I think really the big question here is, is what, what are people going to do with it? What are associations going to do with that?

But even our individual listeners.

Amith Nagarajan: Yeah. What I'd love to be able to do with, you know, with the earlier example is say, Hey, Claude, hang out with me. We're going to do a QA testing session on this website. And now we know, okay. The sidecar AI learning hub is awesome. We've tested it now. Hey, Claude, listen, what I want you to do now is to, uh, create a version of yourself that can rerun that exact set of tests.

As frequently as I'd like you to run it. And then says, Claude says, no problem. Claude probably [00:35:00] has to go write some code to do that, which Claude can do, right? All these models can do that well. And then test it and say, does that test replicate what our session had? And if so, then it's like, yeah, it's ready to go.

And so we'll say that's QA Claude. And so then I'll say to QA Claude, Hey, I want you to run QA Claude, um, every day. And send me a result, a a report result, and tell me if there's any issues that were found and classify them as high priority, medium priority, low priority. Uh, and then if you have these issues and you know what the problem is, uh, recommend what the remediation is.

Uh, and then the next step beyond that is to say, well, in certain categories where it's like saying, oh, there's a broken link. Well, if we so authorized Claude or mini Claude or QA Claude or what we want to call them, um, we can say, well, here's our admin access to this particular account. You can go in there and change that if you find a broken link.

And of course, every level of this, you have to trust the technology more and there's varying levels of comfort for that, but you can envision a thing where it's kind of like self healing, um, where it goes from QA to [00:36:00] actually making the repair. Um, so. Maybe you still get notification that there's something was corrected, but maybe you don't, you know, modern computer systems constantly have all sorts of issues.

And there are many self correcting mechanisms in places in databases, on networks, and the entire design of the internet is a continuous self healing, self correcting architecture. And so, um, that's how a lot of modern systems work. So I could see that being applied to association business processes.

Mallory Mejias: I don't know if there are all that many people out there who like QA. I know that I do not. It's kind of tedious, but I'm just thinking how exciting it is to live in a time where all these things that are pain points, tasks we don't enjoy doing tasks that are really tedious, um, AI is going to be able to do for us and thinking about all that time, it's going to free up for us to do the things we enjoy and the things that are more human.

Amith Nagarajan: Yeah, and another element of QA is get the AI to be like a simulated student to take your course end to end and actually consume the content, give you feedback, attempt to, attempt to answer the questions. You'd have to do a lot of work to make sure the AI [00:37:00] doesn't use its prior training, uh, and knowledge to answer questions in an exam, for example, but, or how, what, what do you cap it off to, hey, you're capped off at a 12th grade education or whatever.

There's all sorts of things you could, you could theorize around that, but there's just an enormous number of things you could do with this technology. And, and, uh, I think the feedback opportunity is tremendous.

Mallory Mejias: Thanks for tuning in to episode 53 of the Sidecar Sync Podcast. We will see you all next week. Post digital now.

View full post