Unlocking AI-Powered Insights from Unstructured Data [Sidecar Sync Episode 48]

Written by Emilia DiFabrizio | Sep 19, 2024 7:23:50 PM

Timestamps:

00:00 - Introduction
06:05 - Unstructured Data: An Overview
09:42 - The Power of Unstructured Data for Associations
11:46 - Pre-AI Approaches to Unstructured Data
14:30 - AI-Powered Analysis: A Game Changer
20:18 - Playground Demo: Structured Insights from Unstructured Data
24:21 - Real-World Applications for Associations
33:43 - Predictive Insights and AI’s Future
38:26 - Real-World Applications
42:37 - Exploring Advanced AI Models and Predictive Capabilities
51:07 - Tackling Key Pain Points
55:14 - Classical Machine Learning vs. Foundation Models
58:23 - Predicting Future Trends with AI

Summary:

In this special episode of Sidecar Sync, Amith and Mallory dive into the transformative power of AI in analyzing unstructured data, especially for associations. They explore how AI can help organizations unlock insights from sources like emails, community discussions, event recordings, and more, all of which have been traditionally challenging to analyze. Amith shares real-world examples of how unstructured data can reveal hidden trends and opportunities, offering valuable insights for future decision-making. Tune in to discover how your association can stay ahead of the curve with AI-driven data strategies.

Let us know what you think about the podcast! Drop your questions or comments in the Sidecar community.

This episode is brought to you by digitalNow 2024, the most forward-thinking conference for top association leaders, bringing Silicon Valley and executive-level content to the association space.

Follow Sidecar on LinkedIn

🛠 AI Tools and Resources Mentioned in This Episode:

OpenAI Playground ➡ https://platform.openai.com
Sidecar AI Learning Hub ➡ https://sidecarglobal.com/hub
Free Book: Ascend - Unlocking the Power of AI for Associations ➡ https://sidecarglobal.com/ai

⚙️ Other Resources from Sidecar:

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey. Follow Amith on LinkedIn.

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space. Follow Mallory on LinkedIn.

Read the Transcript

Amith Nagarajan: Welcome back to the Sidecar Sync. We have another episode for you lined up today and it's going to be awesome. It's a special episode all about unstructured data. My name is Amith Nagarajan.

Mallory Mejias: And my name is Mallory Mejias.

Amith Nagarajan: And we are your hosts. Now, before we get into this special episode, all about unstructured data, which is way more fun than it sounds.

We're going to take a moment to give you a few thoughts from our sponsor.

Mallory Mejias: Amith, how are you today?

Amith Nagarajan: I'm doing great. I'm excited for this week, both because of this episode, we're recording a little bit earlier this week than normal, but that's because a lot of us from across the blue Cypress, uh, family are heading up to Utah for our annual leadership summit, where we have a lot of our senior managers and, and, uh, coming together to just learn together and network.

And it's always a lot of fun.

Mallory Mejias: Absolutely. I will be attending as well. I, right now I'm still in Atlanta, but I'll be heading to Park City tomorrow. Wow. Tomorrow evening, and I'm so excited. This will be my third year attending the leadership summit. Amith, what inspired you to create and host the Leadership Summit all those years ago?

Amith Nagarajan: Well, it was the, uh, timeframe was 2021, four years ago, and we were looking for a way to get people together, um, after, you know, a long period of time where there wasn't much face to face contact because of COVID. So it was September, a timeframe of 21 and getting together in an offsite has always been something I've enjoyed doing with, um, leadership teams and people from all levels of our organizations, because it's just a time to kind of disconnect from the day to day and think a little bit more broadly and deeply and just get to know people better. And that leads to all sorts of amazing things for the business. It's healthy for everyone to get away from their day to day work. And I think there's tremendous power that comes from that.

I've done this for years and years and years across different companies for Blue Cypress. Um, it's just been a really, you know, formative event for a lot of people in their careers here. And I think it's, it's one of the pillars of our, of our calendar, you know, when we get people together. So we bring everyone from across the senior management teams from all of our companies, um, as well as the BCHQ team.

So it's, it's a ton of fun. And I always learn a lot just, you know, Hanging out with people, talking to them. We should bring in a speaker or two from outside to give us some additional perspective. Uh, so we've got some interesting stuff lined up this year as well. And, uh, it's just, it's just a great part of our rhythm.

Mallory Mejias: So I've had the opportunity to attend the past two years, but I was helping more on the planning side. So I will say I'm really excited this year to get to attend. No strings attached. Just sit back, relax, and enjoy. But one of the most interesting things I've found about the past two years is we put a lot of work into the structure of the event, into the sessions that we curate.

And to me, some of the most interesting conversations that are had are those that kind of fall outside of those structured sessions, or it might be one specific topic that we do like a Q and a on that goes on for 30, 45 minutes. So I always think it's really interesting to see just how excited people are troubleshoot things together.

Amith Nagarajan: It is fantastic. And of course, this year, like the last few years, uh, we will be talking a lot about AI. We'll be talking a lot about clients and, you know, the state of our community of associations and nonprofits and thinking about, you know, how we can better serve the market. So there's, there's always a lot of interesting things.

To discuss, but I agree with you. It's the pieces that are not part of the formal agenda tend to be the best parts. I mean, that's really what we do it for. Ultimately, you know, the formal agenda, you could, you could deliver a lot of that content virtually and save a lot of money and a lot of time, but it's the other stuff that you can't replicate.

So, especially in a world where we are mostly remote. You know, we have offices in various parts of the country where we have small clusterings of people and, you know, we don't mandate people go in a specific number of days or anything like that. I know a lot of organizations do that. We've kept it pretty much a hundred percent remote, but just have the facility available and that seems to work for us.

But this type of event is fantastic. You know, we also do smaller events throughout the year. For example, for our technical crews, we do hackathons at various times of the year, and that's a great way to get people together. Uh, and then we also use the rhythm of the event calendar for the association community to bring our teams together.

So we'll often have people come together before or after an event. Uh, that's not, you know, a Blue Cypress specific event. So we, we think that's a real important part of relationship development and Uh, for us, since we are a family of over a dozen different brands, um, a lot of times people don't even know what the other companies do.

So, you know, we, we learn an awful lot about what our colleagues are doing and how they're approaching different problems and opportunities they see and so on.

Mallory Mejias: Yeah, absolutely. You nailed it on the head. That's one of the, the onboarding processes about joining the Blue Cypress family of companies. Truly, it's just figuring out what the other companies do in the portfolio. That's a lot of companies to really learn their business models, learn who works there.

It's a challenge. All right. Well, today, as Amith mentioned, we're talking about AI powered analysis of unstructured data specifically for associations. And we're going to kind of divide that into three sections. First, we're going to talk about. Unstructured data, what is it to set the foundation? Then we're going to talk more about unlocking the potential of that unstructured data.

And then finally, we're going to talk about why this matters and what exactly you as a listener can do next with this information. So first and foremost, what is unstructured data? It refers to all the information that doesn't fit neatly into organized formats like databases or spreadsheets. This includes things like emails.

blog posts, social media content, audio, and video recordings. It's the kind of data that you are generating every day, but it's often overlooked because it's not easy to analyze. In fact, around 80 to 90 percent of all the data we deal with is unstructured. And for associations, specifically, this can be anything from member emails to community discussions, event recordings, and even the transcripts of your board meetings.

The challenge with unstructured data is that while it contains valuable insights, it's not organized in a way that allows for quick analysis or action. And until recently, most organizations haven't had the tools to follow through. fully tap into this resource. AI is starting to change how we approach this kind of data, offering new ways to analyze, organize, and extract meaningful information from it.

So Amith, you kind of joked at the beginning that this is going to be more fun than it sounds like. Um, I'm wondering why this has been a topic that you've been focused on recently. What has it something you've been thinking about for years? Is it something that's kind of been sparked with the AI landscape?

Which is it?

Amith Nagarajan: I've been thinking about this for a long, long, long time. Uh, you know, for, for me, a lot of it is that the amount of unstructured data has always eclipsed the structured data that we've had available. And the rate of growth in unstructured data is far faster than the rate of growth in structured data. So we're at 80, 90%, you know, if, if the trend lines hold, it'll probably be nearly all of our data.

Will be unstructured. Now, keep in mind, structured data is also growing, but structured data takes a lot of work. You know, you have to structure it. You have to take the time to think about what that structure is. And then you have to build systems of both technology and systems of process to get that structured data into a structured database or at a minimum, like a spreadsheet.

So that's why there's so much less of it. Whereas particularly with mobile devices and cameras and all the things we have, We generate so much unstructured data every day. It's, it's staggering. So I've been thinking about it for a long, long time. Um, because in the world of business application software, uh, the insights that you glean from the structure data are helpful, but they're very limited.

It's like looking through, you know, go to your front door and try to look through that little hole to see who's at your front door, that's the kind of vision you get of the world, um, when you're looking at Just structure data and unstructured data has the potential to give us so much more insight. Uh, You know, we've talked a lot about how there's information loss as you go from one modality to another.

So for example, video to audio, you lose some information because you don't have the video. Audio to text, you lose some information because all you have is the transcript. Well, if you go to structured data, you lose even more because, you know, you're getting into this super, super narrow space. Um, so it's been an opportunity area.

And in theory, you know, when you think about like all the potential insights that are in there, In images, in videos, and obviously in text, um, if we could just ad hoc ask any question we had in a, uh, consistent way across all of that unstructured data and get back the results immediately and be able to use it, it would be quite interesting.

And I know we're gonna talk about that a lot, but to me, that's the opportunity for this market because associations have such a massive amount of unstructured data at their disposal.

Mallory Mejias: hmm. Pre AI, what is the process that you've seen in this industry and the association market for organizations to take advantage of their unstructured data? Or do you not see that?

Amith Nagarajan: Well, pre AI, if there really was a pressing challenge or opportunity, you would put people against it. So you would say, Hey, Mallory, we need to evaluate this. Data we have this content we have and we're gonna have to get a team of people to read all the content And then give us a report, um, and you might do that on a consistent, you know continual basis where you say hey We're going to do this kind of analysis on every document we put in our journal.

We're going to extract certain information. Um, or we might want to look, for example, at the market that we serve. So let's say that we're in the insurance sector. We might want to keep a close eye on all the companies in the insurance sector. How healthy are those companies? Are they growing? Are they shrinking?

Are they profitable? Are they having financial difficulties? Uh, and that would traditionally require both proprietary surveys that you might run with your members. But you might also look at publicly available data on companies that are publicly traded in your sector. But you would manually look at things like earnings calls or, uh, quarterly filings, things like that.

But you would throw people at it, is the short version of the answer to your question. And of course, that is not a scalable solution, both in terms of cost, but also in terms of time. Because it takes people a long time to do things like this. And they probably have to be pretty well informed in order to participate in those kinds of tasks.

Mallory Mejias: Now, there's obviously lots, many types of unstructured data. We talked about a few just now, being emails, community discussions, recordings, transcripts. It's kind of a tough question, Amith, but when you think of associations broadly, is there any specific type of unstructured data that you think, Oh, if you could just do something with this, you would have the key.

Amith Nagarajan: I mean, 100%. I think that the content that associations produce that's proprietary to them, whether it be, um, academic journals or professional journals or, uh, even blog posts, but certainly proceedings from conferences, video recordings from events. All of that is unstructured data that Basically, all computers have been able to do is store it for you, store it and transmit it.

So, um, and even that has been a fairly recent phenomenon for a long time. You didn't even have a lot of that digital, that content digitized. So, um, so basically, you know, we're talking about a big opportunity because the association world is full of content like that. And much of that content is proprietary.

It's not available to the public. And usually it's pretty relevant and pretty good content for the field that the association is serving. So from my point of view, there's all sorts of potential in there, um, to build new products, to certainly make existing processes faster and more accurate, but to create new products and services that generate new streams of revenue for associations that are also hard to compete with, um, for organizations outside of your association.

Mallory Mejias: And then last question here, when you are speaking to organizations or leaders about unstructured data, are there any common misconceptions that you hear over and over that you want to right on this podcast?

Amith Nagarajan: Well, I think a lot of people just assume that unstructured data is still kind of in this domain of the unreachable, you know, computers have historically honestly been pretty dumb. Um, they've just gotten better at doing dumb things, meaning that you've had to tell them everything that they needed to do.

And then they do that one task over and over and over again. They just have been able to do it faster and cheaper. Um, now we have a fundamentally new capability, which is that the computers don't have to be taught every single little thing, and they're able to learn in many respects on their own. And that yields all of these unprecedented capabilities.

So a lot of people don't realize that AI can, in fact, help you. automatically extract insights from unstructured data of all modalities. Uh, and that's part of what we're obviously trying to address by, you know, sharing this type of content. Um, but to me, that's the big issue is that there's a little bit of disbelief out there that you can indeed do this accurately, cost effectively and at scale.

Mallory Mejias: And yeah, that's a perfect segue for the next portion of our podcast, where we're actually going to show you a video of this in action. So you can see just how easy it is. As Amith mentioned, historically, this has been a people job. You had to have people read and watch and listen to all of your pieces of content to extract relevant information.

And you can imagine that sounds incredibly time consuming. Like the process was probably incomplete and of course, subject to human error as well. So we know AI right now is making it possible to not only analyze unstructured data more quickly, but turn it into structured insights. It can categorize large volumes of content, summarize those key insights, and even assign specific values to qualitative data.

For associations, this means extracting valuable information from your unstructured data sources, which makes it easier for you to interpret and act on those insights. So to show you what we mean here, we want to play a demo video. So you can see this in action, essentially AI helping to pull structured insights out of unstructured data.

If you want the full effect, I recommend that you check out the video version of this podcast episode on YouTube. Otherwise we do have a narration of the demo as well for our audio only listeners. And we'll play that now.

Amith, can you talk about the tool that you used in this demo and why you selected that tool?

Amith Nagarajan: So we used, uh, something very simple, but often unknown to most users. Even those who are familiar with a variety of AI tools, we use something called the playground, which is an open AI tool that allows you to basically play around with models in a far more detailed way than the consumer chat GPT product allows you to do.

And in the video, if you see that there, we really zoomed in a very narrow part of what the playground can do, but you can choose the specific model that you want to work with. You can set something called a system prompt, which has a profound impact on the way AI models behave. And then you can provide a whole series of different messages so you can chain them together in interesting ways.

And that also affects the way the model works. There are other settings like temperature and, uh, things that we won't get into, but there are more controls basically. So, uh, instead of being, you know, a very simplistic, uh, just type in a prompt and hit enter, there's, you know, a handful of controls. And the reason we used it for this demo is because it's a great place to show.

How a system could actually scale, um, this kind of concept because you wouldn't do this one document at a time. The demo showed, you know, a couple of earnings call transcripts for one company that were manually loaded up into the playground and then questions were manually asked. The idea behind it is that anything that can be done in the playground can also be done programmatically, meaning that a software developer can write code.

To do those same steps through the open AI API, as well as with lots of other models, we used open AI because their playground is probably the best playground type tool in the market. There are others. The grok cloud has a good playground. So does anthropics, a cloud environment. Most of the major developers have some kind of playground type environment.

Open AI has just been added a little bit longer and they have a more robust. Playground tools. That's why we used it for the demo, but you can use any language model. Um, I should say any, uh, significant language model, the smallest language models are not necessarily the best at all the tasks we talk about.

Uh, but even something like a llama 3. 1. Um, middle sized model, like the 70 billion parameter model would be perfectly fine at doing a lot of the things that we demonstrated. Um, so that's the basic idea is the playground is just a way of simulating what you might actually go and build.

Mallory Mejias: In your regular day to day, are you opting to use the Playground or just the normal version of ChatGPT? Hm.

Amith Nagarajan: I'm a normal consumer most of the time. So I'm using the chat GPT app on my phone. Cause I'm always moving around. Um, I use, you know, all the mainstream, you know, type tools as a consumer. If I want to do something where I'm thinking through. Hey, there's a new way we could do something with a new model.

Like, you know, open AI just released the O1 models and the O1 or the O1 preview. I should say in the O1 mini, uh, which is the new name for strawberry, which we talked about recently on this podcast is now called O1 preview and O1 mini, uh, which are an interesting topic. Uh, we can touch on a little bit today, but, uh, that particular model series is available on our website.

to some consumers by the A. P. I. And so if you want to be able to more rigidly test models with the same system prompt and sequence of user prompts, it's a great place to do that. Um, I get into that from time to time. You know, my typical workday, uh, is is not really predictable, which I kind of like. Um, but it's I do use that tool regularly to try to simulate.

Hey, this is what it would look like if we did this programmatically. And then I'll go talk to, you know, Different development teams across different companies and say, Hey, what do you think about this idea? Um, so it's a good prototyping tool, I guess, is the best way I describe it.

Mallory Mejias: That's helpful. Talking about the demo, you asked AI to evaluate the CEO's optimism on a scale of one to five. I'm sure we have some listeners that are hearing that or viewing it and wondering how exactly does AI determine something as subjective as optimism? Um, so I wanted you to touch on that and then to maybe expand on, on what practical value would that finding optimism on a, an earnings call have for associations.

Amith Nagarajan: Sure. Well, let me first start off with how it would handle something a little bit less, uh, subjective, which is I asked it to extract, uh, the names and titles of the people. That were in the call, and that is a little bit less of a Hey, how? How optimistic was Satya Nadella versus his CFO on that? It was able to knock that out, and almost any model would do that very well.

So that requires a little bit less of a leap of faith in the sense that it's looking for something very specific. In the other example though, of like optimism level in general, like how optimistic is this person about their company's future? How optimistic is this person about the economy? Broadly, how optimistic, in the case of the example, uh, was the CEO about AI specifically for their company.

So we're asking for a specific person. So in the transcript of the earning call, we had the AI has to narrow it down to just that person's commentary. We're asking for a particular type of optimism around AI. So it has to be smart enough to know when, when that person's talking about AI. But fundamentally, your question is an excellent one, because, you know, when you think about optimism, how would you determine if Satya Nadella is optimistic about AI, what would you do?

How would you go about doing that? Mallory, if I gave you the transcript and asked you to do that task?

Mallory Mejias: Well, if you gave me a transcript, that's a whole nother issue. I'm thinking audio, easy, right? I would listen to tone. I would listen to all, all the other little details there. With a transcript, it would be word choice. Um, maybe how wordy someone was, if they were elaborating a lot, or if they were short and concise and to the point.

The kinds of questions they were asking. I guess that.

Amith Nagarajan: Yeah, there's the substance of the thing, which is, what did they say? Did he actually say, I am very optimistic about artificial intelligence? Probably not, right? And so, lesser AIs in the past would have been basically thrown off by that. You know, old school, classical, natural language processing, or NLP, would have looked for like, keyword counts.

So, if Nadella never said AI, but he's talking about like, You know, intelligent computers, right? And it's not a simple synonym. Those kinds of systems would have been thrown off. So it's much, much more than that. It's essentially, uh, effect, effectively a facsimile of a, how a human might actually think about that task, right?

Where it's the AI is looking at the full corpus of content it's been given, which is the transcript of the earnings call. Agree with you completely, by the way, the loss of tone. Uh, really takes away a lot of the information, right? It's a perfect example of what we were just talking about. So I'd much rather feed the actual audio files to the AI when AI model, AI models tend to be multi modal.

So certainly with chat GPT, you can do that. You can feed in the audio file and you can ask it to listen to the transcript and then it's gonna be much, much better because there's more information there. But ultimately, it's making a judgment call. That's what it's doing. And the way AI makes judgment calls is through probability distributions.

So, in fact, you can even see what the token probabilities are for particular prompts. There's this thing called log probs that you can look at as a developer Most users would never encounter this and you can see, okay, well, the five was like the highest probability versus the four being the lesser probability.

So it's picking essentially the most likely token, right? We've talked about that a lot. It's still, it's still next token prediction. Um, and so ultimately that training data, this massive corpus of content that these models are trained on. Basically all of human knowledge, basically, you know, is teaching the model, how to think about what does optimism mean and how do I determine if this person's optimistic or not?

Um, so it's, it sounds really kind of weird. Um, but that's, what's exciting about it because normally, um, it would take humans. And their time and their judgment, like to read an earnings call transcript, probably even if you're a speed reader, it's probably 30 minutes to an hour. And then for every time I ask you a question about that particular transcript, and I might ask you all the questions up front and say, Hey, Mallory, can you read the last 20 quarters of Microsoft's earnings call transcripts and give me the answers to these 10 questions for all the last 20 transcripts that you could go do that, it would just take you a really long time.

And if after you got done, go, Oh, geez, I'm so sorry, Mallory. I forgot to ask you to answer these other two questions. You kind of have to redo a lot of the work, right? Because you didn't have that in your context when that, that, that extra, the extra questions. Um, so that's the power of this now that that's the technical answer.

Now you also asked, why does it matter? Right? What's the value of, in this particular example, earnings call transcripts. We like this example because it's public data. And most people have some general understanding that, you know, publicly traded companies report their results on a quarterly basis. And that data is public.

Um, and for associations, associations are all in industry or a sector. Some of them span multiple sectors, but, um, being able to instantly or close to instantly. Ask any arbitrary question across the entire corpus of content for all the earnings calls in your sector could be interesting. So let's go back to that insurance example.

Um, let's say that I am an association somehow involved in the insurance world. And I'm just curious how insurance companies feel about AI. Well, I could run every earnings call transcripts for all of the insurance companies and everyone that's kind of in a related field. Through that type of tool and get a pretty immediate understanding of where people are at, and that can be valuable in terms of publishing research reports, for example, um, or possibly even calibrating the kinds of products and services that I might bring to market.

Um, depending on how interested my sector seems to be in certain topics. Um, I might be able to use that to gain insight, to pick better topics for my annual conference and for my publication schedule and so forth.

Mallory Mejias: With the insurance company example you gave, I want to bring this back down to ground, down to the ground a little bit. If you had to estimate, and you might not be able to give a good estimate of this, this technology is available right now. Our listeners could go out and run that exact experiment right now.

What would you say would be the timeline and cost, roughly, if someone wanted to start that project today?

Amith Nagarajan: You're referring to doing like one document at a time manually, or are

Mallory Mejias: No, I mean like, automating, earnings calls, exactly what you just said. I want to do that right now. What does that look like?

Amith Nagarajan: Well, you could certainly go build something like that by hiring a programmer or maybe asking an AI to write the program for you. And it would probably get pretty far with it. Um, and then go and do this thing. And it would, it would take you probably a matter of a few weeks and probably not a massive amount of money, but, um, that requires skills or dollars.

Um, There is a tool that we have available that we didn't talk about in the demo, uh, that particular demo, but, um, it's called Member Junction, which is an open source common data platform that we publish, and it's a totally free tool for the nonprofit community, and Member Junction actually has functionality built into it.

Which allows you to do exactly what we're talking about. It allows you to essentially point member junction at one or more sources of content. You can say, Hey, the source of content is a website or is a cloud storage folder or whatever. There's lots of options for content sources. And then you can specify the questions that you want to ask, uh, of each of the documents that exist in that content source.

Uh, so it allows you to essentially It's just structure, like all the prompts, like we put them in one by one in that demo, but you can basically put them into member junction in what's called the content type. And then member junction will automatically go and do this against all of the documents that you have in whatever location you ask it to process.

And it'll keep doing it forever. So you can point it at a website and say, Hey, do this for my website. And it'll automatically process all the content on your website. As soon as new posts appear on your website, it'll automatically suck those in and keep processing them for you. If you add new questions, it'll go back and reprocess the old documents so that you have a consistent.

Set of structured insights across both new and old pieces of content. Uh, so member junction does this quite nicely and associations can, and third party vendors, anybody can download member junction and start using it right away for that. But you can also build these solutions yourself. It's not intended to be like an ad for MJ and of course, member junctions free anyway.

So the point is, is that there are ways to do this. That is one way that is super easy and inexpensive.

Mallory Mejias: Using member junction as the example, is this something that you set and forget? So you kind of do the initial setup and then just see the insights keep rolling in. Do you need to keep a close eye on the quality of those insights and then kind of work to, to fine tune things? What does that look like?

Amith Nagarajan: You know, it's, it's a, it's an iterative process in that you would probably, let's say I wanted to do this at the earnings calls. First of all, where, what's the content source. Am I going to subscribe to a paid source of content and then pull that data in? Or do I want to like point at specific company websites?

Do I want to try to pull data from the sec? There's lots of ways to pull that, but then the questions that you want to ask from time to time, you're going to change them, right? Um, the, this is where we're going from scarcity to abundance. We like to say a lot where, you know, Up until now, the ability to ask and have answered.

These kinds of questions against vast arrays of structured unstructured data has been very expensive and therefore it's been scarce, right? You haven't been able to do what I'm describing. Um, but for a very small number of questions asked against a small number of documents, and even that costs you a lot of money, but we're moving to a model of abundance where you can ask any number of questions at any time against any number of documents and basically get the answers for near free.

Um, so that is going to cause people to think. differently, but you're gonna have to think more creatively because you haven't had the ability to ask a whole lot of questions all of a sudden you do. So it is going to be an iterative process, the actual mechanical, uh, process of member junction doing what I described, you can set and forget, but when you, what you want to do is go back and look at what's extracting from the documents.

So. Imagine it's like essentially dropping in the answers to all those questions into the database for you. Once it's in a database, you can write reports against it. You can feed that into all sorts of other things. So you do definitely want to inspect the outcome, but I think the most important thing is figuring out what you want to ask.

And that by itself is something you have to learn how to do.

Mallory Mejias: In terms of AI analysis of unstructured data, Amith, I've heard you talk about the example of how this can apply to medical research. Can you kind of talk a little bit about that? It's much more complex, I would imagine, than what we're talking about here with transcript calls, but maybe not.

Amith Nagarajan: It's more complex in a way in that the subject matter, I think, would tend to be more, uh, maybe not even more complex, but more domain specific. Um, but in a way, a research paper is actually simpler because it's more predictably structured. Um, whereas an earnings call is just people talking, and they typically have some planned comments.

But then what makes earnings calls interesting is, Analysts can get on the phone on this conference call and ask whatever questions they want, right? And typically the executives from the company will indulge quite a few of those questions and answer them And you know, these people tend to have a lot of pr training So not a lot slips out but every once in a while you get some interesting comments that come from people outside of their prepared remarks so um It's interesting because that aspect of how unstructured it is makes it somewhat complex for the AI to analyze.

Um, now research papers coming back to your question across any domain, right? There's tons of research happening in the world in all sorts of fields. And you know, you can get access to the world's academic publications, a number of different ways. There's Google scholar, there's Arvix, there's other sites where you can download these things for free and you can search, but what if you wanted to say, Hey, listen, I want to monitor.

all of the research happening in, in my field and let's say a few adjacent fields. Um, and what I want to do is I want to automatically pull in all the papers as they're published and I want to be able to answer a number of questions about these papers. So if I'm in cancer research, maybe I want to know what particular types of cancer is this paper dealing with.

Maybe I want to know what type of research is it. Is it looking for Um, for example, is it looking for early detection type of techniques? Is it about curing the disease? What, what is the paper about substantively? And, um, there might be several classifications for that, and I might want to do that across every single paper that's in my field and perhaps in some adjacent spaces, which is a massive task, right?

Um, but associations that are in particular narrow domains, I think would do well to find a way to pay more attention to what's happening in their fields and then do things with that. I mean, what, what can you do? Well, first of all, you can inform your members about it. More effectively, if you have this newfound superpower to keep up.

Right. Because the volume of research that's happening in a lot of fields is so overwhelming that even the most in depth practitioners in those fields, maybe consume five or 10 percent of the, of the work that's out there. That's probably a massive overestimate, um, because there's so much volume of content.

So I think it allows an association to do a better job of curating, do a better job of then, you know, feeding their audiences, their members, and perhaps others with, uh, content. more timely insights. Um, and then, you know, the other thing that you could potentially do talking about putting layers on top of, uh, what you might do with the basics.

What if you had a really smart capability on your website where you allowed any of your member, uh, folks to ask questions of any type across. The entire corpus of content, your own journals and all these other things and to get research reports brought back, right? It's like a report being prepared by, um, someone who's doing the work manually, but done on a completely automated basis.

That would be a service. I think a lot of associations could monetize. In a pretty meaningful way.

Mallory Mejias: then especially using that proprietary data, it's really a service that no other company or organization could provide to the world, you know?

Amith Nagarajan: Yeah. You know, a lot of people in the world of, um, academic publishing, the actual number of pieces of content that make it through a peer reviewed journal and get published is very small. The number of pieces of content that are actually published by people is massive, right? Because those are like anyone can say, Hey, I'm publishing this paper.

There's like no constraint on that. I can go publish a paper on cancer research tomorrow. I have no credentials and that's, I don't want to read it, but, and I wouldn't know what I'm talking about, but like nothing stops me from doing that. Right. So, um, that's kind of like this massively wide array. And so therefore you get a lot of stuff that's not really great science, but, uh, and then in like, Say you take something like a journal like Nature, which is one of the most preeminent, you know, publications to get a paper published.

There is a really big deal on that takes a lot of time. Credibility that peer review process is extremely rigorous as it should be, because once something is published in one of those journals, it's considered essentially be like, Hey, we've we've shown that this experiment works, that this thing's Really a thing.

Um, whereas, you know, this massive volume of content that precedes that type of, that level of achievement doesn't mean it's not interesting, especially if it's from people who aren't well known a lot of times it's like anything else in life. When humans are involved in something, you know, if you take like, you know, the next paper that Jennifer Doudna writes, she's the person who was one of the main contributors to creating CRISPR.

Um, if she publishes a paper, every journal is going to look at it immediately because she's Jennifer Doudna. Um, but if some person who's never been published anywhere has this It's an unbelievably brilliant thing, and they want to publish it, it may or may not ever get noticed by anyone, right? So um, all I'm saying is, is that this gives us the scale of resourcing where we can do things that previously would be really out of reach.

Mallory Mejias: For the sake of these examples, on this episode of the podcast, we've focused mostly on text with the earnings calls, transcripts, and then even with the research that we just mentioned. Can you talk a little bit about what it would look like to use audio or video, um, or if you recommend sticking with text for now and then maybe trying that out later?

Amith Nagarajan: I would start with text for now for two reasons. One is it's easy to test out for anybody. Um, and it's really inexpensive. Um, multi modalities, particularly with video is going to cost you a lot more and I'd wait for the curve to keep working in your favor. Um, I'm not saying wait five years. I'm saying wait six to 12 months.

Maybe, um, for example, Okay. Uh, just last week, um, the Mistral folks in France released Pixtral, which is their, um, video or sorry, image classification model. They have a video version coming and, uh, that's open source and that's going to be able to do a lot of extraction of insights from images. So you can pass along an image of a picture of your house and say, tell me what this is and what, you know, what part of the country might this house be and all sorts of things like that.

And it will give you pretty good answers. Um, so that kind of extraction of insight from, uh, Other modalities is super interesting. Um, but I'd suggest people, you know, kind of crawl before they walk and walk before they run and text, I think is kind of an easier thing to play with.

Mallory Mejias: a report. A

Amith Nagarajan: The models are also getting a lot better too.

So, uh, as these models get smarter and smarter, you're more likely to get interesting insights. One of the things that could happen in an experiment with this is you just do something really simplistic. You pass in a chunk of data and you ask a question and you get something bad back and you go, Oh, well, clearly the model's not smart enough to do this.

I'm not going to bother with it. And that could be true. It could be that your use case is above the capabilities of the model. Or it could be that you didn't approach the prompting strategy in a way that is correct for your use case. Or maybe you're using a model that isn't quite at the forefront of the type of thing you're trying to do.

Um, I would recommend trying several different kinds of prompts. Um, in this experiment, I would also recommend trying two or three different models, like certainly try open AI, maybe try out. Um, the Llama 3. 1 models on the Grok cloud, um, or try Anthropic's Claude, which is an amazing model as well. So I would try multiple different models.

The other thing you can do is you can go to a Frontier model, you could even go to like, you know, GPT 01 Preview, and say, hey, this is what I'm trying to do. This is the type of content I'm working with. I want you to help me. Create the best prompting strategy to use with a lesser model. And so this super high end model, like a one preview might be able to do a really good job at doing the prompt engineering for you.

So I wouldn't give up too easily, but it's, it's easy to give up, right? Because you go, you know, especially people coming in going, I don't think the AI is smart enough to really pick up on all the nuance in my content. And that may be true, but it's also possible that you just need to do some iteration and try a bunch of different things.

That's oftentimes how a lot of stuff happens in AI engineering, is you're literally throwing stuff against the wall to see what sticks.

Mallory Mejias: Tiny tangent, since you mentioned it, have you tested out O1 and what do you think?

Amith Nagarajan: I have. I haven't tested it extensively. Um, so there's two new models that were released by OpenAI last week, O1 Preview and O1 Mini. Oh, one mini is not a preview. It's just a much, much smaller version of the latest cut of Oh, one Oh, one preview is labeled that way because it is not intended to be like a long term model people are going to use, there will be an Oh, one at some point, probably in the next month or two, this is just a preview.

Um, and what I found is that the chain of thought reasoning that the model does itself while it's, you know, quote unquote thinking, um, is really helpful for more complex questions. And it's interesting to watch. You know, how it's taking that additional time as we talked about in the last pod, it's using that time, it's using, um, this, this process, which is like chain of thought prompting.

It's similar to that. It's just been baked into the model. Um, and so it's, it is coming up with smarter answers to more complex problems, for example, with code generation. Oh, one is amazing at that. It's way better than GPT four. Oh, um, Oh, one has been shown to be able to solve a lot of problems at the PhD level in domains across, you know, a wide range of, of different disciplines, which is really exciting.

GPT four. Oh, is not at that level. Um, and you know, open AI used a process where they actually hired a bunch of PhDs and spent a ton of money doing reinforcement learning with, uh, And that's why they're able to claim. And there's, you know, quantitative reasons they're able to show like it's actually better than a lot of PhDs and a lot of these fields.

So it does represent an interesting breakthrough. Uh, I haven't gone super deep in it, but, um, there's a lot of good information out there about it, uh, at a technical level. If you're interested, there's a great podcast called Latent Space, which is very technical. It's for AI engineers that had a great pod that just dropped, I think two days ago, that is with an open AI person, as well as the hosts of that podcast.

They talk about O1 a fair bit. Um, and you know, I think there's just a lot of blogs you could read up on it that are, that do quite a good job, but I think our, our earlier podcast, actually Mallory, I think we did a pretty good job kind of explaining how then StrawberryNow01, uh, worked and it's pretty much as, you know, as was advertised.

Mallory Mejias: and we really structured that whole episode or that, at least that topic on, on reports and leaks, but it seems like most of those were pretty accurate. I will say from a not technical perspective, I like the new strawberry better. I wish they had gone with that instead of 01, I'm sure they have their reasoning, but I myself have tested it out just like one to two prompts, and it is really interesting to see the model, quote unquote, think, uh, I asked it about event planning, marketing, and it was like mapping out the plan, thinking through the timeline, and it kind of takes maybe 10 to 15 seconds before it generates a response, so I recommend that you all check it out, I'm excited to see what the big release looks like.

Amith Nagarajan: I suspect that in the coming weeks, certainly months, we will see similar types of things announced by Anthropic certainly, and probably all the other major players in the space. Um, and it'll be interesting to see, like, will there be a GPT five? You know what, what they've said is that, Oh, one represents a reset of the counter because they're now in kind of a new era of models.

So they're not. It doesn't sound like GPT 5 will ever be a thing, but there might be an O 1 and then an O 2 and an O 3, and the O stands for OpenAI. Um, GPT was, you know, I think a lot of people have become, uh, accustomed to GPT, chat GPT, but it's a very technical term, and doesn't really mean anything to anyone other than the fact that it's an acronym that they know means AI stuff.

Uh, but over time, GPT as the fundamental technology is also going to shift. So it's, it makes sense that they got rid of that because there will be other model architectures post transformer that will supersede GPTs. Anyway, I digress, but, uh, I think it's very much worthwhile as you put it for you to go experiment with it and, uh, try the things that you haven't been able to make work in a pre 01 world.

Mallory Mejias: I saw Ethan Mollick had a great post about that on LinkedIn, like you should have this list of challenges ready to go, run them through that model. If you don't have that list of challenges, what are you doing? And I thought that was really great post.

Amith Nagarajan: Yep,

Mallory Mejias: Well, we have been kind of dancing around this topic all episode in terms of why does this matter?

Why does this whole topic of the episode matter and what can it do for associations specifically? So I think. We both believe, and I think it's the case, that Analyzing unstructured data can fundamentally change how you operate and create value, and that allows you to tap into new opportunities. So we've mentioned this as well, but analyzing public unstructured data, for example, like research papers or industry trends and a mix of your proprietary unstructured data, Like member communications or even event feedback, you can uncover patterns and insights that were previously hidden.

And sure, this allows you to stay ahead of trends, to be more responsive to member needs, but also to identify gaps in the market that could turn into new revenue streams. AI might reveal trends in member behavior that suggest a demand for a new type of service or product that your association can offer.

And by understanding these trends faster than before, yes, your members will be happy, but you'll also be positioning your association as a leader in your field. And so the big question here is how exactly do you take advantage of this? My immediate thought, Amith, with the demo example is it's amazing extracting these insights.

Is great But kind of the follow up naturally is well, what what can you do with these? Let's say you scale this you automate it You have all these insights all the insights you could ever want. Well, what then what do you do with them? So, uh, I

Amith Nagarajan: Well, you know, with any new capability, it takes time to figure out how to use the thing, right? So electricity took a long time to make its way and all the different applications that are out there. And I think the same thing is true for AI, broadly speaking, and for this particular capability of AI.

Um, The way I'd put it is this. I would look for some pain first before I looked for opportunity. Depends on the organization. Um, so let me give you a very specific example. Many organizations run events and for those events, they will typically issue some kind of a call for speakers, call for papers, that type of a thing, depending on the style of event they have, they call it different things.

But the basic idea is they're opening it up for people to submit proposals, to speak at their event. And some events are very large scale where they might actually have many, many hundreds of sessions. Um, and in order to fill the hundreds of sessions, they might receive thousands of proposals. For talks, different people submitting these things typically over a period of a few months.

And so this abstract submission is typically what it's called this abstract submission process. This abstract submission process is one that, um, people also have for journals, uh, for a lot of different things. It's essentially like an application process. Um, well, it turns out that the way people deal with this right now is a combination of staff. As well as volunteers read the proposals, right?

Makes sense if you want to speak at digital now you submit a proposal and ultimately Mallory and others will have to read that proposal and determine do you qualify on certain basic criteria that maybe after that first pass you'll look at it and say, okay, well, let's get some input from different people in terms of the speaker, the particular topic.

And so on. And ultimately you make decisions based on not only the quality of the submission, but also the mix of topics that you want to include in an event, as well as, you know, other factors. So where AI can help in this particular case is at a minimum doing that first pass. So let's just say that you had a storage location in Azure or AWS or Dropbox, wherever, where you just dropped off these files.

So you get all these submissions coming in, their Word documents, their PDFs, whatever, right? And so you're getting hundreds or maybe thousands of these documents. And rather than going through and reading all of them, For that first pass of like, does the, does the submission check off the basic check boxes?

Like, does it have the information we asked for? Is it substantial enough? You know, sometimes people submit a two sentence description and you ask them for 500 words or whatever the case may be. Is it on a topic that is related to the program that you are building for the event? That's a little bit more nuanced, right?

Um, so what if we could automatically go through every one of those documents? And answer those kinds of key questions is this, does this document, like there's a rubric eventually, right? Where you say, Hey, these are the types of things we're looking for. And we put those into questions. We ask the AI and the AI app basically answers those questions automatically for every single abstract that's being submitted in real time.

So the abstracts drops in and it's close to real time. You can set it to be every few minutes. It'll do this. And then you can just look at a dashboard or a screen that shows you all the submissions and the extracted. structured values, meaning, um, you might have a rating scale of completeness, but you might also have a bunch of checkboxes says, did the author include contact information?

Is it at least, you know, 500 words in length? Did the author provide citations to other publications if that was required and on and on and on right In some cases you might have scenarios were like for digital now, for example We allow non association staff to submit but they have to have an association staff person co present with them We do that for a variety of reasons But we could ask the AI to determine whether or not there was an association staff person Co presenter along with the non association person.

And it'll be able to figure that kind of stuff out. So if you're getting this massive volume of documents coming in, if you can narrow it down to the documents that actually fulfill your criteria, you've saved a lot of time, and then you can go on to the more substantive work of actually evaluating the program.

Now you can, of course, get the AI to help you with those steps too, but just take that most basic, simple step. You'll probably save. A lot of time. And I think that's a great place to start. There's lots of pain points like that. You might have a similar scenario with volunteer applications where people are saying, Hey, I want to volunteer for this committee and you require some, maybe in a video submission saying, why do you want to be part of this committee?

And you want an AI to quickly look at the video and extract several pieces of information to make sure that the video is worth your time to watch. Um, now if you put a system around this, right, where you have, uh, some kind of tooling and you. have this process. Maybe you go back to people who have submitted something in an incomplete way or something that doesn't quite match and give them feedback.

Because as, uh, as an organization that operates off of volunteer provided content, you don't want to choke off that pipeline. You want to encourage people to submit. So wouldn't it be great if you could give very rapid feedback to the people who are submitting and say, Hey, your proposal didn't quite meet the mark.

Here's where it needs to improve. That's really useful because most of the time the experience for someone submitting to a conference is you hear nothing, nothing, nothing, nothing, and then you either get a email saying we're pleased to accept or thanks for your submission. You're not in. But usually there's no detail behind it, right?

So there's a lot you can do. That's just one example. The other examples we were talking about with cancer research or research papers in general and earnings calls are more like opportunistic saying, what could you do if you had this tooling, you could build new services and new products. And that gets me more excited than anything else, but I would start, if I was an association staff person, I would start with my pain.

I would look for where I was dealing with a lot of unstructured data in my day to day job, and then look to solve a problem or two that's fairly simple with this kind of tool.

Mallory Mejias: We have talked about real time insights or nearly real time insights or historical insights in the sense of, of looking at the past six months, let's say of, uh, calls from our call center. Can you talk a little bit about maybe phase two of this and, and if AI, well, I know it's also capable of creating predictive insights, but kind of if that's a separate process or if that's something that goes hand in hand with what you're talking about now.

Yeah.

Amith Nagarajan: Well one of the things to think about is that we have a lot of tooling for structured data that is not capable of operating against unstructured data. So you have report writing tools like a Tableau or a Microsoft power BI, and you have a lot of people also who know how to use those tools really well, right?

Whether they're vendors or your own staff. Well, once you have structured data. You can do a lot of things with your classical tooling around structured data, which can be super interesting. That can perhaps be descriptive analytics, where you're saying, Hey, tell me about the situation, right? Give me charts and graphs and tables that summarize this, the structured data that I've extracted from the unstructured.

Um, that's useful. Uh, you can also look for prescriptive analytics, where you're looking to solve specific problems. And there's different approaches to that. And you can look to do predictive work, where you say, Well, let's take. The data set we have, let's just say earnings calls transcripts, and let's say we had, I don't know, 30 or 40 questions that we're asking across all of the earnings calls transcripts for the last 10 years, let's just say, like AI optimism being one, and there's, let's say there's 30 or 40 other variables we're asking for, and we go and get that for an entire industry, say insurance or consumer packaged goods or financial services or whatever the sector is.

Or maybe even more broad than that, and we have a database that has all these additional features, right? These additional structured features. And then we also pull in some additional interesting public data, which is already structured, things like the volume of that security by day, by month, by week, the price, um, and other factors, right?

What was the market cap of the company, other things that would be interesting to look at. And then we train a classical machine learning model. To predict where the stock price might go based upon the variables in the unstructured data Now i'm not suggesting that is a good investment idea just to be clear But I would not be surprised in the least If hedge funds were formed around this type of concept, right?

Looking for additional things to trade, train quantitative strategies against. Now in the association world, most people aren't going to do what I just said. The reason I use that example is again, I think a lot of people are familiar with the stock market at a very basic level, at least, and these concepts can apply to predicting things that matter a lot in your sector, maybe in your sector the price of a certain commodity matters a lot to everyone in your space, or maybe being able to predict employment levels is really important. There's a lot of things that I think are factors that associations can do a much better job with from a predictive viewpoint, using the structured insights that come from this process.

Mallory Mejias: That's really helpful. You mentioned in your example, a classical machine learning model. Can you clarify how that's different from what else we've been talking about?

Amith Nagarajan: Yeah. So we spent all of our time talking about what a lot of people call foundation models, which are pre trained large scale models, even what we call small models, like small language models are fairly large in classical terms. Classical machine learning being, uh, basically where you don't have. This pre training and this foundational aspect or multi purpose use.

It's not for that. It's for solving one specific problem. Um, a good example of that would be like, uh, classical image classifiers where you would say, Hey, I'm gonna train on 10, million images. And I want to be able to say, Oh, this image contains a cat. This image contains a horse. Um, so models like that, but that's actually even, even that type of model is fairly general purpose.

Um, you might have another model that says I want to have a prediction on how likely it is for my members to renew. So what I do is I create a machine learning model, train on my, all my historical data on a variety of different variables on my members and. So, for example, I might have things like, you know, how long has the member been with us?

I might have their name. I might have their gender. I might have their location. I might have the total aggregate amount they've spent with us life to date and a variety of other structured variables, right? Um, so I have all of that and I can train a machine learning model to predict. with a different confidence levels, how likely it is for Mallory or Amith or somebody else to renew or not renew.

And there's a lot of value in that. The problem is, is that the structured data is this narrow, you know, pinhole of visibility into the true world of data. So what if you could also use every email that they've sent you, every comment they've ever made on your online community, and ask questions about each member across all of that unstructured communication At the member level, then populate your AMS or your other system with additional attributes that are telling you in real time, something like member happiness level, right?

Self reported NPS, you know, it's better than nothing, but people are pretty bad at self reporting how they feel about a lot of things. So if you could pick up on this digital exhaust, as we like to call it, which is this all this other stuff in your ecosystem and say, Hey, I'm going to add enrichment to my structure database.

So now I have 2345 additional fields in my member record. And then I train a machine learning model on that. And I start using, you know, their happiness level, right? Um, There's a lot of other things like that you could use and then train an ML model to predict renew, not renew. So that's what classical ML is, is in my definition of it.

It's, it's a purpose built model that does only one prediction type. Um, generative AI, by the way, is predictive. It's just predicting the next word or predicting the next pixel. Um, so all machine learning and all AI is about prediction. It's just that it's kind of the scale of it and how wide the use cases are.

Mallory Mejias: That makes sense. Okay, so for our listeners who are still with us, who have decided they want to go out and use AI to pull structured insights out of their unstructured data, would you say, or this is what I would recommend, and you tell me if I'm wrong, Amith, I would recommend that they select one unstructured data type based on a pain point that they're experiencing, not just any random data type, but a pain that they're experiencing.

Select a data type. Decide on the fields or attributes that they want to pull out of that, or essentially decide on the structured insights that they want to pull out of that, and then you fill in the rest.

Amith Nagarajan: Yeah. I mean, it's the questions that they'd love to know the answer to, right? It's like, what are you curious about? And you think about like, well, in all of the emails and all the online community posts and all the social listening we can do, could we answer how happy is Mallory with her membership at XYZ association?

Maybe not, but there's actually a good chance that you probably could. And so, and maybe you can do it definitely at the population level or at a segment level and say, Hey, across all the people who fit into this category of early career professionals or late career professionals or whatever, you know, so you could say, give me like, take all of the emails that are from people in those categories.

And then train against all of those, right? So maybe it's not at the individual level. So there's all those, it's the questions you would love to have answered as a marketer or as a customer service person, or as a CEO, there's tons of questions that you'd love to ask. If you had this magical machine that you could ask any kind of question like that and get reasonably good data back, you'd probably be pretty into that.

So that's, that's everything you just said. I agree with a hundred percent. And that's really the variables that you refer to are just the questions that you'd love to know the answer to.

Mallory Mejias: I love it, and I love the analogy you gave at the top of the episode with looking through the little door hole, so if you feel like that's you listening to this episode or viewing this episode and you want to expand that view just a little bit more, hopefully you've left this episode with a few tips and tricks.

We will see you all next week after Utah.

View full post