Securing the AI Frontier: Understanding AI Attack Vectors TY Page

Written by TRACY WALKER | Aug 8, 2025 9:44:50 AM

Transcript

00:07
Thanks Chris. Assuming let's share my screen. And I may have changed the title just a touch. But welcome everybody. Tracy Walker here to talk a little bit about AI. Not a very popular topic right now at all. So sorry if this might be a little boring sarcasm intended there.

00:33
First, this is kind of what we're going to cover today in this, kind of talk a little bit about what everybody is already seeing. So where AI can help APSEC mitigating your own risk with AI and LLMs, and some practical advice here as well, depending on where you're starting. I think, Chris, do we have a poll that we can start off with? We do. I'm left right now. So B to B.

01:02
looking out for a Polish Republic on your screen right here. Yeah, it did. Thank you. So basically, and I've got three or four poll questions. This first one just kind of gives us an idea. Without really knowing what everybody's level of experience is with AI, how much you are or are not doing with AI, for a general webinar like this, kind of difficult to not cover some things that some people are probably already going to know.

01:33
So I'll await some of the poll results there. That'll give me a little bit of a clue. And then while you're answering that poll question, I'll just do a quick introduction of myself. So again, my name is Tracy Walker. I joined DefectDojo last July. So I've been with the company for, coming up on seven months, or eight months actually, if you count the month of July, the zero. But,

02:02
Yeah, so have been with the DefectDojo for a short time and have spent about 30 years plus in information technology. The first 20 years was mainly software development operations, a lot of internal IT, some management within there and executive management, etc. But and a lot of consulting. And the benefit of all of that was it gave me a lot of hands on experience in a lot of different environments.

02:31
which really kind of taught me a few things about ego and knowing what you know and what you don't know and things like that. So in the terms of AI, let's just be real clear. I am not an expert. I have used AI tools probably for about the last year for various things, presenting documentations, helping my daughter get through college, things like that.

02:59
but also writing some code and code help, co-pilot, things like that. So I have some experience using it. I have some practical experience trying to use it in situations where maybe it wasn't the best choice of a tool and then you get a lot of spin and issues that you can run into. So that's the very first thing. I'm coming into this probably at a lower level than everybody else on this call. And so I've kind of.

03:27
created this webinar with that assumption that for some folks, I may need to catch you up a little bit. Other folks, you may see some of this and be like, yeah, duh, we know this. But we'll try to get you to the end to at least give you some practical things. And if we have some time, I can even show some of the ways I've been using AI tools for real work, mainly on the open source version of DefectDojo.

03:55
Um, so what was our poll results? Do we know looks like around 65% said yes. And then 35% said no. Okay. Awesome. That's really good to know. Thank you. Um, all right. So we're going to start off talking a little bit about what everybody is seeing in a software development. I would even say for some of those who said, no, it's not being used here.

04:24
I would assume it is because that's just the way it is. And colleagues and people that I've worked with for years and years, I play in poker games, things like that, when we start talking about this, everybody is using this, whether they're actually admitting it or not. So I got some really old statistics here. These were from eight months ago. These are dated and old. Even people who are watching the news over the last week with...

04:53
DeepSeq and some new LLMs that are now kind of emerging as this is better than that, uses less power and all these kind of things. So these are dated. This all comes from a GitHub research. These statistics have been quoted just about everywhere in lots of different publications and things like that. If you want to go and look at this particular PDF, it's available, has a lot of great statistics in it, and really kind of confirms a lot of the things that we're seeing.

05:24
Again, eight months ago, this number is absolutely low for the number of environments using it. 40% of new code is being generated by AI. This is based on their study, meaning new code, not necessarily code that's existed before, but probably closer to half of all the code now is being AI generated. 50,000 organizations at that time had adopted GitHub Copilot. This is

05:53
probably the most popular AI assistance tool for code completion, it'll finish your lines of code. I've seen users talk about, they hit tab three times and they had a PR. So this can be very, very efficient and a lot of people are using it. And then this was the one that I think was kind of the most important that in the study that this particular blog kind of goes through, and this was done with Accenture, but...

06:22
Some of these numbers may be skewed in some ways, but I still think that this has a big indicator in when 96% of developers install the IDE extension into their IDE, Visual Studio, whatever it may be, and start receiving and accepting suggestions immediately, that's pretty much everybody. As soon as they have it installed, they start using it because it's helpful, right?

06:50
So that is a big, big number. That's just the ones that are admitting that they're doing it. There's probably a lot of environments where they've got chat GPT or Claude or something on the side and kind of using it to help with tricky problems or things like that, or boilerplate code, things like that. There are two different kind of categories if you wanna think of it, of how developers are using or how development is using AI.

07:17
The extension with the copilot, this is what we're looking at here. So this is in Visual Studio, mainly for autocomplete. So direct code generation, right? A person using it to generate code, function implementation from comments. So being able to add code or kind of create a structure and then having the LLM generate code for those sections, or when you're adding, modifying code, generating test cases.

07:46
Code comments and documentation, one of the things that humans don't do as well, or consistency, or when you're under a deadline, you stop kind of commenting on your code and things like that. Well, AI is fantastic for this. And you can even take code that you're not going to have it change the code, document my code for me, identify what all these functions do in that kind of thing. Building out APIs, very, very easy and kind of direct, an easy way of kind of checking to make sure it's accurate.

08:15
And of course, refactoring and even conversion of code, that kind of thing. So direct code in generation or within a SDLC. So the visual that we have here, this is kind of that infinity, agile DevOps kind of flow, we're always just flowing, whether your sprints are quarterly, weekly, daily.

08:40
But all of these opportunities for all of these different places where we have security tests and things like that, that's what DefectDojo aggregates. But within that workflow, doing automated code review, dependency analysis, so all of these things that we're talking about here within the flow, usually you're doing some of these things anyway and maybe the tools that you're using have added some AI components, maybe they haven't, maybe there's other...

09:08
third party tools that kind of use AI. Whisper comes to mind for some reason for translating videos and that kind of thing. But there's lots of tools that you can add that can add some AI elements to this. So in the workflow integrations, we're kind of still seeing, we're not changing a lot of processes for this from what I can tell from various blogs and things like that, but definitely adding AI components to all the things that we're already doing. And that's a theme.

09:36
That's definitely a thing for implementing AI and maybe not treating it so much as this monster that we've never seen before. It's actually very similar to things we have seen before.

09:50
The impact on software development. So this also comes from that, actually this will come from that same blog. So there's some additional, what are the positive impacts of software development? Increase in productivity. This is really the quantity of code, not so much the quality of code. So just in matter of number of lines of code have been increased by 25%. Reduction in boilerplate code. So boilerplate code is repetitive code that doesn't really add any extra functionality.

10:19
but you kind of need it in there for definitions or whatever it may be. So reducing that is actually a good thing because this can take a lot of time just to manually, a lot of people will even automate building a lot of boilerplate code and templates and things like that. But reducing that is actually a positive thing. Faster code completion. So this marks basically spending half the time, right? So we're in creative, we're

10:47
doubling our speed, if you will. Not necessarily, doesn't necessarily translate one-to-one with developer productivity, because when that code completion is happening for us or the AI is building these things, sometimes then you also have to troubleshoot that code. And when you're troubleshooting code that you didn't actually write, it's a little bit more challenging, because troubleshooting means you really have to understand what it's doing, understanding what the LLM generated for you and what it's actually doing.

11:16
So sometimes that also can slow you down extra troubleshooting because you're kind of having to catch up to what is this thing trying to achieve here and think so yeah, could be twice as fast, but you may spend a lot of that time also troubleshooting, dealing with hallucinations and things like that. And this is a repeat because I want to repeat this anytime developers get exposed to this, this is a tool we will use. It's that's just the way it's going to be. If it makes me faster, if it can

11:46
automate repetitive tasks, of course we're going to use it and whether we're admitting it or not in an environment or not. And that's kind of what brings us to code quality concerns. And I lost my speaker notes here because yeah, this is coming from Adam Tornhill. He's the author of Your Code as a Crime Scene. Fantastic book.

12:15
like a crime scene. It's a really interesting way of kind of investigating what's going on here, how this thing happened. But this quote, I think, is kind of spot on. The ability to automate something also works if it's not that good. I'm gonna automate anything. If it's not super great, I'm just gonna continue to automate something that's not that good. So the majority of code that were on all of those statistics that we were looking at in those previous slides,

12:45
was for adding code, not changing code, not updating code, not necessarily refactoring code, or especially deleting code. So we're seeing more code, not necessarily less, and yeah, we're creating it faster and it can be more efficient from producing it, but if we're not also refactoring, deleting, making it more efficient, stuff like that, you got a lot more code to deal with, which is not gonna be easy for you or an LLM.

13:13
The code suggestion algorithms are incentivized toward acceptance. This is a pretty important thing to understand that it's trying to get you to use what it suggests. And so it's really kind of, I struggle with this word incentivized, prioritized, you know, how is it being rewarded from this? Well, because it was successful. If you took the suggestion, that must be a successful. The human said that did something good. You get a bonus point, right? So

13:42
Sometimes wanting to be accepted is not always the best approach, whether in life or AI. So you kind of got to be cautious. You may not be getting all of the suggestions. You may not be getting the best suggestions, but you're getting the one that the AI thinks that you're going to accept. Churn doubled, and that should be 2020 and 2024. So Churn is code that is updated frequently or code that has a lot of

14:12
Sometimes it's because it has issues. So there's always a reason why there might be a churn with code, but it's an indicator to look into understand the reason why, right? There is no single answer why you would have churn. Could be a very complex part of the application or it's the critical piece that everything kind of plugs into. So understanding why that churn is increasing and is being recorded over the last four years, if it doubled, that's definitely something that

14:41
in your specific environment that you're going to want to kind of take a look at. And this increase of copy pasted code, especially if it's generated by AI, well, it doesn't even really matter at that point. If we're using a lot of copy pasted code, that does reduce your maintainability because you end up changing a little tiny thing over here and then you lose track of all these little changes. And then you continue copy pasting from other places. You're less efficient. You're going to run into more issues.

15:09
And this also came from a get clear, you can see the link here, the pressure on code quality as a result of AI.

15:20
Alright.

15:25
So takeaways, did I? Yeah, takeaways. These are pretty easy. This is one of those duh moments. It's here whether you like it or not. There isn't any part of the SDLC that AI is not going to affect. And that should just be, you can take a look at that. Infinity map, every single tool, every single process that you're doing is probably going to be influenced or made more efficient or whatever.

15:52
via tools that are also enabled with AI. Significant productivity gains have been experienced and seen especially for junior level programmers. So sometimes using the AI tools speeds the junior folks, the less experienced folks up faster. I've experienced this myself. I'm not proficient in all languages or anything like that, but it does make you faster quick and it kind of helps you with.

16:18
syntax errors and just formatting. If you're not coding from the tips of your fingers, it can really, really speed things up. Quality control, and this is even going to prompt our next poll question. Chris, if we want to get the next one ready.

16:37
Yes. Is there a need for more human oversight when dealing with AI? I don't see the remainder of that question, but that's the question. Is there a need for more human oversight when considering AI generated code, AIG? Quality control. This is a discussion question. There may not be a right answer for this. And of course, security implications come from this.

17:06
from all sides because as we add AI artificially generated code, we're also introducing potential a lot of security risks, some that are very familiar to us, things that we're already dealing with with human-generated code. What's our result on our poll, Chris?

17:29
Looks like the vast majority are 100% so far saying yes to that question. I love it when a plan comes together. All right, so my question for those who say that yes, there needs to be more quality control for AI generated software. My question to you is, why are you not doing those things with human generated code? As I go through, for example, and if you go through a lot like the NIST, we're gonna take a look at NIST.

17:58
If you replace the word AI with human on all the security checks that everybody's saying, well, you got to do this for AI and da da da, replace the AI with human, do a mass substitute, and then reread all of the security controls. It reads just like any other security control. It doesn't seem to matter to me. Well, it does, but it doesn't. So the security controls that we have for AI probably shouldn't be any less or different than we're doing with human generated code.

18:29
We shouldn't be trusting humans any less than we're trusting AI, and we shouldn't be trusting humans necessarily anymore, because when we're talking about security, we don't really trust much, right? We kind of suspect everything. We're always proving that we're not being hacked. We're always proving that we're not under compromised or whatever, right? Good security culture is that you always assume that you're compromised, and you're constantly just trying to figure out where it is and constantly proving.

18:56
that it isn't compromised, et cetera, et cetera, right? So my challenge, and this is one of those that I don't know that there's a right answer, but if you're identifying places where you feel like you need more controls, more inspection, more quality security control around your AI-generated stuff, my first question is why are we not doing that for human-generated code? Why are we not adding those same kind of security controls, et cetera?

19:24
for humans no different than we are for AI. Let's up the game for everything, right? If we're gonna look at code, we shouldn't care necessarily where it came from. We should be testing, validating, securing, kind of the same way we've been doing for years and years or trying to. That's a great discussion. All right, where can AppSec, or where can AI help in AppSec? All right.

19:51
More statistics. So these came from various places doing research. 46% reduction in vulnerable code when using GitHub Copilot. So that did come from that next research with Accenture. And this is basically, you're kind of doing the same thing when you're doing static scans, code scans, right? You are removing vulnerable code or identifying vulnerable code. So when you're using GitHub Copilot, you're enhancing that. So...

20:19
46% reduction in code committed when using this. And also I would note the with the security focus features because I think sometimes those are variable, optional that you can turn those on or off. Increase in the vulnerability detection rate as you might expect through various scans, AI versus not. Reduction in the meantime to detect security incidents. So this is half the time.

20:47
If it takes me 15 minutes to detect an incident, well now it's maybe down to seven minutes, right? So we're decreasing the amount of time needed to detect things. And the 65% also kind of needs a little bit of explaining. If you read the cost of data breach report from IBM, this is really kind of using all of these things over here. And it's just really saying that...

21:11
For those companies that have implemented AI at various parts in their SDLC, code generation, whatever the case may be, they typically have reduced their breach costs. That's not to say that they have reduced the number of breaches, but they have been able to isolate, contain, et cetera, those breaches so that they're not as impactful. So reduced breach costs means it's a smaller breach, contained, didn't blow up the whole thing,

21:42
because you have had security layers that you've had in place to help, you know, contain these kinds of things. So these are good motivations to, to use here. Did I miss anything here? Nope, I didn't. Um, so some general, this might be a little bit of a dub, some general ways you can apply AI throughout, uh, an application security. So you're again, your process analysis. So this is one of the two, right?

22:11
you're using AI to write code and or you can use it across your SDLC. So, ability to check configurations, compliance checks, a lot of the kinds of things that you want to be doing over and over. Maybe you're spending a lot of time with from a human perspective, but now you can actually have the AI do a lot of that heavy lifting and then you can review the results of that. Red teams love AI, threat scenario generation, risk scoring.

22:39
mapping attack surfaces, data flow maps, service providers. So the benefit of having the AI from a security standpoint is a lot of times you don't actually get to get into the code, you're not able to see a lot of the things in the back end. Maybe you only have an idea of what the architecture is. Maybe there's a little bit more black box type of testing where you don't even necessarily know what's going on on the inside. AI can help you understand exactly what's going on.

23:04
in the eight side or at least understand the architecture, the components that make up that architecture and the strengths and weaknesses of those components, right? So just different ways of everything has a pro and a con or a trade off. So the threat modeling can at least give you, or at least that approach can give you an idea of different ways of attacking different components of an architecture because you now know what the, those components are and maybe what their tendencies are with those components. Code review.

23:34
That seems pretty straightforward, running code through an LLM, especially this one I love, the RBAC encode, right? Sometimes we have to put a lot of RBAC type statements throughout different functions to make sure when they get executed we're double checking that the person running that is able to do so. This is another great way of kind of making sure your code is consistent. We talked about the architecture validation. And then finally in Ops.

24:02
incident triage, all of these things here. The one that I liked the most was the documentation and the knowledge base maintenance. As these things happen, sometimes in the heat of firefighting, we don't actually go back and document the root cause and all of that kind of stuff. The big ones, yes, but some of the smaller ones, the daily things that we kind of deal with, or especially the ones that turned out to be nothing. This can also help with some of that kind of activity, documenting it.

24:32
knowledge base. And this can also be in code documentation, as I mentioned before.

24:38
So yeah, lots of ways that we can use this, right? Everybody has kind of a unique environment. I wanna double check on, are we coming up on, we're on slide 15, so mitigating your own risk. When is our next poll question, Chris? Apologies. It's on slide 20. So we had a few more slides. I can throw it up earlier though if we want to as well. Yeah. Oh, we'll wait, a couple more slides.

25:07
I'm just making first time live doing this. So let's take a step back, do a little bit bigger picture of, all right, so we have this thing, this AI thing that everybody's trying to quantify. Most exposed right now is LLMs, Claw, ChatGPT, et cetera, et cetera, right? But as tools are starting to build AI capabilities into those tools, they may be using a private LLM.

25:35
or one that's behind closed doors so you can't necessarily see the data that was used for training it and things like that. So when we think of AI, wanted to kind of touch on a SWOT analysis of this. And this was generated by three different LLMs. Just that's why there's no reference here. And I'll talk a little bit about that here in a moment because I got very similar results from completely different LLMs, mainly Claude.

26:04
and DeepSeq gave me the exact same elements, not always in the same order, might have an extra one or two, but for the most part with the exact same prompt gave me the exact same response, whether I found it very, very interesting. So just a little tidbit from my own experience. If you're using ChatGBT, DeepSeq is free. They claim a little bit faster. If you're trying to use it in the middle of the night, it'll be very slow. I wonder why.

26:34
But do some comparisons between the two. I was shocked at how similar these were. That doesn't mean it's right. That means that the data it's trained on, which was good up till last July, I think, the data that these things are trained on could be very similar. And depending on the internals of these LLMs, they could also be very similar. So I just found it really interesting that I got such similar results from the exact same prompt from two completely different LLMs.

27:02
I just found that fascinating. So test that on your own. So strengths, I think we kind of know what the strengths are, right? Weaknesses, so critical decision-making, not so good. Because again, it's emphasis to get you to accept something, not always the right thing. So that is immediate thing to be aware of. Creativity versus a zero-day attack.

27:31
I can tell you trying to use LLMs for various household things when something happens, the advice is not always really even usable. Contextual nuance, so this we kind of get into. Every environment is unique. Context is extremely important as to what's going on at very specific points within an application and it doesn't quite have that same kind of context real time. This is different situation than what was happening.

28:00
two days ago kind of thing. Ethical boundaries is a fascinating thing. You'll find this also in some of the frameworks like NIST of understanding what the ethical boundaries are. What are ethical boundaries? Humans have them kind of natively, but the AI, we have to train it as to, maybe the best solution isn't to not blow that thing up, right, the kind of negative, aggressive ways that it can try to solve problems. The quality of the data.

28:27
If you don't know what the data the LLM was trained on, then you don't know. And you'd also, this is a great way of attacking LLMs, is the data that they can be trained on, or as it continues, the things that you can kind of force it to train on to provide bad answers or things like that. Adversarial vulnerabilities, basically kind of like SQL injection into your LLM, trying to get it to expose things.

28:56
The opportunities I still red team hunting. If you're on a red team, you're doing a lot of pen testing kind of stuff. Please, I mean, I think you should definitely check it in to using it. It's, may give you some ideas of things to try that might have a higher risk factor for why you might want to prioritize fixing those kinds of things. Obviously tool integration, cost reduction, some of these things are great.

29:22
Threat intel sharing, I think is an opportunity between sharing information, even if it's within a company, but from different systems, if they're seeing the same kind of things. There's a lot of threat correlation that already happens with different seams and different tools like that, but the AI can take it to another level of, again, further reducing noise, showing you the things that are real. And training and exercises, fantastic scenario generation for different types of things you can test on.

29:52
All the things that are threats and risks, that's kind of where we're kind of headed from a security standpoint. These are the things that we are worried about. High cost. I added here myself because it's having started my career at the beginning of the Internet. First time I had access to the Internet, I was working for a defense contractor and nobody I knew had access to the Internet. So.

30:21
So I'm seeing a lot of those same kind of patterns now, even with AI, how big this is going to be, how it is going to affect our everyday lives like the internet did. But I also take you back to the time of data lakes, when we had big data and we were going to move all this data into these big data lakes and we were going to generate all this value and all this kind of stuff. Well, some companies did, but a lot of companies did not. A lot of companies spent a lot of money to create a solution.

30:50
thinking that they would find a problem that it would solve and never really did, ended up spending a ton of money and didn't get a whole lot out of it. So the start slow approach is absolutely applicable in this situation if you haven't already. But so some other things that we can cover there with the SWAT.

31:11
And so if you're kind of sitting back going like, well, none of this really helps me, right? Everybody's situation is kind of unique and one size is not going to fit all. So we have environment specific considerations, AI integration points, risk assessment framework. So these are kind of trying to compartmentalize the different things that you want to kind of consider when you're thinking about how are we going to secure ourselves when using

31:41
AI. So things like you're seeing on these lists, security controls, a lot of this is already the existing things you already have. And now you're probably going to add an AI flavor to that. Integration points, if you really look at these integration points, this is just your SDLC, right? It's the systems, it's how you're changing those systems, those applications, your development lifecycle, your operations that are taking place. So

32:08
All of the existing integration points that you have today are still applicable for using these kind of tools and integrating between those tools. A lot of times when we look at that infinity cycle, a lot of those tools don't talk to each other. That's why D-Vec Dojo kind of exists is to pull all that data together, right? So you can see it in one place. But this is also an approach of trying to connect the dots throughout an environment to get better analysis.

32:38
and kind of continual analysis because it's watching this stuff all the time. It doesn't have to wait for a Slack message. Risk assessment framework. Pick one, pick one that's kind of focused on AI. I'm going to show you one here in a moment, but use something as a guide to help you understand where you can begin to affect security around AI.

33:05
I would not necessarily advise showing up at your executive's office going, we're going to implement NIST and we're going to do all of these things. No, it's usually going to kind of go the other way. You're going to pick and choose the things that are applicable to your environment and apply those, which brings us to here's some additional attack vectors. So these are the things you're kind of securing.

33:30
against when it comes to especially for a private LLM. So if your company is going to run and host its own AI large language model, then these are things you're going to want to definitely be aware of. But even as I look at these direct injection into system prompts, well, that's like SQL injection. I'm going to put code into the prompt to see if I can get it to do something. Hidden injection through instructions.

33:57
Role-playing attacks to bypass restrictions. This used to be kind of fun to watch. There's various Reddit channels where you can see people get strange things that they were able to get the LLM to respond with to kind of reveal biases or reveal how it was trained. And that could also mean your company's data if it happens to be in that LLM. Context leakage, leakage exposing system details. So again,

34:24
I'm giving you this answer, but at the bottom it has the IP address, that kind of thing, kind of that ancillary wasn't supposed to be part of the answer, but it ended up in the answer. The training and data attacks, so data poisoning during the model training, different things like this. You can read the bullets. I will, I'm going to give you a personal experience here.

34:49
I have a car that I frequently look up on YouTube and I'll watch, you know, look at different videos kind of telling me about the car or different things like that. And I recently did a search on YouTube and I wanted to see any videos on this make and model of this car released in the last week, right? Within five days. And that's kind of like, well, maybe one or two, you know, around the world, right? No, there were hundreds. There were hundreds of videos on this make and model of car.

35:19
And when you started watching the videos, because some of them had like 200 views and 500 views, but they were all released in the last five days. And they were all AI generated, every one of them. In fact, most of them, as I started to look at them, it wasn't even the right car. It was a different model that said it was a Ford and it wasn't, but they're AI generated. And I'm assuming that they're also AI watched. And if so, if you think about it,

35:48
If all of a sudden you get a flood of AI generated information that overtakes all of the actual real videos that show things and all of these videos that are AI generated are also being watched by bots to increase their viewership, then you're also kind of retraining algorithms in YouTube or algorithms in Yelp or algorithms in all these other things that use reviews and things like that. We're already seeing this on Amazon. So all of a sudden you're going to get

36:17
see tons of AI generated content and it's going to be much more difficult to sift through that content to find the real stuff, actual valid or accurate stuff, which is going to mean the data that you're training your LLM is going to be even more critical to make sure it is what you want to train on and it hasn't been manipulated and can't be attacked like that. So thank you for allowing me that little story. Not like you had a choice though, right? Mitigation strategies.

36:47
The mitigation strategies usually fall along the same lines as your existing mitigation strategies. And as our poll suggested, we probably should have more for AI, but I'm thinking you should have the same mitigation strategies both ways. There are some special things about AI, the ethics, the things that it can generate, the security around some of the things, even internally that you don't want it to suggest.

37:12
you know, like, you know, explicit content or not safe for work kind of content. Um, so yes, there are some other things to consider there. But as I continue to do more research about ways of approaching this, it just reminded me of all the things that we've always been doing from a security standpoint of identifying risk, how, what's the level of the risk mitigating, et cetera, et cetera. So these are all things I think everybody's kind of familiar with.

37:41
I meant to have this pop up because I didn't want to give this, but I think we do have a poll question for this.

37:53
Yes. And we do. We're doing helpful. Oh, here we do, yes. There you go. So this is kind of a DefectDojo specific question. Would it be helpful to have an AI security framework as a benchmark in DefectDojo? And I wanted to show, here we go. So this is open source DefectDojo. And in open source DefectDojo, if you go to a product, you have a dropdown here that's called benchmarks.

38:23
And in those benchmarks, we have benchmarks for a couple of the OWASP ASVS versions here. And you can see I've applied benchmarks to this particular product and I can check boxes on these. If I go to them, you get a, you know, which level you want to be at. Do you want to publish it? Do you want to report on it? And which things you've passed or been working on versus the things that maybe you're things are applicable or not, versus the things you want to pass or fail.

38:50
or when they've passed that and you can kind of track all that. So my thinking was this would also maybe be a really good place to add something like the NIST AI framework version 1.0. And I worked on this hoping that I might be able to get this built and put it into the open source before this webinar. But alas, even when using AI tools to try to build this, I'm still working on it.

39:19
But I've made some progress and so it is so that was the question Would something like this be a value something more specific to AI? Something where you could use defecto to kind of track those kind of things and this brings us to well this In all of my years and all of the different environments that I have worked in all over the world I've been very very fortunate to work in a lot of different environments hands-on

39:48
First, I think I didn't lose root access to production environments until about my 15th, 16th year as I got into management and things like that. Well, you know, you can't have a CIO with root, right? But the thing that I have learned, especially through the consulting, every IMT environment is unique. A lot of times when you ask questions, you will get, well, it depends. Yeah, it depends on the environment, the culture, the people, the technologies that were chosen, the way they were integrated.

40:18
Uh, if you've ever been one of the, through one of those exercises where developers are given the exact same requirements and they build a quick application and it is two completely different applications, that experience of, you know, every environment is unique, not just different, unique. Now this seems obvious. Some will be going, duh, but this is where your humility comes in coming into an environment thinking that, well, I did it this way at the last three companies, it's sure to work here. Not necessarily.

40:47
Um, every environment was created usually by very smart people who had very real reasons why they chose the trade off that they chose. Yes. Sometimes it ends up, you get kind of caught in a corner and you don't have a lot of flexibility more because you've built some things around something that you didn't want to build or whatever. But the point is every environment is unique. Every environment has different risks, different processes, different cultures, different attitudes towards security. So

41:14
your environment is where you have to start understanding that environment, its capabilities, its maturity, and where things can be done to improve things from a security standpoint, because what works in one environment may absolutely not work in another. So this hopefully will give you a little bit of hope that, oh, I've got to catch up to all this stuff. No, not necessarily. You got to catch up to the stuff that you need in your environment. Nothing more, nothing less. Right. So let's talk.

41:41
I'm going to use the NIST, OWASP, they have the top 10 list. There's various, I think there's some other frameworks that come out. But as I really started looking at the NIST, the govern, the map, measure, manage sections, there are things in here that surprised me of things that I wouldn't have considered checking, especially when it comes around information integrity, safety evaluations, kind of.

42:09
measuring the kind of responses that they're getting? Are they accurate? Especially for like marketing, documentation, those kind of things. So this framework is extensive. I brought up a couple, there's a lot of different components to this. And I would be shocked that anybody would even attempt to try to implement this whole thing. And this is why I was saying, may not wanna go to your boss, we're gonna do NIST today. No, you're not.

42:37
But you can take things from this. You can take elements of this. Things that maybe if you read one of these things that you should be measuring or something you can measure and it's like, ah, we do have that one right there. So this may give you another way to kind of highlight things that you want to use from a security perspective or that need to be used. And then you can say, well, you know, it's in the NIST AI risk management framework. So it'll at least give you a place to kind of start. Now.

43:08
I won't cover it anymore as I see at the time. I can't believe I've been talking for 45 minutes. So NIST is a great place to start. But if that is too much, our last poll question, and then we'll wrap up and do some questions. Are you working or do you have an AI LLM usage policy? Yes or no?

43:36
I ask this because sometimes coming in with the NIST or coming in with an intensive framework is not always well received. We got other priorities right now or whatever the case may be, right? So sometimes coming in with a big framework isn't the best starting place. Chris, do we have results on our poll? I'm very curious about this one. We do. And it looks like about 64% said they do have a policy. And then

44:04
36% say they don't. Ah, I had no idea what they may or may not, what that may or may not indicate. So as a result, so it doesn't matter if you have one, awesome, if your company doesn't, this might be a simple place to start because this might give you a way to just, and again, this also AI generated across Deep Six, Claude and ChatGVT.

44:34
Crazy, same kind of, so if it's all wrong, all three of them are wrong. But things starting point, an easy acceptable use policy. These are the things that we want to encourage you to use this for, because it can help your company. It can speed things up, it can make things more accurate. There's a lot of benefits, obviously, but there's also a lot of risks. And like first and foremost, just a way of having a simple, these are things we do not want you to do.

45:02
Don't upload our financials. Do not upload any of our private source code. If you've got open source, publicly available code, that's a less risk. But the internal code, anything that's proprietary, proprietary information, you don't wanna be copying, pasting that into an LLM for any reason because that becomes part of that training model. You could really become exposed at that.

45:25
So there are a lot of prohibited. And this is the part I think that this serves the purpose of having some kind of usage policy, because these are the things you wanna make sure everybody's aware of and thinking about as they're using the tool. We wanna encourage you to use it to make us better, but we also wanna encourage you to know what things don't, you know, don't press these buttons, right? Network. Sometimes we'll even get, when we do parsers for DefectDojo,

45:51
Sometimes we'll get example files from a user that has an output file and we'll open that file up and it's got IP addresses and email addresses and domain names and all of this identifying information just out of one scan file coming out of some of these tools. So those are things you don't want to just take that file and say, hey, build me a parser. You want to remove any of that kind of identifying information, just anything that could

46:20
potentially used against you. Additional usage policy components. So these were some other things that the DeepSeq and Claude basically were seeing in promoting required practices, data handling, security consideration. So just an easy way to build a simple policy, do a mass email. You could even do, I think, even in DefectDojo, you could possibly even use the questionnaires to.

46:48
and say, hey, have you read that policy? Are you aware of what the things we want you to do or not? That kind of thing.

46:57
So it's here, it's real, and if you're a fan of Seinfeld, you got to get going. 40% of the code, so here's kind of the summary slide. Basically, and I think everybody kind of understands this, it feels overwhelming. It feels like there's a ton there. It's changing so fast that even week to week, it's just like there's new LLMs to be used and different things.

47:26
But again, know yourself, know thyself first. So understand your environment, apply the things that make sense in your environment. Try to start small, try to do things incrementally, try to be aware. One of the things in the NIST was just identifying all the different tools where you might be using AI. So you can kind of track those kinds of things. So there's just a lot there to try to wrap our arms around. And hopefully some of the suggestions or some of the pads for

47:55
using policies or things like that have been helpful. Any questions or comments, Chris? Thanks.

48:05
Looks like we don't see anything come in yet. So again, folks, we have to miss back to, or take all sorts of data like a little bit later too. Any questions, be sure to post it in the questions chat there or the chat. But so far, I don't think I see anything. I see 88% said that the having a NISC benchmark would be a good thing. I guess I'm gonna have to keep working on that. Like any feature requests, a lot of them next sprint. So stay tuned there, stay tuned.

48:36
Um, yeah, I don't see any questions or anything else comes up. Uh, I don't know if we want to go through the whole DefectDojo, like workflow stuff, but maybe that's time for another webinar, probably. I think we'll have to do that for another webinar. Uh, some examples of how I've been using, uh, LLMs to do analysis on parsers, um, even potentially creating a parser. Um, uh, so yeah, there's.

49:02
Lots of ways of kind of using that code for just really quickly the parser example because all of the parsers have example test files, you can take those test files. Do I have it up. I do. So this is a will do this for another webinar, but using the parser. So in the parsers. We have test files right as I was mentioning earlier that have, you know, example data fields and all this the

49:29
The cool thing that you can use an LLM for is because it can't run code, you can upload or what I did is I uploaded one of the sample test files. I uploaded the export after I parsed that data in. So I've got the parser Python code, which is publicly available. It's in the open source. But I can also, when I parse those findings, I can export those using an API call into

49:56
the finding field that has been mapped for that particular WIS CSV file, right? So by giving the three files to Defecto, or Defecto to Claude AI, I was able to identify exactly with a high degree of accuracy, because I've double checked all of these, where those fields are being mapped and the line of code where those are being mapped. What this gives me,

50:24
is then as I start to build data mappings for every parser, one gives me documentation of exactly what's being parsed and what's not, because not every parser is parsing every possible field that is coming in in the CSV file. You can see here, some of these things don't get mapped, right? Static field, parser processing, so AI does a fantastic job of doing analysis like this, but then I can feed this output

50:53
back into the LLM. Once I've checked it and verified that it's accurate, now the LLM can use this analysis, and this is one thing I love about CLAWS, you can do this in a project. So you can see down here, I can map all of my parser files, and now I can do analysis of, well, let's see, go back up here. Here's our finding for BlackDoc. So now I can build an index of every parser.

51:21
and every piece of code that is changing or parsing different elements from here to a finding field. And that gives me a way to kind of evaluate the functions that those parsers are using. Some of these were created years ago by members of the community, some have been updated, some haven't. But now I have a way of kind of evaluating, well, this one does a great job of parsing these fields, and maybe we want to update some of the older parsers, maybe include some of the fields that are not being mapped.

51:50
those kind of things. So kind of a new opportunity to do a refresh across all of our parsers and just by building on the information that we can use, what those mappings are, and then doing some analysis of all of those mappings. So it's just kind of a preview of some of the work I'm doing for a future webinar.

52:14
So that's a little bit of ammo for you guys to come into the next webinar. Like, stay tuned for that one. Get a little bit of a nice preview there. Yeah, I don't see any questions for comments, so I guess we can wrap this up early. Again, big thanks to you, Tracy. Thanks for presenting today. Thanks to everyone for joining us for this webinar. We'll send the recording link up probably today or tomorrow, give or take, how fast I can get this recorded to process. But stay tuned there. A lot more to come. And we'll hope to see you guys at the next webinar. Thank you, everybody.

View full post