Your Data Is In—Now What? Best Practices to Refine, Route, and Report Security Data using DefectDojo TY Page

Written by TRACY WALKER | Jan 29, 2026 7:48:42 PM

Transcript

00:00 Introduction and Overview
01:38 Understanding Data Refinement
03:07 Asset Hierarchy and Organization
06:16 Import vs Reimport: Key Differences
10:12 Deduplication Techniques
11:22 Engagement Types and Metadata
15:40 Modifying Product Hierarchy
20:06 Handling Duplicate Findings
21:54 Prioritization and Contextual Metadata
23:52 Bug in Community Version
24:07 Pro Version Features
26:12 Risk Acceptance and SLAs
27:46 False Positives Handling
28:18 Quick Wins and Best Practices
29:56 Ticket Integrations
35:13 Automation with API and CLI
41:38 AI Tools and MCP Server
45:26 Conclusion and Final Tips

Introduction and Overview

all right yeah, it's a couple of minutes after so we can get started. This is going to be a hodgepodge of best practices. It's going to be basically if we're gonna, we're gonna talk about refining data. We're going, reporting routing and different things. But this is not necessarily a training session.

This is gonna be tips and tricks, things you may not have been aware of. I know I'm going to cover, if you sit through 20, 30 minutes of this. I'm going to cover some things you already know. No question. Or at least most people know. But what we have found as we get in, as we do onboardings, when we talk to the community users who have transitioned to the pro version or just are using the community version sometimes we will show something or talk about something and we'll get a reaction of, I didn't know that it did this.

I didn't even know that was there. We've been just dealing with this. We didn't think it was exactly right, but we didn't know it was, there was something better or another approach or whatever. So I'm gonna cover those things. And so a few people, maybe even the majority are going to see, you're gonna see a few things that you already know about.

But I may trip over something, you'll be like, I had no idea. And that's really the purpose of this a presentation. So we are gonna talk about ways of making your job easier as far as like triage and doing kind of the refinement portion, routing getting the data where you want the data to be and reporting.

So we're just gonna kinda show some tips and tricks, some best practices and things like that.

Understanding Asset Hierarchy

So let's this will probably be the biggest section when we're talking about refining data. And before we begin, and I know this is once you get your data in, but we can't not cover the hierarchy and.

How important this is because we do, especially when we have like community users who are transitioning to open source, sometimes they'll say we'll look at their way. They've broken down their asset hierarchy and their, what we call fighting the tool. So we do want to cover this a little bit to make sure that even though you may already have a hierarchy, I want to give you some ways of looking at it to be like, oh, maybe if we changed a few things, it would save me having to use tags or different reporting, things like that.

So assuming that you have an idea. Of what the hierarchy looks like. Findings are discovered by scanners. Scanners are the test, right? Engagements are ways of organizing your tests. Usually. We got versions, metadata there. You could have one engagement for an asset that's not unusual.

You might have five or six. You might have a whole bunch of engagements. Asset level is the most important. And then organization, product type. If you've been a long time community user we just changed the name from product type of organization. And product to asset. Really just the name change there is additional functionality coming around the asset and being able to do parent-child relationships there.

Whenever we do the onboarding with customers, we will usually a, focus on this asset level because this is where a lot of this functionality works. For example, the deduplication works at the asset level and everything within that asset, all of the findings we do deduplication, the open source version will deduplicate the same tool.

So if you're importing from Tenable over and over, that will deduplicate there. The pro versions can do deduplication between, so there's two deduplication passes in the pro versions, but in either case it's always from the asset level and everything within that asset level. Organization and asset are the only two layers that have our back.

So organization level, this is as I say, it can be anything you want to be. And whenever you create a new organization, I like to show it's only four pieces of metadata. Name, description, critical and key. It can represent anything. Big tip and tips is the organization level. You want this to be granular, right?

If I had a large company and I had three subdivisions and then I had a whole bunch of layers within those subdivisions, I probably would make my orgs the level of those three orgs, right? I wanna make it more granular. If I, if a user has access to the organization, they have access to all the assets underneath.

Or if you give permissions at the asset level, then you only see that asset and you don't see the other ones in other within that organization. But that's the only two layers that you have are back. So the organization layer is really of all about reporting and who can see that data. And you do want it to be a little bit granular because if I were to do a quick report on organization, maybe I'll do an executive insights.

We'll see. I can always select multiple organizations and have that data roll up. But if I haven't broken it down, I can't break it down here, right? I only have, maybe it's Microsoft. Maybe I should have broken Microsoft into smaller chunks so that I can always bring 'em together, but harder to slice and dice.

So that's another kind of important tip that we do see sometimes as organization is too high level. And can't really be effective because it doesn't have that granularity. You can always move things around. The assets just basically, it's a one to many all the way down. And so you can always just change the ID or reassign an asset and everything underneath it will move with that asset.

So you can always kind of restructure these. One other thing that I'll mention is for organizations, if you have a large organization and you've spent a lot of time kind of building this hierarchy and you recognize, oh, I didn't realize that, my ticketing is attached there, and or the risk and priority is attached there.

So if you need to make some adjustments, we've also built some tools like for our pro users to reorganize things, export your current hierarchy, make some changes, reimport it back in with the changes that you want to have made. So being able to be flexible around this will also help you there.

Import vs Reimport: Key Differences

Another big topic that we run into, and it's sometimes I think it's one of the, probably one of the areas that we would get more confusion and a misunderstanding of kind of what this is. So we have two ways of getting the data in via API or even in the UI import versus reimport. And sometimes we will find situations where, let's go to a, an engagement here.

So I'm in an engagement and you can see I've got a bunch of scans that have been brought in from different tools. But as an example, sometimes we will see like a community user. And we'll go into an engagement and they have hundreds of tenable scans just over and over and over and over. And we will go into those scans and you'll see that almost, all of, most of them are duplicates or that kind of thing.

Also notice that the import history, it just has that, six were created. So every one of these is being done via an import. An import is always going to create a new test, and it creates findings for all of those tests. As you were just seeing when I go into this Tenable scan, it created all six findings.

And even though five outta six were duplicates, it created findings. So they're marked as a duplicate, but the findings still exist. And if I go into every one of those scans findings, they, all of the findings will exist because they will always be created when you do an import. The difference is if I do a reimport.

So in this scan, instead of doing an import over and over into this engagement, and notice here, up here where it says import scan. So that's the import if I go here, so you can find it right here. See how it says reimport findings. When I click on the scan itself, if I, or if I go into the scan, I can also reimport here.

And that is how you get the reimport. And notice how we created six this time. Five and touch, which that means we didn't create five more findings. They're already there in the test. So this also can help keep your, the number of findings lower because we don't have to create the individual ones, then deduplicate 'em and mark 'em as duplicate, right?

So that's why, that's how you get this automated triage. So the reimport, it's the same test, it's the same scope, it's the same repo scan. That enables the auto triage, which is what you're seeing right here. That is the diff right last scan didn't have it. This one does the next scan, it closed the, finding was not there.

So it closed that finding, marked it as mitigated, then it comes back, then it gets closed again. So you're getting this triage history as you're doing the reimport. So that is a big difference right there. The reimport is what is going to do that. You can use reimport all the time, even if it's for the first test.

There are some parameters a c auto-create context that will allow you, if it's like a new in a new engagement, you can have that created on the fly. So it can also create it'll show up as the first import. But you don't have to write logic into scripts or anything to say that I was just noticing I was frozen.

You don't have to write a script to do the logic of the first time I have to use import, and then after that I want to use reimport. You can use reimport right from the very beginning. Just understand that behavior. If you're using the import in a script over and over, it will always create a new test and it will not do this auto triage because it's assuming you want to keep those things separate.

All right, so that was a big tip and trick. Hopefully that was new information to some folks.

Deduplication Tips and Tricks

Let's talk a little bit about deduplication. So the community version, open source does have same tool deduplication, which means if you're importing from the same scanner even if it's in different tests but within an asset, right?

So we do not de-duplicate between different assets because this asset is managed by this team and a different assets managed by a different team, and we need 'em both to get the, the vulnerability. That's one of the keys to understand about the same tool is or all of this it is from the asset level down.

You can as a tip, and this is one of those things, sometimes people don't know when you are in an engagement. You will notice that there is a setting called isolate deduplication from other engagements. So this gives you a way to say yes, we deduplicate within the asset except that engagement right there.

I do not want anything deduplicate there. I wanna see all the findings. I wanna track them for whatever reason, and I wanna keep those kind of separate. And that's how you can do that. So that allows you to turn off deduplication just for that particular engagement and don't commingle those. Another quick tip since we're here, the engagement types.

Sometimes we get questions about what's the real difference between these? There's not a real difference between them. There's no functional difference between A-C-I-C-D versus engagement. This is a required field to have a type there. The differences in the metadata. All of the metadata is actually available to both.

So even if you were doing an interactive, you could still using the API or things like that, update the version, right? Version is one of those things we track. If you're doing lots of different pr automated CICD runs, maybe you have a different version every time. Maybe it creates a different engagement for each version, or you run multiple scans for a particular version.

So being familiar with some of the metadata will also help understand that's why I want to use CICD because it has version in it. Whereas maybe the interactive type doesn't have I guess it does there, I understand corrected. So familiarizing yourself with some of the metadata will also help you understand some of that alignment.

So same tool in the community. If you want to change the deduplication, you can in the community version, you actually have to go into the code. You can find all of the deduplication algorithms, how they're being done, which fields they might be using if they're not using like the unique id. In the community version actually I can't show it to you in the community version 'cause it doesn't allow you to configure the deduplication.

I can show you the same tool, deduplication settings in the pro version because we do expose. The deduplication setting. So here you'll see the same tool deduplication. So this is actually set up exactly the way it is in the open source. All of the scanners parsers are there and all of them have a deduplication RI algorithm.

This is what's set in the parser. It's itself. You can see some parsers will use hash code, some will use the unique id. So if the tool has a unique ID for this finding, we don't, it makes it very easy to deduplicate that and or if it's missing that you can actually use both. But this is already set in the open source and same with the pro version.

It's already set. You don't have to do any configuration here, but you can change it here. So that's a difference between the pro is you can change it and you don't have to recompile and do all of that like you would with the community version. And in the pro version, one tip, and we talked about this in onboarding with new pro users, but this is not set up automatically.

You have to go in and you have to choose your tool and then you have to enable the algorithm, which will always be hash code, and then you pick fields for that hash code. The reason we don't have this set up is because we did the calculations. It's billions of different combinations of tools that any particular environment may have.

There is no way for us to guess which tools you're using and the data that they're reporting to predict what would be the best matches to deduplicate. So that is why we require users to set this up, we want you to know which tools you're using. The fields are reporting so that you can pick the fields that are consistent between all of the tools.

And you want the data to be consistent there too. For example, description is a notoriously bad field to pick because the description can change quite a bit. The way that the deduplication works for hash code is we're creating a hash for each field. So if I pick a unique ID from tools, severity, maybe actually that's not a good one, but the vulnerability ID certainly is good.

If all of my tools are reporting an accurate file path, that's a fantastic key to use. But again, we will create a hash for each of these fields and then we compare each individual field with other findings, and that's how we can this was almost a match, but not quite, versus a full match and things like that.

Any questions so far, Chris? I wanna take a pause and make sure we haven't put everybody to sleep.

We do have one here. Mostly about the product hierarchy stuff about, oh another question coming in too. Awesome.

Managing Product Hierarchy

Question is, if I have my data in diva diligence, I wanna revisit our modify the product hierarchy later on.

How easy is it to do and I guess probably with that too, like any tips or anything else that you'd recommend to when making that adjustment?

Yeah. If you wanna revisit and modify, depends on how much you want to modify. There, thankfully the API gives you the ability to really do some mass changes.

We have a tool that will allow you to actually, another quick tip that we'll just cover real quick. Often open source users will not have well. Let's I'll, sorry. I will focus back on this question 'cause I'm about to go down a rabbit hole. So it is fairly easy to reorganize things because again, it is a one to many down this hierarchy.

So it depends on how much you want to make those changes. But it is fairly straightforward and we've done a lot of scripts to export the way it exists and or even do a database dump. We, there's a lot of different ways you can reorganize it depending on how how detailed you're going to do that.

The one thing if you, if the asset level was set like, so you know, it's this application, but that application actually has 10 different microservices, breaking those up is much more difficult. You almost have to just create the assets that you want to have. Make them part of that same organization and to break that up so that particular, if you gotta go more granular, that may require, the creation of some new things and, organizing and moving some engagements around and stuff like that.

You can copy engagements. If it's too detailed, we, I mean we help the pro users all the time with this kind of thing. My best recommendation is just remember that the API on the backend allows you to really change a lot of things. And if you're using some AI tools like Cloud Code or things like this, they're great with being able to inspect the API help write, Python bash scripts to do data exports, data manipulation updates.

I'll show you some examples of things we do with some data augmentation, but yeah, so it's absolutely possible. Lots of ways to approach it. Reach out to the community, our Slack channel or if you're a pro user, you can always engage us directly and we will help with that.

Nice. Let me, a bunch of questions actually came in. Let me just ask one more then we can save some of these towards the end. And this question is in my setup, my asset product covers many code repos. Is there any way I can get a list of findings slash search findings from a particular code repo?

I've added the code repo metadata in my engagements. There doesn't seem to be a search option for engagement level metadata for findings. Is there a better way to accomplish this?

Yeah. My rule of thumb is I usually will look at the asset level as if I have a repo in GitHub. Now, of course you can have a mono repo, kinda like I said, this one repo has 15 different microservices.

So in that case, I might make each microservice in its branch as the asset level. So maybe you run a scan on that branch so that you can isolate different components that way. Yeah, it's every env every environment is unique. And that's not an excuse. It is a reason to say that if I always focus on this asset first to, to get to the level of the detail that I want for kind of the automation for the scans, right? Organizing this stuff and automating it as it comes in is actually pretty key.

So I guess what I'm trying to say here is that lots of different ways of doing this. I think the bigger key is if once you understand the functionality that's at that level, like this list right here that sometimes will make it, you recognize, oh, if I want different rback for different assets, you're going to have to break this down.

Yeah, that's, I don't know if I answered that question really well. Apologize. I see. Another question here. Are we planning to change the product type and product to organization NASA in the community edition? Yes. Yes, that is coming. Will I be able to transfer my community edition to from Dojo Community Edition to pro?

Yes. Yes. You can migrate your community version and database to the pro version. Yes, that is possible.

Okay. I think there was one more thing. Yeah, deleting duplicate findings wanted to cover this real quick. And then particularly for the pro users, because the license is based off of the number of findings you can have unlimited users, unlimited imports, all of that. But it is based on kind of storage here in the pro settings, system settings.

Yes, you always gotta enable your finding deduplication, but this setting here, delete duplicate findings. I just wanna explain it real quick because sometimes people don't, I don't even aware that's there. If I turn this on and you see how it says maximum duplicates right here, so that 10, what it will do is if we're seeing the same duplicate over and over, we always keep the original finding that was first imported.

And we keep the most recent 10 minus one. So the next nine, the most recent nine. So we always have the original, we always have the most recent nine that we're imported, and then we delete everything in between those two. What the advice is, do not set this to zero. We do use some of those duplicates to give us this, this was a near miss, four out of five fields match, so it was a close match, but not a perfect match.

You can always say no. That's a duplicate. So if you. Tell DefectDojo, this is a duplicate. It will remember that, and then it will actually use additional algorithms to say four outta five is good enough. This one was a duplicate, this one is two. That kind of thing. So don't set it to zero. We suggest, I think the minimum that we recommend is like five or you can keep it at 10.

But that's just another little, a little dial that you can use to reduce some of that noise of all the duplicates

prioritization.

Prioritization and Metadata

So this one will feel a little lackluster for the community version. What we're talking about here is these fields specifically. And this is context for the, at the asset level. So business criticality, all of these things are set. At the asset level. So let's go into the ad product here and the community version.

So you can see here, business criticality and user records, revenue, external audience, and internet accessible. So those five specifically, that's what we're looking at here. Notice how wild card revenue and user records, what does that mean? You can't change the title here. I can't name it something custom, but user records is just a number, and this needs to be a range of numbers.

You could actually say zero through 10, ten's most important. Zero is not. So yes, you can use it for user records and if you had an idea for every application, how many users were using it, you could use that. But it just needs to be a range between zero. I don't care about this to another number that's larger than zero, right?

So if you create that span, we will, we'll understand what the, the breadth of that range is, and we will attribute points essentially to it in the perversion. So user records can represent anything you want. Revenue exactly the same way. Zero. I don't care about it. Million I do, and it's just a range.

You don't have to plug it into anything. It could be revenue loss, revenue gain revenue from that customer. You can use it for however you want. Now in the community version, if I go to all of my products here, you'll see there's that criticality and you can see the metadata. So you got a visual of.

Of what your risks are and you can sort by those and or prioritize this way. I may have clicked on, I am found a bug in my community version. I guess I did. I'll grab that later.

Pro Version Features and Prioritization

But yeah, you can see this, but that's really all we're getting out of that from the community version in the pro version.

If you have set this up in the pro version, we will use that data for our risk and prioritization. So this priority Insights dashboard, you can see this urgent needs action medium. So this you know what? I thought that I put those slides in here. I didn't, but you know what? I gotta back up right here.

This is how we're calculating our prioritization. So you can see we're giving a base score based on the severity. And then based on EPSS, so the pro version adds EPSS, I'm gonna show you how you can actually do that. In the community version, Kev, we do, grabbing extra data endpoint coverage.

So we count those, but then here's those five fields again. So we use those in the pro version. We give those points, it creates a range, urgent needs action, et cetera. And then that's what comes in through your you get a bunch of grasp, but then you got your to-do list. So these are my highest risk, highest priority items in this environment.

You can even see down here where we've got the I've got all of my fields showing right now, but you can see the known exploited. There's the EPSS. I feel like I'm missing some of my, I may have and I did. You can see here the,

I took these out and I apologize. The risk factors is what I was looking for. Those same risk factors there, and they're in there somewhere. Alright, I wanna move on. So let's go back to here.

Okay. So using the context, better prioritization. This is actually important, whether you're using perversion or not, because later we're gonna talk about, yes, you can also connect AI tools to DefectDojo, and those AI tools can use that data.

Risk Acceptance and False Positives

So it's good to have more data, more context, because you're also creating the basis for using those AI tools risk acceptance and false positives.

So sometimes people use the risk acceptance, sometimes they don't. They don't, it's just another thing that, we, yes, we risk acceptance. Some of these, they don't, may or may not have that workflow. The key that I wanna make sure everybody understands is that risk acceptance is the only way you can change an SLA so that it gives you that audit control.

So if I go to a, an engagement and I go into a finding. And I say, I click on one and I'm gonna do a risk acceptance on that. Create new add to existing. So these are the fields that I wanna make sure you're aware of. So yeah, you can assign it to an existing risk acceptance. Down here at the bottom you can set an expiration date.

So maybe I am going to, the engineering team needs a month before they can get to this. Then I would do a risk acceptance. Set it to the, say the end of February and then I'm going to reactivate it and then I have the option to restart the SLA then as well. So that is the only way that you can say, I'm gonna change the SLA, you have extra time, it's not gonna count against you.

And then the reporting will also be accurate as far as within that SLA. If you don't click that, there's, they're gonna miss their SLA, it's gonna get counted against them, even though we accepted the risk. So that's an good important thing to understand as far as and you've created an audit trail of all of those that you did reset the SLA and why you did it and all of the details, the proof, all of those kind of things.

Another key, and this was something that I wanna make sure everybody knew, false positives. So if you're using the reimport and you have marked a false positive. We will continue to mark findings as a false positive. Again, this works with reimport and it is from the asset level down. So if a false positive is marked in one asset, it does not mean it's marked in other assets because again, maybe it's an FP over here, but not in another asset.

So just another little tip there. Okay.

Quick Wins and Best Practices

Quick wins. Something you can take away Today, you can look at things. Sometimes we'll look at some of the, environments, community pro, otherwise, and we will see certain things and we'll be like you may be fighting the tool. I mentioned that earlier.

So some of those symptoms tag explosions. So if you have. Using tags for versions that can explode. Obviously. So lots of tags. We can also, there's also ways of minimizing the use or having a set of tags that, so they're not growing exponentially. Another thing like do duplicate tests.

Let me give you an example. Let's go to this engagement. So again, if I see lots and lots of tests here, that means they're probably just using imports. Maybe a very good reason for that, but that's also one of our clues that are you sure you're aware that you can also do reimport and that you can do the triage?

So if you're seeing hundreds, of tests coming in and they're not using that auto triage and things like that's also one of those clues that maybe we're not getting the most out of the tool. And I already mentioned the organization level, that product type slash organization, if it's too high level very large company, but they only have three orgs that you're really not getting a whole lot out of that.

You should have more granular so that you can do reporting. Security are back across that. So that's another clue that, maybe there's a way to take a look at the tool again, and maybe we should reorganize some of this. All right. Routing, getting. The data where you want it.

Ticket Integrations and Management

So obviously the open source has Jira for ticket integrations. Just wanna make sure you're aware. And the open source also Jira, you can do bidirectional. So that means if somebody updates the Jira ticket, it will update into DefectDojo. In the pro version, we also support ServiceNow, GitHub, GitLab and Azure DevOps.

These are one way integrations. We don't actually allow the ticketing system to upgrade or update DefectDojo. Now why would we do that if the open source supports the bidirectional? Let me explain. Let's go back to this one right here. And you'll notice that in the second reimport, see how we closed one.

And created two and untouched. So that one that we closed, that was in this Jira ticket right here, and it was made inactive. In fact, I think if I run that import again right now no, nevermind. I can't do that. I yeah, I won't try to do that. But you'll notice this one was closed. So in this particular situation, the developer closed the Jira ticket, it closed defect the ticket in DefectDojo.

And that engineer had never touched the code, did not fix it, actually. And we have had community users come to us, or, as they're transitioning to pro that they would see this issue where, I got a report this morning, I gave to my boss that all these things were fixed, and some of them were not fixed because then they ran another scan.

And they got reactivated. That's because of the bidirectional nature, right? We're going to assume if the developer closes the ticket that it's closed. But if they didn't touch the code and they didn't run another scan, you will not know that until they run the next scan and then you'll see something get reactivated.

And that means because the next scan says it's still there, so we reopen that ticket in Jira. So if you see a lot of that kind of thing, we find a lot of the pro users decide to turn that bi-directional off. They want to use DefectDojo as the source of truth, and you have to prove to us that the scan is clean, just like they would do, right?

If the scan is clean, I have to assume, unless I'm going to go into the code that the engineer has made the fix and it's working. So that's a little tip and trick on the integrations that you know of the bidirectional or some of the issues you might see with the bi-directional. Ticket fatigue. Occasionally we will encounter environments where they're auto creating all of the tickets.

So when you do an import, one of the options is that you can automatically create even with re-import findings, that you can automatically create tickets when you do the imports for anything that's new, just automatically create the Jira ticket, et cetera. What we have found, and usually there's some filtering already happening, like your your you're filtering on info level.

You're not, you're only gonna create like the high severity level severe level tickets. But we find that this can also cause some problems. It causes some ticket fatigue. So we, as a best practice, if you're able to, with all the other time that we're trying to save. For getting data in and doing the auto triage and once you create a ticket that we're tracking that the ticket is open and closed and all of that kind of thing, if you can automate all the things around it, then this is where you can prioritize the human verification, right?

If I go into a test and just like here, if I go in here and I say I am, I have verified this ticket and now I want to push it to Jira, instead of doing it automatically on the import, then I'm providing a human filter. Then my engineers are not getting ticket fatigue. And then they know anytime I send them something, this is something that Tracy has verified and it's important and they're more likely to remediate that and work with me.

But if I just set it and forget it and just flood them with tickets and let them figure it out we find that is less effective. Or we at least we see that in a lot of environments,

selective notifications. So speaking of notif, ticket fatigue notification fatigue so most of the users that we see are using a select number of notifications, for example, approaching SLA the ones that you see, or things that point to things that are wrong. For example, there's a notification for every time a scan is added, you'll start to ignore those, I promise.

But I do pay attention to when a scan is out added empty. 'cause that indicates there might be an issue. SLA breaches for combined. So I'm not getting one a ticket, I need a notification for every single one or things that I'm going to take action on. That's the advice, that's the tip and trick is focus the notifications on things that you want to take action on, or that indicate that maybe something is wrong, but not just the daily status of tickets coming in all the time.

Because you will create a filter and it will go to a, an inbox filter or a folder that you're never gonna look at.

API and CLI Flexibility

A-P-I-C-L-I flexibility. This is less a tips and tricks, but more of a reminder. With the pro version, we have a CLI tool called the universal importer that allows you to just run a command from a script from the command line to push findings into DefectDojo. And it makes it easier because you don't have to write curl commands and it's, APIs are a little bit more technical, but it's still using the API.

So even if you have to take a couple extra steps to write curl commands and things like that to be able to automate the scans in your pipeline is. To me is critical. I do not want to have to import scans manually. I don't want to have to import scans through the ui. So we want to automate this as much as possible.

The API is very robust and you can use that API we, we've seen users who don't use the UI at all. They do everything automatically through the API. So just don't be afraid of the API. There's full documentation for it. There are tools, AI tools now, Claude Code, ChatGPT, Gemini all of the LLMs can help write curl commands, write scripts to import things, update things, stuff like this.

And I wanted to use this also as an opportunity to maybe show some augmentation that we have done in the past for let's see.

Here we go. So here is a script that we wrote and that we've provided to some users and customers. Basically let's say you have, 'cause we, you wanna use lots and lots of different scanners, right? Just like we were showing in some of these engagements. I'll jump to it here in the

will that give me, no, it won't. Sorry. The engagement. I just hate having to filter through these to find the engagement that I was looking for. 'cause I don't spend enough time here in my let's go back to a product. Be easier to find this way. Apologies. Oh, this is my open source. That's why I can't find it. It's here. Let's do it this way. Let's go to the open source and let's go here. I'm going to go into let's go this scan right here. So let's say that we, yeah, perfect. So I am ported this scan. Now in the pro version, we will update the EPSS automatically.

Or any other, Kev, we'll also update the kev. We go to first.org, we go to csa, but you can also see how the some of the data in some of these, let's take this one for example. Maybe the mitigation. That's okay. It's not bad, but some tools don't provide a mitigation at all. So one example of something you can do with the API is like a little script like this.

You can see I've got description, CVSS, severity CDs, so I can update any field from any source and just update the findings. This is a test. Let's see. The test number is 27 49. And you can see here, you can see how EPSS is not set, which we use for priority. So this in the pro version, it updates every 24 hours or so.

So I'm just going to say 27 49. I need to make some updates to this. So let's update a test. 2 7 4 9. That's my test id, right? 27 4 9. And I'm going to update CVSS score. CWE, the publish date. You know what? Let's do this. Nope, don't proceed. I'm going to also add, let's do a two. Gonna configure these fields and I wanna make sure we also include seven.

So I want to get the mitigation. In fact, let's just also do one. Why not just update all of those? I'll leave the severity alone. All right. So we'll continue with that. And now we will type in that, update the findings. 2, 7, 4 9. Yep. We're going to update EPSS. We're gonna update kev. I've also written a script so that you can use it either with NIST, E-U-V-D-N-V-D.

But it just goes, it goes, gets all that data and then it updates DefectDojo. You can augment any data in DefectDojo with some simple scripts. I wrote this with Claude. It's very straightforward, gives me errors. If it's, this, oh, that's not an error, that's saying that finding was there. So it's updating all of these on the back end.

I can automate that. If I hit refresh. Now you can see there's my EPSS is showing up. My priority has changed, known exploited is showing up. If I go into one of these now, I can see the description has been updated. Maybe my mitigation has been updated so I can pull from different sources and really augment all of the data in both the open source and the pro versions

reporting. Oh, biggest tip thing. This one I love. Because when we sometimes will get requests, like I want to be able to create a view and I want to be able to send that view to someone. So like for example, if I come to this metrics page, we'll use these insights, but this also works in the open source.

So say I had an executive insight and I did a report for revenue and research, and I can include child assets, but let's say I'll just include all of them. So I've got those two orgs. Notice up here at the very top where we've got the date range and product types and the product types there. So one and three is the two that I chose for those organization, you can just copy and paste this URL.

If you copy and paste that URL and give it to somebody else, they will come to this page and they'll already have the filters that you already selected. So that's another way of sharing a particular view via A URL. Very straightforward. And then you can give that view to somebody else. And the only caveat is you do have to have permission to see that data.

So it is expecting that you would have the same permissions because you will only see data that you have permission to see.

And another point to that metadata that I was talking about a little bit before.

AI Tools and MCP Server

Because if you're not using AI tools I got news for you. You will be. It's, there's not really going to be any escaping this, that, that is headed everybody's way as far as AI tools. The closest thing I have seen to what's happening with ai in the early nineties was the internet.

There was a point in my career where I realized I will always be using the internet for the rest of my life. And AI tools are exactly the same thing they are. You will reach a point where you're using AI tools every day and it is gonna be there. The beautiful thing about DefectDojo is we were built in to be ready for this all that metadata that I was talking about earlier that matters to LLM analysis.

It can do a lot of pattern matching, risk identification. We did a webinar three, four months ago about. Our MCP server. Oh, there's my risk and prioritization. Sorry about that. Those are supposed to go up above. You know what, it counts because this context, yeah, we'll talk about it here too.

This context applies, right? All of this information would apply to an LLM. Yeah, you can customize this, but that's the thing that, you know we talk about that. If you were to connect all of your tools to an LLM, all of your duplicates all of the false positives, all of that stuff, and this data is not normalized, right?

So you got certain data fields that aren't even the same type severities maybe are not represented the same way. So you're asking the LLM to filter through all of that and normalize and do all those things, and that's going to create more of the hallucinogenic and, it may not get all, everything exactly right.

But with DefectDojo and using an MCP server with DefectDojo, you can, you're getting the benefits of all the normalization, deduplication and enrichment there. So here's one of the things I feel like makes DefectDojo different. Yes. We provide an MCP server for the pro community. It's under our ai.

You can enable it and then you can connect to it. We host it, we run it. It's just this little MCP server that will give you that deeper analysis. We've built some custom API endpoints for it, so it's more efficient and things like that. But you can build this for the community version. There is a community MCP server for DefectDojo that is out there.

Let me encourage you, you can build an MCP server. This, these are not hard to build. AI tools know how to build CP server. It's very easy to build, run, connect to the only big, the major difference with the the open source version to use MCP against that, which I, and I've done this. I've built MCP servers, I've connected them to the community version.

In larger environments, the context window is not going to be able to handle, Hey, show me all the findings. 50,000 findings is gonna blow up your context window. So you gotta know how you're, what you're doing, how to pull the specific information that you want. Might take a little bit extra work, but it is possible you can do it.

We've even submitted a talk to oas bu to have a little pod to teach people how to build MCP servers and using DefectDojo open source as an example. So this isn't a proprietary thing. Yes we've built one that does really well. We're getting great feedback from this already. And it's, it should be out of beta anytime soon, but it's something you can use right away.

We host it, it's already built, it's already tuned it API endpoints. Don't be afraid that you can't do the same thing with the community version. 'Cause you can. So that's the big cap there.

Conclusion and Q&A

Lots of, yeah, that was lots of tips and tricks. We went through a lot of things, fast questions.

View full post