Skip to main content

One post tagged with "sql"

View All Tags

Semantic search for dynamically built queries in Java and CodeQL

ยท 7 min read

There was a challenge for me recently to search for SQL queries in large codebase. There is a problem with using basic grep or even IntelliJ search here because of the performance issues.

  • queries are long and dynamically appended
  • codebase is large
  • string searching is not performant enough.

An answer how to solve this task is buried in history of beginnings of static analysis tools. The first tools used basic regexes, but that turned out inefficient pretty quickly. Then incrementally more focus has been put to parse source files to Abstract Syntax Trees which is allows more freedom to write queries. Then finally Data Flow approach was added alongside Taint Analysis to make current landscape of security today.

Semantic searching has 2 advantages:

  • searching bare tokens is orders of magnitude faster than strings, in turn searching Abstract Syntax Trees is order of magnitude faster than tokens
  • semantic search offers more precision in designing the queries which only reinforces the first point.

CodeQL is one such tool that knows the syntax of major languages (Java) and caters for performant search of large codebases. I decided to have fun with it over the weekend and push it to it's limits as searching for dynamic queries is hard enough. I will show how to set up the project and write some queries for toy source file.

Let's get started.