Skip to main content

4 posts tagged with "java"

View All Tags

Semantic search for dynamically built queries in Java and CodeQL

· 7 min read

There was a challenge for me recently to search for SQL queries in large codebase. There is a problem with using basic grep or even IntelliJ search here because of the performance issues.

  • queries are long and dynamically appended
  • codebase is large
  • string searching is not performant enough.

An answer how to solve this task is buried in history of beginnings of static analysis tools. The first tools used basic regexes, but that turned out inefficient pretty quickly. Then incrementally more focus has been put to parse source files to Abstract Syntax Trees which is allows more freedom to write queries. Then finally Data Flow approach was added alongside Taint Analysis to make current landscape of security today.

Semantic searching has 2 advantages:

  • searching bare tokens is orders of magnitude faster than strings, in turn searching Abstract Syntax Trees is order of magnitude faster than tokens
  • semantic search offers more precision in designing the queries which only reinforces the first point.

CodeQL is one such tool that knows the syntax of major languages (Java) and caters for performant search of large codebases. I decided to have fun with it over the weekend and push it to it's limits as searching for dynamic queries is hard enough. I will show how to set up the project and write some queries for toy source file.

Let's get started.

Reflections after writing simple Spring Boot library

· 4 min read

Sometimes learning from adversity is better than trying to avoid it. Taking it into careful consideration provides valuable lessons that will support you in the future.

I appreciate my job for one particular thing. That is, it provides steady steam of difficult problems that challenge my intellect. Recently I tried to wrap my head around problem how to write tests for semi-large Spring Boot codebase and refactor it (with no tests whatsoever).

I started from the assumption that when you don't have any legacy tests at hand first you write them. How can you know you don't break functionality without running the tests? But the code was very unfriendly and writing them would require writing mocks.

So I thought - why not automate stuff a little bit:

  • instrument given beans with reflection
  • dump args and results to json
  • load json directly in tests instead of writing mocks in plain Java

1BRC Challenge

· 12 min read

One thing that recently got nerd the hell out of me was 1 billion row challenge. Citing the original site:

Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!

I was working on it after hourse and 1 week after taking on the challenge there are several conclusions worth writing about.