Qraptor’s team members have a wide variety of technology backgrounds. I’d say our team is polyglot from the perspective of the programming languages the members know. Besides our data science know-how, we have a strong background in software engineering. What we have learned in countless corporate software projects is that it’s a mistake to push software engineers into specific technology stacks and frameworks. We chunk up problems into smaller sub-problems and let the developer/data scientist decide which stack solves a particular problem in the best way.
What’s the metric we are using, which is the best technology to solve? It’s simpler what you might think. It’s the just lines of code. For us, the best technology for solving a problem is that one that needs the fewest lines of code. Our problem-solving modules are simple microservices with just a few lines of code, which provide an API or a UI to the outside world. We pack the microservices into docker containers with everything that is needed to run it. The microservices are so dead-simple that sometimes even a developer that is not familiar with the programming languages can maintain it. We throw microservice away in case they are not suitable for a problem any longer. For many things, we use ready-to-use packed containers, e.g. ElasticSearch, MongoDB, Tensorboard, which reduces the setup efforts. Our goal is not to reinvent tools and lose time for configuration; instead, we want to focus on our core tasks.
Right now we use four programming languages, but we think we might use more in future.
While there are many discussions about the pros and cons of Python and R in data science, we think R is superior in the field of data analysis. For analysis and preprocessing, R is our choice. The syntax of base R sometimes feels a bit strange; however, with the libraries from the tidyverse ecosystem, the code is pretty readable. We love the grammar of graphics and data manipulation approach in ggplot2 and dplyr. We use Shiny Apps for creating analysis dashboards with bare minimum lines of codes.
We use Python for modelling. Our core reinforcement learning framework, which uses OpenAI gym and stable_baselines a lot, is developed with Python. We use Python also for several backend services.
We have developed a framework that lets us quickly set up training sessions in the MongoDB, and the framework initializes all the objects that are needed generically. The frameworks persist all artefacts that a training session produces in the database, e.g. the model, training parameters, along with everything that is needed for a training session.
The world hates Java, but we love it. We have many discussions when to use Java and when to use, e.g. Python, but it has always been a discussion on a high standard. No stupid flame wars. We believe that a strongly typed language like Java has many advantages, but that are also use cases when scripting languages are the better option. Sometimes you need even less code in Java than in Python, e.g. with the usage of the Spring Framework. We use Java when multi-threading is essential.
Written on September 9th, 2019 by Jens Laufer