From astrophysics to the automotive industry – how does that work?
I earned my PhD. in the field of Galactica Dynamics and Survey Mining (Galactic Archeology). Here we were dealing in particular with bringing together the world of theoretical physics and simulations with the world of astronomical observations – to then create statistical analyses. Even then, I had already discovered Python and especially the Jupyter Notebook (then still IPyhton Notebook) as a data science tool. Those still play a pivotal role for me and the data science world. After finishing my dissertation, I spent a year at Oxford as a post doc student; when I then realized that the analytical methods themselves intrigued me more than the results did, I made the transition to the world of business.
And where your journey take you?
First I worked on introducing and establishing data science and Big Data approaches for various OEMs. There may be a mountain of data but a lot of basic fundamental work had to be done to analyze this data. A big issue was the automotive-specific binary data formats such as MDF4 or ADTF. I spent a lot of time marrying the data with the Hadoop Ecosystem such as HDFS or Spark. Ever since then, I’ve felt at home in the automotive world.
Why did you choose Valtech Mobility?
I was very enthusiastic about the team. The data science world is rather small and everybody knows everybody. And the team that was assembled here is really excellent. We cover various disciplines ranging from embedded programmers, to computer science and machine learning specialists to the more “classic” data scientists who view everything through their data lens. In general you could say that our core areas include data science, machine learning, data engineering and data operations aka DevOps and we develop data products or end-to-end data pipelines.
What does your vision look like?
At the moment we are working on a project for a big German aerospace company but our real focus is on autonomous driving. We don’t want to just follow the herd, we want to create milestones. Fact is, we are tinkering on our own prototypes as we speak – with cameras, LiDAR and Jetson TX2 for GPU support. Our first milestone, however, will be a coordinated Hadoop-compatible data acquisition. Here we are working on combining ROS (Robot Operating System) with Hadoop ( more on this in our GitHub Repos.)
You said you were more of a “classic“ data scientist. What do you mean by that?
Personally I find it very satisfying to analyze a data set, i.e. to figure out the internal correlations and connections but also to find the (always present) inconsistencies and remove them. Of course, the given objective often is an ML application, but data cleansing and feature engineering is also a must in times of deep learning.
Which tools do you use?
As regards prototyping and in the DevOps-mode, the best for me is – hands down – the mix of Jupyter, Python and Apache Spark, since they allow you to combine data preparation, analysis, visualization and documentation. And now that Spark has introduced the data frame API, you have to make almost no compromises on the performance side with Pyspark. And Keras gives you a really great API for tensor flow. But, of course, you still orient yourself towards the customer’s IT structure and how their production is planned – you have to be flexible in this, otherwise the migration effort just gets way too big.
It is said that the technological prerequisites for autonomous driving are already available. Why has this not yet arrived in our everyday lives and on our roads?
Legal and moral concerns are often raised. The classic example is the dilemma of who an autonomous car would run over if it had to decide – the old lady or the school kid? The premise here is that the vehicle is controlled by a set of rules: if A then B. But that is not possible. The modern approach with neural networks works completely differently and does not make room for such fine adjustments. Legislators have to deal with the liability issues. Here, everybody is on the same page and moving in the same direction. However, technologically not everything is set just yet. I think the Pareto principle applies in this care – the remaining 20% of the result needs 80% of the work. Just cruising in the traffic does not pose a problem, but to handle all the exceptional situations or at least anticipate them in time – that’s the main problem I think. But: We’re working on it.