Big data, machine learning, neuro-linguistic programming, data science. Concepts shrouded in mystery that become annoying buzzwords before most people can understand them. Unless… The head of our Big Data team, Hynek Jína, reveals their secrets and explains their role in business development.
Words and photo by Jan Strmiska
There's a mysterious data department at Creative Dock that you're the head of. How would you describe what you actually do?
Our industry is a living organism that consists of four parts: Risk (probabilistic decision making), Data Science (magic), Reporting (images) and CRM (understanding the customer). The first area is risk analysis. This means decision-making processes. When you want to assess risk or perhaps evaluate a loan. We do this on a lot of our projects.
Are we talking about analyzing a particular person's data to see if, for example, a bank should give them a loan?
Yes, but it's not just analyzing, it's also finding that data. We try to find out something else about that person that companies wouldn't normally use.
So you're on Facebook all day.
Well, basically yes. (laughs) When you have your own business and you apply for a loan through our P2P project, Nafirmy.cz, we look at all the traditional registers – if you have any debts or anything like that. But at Creative Dock, we also deliver other information. We call it alternative scoring. We simply look at other information about your company. If it has a Facebook page, if you just started it last week, how many followers there are, what your website looks like. That's kind of the, say, first layer. But then we can go on and explore how active you are, what you're saying in general. There might be user reviews and other things.
Well, yes, I know that, I do this as a journalist.
But we do it automatically. Basically, anything you would do more than once, we automate. We try to generalize these things.
But the world is so complicated!
A lot of actions seem like you do them differently each time, but if you break them down into smaller tasks, you can find the same patterns. So we're looking for more general things and smaller tasks that are somehow graspable. And they become these kind of packages that you can put together.
"Machine learning can be super-sophisticated. You can train algorithms on data. But then it might also turn into a black box where you don't even know why it gave that particular result."
Does automation always pay off?
Definitely not always. If you only want to look at one company, it's easier to do it "manually". See if they have a Facebook page and what's on there. But then you don't have a comparison to the whole. You might want to find differences or points of convergence to somehow relate it to other companies. And part of our product is that it doesn't just download information for you, it interprets it.
Is that how robots know more about me than my family does?
We can decide it's cool that you have a Facebook page, and it's cool that you have this many views. But we can also machine recognize when people berate you in the comments.
But now we're talking about evaluating millions of people.
Let’s imagine a model situation. You have a business and you want to take a loan. And we want to know what kind of company you are. If there are reviews about you somewhere, that's great, they're easy to interpret. One star is bad, five is good. When someone writes a comment, you as a person can see if they're berating you or praising you. But what if there are hundreds of comments? Even if we're only talking about one company, to rate it manually is already quite a tedious task. Plus, it's constantly changing over time.
And then comes in – as an even higher stage – the machine learning?
Yes, and it can be super-sophisticated. You can train algorithms on data. But then it might also become a black box where you don't even know exactly why it gave that particular result. You don't want that when assessing risk, because with a bank, you’ll need to know why it's that way. On the other hand, with Google Translate, for instance, you don't need to know why it gave you that particular sentence. You don't analyze it. It’s the result and nobody knows why. And if you don't like it, you feed it different data to learn it differently. But here at Creative Dock, we want to understand it, too. Because then when somebody says, "Hey, why did you give me a bad credit score when I'm actually great?", we can tell them, "But you're bad at this specific thing."
So when Facebook bans me, their people might not even know why they did it?
That's possible. But if you complain, they can look at it and analyze it retroactively.
User-wise, machine learning is already affecting everyone these days?
Absolutely. I was amused when I recently gave a cursory try to the German translator, DeepL, which is now beating Google Translate. Not in terms of usage, but translation quality. A colleague told me that he put some of his Czech text in there, translated it into English, liked it, so he translated it from English back into Czech and thought it had better Czech than the original. (laughs)
How accessible are these technologies like machine learning for small businesses?
It depends. In Denmark, we worked on a project with Eat Grim – a company that grew up pretty much from nothing, they did everything by hand for a couple of months. The principle is selling crooked fruits and vegetables that don't match the big market's idea of what a cucumber or an apple should look like. When you want a machine to spot a "wrong" banana or vegetable, you can use neural networks. For example, you've got a conveyor belt that it runs on, a camera, and a software that compares: Is it the right color? Shape? Is it ripe?
"Data science is the sexiest stuff."
So after risk and automation, what’s next in business?
Data Science, which we've already mentioned a while ago. That's the sexiest stuff to write in presentations afterwards. (laughs) For example, we did the Crash project. For an insurance company, we were identifying parts of a damaged car from pictures and estimating the amount of damage. Or a car insurance project assessing the quality of driving: based on position sensors in mobile phones, we were able to find out if the driver was driving safely.
But this area also includes more common things like the recommendation engine. You already know it from Netflix or YouTube, which will offer you the next thing to watch based on your behavior.
That's a real concern for us right now on our Albert app, for example. Based on what you buy and your preferences, we can recommend the most suitable products and recipes. But again, there are many layers. Even in such a big and successful project, the recommendation engine is basically made with the simplest logic. And the client is happy that they are able to understand it, that they know what results it gives and why. They don't want any magic. We've already come up with about three levels of how it could be made more robust and more accurate to specific metrics, but for now, the way it works is just fine.
Can you give me some more examples of how you used machine learning?
For one client, we were determining if the roof of a given house was suitable for solar panels. Or what the payback would be, how much you would need to put in, et cetera. The idea was that you put in your address, we'll pull up pictures from Google Maps, the land registry or a few other databases, and say: We're estimating such and such potential, which means you need an investment of maybe 20,000 euros to get a return in 7 years and blah blah blah. And all without anyone having to go see the house.
But it ended in the prototype stage. That's the tough part of the business. Often you make a move but it doesn't bring in as much as you expected, so you hit pause. Sometimes, maybe after a while, it gets pulled out again, or sometimes it doesn't.
Is there a cheerier example?
We've done something similar on the Refinanso project. Say you want to refinance your mortgage; you tell us what kind of property you have, and we say we're 80% sure you can do it. And that the price of your property is in the range of, say, 250 to 300 thousand euros. We do machine analysis of various maps, land registry and price maps, but also on what level the listings of similar properties in the area are. There's a lot of aspects to assess. And as a result, the bank doesn't have to send an agent out there to price the property.
"The market today is so fast that you're normally going to do things that nobody in the world has done or you're not able to find anyway."
So we could say that the advantage of a larger company like Creative Dock is a library of some ready-made solutions?
It's two opposing forces. Every company is always looking for its optimal size. The more experience you have, if you can document it well, your know-how increases. The whole evolution can be faster. That's where we already have a pretty good advantage at Creative Dock. We've already tried a lot of things.
But the market today is so fast that you're normally going to do things that nobody in the world has done or you're not able to find anyway. Because the technology today is completely different than it was, say, two years ago. Take for example, the