A SUSU senior research fellow, Candidate of Sciences (Physics and Mathematics) Aleksei Ruchai, jointly with his colleagues, is working on tasks from a wide variety of fields: digital banking, cattle breeding, face recognition, and tumour recognition using mammography. And in all these cases, the combination of innovative ideas and skilful use of equipment allows to achieve interesting results.
Gradient boosting versus random forest
What is a transaction? A bank operation on money transfer, its transfer from one account to another one, and so on.
Computer sees transaction as data, which reflects the history of the flow of funds. What for? So that you could always file a complaint or request the information on the transfer.
Some transactions may look suspicious and may draw the attention of the officers of tax services, banks or law-enforcement agencies. Say, if equal amounts get constantly transferred to someone's account, can it mean illicit trade? Certain amounts may be indicative of a distribution of drugs. The fight against "money laundering" is an important condition for financial stability and security.
The task of artificial intelligence is to learn how to reveal suspicious and illegal transactions. Computers do not have to word the cause as humans do. It is enough for them to create a mathematical criterion to make a decision on whether the transaction is legal or not. To learn that, they use the accumulated databases of transactions, where suspicious lines have been marked beforehand. It is good when such a database is well-balanced, meaning that it has approximately equal numbers of "normal" and "bad" transactions. In the real-life activity of banks, the share of suspicious and illegal transactions is quite small. So, having received the "real-life" data, the system will have to spend more time on learning, and the results will be less efficient.
For learning and testing of their system, the SUSU senior research fellow, Candidate of Sciences (Physics and Mathematics) Aleksei Ruchai, jointly with his colleagues, have used the Elliptic database, with a selection of bitcoin transactions. This database is not well-balanced: it has approximately two hundred thousand transactions with about the tenth part thereof being the illegal ones. To collect it, scientists had to preliminarily balance out the database.
The Elliptic database is a well-known one, so of course it sparked interest of other specialists as well. They tested various machine learning methods with funny names: "random forest", "logistic regression", "naive Bayes classifier", and "multilayer perceptrons method". Aleksei Ruchai and his colleagues have managed to surpass all the previous results using gradient boosting.
Boosting (meaning "enhancement") helps improve the accuracy of forecasts. Step by step, boosting builds a model. At each step, the model learns using those examples where it made a mistake at the previous step, thus fixing these mistakes without affecting the general accuracy.
The term "gradient" was taken from analysis. Such boosting is similar to "gradient descent", which optimizes the function, finding its minimum and maximum values. We must make minimum number of mistakes while revealing the maximum number of suspicious transactions, right?
As a result, the share of correct answers using the Elliptic database with the algorithm of XGBClassifier equalled 0.9921, while the previous published result was only 0.9780. The difference between these two figures is critical: the previous one did not meet the requirements of reliability for revealing abnormal transactions, and now this condition has been satisfied. So, our scientists have won!
Thirty three cows – how many kilos?
Live-stock animals must be weighted from time to time. It is a known fact that this is a stressful procedure for cows and they lose 5–10 % of their body weight as a result. And it is not because of what they see on the indicator scale, like fashionistas on a diet might do. It is just that any strange procedure or close contact with an unknown person can give animals a fright.
Cattle are not that easy to frighten though. It is much more complicated for pigs living at farms. Specialists state that if you take a pregnant or lactating sow to do the weighing, it can have a heart attack resulting in sudden death.
That is why a preliminary and careful approval must be obtained for all research studies, and experiments must be approved in terms of bioethics. Lockdown measures must also be taken. No animals must be harmed during the measurement!
The SUSU senior research fellow, Candidate of Sciences (Physics and Mathematics) Aleksei Ruchai participates in a project on measuring the body weight of live-stock animals based on their size and appearance: points or morphological characteristics. This work is supported by a grant from the Russian Science Foundation and is being fulfilled jointly with the Federal Research Centre for Biological Systems and Agricultural Technologies of the Russian Academy of Sciences from Orenburg. The research is in progress since 2017.
Tape measurements cause as much stress in cows as the attempts to put them on scales. That is why mathematics, computer vision and artificial intelligence come to our aid.
Computer vision implies observation using a digital camera and image analysis. To assess a pig's profile, we need to either immobilize it, or let the camera choose the right angle itself. It is difficult to measure the parameters using a video of a moving animal as you need to take the movement trajectory into account. And it also turned out that animals are afraid of cameras as well, especially if these move or make whirring sounds. As a result, it is only possible to capture animals on video from overhead. Cameras mounted in this position stay much cleaner.
"Every year, we add images of 200-300 animals, of both cows and pigs, to our database," shares Aleksei Ruchai. "Creation of such a database is a labour-intensive process. According to our preliminary calculations, the measurement of body weight using an overhead image of an animal can yield satisfactory results, though of course side-view images would have provided measurement of a better quality."
In addition, overhead images provide additional information, using which it is possible to evaluate the yields of both milk and meat.
Body weight of animals depends on the season, and it is important to take the temperature and other environment parameters into consideration. The error in the weight measurement provided by the model will equal 5–10 %, and this is acceptable for agricultural complex. The model can make a mistake if it receives an image of an animal of another weight category, or of one depicted on a rubber litter instead of a concrete one – these are the problems of all neural networks.
"The geometry and points of an animal would certainly help determine its body weight," explains Aleksei Ruchai. "Our goal is more ambitious: we want to also tie the genetic parameters of an animal to its morphological characteristics. We take the genetic analysis, make a selection and find out which genes correlate to the height at the shoulder, at hips, or the depth and width of chest, etc. And we've managed to detect several of such genes."
Two and a half dimension
Another filed that Aleksei Ruchai works in is related to face recognition. And here they use an interesting idea: application of 2.5D (literally "two and a half dimension") images.
Face identification systems are our everyday reality now. Normally these work with two-dimensional images, and efficiently, too. But naturally they make mistakes sometimes. If a person's face is too far from the camera or too close to it, or if the angle is not right, the neural network can yield a false result. We could make people stand properly in front of the camera by creating rules and standards, or we could rather give thought to improving the technology itself.
A flat image consists of pixels, each of which is determined by the intensity of the red, green and blue colour. Every computer user knows the RGB abbreviation. The principle of a 2.5-dimensional image is that RGB is complemented with D – image depth or distance between the camera and the object's surface (a face in our case).
There are three ways to measure the depth. The first one is simple and unreliable. It is called photometry: two identical flat images located at a certain distance from one another are compared. The second method implies the using of a laser. And the third one is infrared illumination: the object's geometry is determined based on the "distortion" of the infrared grid.
The neural networks, which have already learned to recognize standard two-dimensional images of faces, are used as source material. To create those, a huge amount of images is required along with supercomputer processing. Only big corporations can do it. So, Google created namely such a database called Imagenet. And based on it, the mathematicians from Chelyabinsk have created a more advanced artificial intelligence, which already takes the image depth into account.
Beware! Cyber threat!
A new project that Aleksei Ruchai participates in is called "Intelligent Methods of Ensuring Cyber Security of Industrial Networks of Computer-Aided Process Control Systems at Enterprises". The work under this project is being conducted with the supervision on behalf of the SUSU senior research fellow Konstantin Kostromitin.
In the Ural region, first and foremost, such systems are used at metallurgy enterprises. The high temperatures, aggressive acidic vapours and other hazards for workers and residents of the city, which may become an epicentre of an environmental disaster, make automation life-saving.
Data from cameras and sensors are fed to standard computers connected with each other through protocols like TCP/IP. And what if a virus interferes with the work of such system, or an ill-intentioned hacker decides to create a local apocalypse in the city?
Obviously, there already exist many technical means to prevent cyber attacks. However, they resemble "plugging of holes". While iron and steelworks can have quite many units, which require security.
Neural networks, which have learned from multiple cyber attacks' examples, could monitor the "general" state of the system, detecting suspicious conditions long before they cause trouble. Let us remember how it was with "marginal" bank transactions described earlier. Artificial intelligence will help efficiently set the reference points and monitor them in order to alarm us not just in time, but beforehand.