Scientific supervisor, Doctor of Physical and Mathematical Sciences Ihnatenko Oleksii
Game theory
Learning in games. Axelrod tournaments
Project domain and motivation. Machine learning and game theory naturally intersect in developing strategies in iterative multiplayer games. With neural networks, this research gets a new boost. For now, the question of the best strategy is open.
Project description. Axelrod tournament is the computer tournament of programs, playing repeated prisoner’s dilemma games. The prisoner’s dilemma is a crucial model for understanding processes of conflict and cooperation on a long-term scale. For now, finite automata and RNN are fighting for first place, but the result strongly depends on the number and types of participants (see https://arxiv.org/abs/2001.05911 for current overview). The idea is to try new approaches to meet SOTA and propose a new, better algorithm.
Desired outcome. Study recent neural network architectures and their performance in different tournament setups.
Project prerequisites. Basic game theory, machine learning algorithms
Natural language processing
Multi-Messenger online data analysis of astronomical data
Project domain and motivation. The field is NLP of astronomy-related texts. The problem now is the management of a new type of astronomical “big data” flow generated by modern multi-messenger observational facilities: detectors of gravitational waves and neutrinos, radio, optical, X-ray, and gamma-ray telescopes. Vast amounts of the multi-messenger data host a vast discovery potential that needs to be uncovered using modern data science approaches and exploiting the open science data management principles (Findable, Accessible Interoperable Reusable, FAIR).
Project description. The multi-messenger astronomy has recently emerged with breakthrough developments in the detection of gravitational waves and extra-solar neutrinos and immediately revealed new challenges in combining essentially different messenger-specific perspectives of astronomical objects. This provides a case for the data science study of possible synergies of diverse data analysis methods required to leverage the power of a broad range of observational techniques in addressing common scientific questions. The project will also aim to close the gap between the expertise in domain-specific data analytics and broadly applicable data analysis techniques and on development and application of machine learning methods. We aim to improve the labeling of raw text of astronomy messages and apply NER methods to connect messages to the knowledge graph of European science.
Desired outcome. Apply NLP algorithms for astronomy observation dataset and connect messages with objects and papers in the knowledge graph.
Project prerequisites NLP methods, NER, knowledge trees.
Sentiment analysis of call center audio conversations
Project domain and motivation. Speech Emotion Recognition is a recent part of sentiment analysis. Ukrainian speech recognition (as well as Ukrainian NLP) is still in an under-resourced stage. That is why a business needs models, helping to work with concrete problems. One crucial problem is detecting conflicts and other emotions in audio conversations of call centers. Each call center produces a large amount of data, and each manager's quality of work needs to be evaluated manually; any automation will be of great value.
Project description. There is a dataset of prepared, labeled audio files with conversations. Audio can contain Ukrainian and Russian conversation and tech slang terms, typical for the application area. Also, participants can talk fast or even talk simultaneously. The aim is to perform sentiment analysis for Ukrainian (possibly Russian) speech and detect intense emotions.
Desired outcome. The model predicts emotions with acceptable accuracy and is capable of classifying a large set of audio recordings to find possible conflicts or misunderstandings between managers and customers.
Project prerequisites. Familiarity with sentiment analysis and audio preprocessing.
Evaluation of script following
Project domain and motivation. Speech-to-text-domain is an integral part of NLP. The industry's problem is evaluating the quality of call center audio conversations. The quality here is how close the manager follows a particular "scenario of talk," making proposals and working with customers' questions. Each call center produces a large amount of data, and each manager's quality of work needs to be evaluated manually. Any automation will be of great value.
There is a dataset of prepared, labeled audio files with conversations. Audio can possibly contain Ukrainian and Russian conversation and tech slang terms, typical for the application area. Also, participants can talk fast or even talk simultaneously. The aim is to perform sentiment analysis for Ukrainian (possibly Russian) speech and detect intense emotions.
Desired outcome. The model predicting emotions with acceptable accuracy capable of classifying large set of audio recordings to find probable conflicts or misunderstandings between managers and customer
Project prerequisites. Familiarity with sentiment analysis and audio preprocessing.
NLP analysis of Russian propaganda messages
Project domain and motivation. The problem of manipulation through media is significant now. We already know how much effort is invested into propaganda networks of influence by Russia. There are known cliches used in messages to spread panic and desperation, so we can try to detect them by searching social networks for specific patterns.
Project description. The Russian propaganda network gathered by youcontrol (link) is an exciting tool for analysis. If it is an influence network, it should be controlled by an external source. This source supposedly sets specific goals and narratives to spread, so we can try to use NLP methods for searching for patterns of manipulations. We have confirmed a dataset of manipulative posts gathered by Oksana Moroz, and the hypothesis is that we can train an algorithm to detect manipulative cliche. Such a tool would be handy for information counterfeiting.
Desired outcome. We aim to get a trained algorithm that detects manipulative cliches in social media.
Project prerequisites. NLP, machine learning.
Reinforcement learning
Socoban puzzles solving
Project domain and motivation. Reinforcement learning is a generalization of the supervised learning valuable paradigm for non-stationary environments. There are different methods, beginning with simple controlled Markov chains and growing into an MCTS setup. Model-free reinforcement learning methods can achieve superhuman performance in environments where precise control and high reaction speeds are paramount.
The unique properties of the Sokoban puzzle (deadlocks, loops) make it a rich domain for research.
Project description. Deep reinforcement learning methods can learn complex heuristics in uncertain environments without prior knowledge. Sokoban puzzle provides an excellent playground for complex decision-making in a relatively small environment. An unusual decision tree, generated by box pushing constraint, creates deadlocks when some moves create the impossibility of a successful finish in the distant future. In this project, we need to compare many tools of reinforcement learning methods: model-based, model-free, and actor-critic to achieve the best performance.
Desired outcome. Algorithm for solving and generation of small interesting for human puzzles.
Project prerequisites. Reinforcement learning methods.
Computer vision
Paleontology images detection and classification
Project domain and motivation. The problem is to detect objects and classify them for fossil images. There is a large base of images with fossil remnants, and it is essential to understand what type of remnants and period it represents. General classification of the fossil can have business value because no such system exists now.
Project description. The idea of the project is to implement semantic segmentation of fossil images. The first one can start with object detection, which is not evident because sometimes prints overlap or partially missing. Next, supervised segmentation is possible, and finally, semantic segmentation.
Desired outcome. Semantic segmentation of fossil images.
Project prerequisites. Computer vision concepts and algorithms