#5QW: Katja Hose
Katja is unleashing the power of data. Our new Professor of Databases & AI tells us what data can do and why accuracy is the name of the game.
How would you describe your work in 90 seconds?
My goal is to bring meaning to large amounts of data. Achieving this is rather complex, involving data management, knowledge engineering, and data science. I’m looking into new methods to organize data, integrate different data sets, and efficiently get answers from them. This involves query optimization and analytics as well as machine learning and artificial intelligence. The main data model I work with is knowledge graphs, which essentially is a network to structure data. In a knowledge graph, each piece of information is represented as data points, called nodes, and connections between them. These nodes can be anything, a person, a city, or a gene. The connections between nodes describe how different things are related to each other. Why are knowledge graphs so exciting? Graphs can basically represent any kind of data as factual knowledge. They capture not only similar things and the relationships between them but also additional information about these things and their connections to other things, and information and connections of those things, and so on – meaning that we can access very complex and comprehensive knowledge.
My work is not bound to a specific field or data set. If it’s a table or an Excel sheet, numbers, or texts – we can extract the data. I have worked in various domains, including medicine, bioscience, sustainability assessments, but also ‘traditional’ business data like sales benchmarks, knowledge bases, and Linked Data, which is native graph data modeled using ontologies available on the Web.
How did you get in touch with informatics?
In school, we had informatics classes – I remember thinking, ‘This just makes sense’. In language courses, you always had to interpret what things meant and what the author could have intended. In computer science, I didn’t have to go into lengthy interpretations but rather dealt with logical facts. As a teenager, I started to teach myself some programming in PASCAL. My first ‘project’ was programming a bouncing ball on the screen; my code had go-to jumps, so probably the worst kind of programming style you can use. At some point, I decided that I wanted to do this properly. Torn between studying informatics and biology, I luckily was able to study medical computer science as a minor subject and got the best of both worlds. Still, I love to work with biological and medical data. I have an ongoing project on so-called ‘metagenomic binning’, where we try to structure and analyze multiple genome snippets from different microbes. Usually, sequencing one genome is already a strenuous task. But with data engineering and machine learning, we can create new, efficient ways to sequence and analyze genomes from multiple microbe types.
Where do you see the connection between your research and everyday life?
Simple: Data is everywhere. We continuously produce data and, consequently, knowledge about ourselves and the world. The tricky part about this is my most important mission: If we’re generating knowledge, we must strive for accuracy. Not all models can ingest, analyze, and output data in a way that ensures factual knowledge. A good example is ChatGPT: as a large language model, it gives the most likely answers but not the most accurate ones. Knowledge graphs also work with large sets of combined data, but if your input is correct, you can be sure to get answers based on coherent facts. In the end, I also want my work to increase data quality so that if we’re using machine learning models, we can ensure that they are at least trained on accurate data.
What makes you happy in your work?
When I can get new insights, find new solutions to problems, and can share the result with others. In teaching, I love when students have these ‘aha’ moments. So I try to explain complicated topics in a way that different students can relate to. If I succeed and see that it just ‘clicked’, that makes me very happy.
Why do you think there are still so few women in computer science?
I’m convinced that it’s a societal problem. The percentage of women in informatics differs in other countries, like India, where computer science is considered a rewarding career for girls to gain independence. Of course, the workings behind these different perceptions may also be questionable. Still, the fact is: how society sees computer science as a whole has a significant impact on women going into the field. You’ve already lost as soon as girls think it’s not for them. And this starts as early as kindergarten, where boys and girls are separated in different playgroups and are taught different worldviews, thus developing other interests. Trying to convince them to study informatics as teenagers or young adults is mostly just cosmetics. We must explain to the youngest that they can program and influence all those nice tools they use daily. They need to connect their smartphones and iPads to informatics.
Katja Hose is Professor of Databases & AI at TU Wien Informatics.
Discover the whole #5QW series.