Hoppa till innehåll

  • Start
  • Nyheter
  • Om Programmet
    • Varför STS?
    • Fördjupning om programmet
    • Ämnesöversikt
    • Intervjuer
  • Arbetsmarknad
    • För studenten
  • Student på programmet
    • Studieresurser
    • C-uppsatser
    • Utlandsstudier
  • Examensarbete
    • Att skriva examensarbete
    • Platsannonser
    • Registrera examensarbete
    • Boka tid för presentation
    • Listor över examensarbeten
    • Kommande Exjobbspresentationer
Sök

Exploring Real-World Robustness in Malware Detection: Evaluating class distribution with SOREL-20M

Information

Författare: Hanna Moberg Andersson
Beräknat färdigt: 2025-06
Handledare: Anna Lindelöf
Handledares företag/institution: Subset
Ämnesgranskare: Parosh Abdulla
Övrigt: -


Presentation

Presentatör: Hanna Moberg Andersson
Presentationstid: 2025-10-01 10:15
Opponent: Linn Gattermann

Abstract

Malware is constantly evolving as attackers adopt more sophisticated techniques, and traditional detection methods based on static signatures, which require manual labeling, will struggle to keep up with the pace. In response, machine learning has emerged as a promising approach to improve malware detection. However, despite high accuracy in controlled experiments, these models often perform poorly when applied in real-world environments. This thesis investigates one possible reason for this discrepancy: the difference in distribution of malicious samples in model training and real-world environments. Using the SOREL-20M dataset, several experiments using a Random Forest classifier were conducted. To reflect realistic conditions, the distribution of malware in the training set shifted from balanced to highly skewed, with the malicious class being the minority one. The performance was evaluated using accuracy, precision, recall, and F1-score. The results show that class distribution has a significant impact on model performance, particularly on the trade-off between false positives and false negatives. Balanced training sets tend to produce higher recall, however, they often generate a high number of false positives. Models trained on imbalanced data, on the other hand, perform better at precision but may fail to detect many malicious samples. The results highlight the importance of considering dataset composition when developing AI-based malware detection systems. By adjusting the class distribution in training and testing, developers can control the performance of their models to fit their purposes and better prepare their models for deployment. This work contributes to the ongoing efforts in cybersecurity to bridge the gap between experimental performance and real-world robustness.

Ladda ner rapporten

Exploring Real-World Robustness in Malware Detection: Evaluating class distribution with SOREL-20M
  • Start
  • Nyheter
  • Om Programmet
    • Varför STS?
    • Fördjupning om programmet
    • Ämnesöversikt
    • Intervjuer
  • Arbetsmarknad
    • För studenten
  • Student på programmet
    • Studieresurser
    • C-uppsatser
    • Utlandsstudier
  • Examensarbete
    • Att skriva examensarbete
    • Platsannonser
    • Registrera examensarbete
    • Boka tid för presentation
    • Listor över examensarbeten
    • Kommande Exjobbspresentationer

Kontakt

Hemsideansvarig
Studievägledare
STS-sektionen

Andra webbplatser

Uppsala Universitet
Schema
Antagning.se
Antagningsstatistik
Hittatenta.se
STS-sektionens hemsida

 

Integritetspolicy | STS-programmet 2024

Vi använder cookies för att kunna ge dig den bästa möjliga användarupplevelsen. Du kan läsa mer om hur vi använder cookies och hanterar användardata i vår integritetspolicy.Jag förstårLäs mer