Detecting unknown network attacks using language models
Konrad Rieck and Pavel Laskov
In: DIMVA 2006, 13-14 Jul 2006, Berlin.
We propose a method for network intrusion detection based on language models such as n-grams and words. Our method proceeds by extracting these models from TCP connection payloads and applying unsupervised anomaly detection. The essential part of our approach is linear-time computation of similarity measures between language models stored in trie data structures.
Results of our experiments conducted on two datasets of network
traffic demonstrate the importance of higher-order n-grams for
detection of unknown network attacks. Our method is also suitable for
language models based on words, which are more amenable in practical security applications. An implementation of our system achieved detection accuracy of over 80% with no false positives on instances of recent attacks in HTTP, FTP and SMTP traffic.