Generating Realistic Test Traffic Using Markov Chains
October 28, 2020
This article shows how you can generate realistic user traffic for testing purposes by sampling requests in production and generating fake ones using Markov chains.
Example code is available on GitHub.
Markov chains
A Markov chain is a stochastic model telling us probability of an event based on previously observed events. Probability of next event is determined by last known event or previous events, which gives us a Markov chain of order :
We can track actual user behaviour and build a model by counting events and calculating weighted probabilities for each event based on events preceding them.
For example, users finished writing a blog post. users later published it and one of them trashed it. This gives us probabilities and . We can then selected a random event with roulette wheel selection.
This is useful for stress or smoke testing when you need some fake traffic that resembles real users.
Implementation
Let's declare a struct for a Markov chain.
Map occurrences
will hold event counts BTreeMap<T, usize>
for any previous series of events Vec<T>
.
#[derive(Clone)]
pub struct MarkovChain<T>
where
T: Clone + Ord,
{
order: usize,
occurrences: BTreeMap<Vec<T>, BTreeMap<T, usize>>,
memory: Vec<T>,
rng: ThreadRng,
}
To build a model we have to go through all observed events and count the number of times -th event occured after all events.
We also track last known events inside the memory
vector, which will be used to generate the next event.
impl<T> MarkovChain<T>
where
T: Clone + Ord,
{
pub fn update(&mut self, events: &[T]) {
let events: Vec<_> = events.to_vec();
for history in events.windows(self.order + 1) {
// Split window to 0..N-1 and N
let previous = history[0..self.order].to_vec();
let current = history.last().cloned().unwrap();
// Count occurrence for current event
self.occurrences
.entry(previous)
.or_default()
.entry(current)
.and_modify(|count| *count += 1)
.or_insert(1);
}
// Update internal memory
self.memory.reserve(self.order);
for event in events.into_iter().rev().take(self.order) {
self.memory.insert(0, event);
}
self.memory.truncate(self.order);
}
// ...
The generate_from
function takes in the memory, finds occurrences and chooses an event from those.
We use choose_weighted function provided by the rand crate.
pub fn generate_from(&mut self, memory: &[T]) -> Option<T> {
assert_eq!(memory.len(), self.order, "invalid memory size");
if let Some(occurrences) = self.occurrences.get(memory) {
// Get number of occurrences for each known event.
// We need a Vec for `SliceRandom::choose_weighted`.
let occurrence_counts: Vec<_> = occurrences
.iter()
.map(|(event, count)| (event.clone(), *count))
.collect();
// Chose a random event based on its count
occurrence_counts
.choose_weighted(&mut self.rng, |(_, count)| *count)
.map(|(event, _)| event)
.ok()
.cloned()
} else {
// No match from memory
None
}
}
After generating a new event, we update the internal memory. Next events will then be calculated from the most recent memory.
pub fn generate(&mut self, update_memory: bool) -> Option<T> {
let last_memory = self.memory.clone();
if let Some(next) = self.generate_from(&last_memory) {
if update_memory {
// Update internal memory
self.memory.insert(0, next.clone());
self.memory.truncate(self.order);
}
Some(next)
} else {
None
}
}
We can also write an iterator that returns generated events.
pub struct MarkovChainIter<'a, T>
where
T: Clone + Ord,
{
chain: &'a mut MarkovChain<T>,
}
impl<'a, T> Iterator for MarkovChainIter<'a, T>
where
T: Clone + Ord,
{
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
self.chain.generate(true)
}
}
impl<T> MarkovChain<T>
where
T: Clone + Ord,
{
pub fn iter(&mut self) -> MarkovChainIter<T> {
MarkovChainIter { chain: self }
}
// ...
}
Example
To see it in action, we declare an enum of all possible actions. Of course, these can be way more complicated in the real world use-case.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
enum UserAction {
SignIn,
SignOut,
CreateTodo,
DeleteTodo,
ListTodos,
}
We build a chain from a sample of actions.
let mut chain = MarkovChain::new(1);
let actions = vec![
UserAction::SignIn,
UserAction::ListTodos,
UserAction::CreateTodo,
UserAction::CreateTodo,
UserAction::SignOut,
UserAction::SignIn,
UserAction::ListTodos,
UserAction::DeleteTodo,
UserAction::DeleteTodo,
UserAction::CreateTodo,
UserAction::CreateTodo,
UserAction::SignOut,
UserAction::SignIn,
UserAction::ListTodos,
UserAction::CreateTodo,
UserAction::DeleteTodo,
UserAction::CreateTodo,
UserAction::DeleteTodo,
UserAction::ListTodos,
UserAction::DeleteTodo,
UserAction::SignOut,
];
chain.update(&actions);
Then generate a few actions.
for action in chain.iter().take(16) {
if action == UserAction::SignIn {
println!("## New session ##");
}
println!("{:?}", action);
}
Which gives us the following output.
cargo run --example traffic
## New session ##
SignIn
ListTodos
DeleteTodo
SignOut
## New session ##
SignIn
ListTodos
DeleteTodo
DeleteTodo
CreateTodo
SignOut
## New session ##
SignIn
ListTodos
CreateTodo
CreateTodo
CreateTodo
DeleteTodo
Notice how after each SignIn
there's a ListTodos
, which means .
Built model can represent inherent rules of our application.
Some series of actions will not be generated, those having probability , which is a lot better than uniformly generated random data.
Some combination of actions might not be possible, because they'd break business rules. Those can be filtered out, or left in to test invalid requests.
Conclusion
We can do much more with this. This was only a brief introduction to a handy tool that can be used to generate some fake requests.
Sample code is available on GitHub.