Data Generator¶
Numalogic provides a data generator to create some synthetic time series data, that can be used as train or test data sets.
Using the synthetic data, we can:
- Compare and evaluate different ML algorithms, since we have labeled anomalies
- Understand different types of anomalies, and our models' performance on each of them
- Recreate realtime scenarios
Generate multivariate timeseries¶
from numalogic.synthetic import SyntheticTSGenerator
ts_generator = SyntheticTSGenerator(
seq_len=8000,
num_series=3,
freq="T",
primary_period=720,
secondary_period=6000,
seasonal_ts_prob=0.8,
baseline_range=(200.0, 350.0),
slope_range=(-0.001, 0.01),
amplitude_range=(10, 75),
cosine_ratio_range=(0.5, 0.9),
noise_range=(5, 15),
)
# shape: (8000, 3) with column names [s1, s2, s3]
ts_df = ts_generator.gen_tseries()
# Split into test and train
train_df, test_df = ts_generator.train_test_split(ts_df, test_size=1000)
Inject anomalies¶
Now, once we generate the synthetic data like above, we can inject anomalies into the test data set using AnomalyGenerator
.
AnomalyGenerator
supports the following types of anomalies:
- global: Outliers in the global context
- contextual: Outliers only in the seasonal context
- causal: Outliers caused by a temporal causal effect
- collective: Outliers present simultaneously in two or more time series
You can also use anomaly_ratio
to adjust the ratio of anomalous data points wrt number of samples.
from numalogic.synthetic import AnomalyGenerator
# columns to inject anomalies
injected_cols = ["s1", "s2"]
anomaly_generator = AnomalyGenerator(
train_df, anomaly_type="contextual", anomaly_ratio=0.3
)
outlier_test_df = anomaly_generator.inject_anomalies(
test_df, cols=injected_cols, impact=1.5
)