vijaye12 commited on
Commit
58cd57c
1 Parent(s): bac72b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +236 -3
README.md CHANGED
@@ -1,3 +1,236 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ pipeline_tag: time-series-forecasting
4
+ tags:
5
+ - time series
6
+ - forecasting
7
+ - pretrained models
8
+ - foundation models
9
+ - time series foundation models
10
+ - time-series
11
+ ---
12
+
13
+ # Tiny Time Mixer (TTM) Research-Use Model Card
14
+
15
+ <p align="center" width="100%">
16
+ <img src="ttm_image.webp" width="600">
17
+ </p>
18
+
19
+ TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research.
20
+ **With model sizes starting from 1M params, TTM (accepted in NeurIPS 24) introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting.**
21
+
22
+
23
+
24
+ This model card contains the model-weights for research-use only and full reproducibility of our results published in our [paper](https://arxiv.org/pdf/2401.03955.pdf).
25
+ However - if you are looking for TTM model weights for commercial and enterprise use, please refer to our granite releases [here](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2)
26
+
27
+
28
+ TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTMs are lightweight
29
+ forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be
30
+ fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. Refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf) for more details.
31
+
32
+
33
+ **The current open-source version supports point forecasting use-cases specifically ranging from minutely to hourly resolutions
34
+ (Ex. 10 min, 15 min, 1 hour.).**
35
+
36
+ **Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!!**
37
+
38
+
39
+ ## Model Description
40
+
41
+ TTM falls under the category of “focused pre-trained models”, wherein each pre-trained TTM is tailored for a particular forecasting
42
+ setting (governed by the context length and forecast length). Instead of building one massive model supporting all forecasting settings,
43
+ we opt for the approach of constructing smaller pre-trained models, each focusing on a specific forecasting setting, thereby
44
+ yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast,
45
+ facilitating easy deployment without demanding a ton of resources.
46
+
47
+ Hence, in this model card, we plan to release several pre-trained
48
+ TTMs that can cater to many common forecasting settings in practice. Additionally, we have released our source code along with
49
+ our pretraining scripts that users can utilize to pretrain models on their own. Pretraining TTMs is very easy and fast and can be enabled in less than a day as opposed to several days or weeks in traditional approaches.
50
+
51
+ Each pre-trained model will be released in a different branch name in this model card. Kindly access the required model using our
52
+ getting started [notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) mentioning the branch name.
53
+
54
+
55
+ ## Model Releases (along with the branch name where the models are stored):
56
+
57
+
58
+ - **512-96-r2**: Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
59
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
60
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: main)
61
+ - **1024-96-r2**: Given the last 1024 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
62
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
63
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-96-r2)
64
+
65
+ - **1536-96-r2**: Given the last 1536 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
66
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
67
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1536-96-r2)
68
+
69
+ - **512-192-r2**: Given the last 512 time-points (i.e. context length), this model can forecast up to next 192 time-points (i.e. forecast length)
70
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
71
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 512-192-r2)
72
+
73
+ - **1024-192-r2**: Given the last 1024 time-points (i.e. context length), this model can forecast up to next 192 time-points (i.e. forecast length)
74
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
75
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-192-r2)
76
+
77
+ - **1536-192-r2**: Given the last 1536 time-points (i.e. context length), this model can forecast up to next 192 time-points (i.e. forecast length)
78
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
79
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1536-192-r2)
80
+
81
+
82
+ - **512-336-r2**: Given the last 512 time-points (i.e. context length), this model can forecast up to next 336 time-points (i.e. forecast length)
83
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
84
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 512-336-r2)
85
+
86
+ - **1024-336-r2**: Given the last 1024 time-points (i.e. context length), this model can forecast up to next 336 time-points (i.e. forecast length)
87
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
88
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-336-r2)
89
+
90
+ - **1536-336-r2**: Given the last 1536 time-points (i.e. context length), this model can forecast up to next 336 time-points (i.e. forecast length)
91
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
92
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1536-336-r2)
93
+
94
+
95
+
96
+ - **512-720-r2**: Given the last 512 time-points (i.e. context length), this model can forecast up to next 720 time-points (i.e. forecast length)
97
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
98
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 512-720-r2)
99
+
100
+ - **1024-720-r2**: Given the last 1024 time-points (i.e. context length), this model can forecast up to next 720 time-points (i.e. forecast length)
101
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
102
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-720-r2)
103
+
104
+ - **1536-720-r2**: Given the last 1536 time-points (i.e. context length), this model can forecast up to next 720 time-points (i.e. forecast length)
105
+ in future. This model is pre-trained with a larger pretraining dataset for improved accuracy. Recommended for hourly and minutely
106
+ resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1536-720-r2)
107
+
108
+
109
+ ## Model Capabilities with example scripts
110
+
111
+ The below model scripts can be used for any of the above TTM models. Please update the HF model URL and branch name in the `from_pretrained` call appropriately to pick the model of your choice.
112
+
113
+ - Getting Started [[colab]](https://colab.research.google.com/github/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb)
114
+ - Zeroshot Multivariate Forecasting [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb)
115
+ - Finetuned Multivariate Forecasting:
116
+ - Channel-Independent Finetuning [[Example 1]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) [[Example 2]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_m4_hourly.ipynb)
117
+ - Channel-Mix Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttm_channel_mix_finetuning.ipynb)
118
+ - **New Releases (extended features released on October 2024)**
119
+ - Finetuning and Forecasting with Exogenous/Control Variables [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttm_with_exog_tutorial.ipynb)
120
+ - Finetuning and Forecasting with static categorical features [Example: To be added soon]
121
+ - Rolling Forecasts - Extend forecast lengths beyond 96 via rolling capability [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_rolling_prediction_getting_started.ipynb)
122
+ - Helper scripts for optimal Learning Rate suggestions for Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttm_with_exog_tutorial.ipynb)
123
+
124
+ ## Benchmarks
125
+
126
+ TTM outperforms popular benchmarks such as TimesFM, Moirai, Chronos, Lag-Llama, Moment, GPT4TS, TimeLLM, LLMTime in zero/fewshot forecasting while reducing computational requirements significantly.
127
+ Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider
128
+ adoption in resource-constrained environments. For more details, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).
129
+ - TTM-B referred in the paper maps to the 512 context models.
130
+ - TTM-E referred in the paper maps to the 1024 context models.
131
+ - TTM-A referred in the paper maps to the 1536 context models.
132
+
133
+
134
+ ## Recommended Use
135
+ 1. Users have to externally standard scale their data independently for every channel before feeding it to the model (Refer to [TSP](https://github.com/IBM/tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py), our data processing utility for data scaling.)
136
+ 2. The current open-source version supports only minutely and hourly resolutions(Ex. 10 min, 15 min, 1 hour.). Other lower resolutions (say weekly, or monthly) are currently not supported in this version, as the model needs a minimum context length of 512 or 1024.
137
+ 3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will
138
+ impact the model performance.
139
+
140
+
141
+
142
+ ## Model Details
143
+
144
+ For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).
145
+
146
+ TTM-1 currently supports 2 modes:
147
+
148
+ - **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).
149
+
150
+ - **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast.
151
+
152
+ **Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes
153
+ to get more accurate forecasts.**
154
+
155
+ The current release supports multivariate forecasting via both channel independence and channel-mixing approaches.
156
+ Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across
157
+ time-series variates, a critical capability lacking in existing counterparts.
158
+
159
+ In addition, TTM also supports exogenous infusion and categorical data infusion.
160
+
161
+
162
+ ### Model Sources
163
+
164
+ - **Repository:** https://github.com/ibm-granite/granite-tsfm/tree/main/tsfm_public/models/tinytimemixer
165
+ - **Paper:** https://arxiv.org/pdf/2401.03955.pdf
166
+
167
+
168
+ ### Blogs and articles on TTM:
169
+ - Refer to our [wiki](https://github.com/ibm-granite/granite-tsfm/wiki)
170
+
171
+
172
+ ## Uses
173
+
174
+ ```
175
+ # Load Model from HF Model Hub mentioning the branch name in revision field
176
+
177
+ model = TinyTimeMixerForPrediction.from_pretrained(
178
+ "https://huggingface.co/ibm/TTM", revision="main"
179
+ )
180
+
181
+ # Do zeroshot
182
+ zeroshot_trainer = Trainer(
183
+ model=model,
184
+ args=zeroshot_forecast_args,
185
+ )
186
+ )
187
+
188
+ zeroshot_output = zeroshot_trainer.evaluate(dset_test)
189
+
190
+
191
+ # Freeze backbone and enable few-shot or finetuning:
192
+
193
+ # freeze backbone
194
+ for param in model.backbone.parameters():
195
+ param.requires_grad = False
196
+
197
+ finetune_forecast_trainer = Trainer(
198
+ model=model,
199
+ args=finetune_forecast_args,
200
+ train_dataset=dset_train,
201
+ eval_dataset=dset_val,
202
+ callbacks=[early_stopping_callback, tracking_callback],
203
+ optimizers=(optimizer, scheduler),
204
+ )
205
+ finetune_forecast_trainer.train()
206
+ fewshot_output = finetune_forecast_trainer.evaluate(dset_test)
207
+
208
+ ```
209
+
210
+ ## Citation
211
+ Kindly cite the following paper, if you intend to use our model or its associated architectures/approaches in your
212
+ work
213
+
214
+ **BibTeX:**
215
+
216
+ ```
217
+ @inproceedings{ekambaram2024tinytimemixersttms,
218
+ title={Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series},
219
+ author={Vijay Ekambaram and Arindam Jati and Pankaj Dayama and Sumanta Mukherjee and Nam H. Nguyen and Wesley M. Gifford and Chandra Reddy and Jayant Kalagnanam},
220
+ booktitle={Advances in Neural Information Processing Systems (NeurIPS 2024)},
221
+ year={2024},
222
+ }
223
+ ```
224
+
225
+ ## Model Card Authors
226
+
227
+ Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam
228
+
229
+
230
+ ## IBM Public Repository Disclosure:
231
+
232
+ All content in this repository including code has been provided by IBM under the associated
233
+ open source software license and IBM is under no obligation to provide enhancements,
234
+ updates, or support. IBM developers produced this code as an
235
+ open source project (not as an IBM product), and IBM makes no assertions as to
236
+ the level of quality nor security, and will not be maintaining this code going forward.