- We assume that the originally available data is data.csv, containing entries for 6 time periods
- We want to split into one dataset (here represented by file) used for fitting a model (training) and one dataset (again here as file) used for making predictions by a model
- The important part is that (disease) predictions made by a model are made for a period that was not seen when fitting (training) the model
- Most models that involve multiple time series (e.g. temperature and disease) are not working in an auto-regressive mode, that is, they can not predict a coming period and then use the predicted value as input to predict the next again. The reason is that they typically only predict the main outcome (e.g. disease) and not further time series (e.g. temperature). This means they will only make one prediction at a time.
- These models could be used in various forms of partly auto-regressive modes, though we will not go into that now (like auto-regressively predicting disease while using already available forecasted temperature time series, or by auto-regressively predicting based on data at a range of time lags and ignoring covariates as they become unavailable through the auto-regression)
- What we already want to solve is the orchestration of multiple assessment contexts, with the appropriate information being made available in each context, divided into a) the information to be used for fitting (training) a model, b) the information to be available when making predictions (not the values to be predicted..), and c) the true target values (answer) to compare the predictions against when making the assessment.
- In standard machine learning, the train and test sets are disjunct. In principle, training data could be used (available) when testing a trained (fitted) model, but there is no use of it. When testing ahead in time with a fitted (trained) time series model, time lagged values are used when making predictions. This means that if train-test is split as past-future at a specific time point, then past data will be needed when the fitted model is predicting the first future values. This means that training data is needed also when applying the fitted in the testing phase. 
- One way to solve this, would be to not make a train-test split of the data at all, only removing the target values to be predicted from a single dataset available to the model, where the model can choose to use whatever part of the data made available to it to fit parameters and  to make the target predictions. Still, it seems natural that a model would only used what is defined as past values (typically ranging to far past history) to fit the model, while using the combination of available (typically near) past and future (forecasted values for other than the target time series) when making its predictions. Thus, it seems natural to provide only past values (with target variable available) for the fitting phase, and provide a combination of past and (partly blinded) future values for the application phase.
- For a subset of models that are based on a well-defined limited set of lagged values, the data setup can be provided in a simplified/streamlined that is common to all such models: the data can be provided as time period entries (data rows) that contains all data that are to be used when making a target prediction (and when learning the relation between predictors and target). This allows a disjunct split into training and test data entries (rows), where the needed past values for the test phase are included as lagged values in each test entry (test data row). 
- Each individual prediction task is defined as a combination of the time period for which the target is predicted and the time point at which the prediction is to be made (the time lag at which past data is available when making predictions). That is, predicting January 2024 values when standing in December 2023 is a different prediction task than when standing in November 2023. 
- Also note that if auto-regressively predicting a second period ahead, this corresponds to a prediction task of predicting a value two time periods ahead. That is, while the model may have an inner working of predicting one period at a time, that prediction task is still defined based on the time lag between the time point for which the prediction is made and the time point before which observed data is made available to the model.
- This means that if any time lag ahead is of interest, then the number of prediction tasks that can be defined is asymptotically square to the number of time periods. If there are e.g. 5 time periods, one could set the cut-off points either just before period 3,4 or 5. At each such cut off point like before period 3, one can again aim to predict either 1, 2 or 3 time periods ahead.
- In principle, one could even for each cut off again consider the training period to be from any start period and end period before the cutoff, which would lead to O(N^4) possible prediction contexts. The main point of this note is not that it is meaningful to consider all N^4 contexts, since a slight change in training range should have limited influence on the model. Rather, the point is that the training period need not cover the full period until cutoff, allowing to e.g. save computational time by reusing the same trained models across cutoff periods
- The primary dimension in an analytical sense is not the cutoff time, but rather the prediction delay (how far ahead from cutoff one is predicting). Prediction performance is likely to vary considerably across this dimension (harder the further ahead). The second dimension is target period to be predicted (equivalent to cutoff given a specific value of prediction delay). One would thus probably mainly be interested in performance across different prediction lag times, and secondary explore whether prediction works similarly well for different target time periods (e.g. according to seasonality). The order that contexts are assessed can clearly be different from what would be given from the analytical dimensions.
- A practical assessment approach could be as follows:
  * Loop through model fitting end time point at large step size
    - Loop through relevant time delays for model fitting -  for auto-regressive models only one period ahead, for other models one could train distinct method per desired prediction delay (time ahead)
      * Create training subset, fit model
    -  Loop through data availability cut off time at a finer step size than used for model fitting (e.g. at time step 1), reusing the same fitted model at these finer time steps within the broad model fitting steps
      * Create dataset of available data by cut off, in general this would be the full dataset up until cutoff, as well as forecasted (or proxy forecasted, i.e. later observed data) for time series except the target one (all data except future disease, which is blinded).
      * Simplified data representations could be provided to subsets of methods that operates in ways that allow this. One example is methods that use values for each feature at a specific, limited set of time lags. For such methods, model fitting and model application can be provided a disjunct set of data entries (rows).
      * For each data availability cut off, loop through 1..k time periods ahead as target period of prediction (1..k prediction delay). 
        - For some methods, this can be run in auto-regressive mode, meaning that the same model is run many steps, outputting one predicted value at each step (and updating its state with this value in its auto-regressive mode)
        - For other methods, separate models would be run (those trained with different time lags above)
        - Each prediction is stored, indexed by 1) prediction delay, 2) target time point, 3) method