時間序列函數

時間序列函數是可處理在時間點測量的資料值序列的聚集函數。

下列章節說明不同時間序列套件中提供的一些時間序列函數。

轉換

轉換是套用在一個時間序列上進而產生另一個時間序列的函數。時間序列程式庫支援各種類型的轉換，包括提供的轉換 (使用 from tspy.functions import transformers) 以及使用者定義的轉換。

下列範例顯示系統提供的部分轉換：

#Interpolation
>>> ts = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> periodicity = 2
>>> interp = interpolators.nearest(0.0)
>>> interp_ts = ts.resample(periodicity, interp)
>>> interp_ts.print()
TimeStamp: 0     Value: 1.0
TimeStamp: 2     Value: 3.0
TimeStamp: 4     Value: 5.0

#Fillna
>>> shift_ts = ts.shift(2)
    print("shifted ts to add nulls")
    print(shift_ts)
    print("\nfilled ts to make nulls 0s")
    null_filled_ts = shift_ts.fillna(interpolators.fill(0.0))
    print(null_filled_ts)

shifted ts to add nulls
TimeStamp: 0     Value: null
TimeStamp: 1     Value: null
TimeStamp: 2     Value: 1.0
TimeStamp: 3     Value: 2.0
TimeStamp: 4     Value: 3.0
TimeStamp: 5     Value: 4.0

filled ts to make nulls 0s
TimeStamp: 0     Value: 0.0
TimeStamp: 1     Value: 0.0
TimeStamp: 2     Value: 1.0
TimeStamp: 3     Value: 2.0
TimeStamp: 4     Value: 3.0
TimeStamp: 5     Value: 4.0

# Additive White Gaussian Noise (AWGN)
>>> noise_ts = ts.transform(transformers.awgn(mean=0.0,sd=.03))
>>> print(noise_ts)
TimeStamp: 0     Value: 0.9962378841388397
TimeStamp: 1     Value: 1.9681980879378596
TimeStamp: 2     Value: 3.0289374962174405
TimeStamp: 3     Value: 3.990728648807705
TimeStamp: 4     Value: 4.935338359740761

TimeStamp: 5     Value: 6.03395072999318

分段

分段或視窗作業是將一個時間序列分割為多個區段的過程。時間序列程式庫支援各種形式的分段，並且容許建立使用者定義的區段。

視窗型分段

這種類型的時間序列分段基於使用者指定的區段大小。區段可以是記錄型或時間型。容許使用多個選項來建立翻轉及滑動視窗型區段。

>>> import sparktspy as tspy
>>> ts_orig = tspy.builder()
 .add(tspy.observation(1,1.0))
 .add(tspy.observation(2,2.0))
 .add(tspy.observation(6,6.0))
 .result().to_time_series()
>>> ts_orig
TimeStamp: 1     Value: 1.0
TimeStamp: 2     Value: 2.0
TimeStamp: 6     Value: 6.0

>>> ts = ts_orig.segment_by_time(3,1)
>>> ts
TimeStamp: 1     Value: original bounds: (1,3) actual bounds: (1,2) observations: [(1,1.0),(2,2.0)]
TimeStamp: 2     Value: original bounds: (2,4) actual bounds: (2,2) observations: [(2,2.0)]
TimeStamp: 3     Value: this segment is empty
TimeStamp: 4     Value: original bounds: (4,6) actual bounds: (6,6) observations: [(6,6.0)]

錨點型分段

錨點型分段是一種非常重要的分段類型，透過以特定的 Lambda（可以是簡式值）為錨點來建立區段。例如查看 500 錯誤之前的事件，或在觀察到異常後檢查值。錨點型分段的變式包括用於提供具有多個標記的範圍的分段。

>>> import sparktspy as tspy
>>> ts_orig = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0])
>>> ts_orig
TimeStamp: 0     Value: 1.0
TimeStamp: 1     Value: 2.0
TimeStamp: 2     Value: 3.0
TimeStamp: 3     Value: 4.0
TimeStamp: 4     Value: 5.0

>>> ts = ts_orig.segment_by_anchor(lambda x: x % 2 == 0, 1, 2)
>>> ts
TimeStamp: 1     Value: original bounds: (0,3) actual bounds: (0,3) observations: [(0,1.0),(1,2.0),(2,3.0),(3,4.0)]
TimeStamp: 3     Value: original bounds: (2,5) actual bounds: (2,4) observations: [(2,3.0),(3,4.0),(4,5.0)]

區隔器

透過匯入 segmenters 套件 (使用 from tspy.functions import segmenters)，提供了數個現成可用的特殊化斷詞器。例如，使用迴歸對時間序列進行分段的區隔器：

>>> ts = tspy.time_series([1.0,2.0,3.0,4.0,5.0,2.0,1.0,-1.0,50.0,53.0,56.0])
>>> max_error = .5
>>> skip = 1
>>> reg_sts = ts.to_segments(segmenters.regression(max_error,skip,use_relative=True))
>>> reg_sts

TimeStamp: 0     Value:   range: (0, 4)   outliers: {}
TimeStamp: 5     Value:   range: (5, 7)   outliers: {}
TimeStamp: 8     Value:   range: (8, 10)   outliers: {}

縮減器

縮減器是套用至一組時間序列上的多個值以產生單一值的函數。時間序列 reducer 函數類似於 Hadoop/Spark 所使用的縮減器概念。此單一值可以是一個集合，但更常見的是單一值。例如，對時間序列中的值求取平均值的縮減器函數。

支援數個 reducer 函數，包括:

距離縮減器

距離縮減器是用於計算兩個時間序列之間距離的一類縮減器。該程式庫支援針對序列的數值以及種類距離函數。其中包括時間扭曲距離測量，例如 Itakura Parallelogram、Sakoe-Chiba Band、無約束的 DTW 及無時間扭曲約束的 DTW。還提供了分佈距離（如 Hungarian 距離及 Earth-Movers 距離）。

對於種類時間序列距離測量，您可以使用 Damerau Levenshtein 及 Jaro-Winkler 距離測量。
```
>>> from tspy.functions import *
>>> ts = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> ts2 = ts.transform(transformers.awgn(sd=.3))
>>> dtw_distance = ts.reduce(ts2,reducers.dtw(lambda obs1, obs2: abs(obs1.value - obs2.value)))
>>> print(dtw_distance)
1.8557981638880405
```

數學縮減器

提供了數個用於數值時間序列的便利數學縮減器。其中包括基本縮減器，例如平均值、總和、標準差及矩。還包括熵、峰態、FFT　及其變式，各種相關性以及直方圖。方便的基本摘要縮減器是提供時間序列基本相關資訊的 describe 函數。

>>> from tspy.functions import *
>>> ts = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> ts2 = ts.transform(transformers.awgn(sd=.3))
>>> corr = ts.reduce(ts2, reducers.correlation())
>>> print(corr)
0.9938941942380525

>>> adf = ts.reduce(reducers.adf())
>>> print(adf)
pValue: -3.45
satisfies test: false

>>> ts2 = ts.transform(transformers.awgn(sd=.3))
>>> granger = ts.reduce(ts2, reducers.granger(1))
>>> print(granger) #f_stat, p_value, R2
-1.7123613937876463,-3.874412217575385,1.0

另一個對於理解時間序列非常有用的基本縮減器是 describe 縮減器。下面說明了此縮減器：

>>> desc = ts.describe()
>>> print(desc)
min inter-arrival-time: 1
max inter-arrival-time: 1
mean inter-arrival-time: 1.0
top: null
unique: 6
frequency: 1
first: TimeStamp: 0     Value: 1.0
last: TimeStamp: 5     Value: 6.0
count: 6
mean:3.5
std:1.707825127659933
min:1.0
max:6.0
25%:1.75
50%:3.5
75%:5.25

時間結合

此程式庫包含時間結合函數，或用於根據其時間戳記結合時間序列的函數。這類結合函數類似於資料庫中的結合函數，包括左結合、右結合、外部結合、內部結合、左外結合、右外結合等。下列範例程式碼顯示其中的一些結合函數：

# Create a collection of observations (materialized TimeSeries)
observations_left = tspy.observations(tspy.observation(1, 0.0), tspy.observation(3, 1.0), tspy.observation(8, 3.0), tspy.observation(9, 2.5))
observations_right = tspy.observations(tspy.observation(2, 2.0), tspy.observation(3, 1.5), tspy.observation(7, 4.0), tspy.observation(9, 5.5), tspy.observation(10, 4.5))

# Build TimeSeries from Observations
ts_left = observations_left.to_time_series()
ts_right = observations_right.to_time_series()

# Perform full join
ts_full = ts_left.full_join(ts_right)
print(ts_full)

TimeStamp: 1     Value: [0.0, null]
TimeStamp: 2     Value: [null, 2.0]
TimeStamp: 3     Value: [1.0, 1.5]
TimeStamp: 7     Value: [null, 4.0]
TimeStamp: 8     Value: [3.0, null]
TimeStamp: 9     Value: [2.5, 5.5]
TimeStamp: 10     Value: [null, 4.5]

# Perform left align with interpolation
ts_left_aligned, ts_right_aligned = ts_left.left_align(ts_right, interpolators.nearest(0.0))

print("left ts result")
print(ts_left_aligned)
print("right ts result")
print(ts_right_aligned)

left ts result
TimeStamp: 1     Value: 0.0
TimeStamp: 3     Value: 1.0
TimeStamp: 8     Value: 3.0
TimeStamp: 9     Value: 2.5
right ts result
TimeStamp: 1     Value: 0.0
TimeStamp: 3     Value: 1.5
TimeStamp: 8     Value: 4.0
TimeStamp: 9     Value: 5.5

預測

時間序列程式庫提供的一項關鍵功能是預測。該程式庫包括用於簡式以及複式預測模型的函數，其中包括 ARIMA、Exponential、Holt-Winters 及 BATS。下列範例顯示用於建立 Holt-Winters 的函數：

import random

model = tspy.forecasters.hws(samples_per_season=samples_per_season, initial_training_seasons=initial_training_seasons)

for i in range(100):
    timestamp = i
    value = random.randint(1,10) * 1.0
    model.update_model(timestamp, value)

print(model)

Forecasting Model
  Algorithm: HWSAdditive=5 (aLevel=0.001, bSlope=0.001, gSeas=0.001) level=6.087789839896166, slope=0.018901997884893912, seasonal(amp,per,avg)=(1.411203455586738,5, 0,-0.0037471500727535465)

#Is model init-ed
if model.is_initialized():
    print(model.forecast_at(120))

6.334135728495107

ts = tspy.time_series([float(i) for i in range(10)])

print(ts)

TimeStamp: 0     Value: 0.0
TimeStamp: 1     Value: 1.0
TimeStamp: 2     Value: 2.0
TimeStamp: 3     Value: 3.0
TimeStamp: 4     Value: 4.0
TimeStamp: 5     Value: 5.0
TimeStamp: 6     Value: 6.0
TimeStamp: 7     Value: 7.0
TimeStamp: 8     Value: 8.0
TimeStamp: 9     Value: 9.0

num_predictions = 5
model = tspy.forecasters.auto(8)
confidence = .99

predictions = ts.forecast(num_predictions, model, confidence=confidence)

print(predictions.to_time_series())

TimeStamp: 10     Value: {value=10.0, lower_bound=10.0, upper_bound=10.0, error=0.0}
TimeStamp: 11     Value: {value=10.997862810553725, lower_bound=9.934621260488143, upper_bound=12.061104360619307, error=0.41277640121597475}
TimeStamp: 12     Value: {value=11.996821082897318, lower_bound=10.704895525154571, upper_bound=13.288746640640065, error=0.5015571318964149}
TimeStamp: 13     Value: {value=12.995779355240911, lower_bound=11.50957896664928, upper_bound=14.481979743832543, error=0.5769793776877866}
TimeStamp: 14     Value: {value=13.994737627584504, lower_bound=12.33653268707341, upper_bound=15.652942568095598, error=0.6437557559526337}

print(predictions.to_time_series().to_df())

timestamp      value  lower_bound  upper_bound     error
0         10  10.000000    10.000000    10.000000  0.000000
1         11  10.997863     9.934621    12.061104  0.412776
2         12  11.996821    10.704896    13.288747  0.501557
3         13  12.995779    11.509579    14.481980  0.576979
4         14  13.994738    12.336533    15.652943  0.643756

時間序列 SQL

時間序列程式庫與 Apache Spark 緊密整合。透過使用 Spark Catalyst 中的新資料類型，您可以執行時間序列 SQL 作業以使用 Apache Spark 進行水平橫向擴充。這可讓您在 IBM Analytics Engine 或包括 IBM Analytics Engine 功能的解決方案中輕鬆使用時間序列延伸，例如 Watson Studio Spark 環境。

SQL 延伸涵蓋時間序列函數的大部分層面，包括分段、轉換、縮減器、預測及 I/O。請參閱分析時間序列資料。

進一步瞭解

若要使用 tspy Python SDK，請參閱 tspy Python SDK 文件。