I would like to start this part with Bruce Babbok’s catchphrase: “It is impossible to predict prices, but you do not need to make money.” As a person who has been well acquainted with the concept of “statistical forecast” since the time of my studies, I did not agree with this expression. Why? Because any position in the market is a conscious or unconscious forecast of the sign of the future price increment and partly the size, since no one canceled commissions and slippage. Therefore, the more correct expression is:Exactly it is impossible to predict prices, but in order to earn money, this is not necessary.”
Indeed, to make money, it is enough to have an effective statistical forecast of the sign of the future price increment, i.e., such that with successful forecasts of this sign, we would earn more than we would lose with erroneous ones.
From the point of view of the piecewise-constant model mentioned in the first part, a special case of Babbock’s statement is also true: “It is impossible to predict the points of change of the sign of the mean, but in order to earn money, this is not necessary.” Indeed, if the segments of the constancy of the average are greater than or equal to 3, then it is perfectly earned on the forecast: “the trend that began earlier in prices will continue.” And this forecast is almost always wrong at the breaking points of the broken line, but it is relatively accurate at all other points.
But trading on such a forecast often causes rejection, especially for beginners. After all, with such a forecast, we never sell at the maximum price and never buy at the minimum, i.e., “we miss a bunch of opportunities to earn.”
All of us were once beginners, myself included. And how not to miss these opportunities? Only by building a forecast of the sign of the future increment is better than that mentioned from the continuation of the trend. That is, adding to it a forecast: “Tomorrow there will be a movement opposite to the trend that has begun.” “Tomorrow” is in quotation marks because it doesn’t have to be a day.
Here about my unsuccessful searches for such forecast for daily data SPY 1993-1997 (this data can be downloaded free of charge on yahefinans) and will be discussed. Why SPY and not S&P500? Well, I wrote about this many times: there are no real gaps in the data of the latter.
I started with a simple one: with a forecast of the sign of the future increment of the logarithm of the closing price of the day based on the past increments of the logarithm of volumes using cross-correlation analysis methods. As I wrote in the first part, the transition to logarithms does not change anything in terms of cross-correlation analysis.
For all the values of the series, nothing happened at all. But I did not give up and removed from the series the data of segments on which price increments changed sign on each cycle. He also split the new data into two series: a series with positive m-grams in increments of logarithms of prices (m is greater than or equal to two) and negative ones. On this path, an unexpected result awaited me: based on price data, one can try to build a linear (cross-correlation analysis is limited to the class of linear forecasts) forecast of the future increment in volumes, but a reverse forecast is impossible. Since it turned out not what I needed, I abandoned work in this direction.
Before the next stage, I had the task of constructing a broken line, the breaking points of which I would like to predict. I took a modified Zigzag for logarithms of prices for this. At the first step, it was built like a standard Zigzag with Depth=2, Backstep=1, but with a variable Deviation at each step, equal to the standard deviation of the increments of logarithms of closing prices of 10 previous increments. At the second step, he “straightened” according to the inductive algorithm. The RMS of the entire series of increments of the logarithm of closing prices was considered, if we “straightened” everything to the i-th point, then from this point a corridor was built from -1.522*RMS to +1.525*RMS (well, I like the probability of 0.75, which is obtained for such corridor with a normal distribution with an average of zero and such RMS) and the entire next Zigzaga segment, the highs and lows of which lay in this corridor, was replaced by a straight line. And the end point of the broken line segment following this straight line became a new point for calculating the “straightening”.
Actually, I solved the problem of predicting the breakpoints of such a straightened Zigzag in the future.
I started again with a simple one, with the classic Stochastic and RSI overbought-oversold zones with different sets of parameters (20 for each indicator). And began to count the shares:
when the indicator is in the overbought zone, the share of those points where a growing broken line was replaced by a falling one;
when the indicator is in the oversold zone, the share of those points where a falling broken line was replaced by a growing one.
Alas, for all 20 parameters and both indicators, more than 60% of the specified points were located where the indicator was not in the corresponding zone. I abandoned the further idea of optimizing the boundaries of the zones compared to the classical ones due to the high probability of fitting.
My next step was the use of neural networks, or rather perceptrons. I applied OHLCV for the previous 20 days and 400 Stochastic and RSI values for the same 20 days (400=20*20 parameters) to the input of the perceptron. Only those perceptrons were selected for which the proportions of correctly predicted Zigzaga breakpoints on the training and testing sets were not statistically different. Alas, for such “stable” perceptrons, the proportion of correctly guessed points turned out to be statistically indistinguishable from ½.
Well, the last thing I tried was to build a perceptron for the straightened Zigzaga of the daily RAO UES, submitting not only its prices with stochastics and RSI, but also the prices of Lukoil, Gazprom, Surguneftegaz and SPY daily. The result is the same as for SPY.
What exactly did I prove? But only that the statistical prediction of turning points is impossible by the considered methods. At the same time, it should be noted that the class of functions (and according to the theory, the best statistical forecast is always some function of the known past (!) information) from the considered (!, i.e., not all known – my note) past information generated by the perceptron is sufficiently wide and, in particular, includes all linear ones.
What’s next? I can say for sure that nothing has changed for me from the point of view of the predicted value: I am still only interested in the forecast of the breakpoints of the straightened Zigzag for daily, and not for smaller timeframes. You can “play” with the input information, but up to certain limits. Because its volume, for example, for OHLCV minutes for the previous 20 days is huge and fraught with overfitting.
Therefore, at this point, I stopped trying to forecast turning points and focused on improving trade within the framework of the forecast “the trend in prices that began earlier will continue”. But that is another story.