stefan-jansen/machine-learning-for-trading

이 책은 알고리즘 트레이딩 전략에 머신러닝 (ML) 이 어떻게 실질적이고 포괄적인 방식으로 가치를 더할 수 있는지 보여주기 위해 작성되었습니다. 선형 회귀에서 심층 강화학습까지 다양한 ML 기법을 다루며, 모델 예측을 기반으로 한 트레이딩 전략을 구축하고 백테스트하며 평가하는 방법을 설명합니다.

23 장과 부록이 포함된 4 부분으로 구성, 800 페이지 이상에 걸쳐:

데이터 소싱의 중요한 측면,
금융 특징 공학 (financial feature engineering), 포트폴리오 관리,
감독형 및 비지도 학습 ML 알고리즘을 기반으로 한 장기/단기 (long-short) 전략의 설계와 평가,
SEC filings, earnings call transcripts 또는 금융 뉴스와 같은 금융 텍스트 데이터에서 거래 가능한 신호를 추출하는 방법,
시장 및 대안 데이터를 사용하여 CNN 과 RNN과 같은 심층 학습 (deep learning) 모델을 사용하며, 생성적 적대 신경망 (generative adversarial networks) 을 사용하여 합성 데이터를 생성하고 심층 강화학습을 사용하여 트레이딩 에이전트를 훈련하는 방법

이 저장소는 책에서 논의된 개념, 알고리즘 및 사용 사례를 실행하기 위한 150 개 이상의 노트북을 포함합니다. 이들은 다음과 같은 예시를 제공합니다:

시장, 기본적 및 대안 텍스트 및 이미지 데이터와 함께 작업하고 신호를 추출하는 방법,
다양한 자산 클래스와 투자 기간에 대한 수익을 예측하는 모델을 훈련하고 튜닝하는 방법, 최근 연구 결과를 복제하는 방법 포함,
트레이딩 전략을 설계하고 백테스트하며 평가하는 방법.

우리는 책 읽기 동안 노트북을 검토하는 것을 강력히 권장합니다. 실행 상태인 경우가 많으며 공간 제약으로 인해 포함되어 있지 않은 추가 정보를 종종 포함합니다.

이 저장소의 정보 외에도, 책의 웹사이트에는 장 요약과 추가 정보가 포함되어 있습니다.

독자들이 책의 내용 및 코드 예제에 대해 질문하기 쉽게 하며, 자신의 전략 및 산업 개발의 개발 및 구현에 대해 질문할 수 있도록 하기 위해 우리는 온라인 플랫폼을 운영합니다.

우리의 커뮤니티에 참여하고 ML 을 트레이딩 전략에 활용하는 데 관심 있는 동료 트레이더들과 연결하고, 경험을 공유하며 서로 배우기를 바랍니다!

First and foremost, this book demonstrates how you can extract signals from a diverse set of data sources and design trading strategies for different asset classes using a broad range of supervised, unsupervised, and reinforcement learning algorithms. It also provides relevant mathematical and statistical knowledge to facilitate the tuning of an algorithm or the interpretation of the results. Furthermore, it covers the financial background that will help you work with market and fundamental data, extract informative features, and manage the performance of a trading strategy.

From a practical standpoint, the 2nd edition aims to equip you with the conceptual understanding and tools to develop your own ML-based trading strategies. To this end, it frames ML as a critical element in a process rather than a standalone exercise, introducing the end-to-end ML for trading workflow from data sourcing, feature engineering, and model optimization to strategy design and backtesting.

More specifically, the ML4T workflow starts with generating ideas for a well-defined investment universe, collecting relevant data, and extracting informative features. It also involves designing, tuning, and evaluating ML models suited to the predictive task. Finally, it requires developing trading strategies to act on the models' predictive signals, as well as simulating and evaluating their performance on historical data using a backtesting engine. Once you decide to execute an algorithmic strategy in a real market, you will find yourself iterating over this workflow repeatedly to incorporate new information and a changing environment.

The second edition's emphasis on the ML4t workflow translates into a new chapter on strategy backtesting, a new appendix describing over 100 different alpha factors, and many new practical applications. We have also rewritten most of the existing content for clarity and readability.

The trading applications now use a broader range of data sources beyond daily US equity prices, including international stocks and ETFs. It also demonstrates how to use ML for an intraday strategy with minute-frequency equity data. Furthermore, it extends the coverage of alternative data sources to include SEC filings for sentiment analysis and return forecasts, as well as satellite images to classify land use.

Another innovation of the second edition is to replicate several trading applications recently published in top journals:

Chapter 18 demonstrates how to apply convolutional neural networks to time series converted to image format for return predictions based on Sezer and Ozbahoglu (2018).
Chapter 20 shows how to extract risk factors conditioned on stock characteristics for asset pricing using autoencoders based on Autoencoder Asset Pricing Models by Shihao Gu, Bryan T. Kelly, and Dacheng Xiu (2019), and
Chapter 21 shows how to create synthetic training data using generative adversarial networks based on Time-series Generative Adversarial Networks by Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar (2019).

All applications now use the latest available (at the time of writing) software versions such as pandas 1.0 and TensorFlow 2.2. There is also a customized version of Zipline that makes it easy to include machine learning model predictions when designing a trading strategy.

The code examples rely on a wide range of Python libraries from the data science and finance domains.

It is not necessary to try and install all libraries at once because this increases the likeliihood of encountering version conflicts. Instead, we recommend that you install the libraries required for a specific chapter as you go along.

Update March 2022:

zipline-reloaded

,pyfolio-reloaded

,alphalens-reloaded

, and empyrical-reloaded

are now available on the conda-forge

channel. The channel ml4t

only contains outdated versions and will soon be removed.

Update April 2021: with the update of Zipline, it is no longer necessary to use Docker. The installation instructions now refer to OS-specific environment files that should simplify your running of the notebooks.

Update Februar 2021: code sample release 2.0 updates the conda environments provided by the Docker image to Python 3.8, Pandas 1.2, and TensorFlow 1.2, among others; the Zipline backtesting environment with now uses Python 3.6.

The installation directory contains detailed instructions on setting up and using a Docker image to run the notebooks. It also contains configuration files for setting up various
conda

environments and install the packages used in the notebooks directly on your machine if you prefer (and, depending on your system, are prepared to go the extra mile). - To download and preprocess many of the data sources used in this book, see the instructions in the README file alongside various notebooks in the data directory.

설치, 데이터 다운로드 또는 코드 실행 시 문제가 발생하시면 해당 저장소 (여기) 에 GitHub issue 를 제기해 주세요. GitHub issue 를 활용하는 방법은 여기에서 설명되어 있습니다.

업데이트: 책에 사용된 algoseekdata 는 여기에서 다운로드할 수 있습니다. 전처리 방법은 2 장, 기울기 보강 모델과 인트라데이 예시는 12 장을 참조하세요.

업데이트: figures 디렉토리는 책에 사용된 차트의 컬러 버전입니다.

이 책은 시장, 기본적 및 대안 데이터 소싱, 거래 컨텍스트의 다양한 예측 과제에 대한 ML 솔루션 개발, 그리고 ML 모델에서 생성된 예측 신호에 의존하는 거래 전략을 설계하고 평가할 때 발생하는 다양한 과제를 다루는 네 부분으로 구성됩니다.

각 장 디렉토리는 내용, 코드 예제 및 추가 리소스에 대한 추가 정보를 담은 README 를 포함합니다.

01 Machine Learning for Trading: From Idea to Execution
02 Market & Fundamental Data: Sources and Techniques
03 Alternative Data for Finance: Categories and Use Cases
04 Financial Feature Engineering: How to research Alpha Factors
05 Portfolio Optimization and Performance Evaluation
06 The Machine Learning Process
07 Linear Models: From Risk Factors to Return Forecasts
08 The ML4T Workflow: From Model to Strategy Backtesting
09 Time Series Models for Volatility Forecasts and Statistical Arbitrage
10 Bayesian ML: Dynamic Sharpe Ratios and Pairs Trading
11 Random Forests: A Long-Short Strategy for Japanese Stocks
12 Boosting your Trading Strategy
13 Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning
14 Text Data for Trading: Sentiment Analysis
15 Topic Modeling: Summarizing Financial News
16 Word embeddings for Earnings Calls and SEC Filings
17 Deep Learning for Trading
18 CNN for Financial Time Series and Satellite Images
19 RNN for Multivariate Time Series and Sentiment Analysis
20 Autoencoders for Conditional Risk Factors and Asset Pricing
21 Generative Adversarial Nets for Synthetic Time Series Data
22 Deep Reinforcement Learning: Building a Trading Agent
23 Conclusions and Next Steps
24 Appendix - Alpha Factor Library

이 장은 머신러닝 (ML) 을 기반으로 한 트레이딩 전략 개발을 위한 프레임워크를 제공합니다. 이 책에서 논의된 ML 알고리즘과 전략을 구동하는 데이터에 초점을 맞추며, ML 모델에 적합한 특징 공학적 설계 및 평가 방법, 그리고 트레이딩 전략 실행 시 포트폴리오 성과 관리 및 측정을 설명합니다.

이 장은 머신러닝이 투자 산업에서 경쟁 우위의 원천으로 등장하게 된 산업 트렌드를 탐구하며, 알고리즘 트레이딩 전략을 가능하게 하기 위해 머신러닝이 투자 프로세스의 어디에 위치하는지 살펴봅니다.

보다 구체적으로 다음과 같은 주제를 다룹니다:

투자 산업에서 ML 의 부상 뒤에 숨겨진 주요 트렌드
ML 을 활용하는 트레이딩 전략의 설계 및 실행
ML 이 트레이딩에 적용되는 인기 있는 사용 사례

이 장은 시장 데이터와 기본적 데이터를 다루는 방법을 보여주고, 이를 반영하는 환경의 중요한 측면을 설명합니다. 예를 들어, 다양한 주문 유형과 트레이딩 인프라에 대한 친숙도는 데이터 해석뿐만 아니라 백테스트 시뮬레이션을 올바르게 설계하는 데에도 중요합니다. 또한 Python 을 사용하여 트레이딩 및 재무제표 데이터를 접근하고 조작하는 방법을 보여줍니다.

실제 예시들은 NASDAQ 티크 (tick) 데이터와 Algoseek 분당 바 (minute bar) 데이터를 다루는 방법을 보여주며, 이후 ML 기반 인트라데이 전략에 사용할 수요 - 공급 동적 요소를 포착하는 풍부한 속성 집합을 제공합니다. 또한 재무제표 정보를 SEC 에서 소싱하는 방법과 다양한 데이터 제공자 API 를 다루는 내용도 포함합니다.

특히 이 장은 다음 내용을 다룹니다:

시장 데이터가 트레이딩 환경의 구조를 어떻게 반영하는지
분당 주파수 (minute frequency) 의 인트라데이 거래 및 쿼트 (quote) 데이터를 다루는 방법
NASDAQ ITCH 를 사용하여 티크 (tick) 데이터에서 **주문장 (limit order book)**을 재구성하는 것
다양한 유형의 바 (bar) 를 사용하여 티크 (tick) 데이터를 요약하는 것
eXtensible Business Reporting Language (XBRL) 에 인코딩된 **전자 제출서 (electronic filings)**를 다루는 것
시장 및 기본적 데이터를 결합하여 P/E 시리얼을 생성하기 위해 데이터 파싱 및 결합하는 것
Python 을 사용하여 다양한 시장 및 기본적 데이터 소스를 접근하는 방법

이 장은 대체 데이터의 카테고리 및 사용 사례를 개요하고, 폭발적으로 증가하는 소스 및 제공자를 평가하는 기준을 설명하며, 현재 시장 경향을 요약합니다.

stefan-jansen/machine-learning-for-trading

요약

핵심 포인트

댓글