import streamlit as st from PIL import Image import os st.title('Machine Learning Operations Pipeline') st.markdown(""" # Machine Learning Operations (MLOps) Pipeline Documentation This is the documentation covering each of the steps included in Bioma AI's time-series-forecasting MLOps Pipeline. ## Sequential MLOps Steps The information flow of the pipeline will closely resemble that of a regression machine learning task. The model development will consist of sequential steps: 1. Ingestion, 2. Transformation, 3. Training, 4. Evaluation, and 5. Registration. """) img = Image.open(os.path.join('experimentation_mlops', 'mlops', 'pics', 'pipeline.png')) st.image(img, caption="MLOps Pipeline for Bioma AI") st.markdown(""" ## 1. Ingestion Our pipeline involves extracting raw datasets from the internet (S3 Buckets and other cloud services), the assumed dataset is of one of the following file types: csv, json, parquet or xlsx. The extracted data is saved as an artifact which can help in documentation purposes. In the case of time series forecasting, the data ingestion process is tasked on receiving data from a specific format and converting it to a Pandas Dataframe for further processing. The data will be downloaded from the web by issuing a request, the data will then be converted into parquet before being written as a Pandas dataframe. The parquet file will be saved as an artifact for the purpose of documentation. ## 2. Transformation According to the timeframe of the time-series data, the data will be split into a train-test-validation set. The user will be able to customize each of the set's proportions. Various statistical methods is considered and performed into a selection of columns, the columns and the methods are both customizable. A few methods that are considered are: 1. Logarithmic 2. Natural Logarithmic 3. Standardization 4. Identity 5. Logarithmic Difference ## 3. Training The training process can be broken down into two types according to the amount of variates being predicted: univariate or multivariate. Predictors are either an: 1. Endogenous feature (Changes in the target's value has an effect on the predictor's value or the other way around) or 2. Exogenous feature (changes in the predictor's value has an effect on the target's value, but not the other way around)