13 ML Project Management

14 Basis for Machine Learning in Companies

A good business process entails a feedback loop from the client (receiver of the output) to further optimize the process:

Input \rightarrow Process \rightarrow Output \rightarrow Feedback \rightarrow Optimization \rightarrow new Process \rightarrow …

Business processes evolve along these phases, whose steps should not be skipped:

Individual employee works on tasks, commonly informal rules and heuristics are used.
Team works on the tasks. The process is formalized and standardized to ensure quality and effective collaboration. Don’t stay here for too long, since it is not scaleable.
Digitization is used to automate (parts of) the process. This step should be done before ML, since you need the data and architecture for your ML part anyway. Here you are more flexible and can fail and adapt quicker. Don’t stay here to long, since you cannot assess quaility of your process well.
Analytics are used to measure the performance of the process and if its optimizations are successful. Don’t skip this since you need these indicators for your ML optimizationa and monitoring anyway. Don’t stay here too long to miss out on automation and scalability.
Machine Learning is used to automate and optimize analysis, insights and decision making on the data. You still need some people from step 2 to analyse outcomes, failures and react to it (monitoring).

Framing the problem
Data collection & management
Building infrastructure (data pipeline, databases, training & deployment pipelines should at least work the same way as the designated product unless its a quick’n’dirty PoC)
Data ingestion, transformation & feature engineering
Model selection, training, testing & evaluation
Deployment & integration
Monitoring

The ML-view:
- What is being predicted?
- What data do we need as target and input?
Software development view:
- What info do we need from users to make a decision? (This defines the API)
- Who will use the service? How many people will that be?
- How is the process conducted today?
Data view:
- What data needs to be collected? From where?
- How do we need to transform the data to analyze it & make decisions on it? (Feature engineering)
- How do we react to the outputs of the algorithm? (e.g. kick off automatic process, inform stakeholders…)

Underestimate the effort for data collection, engineering, transformation & ingestion
Focus too much effort in optimizing the machine learning algorithm instead of getting more data
Having too few samples and diversity (independent attributes) or outdated/unrepresentative data
Data is not properly maintained or not available as needed for the project
Assume that no oversight will be necessary in data-, target-selection, feature engineering and monitoring from subject matter experts
Optimize for a skewed indicator that causes unwanted side-effects in model decisions
Building models from scratch, where pre-trained / off-the-shelf models do the job (especially text, image, audio and video tasks employing neural networks)
Having no process for monitoring and retraining