CAI-X

Project guide

Here, you will find (almost) everything you need to know, when managing an AI project. The guide has its starting point in the Region of Southern Denmark, but many of the issues and solutions apply to all AI projects regardless of geography. 

Legislation such as General Data Protection Regulation and the Danish Health Act form a framework of rules and regulations which affect AI projects. Compliance with the regulations on data security is easier when data is made anonymous and/ or data is managed in owned data centres.   

Always consider:

  1. Is data personal?

    Personal data is any information that can be connected with a particular person. This still applies even if the person is only identifiable if the information is in combination with other kinds of information, e.g. accessible by the general practitioner.
    If a private company is part of the project, a data processing agreement, data transfer agreement for third party country, risk assessment and a Data Protection Impact Analysis (DPIA) may be required.

  2. Should you use an owned data centre or a cloud solution?

    As long as data is in the Region of Southern Denmark's data centre, it is easier to comply with the formal requirements to secure personal data. The University of Southern Denmark can also help with data handling in research projects. But it is increasingly complicated  if you cooperate with a supplier who has data processed in the cloud. It is always advisable to estimate the options for each project.
    It is both easier and faster if the project can place both data and the software that is to process the data in the region's own data centre. This applies for both training, development and processing of an algorithm. Therefore, the Region of Southern Denmark recommends to uncover if data and algorithm can be placed in the Region of Southern Denmark's own data centre.

Contact [email protected] if you have any questions regarding the handling of personal data and the need for processing capability in the data centre (only applies to staff in the Region of Southern Denmark).
 

In the Region of Southern Denmark, you can access data for analyses in several ways - one is the region's Department of Documentation and Business Intelligence. The department manages a comprehensive data warehouse of patient data and the department can help select and organise specific data tables with both code and text.

Contact [email protected] if you have questions regarding patient data - not images (only applies to staff in the Region of Southern Denmark)

This part will illustrate points on data and the course of the project.

Machine learning with supervised learning is a popular AI tool. The idea is to build a data set with a number of input variables (A) and one outcome variable (B) - ideally, B has only two-three possible values.

Data analysis (BI) can show the connections in historic data, e.g. that a particular group of cause variables har the largest signification for B. The issue is to make a hypothesis about correlations in data, and it helps define the data set applied in the project. With machine learning, an algorithm is trained from the hypotheses to predict the value of B.

Deep learning also operates with A and B, but the difference is that the prediction of B comes from A through self-made neurons. It is reelvant for image recognition. The result may be harder to explain, and that may resuult in some drawbacks regarding validation and accept from both clinicians and patients.

For both approaches mentioned above, it is advantage with:

  • plenty of A data (causes) and simple outcomes of B (results)
  • an easily explainable model

When the hypothesis and the data sets are ready, the project can expect challenges with the data quality:

  • the data set is not full
  • registration is dissimilar across persons, departments or over time

It takes a lot of time to validate data, og data must be renewed several times during the project period.

Below are some advice for the phases of an AI project:

  1. Qualify the idea

    Find the paticular part of the decision making project where extra information enables you to make better og faster decisions. How does the screenshot look which states your possible measures. It must be valuable for the organisation and/or the patient.

  2. Design a test

    - Specify which measures the solution is to give the end user.
    - Make hypotheses on connections in data.
    - Deduce the need for data and data source.
    - Be as specific as possible on the individual data fields.
    - Start with a simple model.
    - Validate data through more iterations.

  3. IT setup

    - Search the market for IT services to handle the relevant data.
    - There are many open source resources.
    - Draw how data flows, is handled and is presented for the end user.
    - Cast the parts, including data scientist, experts in the current case and the end user of the result of the algorithm.

  4. Project suggestion

    Describe the project through purpose, aim, activity plan, rask management, organisation, budget and assessment. Anchor the project, so it can be disseminated if successful.

  5. Carry the project through

    It is not at all tedious to go through with an AI project. Expect to spend a significant amount of time on data processing and development of algorithms. Data challenges are often time consuming. Finish the assessment with a plan for a prospective upscaling.

  6. Disseminate the solution from project to operation

    The change from innovation or research to operation entails a lot of challenges and tasks, among others:
    - organisational accept
    - procedure for continous validation of data quality
    - demands for real-time treatment of data
    - CE marking
    - Rules on automatic decision making
    - information security
    - information for patients

Artificial Intelligence makes decisions different from human beings. Consequently, the results of an algorithm can give some challenges with bias and transparency.

  1. Bias
    Generally, algorithms are built to suggest the decisions, which historic data shows to have been the result in the past. So there is a historic genrelisation and inertia when changing to new guidelines for correct decisions. In order to handle this, the algorithm must continously be validated and e.g. reset the data which creates bias from modern-day guidelines.
  2. Explanation

    If we do not understand the algorithm, we cannot discuss it, improve it and maybe not even accept it. For that reason, it is recommended to work with:
    - simple models (e.g. decision trees)
    - comparison of the algorithm's capabilities with human capabilities

Tools for data ethics when working with pooling of data (in Danish)