AI Radiologist: Teaching Computers to Recognize Brain Hemorrhage

こんにちは、
こちらRashidとShaanです。 Koozyt で AI/データ解析のプロジェクトに関わっているエンジニアです。
最近、「RSNA Intracranial Hemorrhage Detection」というタイトルのKaggleコンペティションに参加しました。 Kaggleは、MLコンテストの開催に必要なデータセット、コンピューティングリソース、およびその他の情報を提供するウェブサイトです。
ここからは英語になります。お許しください。

The brain is one of the most sensitive regions of the human body and to be frank, we are basically the brain itself. The rest of us are just supporting structures that help the brain do its thing – like watching cat videos on youtube! So anything bad happening in or around the brain is – SCARY!

For this competition, we tried to automate the process of detecting a certain type of phenomenon called “Brain Hemorrhages” or more specifically, “Intracranial Brain Hemorrhages” (does put some stress on teeth when saying that name) using machine learning. 

Wait! Does that mean computers can replace human doctors? Well, fortunately, or unfortunately, NO! Not for a decade or two (or maybe never). This is just an experiment to see how far machine learning can go.

A Short Talk on Brain Hemorrhages

Let’s talk about one of the not so uncommon diseases/mishaps associated with the brain. It’s called a Brain Hemorrhage. It is a kind of stroke (poor blood flow in the brain due to blocked path or bleeding). There are various reasons for brain hemorrhages such as high blood pressure, weak blood vessels that may leak, drug abuse, and trauma (getting hit on the head for example).

So what are the signs?

  • Headache(not always present)
  • Depends on where it occurred in the brain
    • Vision problem
    • Balance and coordination problem
    • Numbness or sudden seizure
    • Problem with talking
    • Complete unresponsiveness or even coma
  • Symptoms may or may not be sudden

How is it Identified?

Computed tomography (CT scan) or Magnetic Resonance Imagining (MRI) are two common methods. Diagnosis requires an urgent procedure. When a patient shows acute neurological symptoms such as severe headache or loss of consciousness, highly trained specialists review medical images of the patient’s cranium to look for the presence, location, and type of hemorrhage. The process is complicated and often time-consuming. 

                       Some MRI examples the brain with hemorrhage.

                             Courtesy of: https://radiopaedia.org/

Intracranial Hemorrhages

Bleeding anywhere inside the skull is called an intracranial hemorrhage. There are several types as depicted in the figure below:

              Image courtesy of Kaggle RSNA Intracranial Hemorrhage Detection

Kaggle Competition:

The Kaggle Competition was arranged by RSNA (Radiological Society of North America). The objective was simple (at least by description) – to detect the aforementioned six types of intracranial hemorrhages. One or more types of hemorrhages can be present at the same time. 

Images are input to the magical AI black box and it tries to detect hemorrhages in various regions, outputting 6 results as follows:

So we detect 5 different types of hemorrhages independently (more than one can be present in a single CT scan image).

Data Provided:

The provided image data is in DICOM format. DICOM is a very common medical image format. It stands for Digital Imaging and Communications in Medicine. It is the standard image format for medical images generated by various imaging devices such as CT scanner, MRI, radiology machines, etc. Aside from the image data, DICOM files also contain associated metadata, such as patient information, etc. 

The competition was divided into 2 stages. In Stage 1, we were given 674257 DICOM images and their associated labels to train with and predict another set of 78545 DICOM images. In Stage 2, the associated labels for the test images from Stage 1 were given and we were able to also use those images to retrain our model. Then, finally, we had to predict a final test set consisting of 727392 images. That’s a lot of data to work with!

Workflow:

  • Apply Necessary Windows on Images:
    When radiologists interpret a CT scan, they generally implement 5 different types of “windowing”.
    Well,  what IS windowing?Windowing can be interpreted as a function that when applied to an image, let’s highlight some specific region of that image. DICOM images are 16 bit, which means, there are 2^16 = 65536 levels of values stored in a single image. But humans can only see 100 levels of gradation on a single color image. So, to accurately interpret the DICOM images, we have to focus on a lower level of data. That is the job of the windowing functions. Windows are defined by 2 variables: window center L and window width W. The window function, when applied, only focuses on the values in the range of (L-W/2) and (L+W/2)L and W are basically measures of radiodensity expressed in the Hounsfield Unit or HU. Radiologists apply the following 5 types of windowing:
    1. Brain Window: W:80 L:40
    2. Blood/Subdural Window: W:130-300 L:50-100
    3. Bone Window: W:2800 L:600
    4. Grey Matter Window: W:8 L:32 or W:40 L:40
    5. Soft Tissue Window: W:350-400 L:20-60

Figure: Example of the effect of different window functions applied to an image.


Now, there can be a question that humans have limitations while interpreting images, but computers do not have such limitations. Computers can even gather information from the data that are missing in the windowed images, which are unknown to humans. We did not pursue this line of thought, but it is an interesting field of research. We assumed that, as the windowed image has less information and focuses on the important regions of the image, it will help a machine learning model by getting rid of what could be assumed as noise. Also, we only applied 3 window functions: brain, subdural, and soft tissue; and stacked them to get 3D images for input to the model.

  • Remove Images that Contain Little Brain Data:
    There were a lot of images in the data that contained little to no brain tissue and they almost always did not have any hemorrhage. We could get the percentage amount of brain tissue in any image in the image_pct_window metadata of that DICOM image. We found that images that had less than 25% brain tissue data, in 94% of cases did not have any hemorrhage. So, we could safely remove this data from training. In this way, we could get rid of 82314 images.
  • Perform Necessary Cropping to Focus on Only Brain Region in the Image:
    We used OpenCV to crop only the brain region of the images. While taking CT scans, the headrest almost always gets included in the image. We tried to remove all the unnecessary areas from the image and keep only the brain areas.
  • Use Transformations and Augmentations:
    We used the following transformations applied randomly to the training images to generate more data so that the model can learn better.
    1) Horizontal and Vertical Flips
    2) Lighting Change
    3) Applying Zoom
    4) Applying Warp
  • Choose Appropriate Models:
    We experimented with different pre-trained models to solve this problem, which included VGG16, VGG19, MobileNet, Inception network, Resnet, se-resnext and finally efficientnet. These are all different models with very deep architectures. Due to the large number of training data and the models being large, the training took a very long time. One epoch of training would take as long as 3-5 hours! That means in 3-5 hours, the model only got to see all training data only once! So, we could not train for longer or experiment with different hyperparameters, unfortunately. Our final score was an ensemble of training with efficientnet v1 and inception v3 models. Ensembling improved our score a lot, which is almost always the case in machine learning!
  • One-Cycle Training:
    Learning Rate is one of the most important hyperparameters of a machine learning model. As the name implies, it controls how fast the model tries to update its weights to properly learn to predict.

                                    Source: Jeremy Jordan Blogpost                             General rule of thumb is: First start with a higher learning rate and decay the learning rate gradually, based on different factors: either decay constantly after each epoch, or decay suddenly by a factor after the metric value does not improve after certain epochs of training.


    But recently, in a paper, Leslie Smith proposed a new method of learning rate update policy, where instead of starting with a high learning rate, the training of the model goes through a number of cycles, in which the learning rate oscillates between a lower or upper bound. Later, the idea was improved, and Leslie suggested that only one cycle where the learning rate and also the momentum (another hyperparameter value) oscillates. This method creates a phenomenon called ‘super convergence’, where the model reaches global minima faster and confidently. More details can be found in this article.

    The Fastai library provides a very easy to use method of implementing the one-cycle training which we used and got considerably better results.

Final Result:

We finished the competition in 253rd place out of 1345 teams (Top 19%) with a score of 0.06945. The score was a weighted multi-label logarithmic loss. The any label is weighted more highly than specific hemorrhage sub-types.

Improvement Opportunities:

The data had multiple CT scan images of a single patient. For a single patient, multiple CT scan images are taken sequentially from different but sequential angles. If we had incorporated this sequential behavior of the images for a single patient in our model through the use of sequence models such as LSTM, we could have created a better solution. At first, the competition organizers prohibited the use of metadata and intended that predictions will be based solely on pixel data. So, we did not try the method with LSTM. But later the rule was overturned, but we did not have enough time to pursue this method.

Final Thoughts:

Applying machine learning in medical diagnostics is very gradually becoming a norm. Thanks to the advancements in the field of deep learning as well as hardware resources, we are seeing many promising results. But still, machine learning is nowhere close to being a substitute for human experts in medical diagnosis, but it can be a helpful tool in accelerating diagnosis and decision making.

The following two tabs change content below.

シェアする

  • このエントリーをはてなブックマークに追加

フォローする