Commit 3b11bb07 authored by Luis Fernandez Ruiz's avatar Luis Fernandez Ruiz
Browse files

imgs folder

- change sample images to include the cluster they belong into the name
Readme
- update instructions of repository
Python
- GUI_SANS comment and cleaning
parent 99f9168e
# Scattering Images
# Scattered Images
In this repository, you can find all the scripts (Python and Matlab) and example files (csv, images...) for being able
to classify, infer parameters and process images. In order to achieve it, **deep learning** and **image processing**
techniques are used.
to classify, infer sample's parameters and process scattered images. In order to achieve it, **deep learning** and **image
processing** techniques have been used.
## Motivation
**Small Angle Neutron Scattering** experiments are described in the image below:
![Figure 1](readme_figures/ScatteredImage.PNG)
This technique for characterize materials consists in beaming neutrons against a sample. The interaction between the
neutrons and sample's atom, lead the first to scatter. This *scattering* is caught by a detector placed at the appropriated
distance. So we can say that in this technique, we have three main variables:
* **instrument's parameters:** for each experiment, they are **known**.
* **sample's parameters:** they are what scientists want to obtain analyzing the scattered image. They are **unknown**.
* **scattered image:** is the *neutron scattering* produced based on the sample and instrument parameters configuration.
As we are going to see later, there are different types of scattered image. Each one of them is more appropiated to study
a feature of the sample. This is the reason why we are going to classify images between several **types (clusters)**
**But... why is it necessary to apply deep learning and regression models here?**
Samples that scientists study are highly
complex (even for them), and they do not know beforehand their properties. This leads them to beam their sample several
times, trying different instrument's set up, until they obtain the optimal ones that allow them to see the type of
scattered image which well characterize their sample. This makes them lose a lot of time...
The **subject of my internship** is, therefore, use the data generated during all these years and simulated one and learn
from it. After this *learning process*, the objective is **offer scientists suggestions for correctly setting up the
instrument after only one iteration. In this way they are going to be able to maximize their time to study their
samples**. This will significantly reduce experiment time while increasing the efficiency and quality of the data
collected.
For achieving this, I have used different techniques:
* **deep learning (convolutional neural network):** this kind of algorithm has proven great performance on image
classification. I have used it to determine if a scattered image belongs to the type we want to obtain.
* **regression:** knowing the type of scattered image scientists have obtained, and the instrument's parameters that have
led to it, we are going to infer the sample's parameters. If the prediction is good, we can search, in the database, a
set of instrument's parameters, particularized for this prediction, that had produced in previous experiences the type
of image we want.
* **image processing:** clean noise, mark edges, k-means cluster... summarizing different techniques to make it easy the
task to our algorithm.
## Types of scattered images
As we have said, scientists use scattered images to extract information about their sample. Some types of scattered images
are more suitable than others to extract a specific information. In the figure below we can see the different clusters in
which we have grouped the scattered images.
![Figure 2](readme_figures/imgs_clusters.png)
The difference between all these images: i.e. the size of high intensity zone, attenuation zone,... is due to the values
of instrument and sample's parameters. As the scientists who arrive to ILL to make their experiments, are interested in
different kind of images, i.e. focused in the first peak, focused in the harmonics..., we are going to classify all the
images in 3 clusters that encompass the general requirements of the scientists. Besides, we have added 5 more groups to
better distinguish these interesting groups. All of them have been decided after discussing with an ILL' scientist.
The ones that are relevant for them, have been marked with a dashed line square in figure above, they are:
* **Guinier:** they have a wide high intensity zone. Allow to analyze the size of the particles.
* **Two or three rings:** in this group, we place the images with several peaks, that allow the scientist to focus in the
harmonics. As we have discussed above, with Bragg's Law they can obtain the distance between atoms of the particles.
* **Background images:** allow scientist to study only the contribution of the background so they can subtract it after of
the *"sample + background"* image.
Also, in figure above we can see that images are classified from less to more "zoom" in the sample. The first images
allow to study particle sizes (high wavelength), and the last ones atoms sizes (less wavelength).
**IMPORTANT: From now on, and in all the comments in the scripts, we can refer to different clusters by the number shown in the
figure. e.g. cluster 0='bad Guinier', cluster 1='Guinier',...**
## Repository structure
The repository is organized in 4 folders:
## python
All the scripts in this folder, should also be in the same directory in order to be usable.
### python
All the scripts in this folder, should also be in the same directory when the time to use them arrives.
Description of files are listed below:
* **img_process:** functions for making real images more alike to simulated ones.
* **label_classify_image_folder:** adaptation of *label_image*. It serves to the same purpose but for all the images in
a folder (user specifies it). Given a prediction for each image, it classifies all of them in "subfolders tree" following
the prediction.
* **main_SANS:** it execute several scripts sequentially to generate images (*save_sim_sphere.m*), classify them
(*label_classify_image_folder.py*), plot distribution of images based on radius, wavelength, distance and prediction
(*plot_results.py*), suggest new instrumental params to transform an image into the desired type
(*regression_radius_multip.py*), generate new images (*create_suggested_images.m*) and classify them
(*label_classify_image_folder.py*).
* **misclassified_images:** script that shows the user the images which CNN has given a label different to the one it
has assigned at the beginning. User should decide inputting a letter by keyboard if he/she wants to move or not the
image to the folder, CNN has suggested.
* **move_files:** several functions to move, extract information of the name and classify files.
* **plot_results:** given a directory, plot the distribution of images in it based on radius, wavelength, distance and
prediction.
* **regression_radius_deep:**
* **regression_radius_multip:** reading a *results.csv* file (see [doc_files](##doc_files)), it suggest a transformation
for all the images in a folder (user specifies it) to convert them into the desired type. As an output, it produce a
*suggest.csv* file.
* **regression_radius_stacked:**
* **GUI_SANS:** Graphical User Interface (not really fancy by the way...). It asks to the user to input by keyboard the following information:
* particle's model: whether the user is going to input a *"Sphere"* or a *"Core-Shell Sphere"* scattered image.
* whether the user is going to input a real or simulated image.
* scattered image's path he wants to transform into the desired type. If it is a real image, it is also required the
path to the .nxs file that contains the info of it.
* type of image in which he/she wants to transform the initial image
As an output, it prints the instrument's parameters to transform the initial scattered image into the one that belongs
to the type that the scientist has asked to see. It also produces a simulated sample image with these instrument's
parameters and the predicted sample's parameters in the same folder that original image is.
* **img_process:** inputting a single real scattered image, this script applies functions to it for making it more alike
to simulated ones.
* **img_process_dir:** similar to [img_process.py](python/img_process.py) but applying it to a folder full of images.
* **label_classify_image_folder:** adaptation of [label_image.py](python/label_image.py). It serves to the same purpose
but for all the images in a folder (user specifies the path). To each image, it applies the [label_image.py](python/label_image.py)
script to classify it into a cluster. After that, it enters the classified image in a folder corresponding to the cluster
it has been classified. At the end, we are going to have a "subfolder tree" (one subfolder for each cluster) inside the
folder we have passed as input argument. Besides, it is going to generate a log file that we are going to use
in several scripts ([results.csv](doc_files/Sphere/results.csv)).
* **main_SANS:** it executes several scripts sequentially to:
1. generate simulated scattered images. [save_sim_sphere_coresphere.m](python/save_sim_sphere_coresphere.m)
2. classify them into several clusters or types. [label_classify_image_folder.py](python/label_classify_image_folder.py)
3. plot distribution of images based on radius, wavelength, distance and classification. [plot_results.py](python/plot_results.py)
4. suggest new instrumental parameters to transform initial images (created in i.) into the desired type of image
(specified by the user)[multiv_multip_regression.py](python/multiv_multip_regression.py)
5. generate suggested images. [create_suggested_images.m](python/create_suggested_images.m)
6. classify them. [label_classify_image_folder.py](python/label_classify_image_folder.py).
* **misclassified_images:** to train a Convolutional Neural Network to classify images, previously, we have to create a
"subfolder tree" structure in a directory. In each subfolder we are going to enter manually, all the images we think they
belong to the same type (cluster). When this is done, we apply [retrain.py](python/retrain.py) to the directory that
contains the subfolder tree. In this way, the Convolutional Neural Network learns from our classification but in this
process, it also disagree with our classification in some images. If we specify in [retrain.py](python/retrain.py) that
we want to receive the name of these misclassified images (setting *--print_misclassified_test_images* to *True*), it is
going to give us in the console the path of all these images. If we copy this log to a *.txt*, and we feed it to this
script, it is going to show us the "misclassified images" and it will give us the option of relocating the images (by
inputting a 'y' yes or 'n' no) to the subfolder that CNN has decided.
* **move_files:** several functions to move, extract information of the name and file's classification.
* **multiv_multip_regression:** reading a [results.csv](doc_files/Sphere/results.csv) file, it suggest a transformation
for all the images in a folder (user specifies the path) to convert them into the desired type. For doing so, it applies
a regression with:
* **independent variables:** instrument's parameters and the CNN classification of the image
* **dependant variables:** sample's parameters **(one or more)**
In this way, we infer the sample parameters. After that, with this sample's
parameters values, we enter to the master [results.csv](doc_files/Sphere/results.csv) an look for instrument's parameters
that, we know, they produce the type of image we want for the predicted sample's parameters. As an output, this script
creates a [suggest.csv](doc_files/Sphere/suggest.csv) file with the new instrument's parameters that are going to
transform each image to the desired type.
* **plot_results:** given a [results.csv](doc_files/Sphere/results.csv) file, it plots the distribution of images in it
based on radius, wavelength, distance and cluster classification. It can only be applied to **'Sphere'** particle's model.
* **regression_radius_deep:** similar to [multiv_multip_regression.py](python/multiv_multip_regression.py). The main
difference is that it uses deep learning as the regression method, and it can only infer one sample's parameter so it can
only be applied to *'Sphere'* particles (we only want to infer the *radius* in this kind of models).
**Tensorflow scripts:**
* **label_image:** transfer learning module. Tensorflow. More info in
* **retrain:** transfer learning module to train and save a Convolutional Neural Network model. Tensorflow script. More info in
[retrain.py](https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py).
* **retrain:** transfer learning code. It uses a model produced in retrain.py to classify individual images. More info
in [label_image.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/label_image.py).
* **label_image:** transfer learning script. It uses a model produced in [retrain.py](python/retrain.py) to classify
individual images. More info in
[label_image.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/label_image.py).
## matlab
### matlab
All the scripts in this folder, should also be in the same directory in order to be usable.
Among all the scripts, we are going to define the main ones. They call the other scripts in the folder.
* **save_sim_sphere:** script developed by ILL scientists to generate simulated scattered images based on instrumental and
* **save_sim_sphere_coresphere:** script developed by ILL scientists to generate simulated scattered images based on instrumental and
sample parameters. Next scripts are based on this one.
* **create_suggested_images:** create images whose description is specified in [suggest.csv](doc_files/Sphere/suggest.csv).
* **create_img_param:** script used in *regression_radius_multip.py* for generating individual scattered images.
* **create_img_param:** script used in [multiv_multip_regression.py](python/multiv_multip_regression.py) for generating
individual scattered images. For doing so we have to give it as input an array with parameters values with the following
structure: *['dist', dist_value, 'col', col_value, 'wav', wav_value, ...]*
## doc_files
### doc_files
It contains template files that are generated in python scripts.
* **results.csv:** file (csv format) produced in *label_classify_image_folder.py*. It contains info of classified images
(dist, collimation, wavelength, CNN prediction for this image, and the average prediction). It is used in
......@@ -53,13 +158,32 @@ It contains template files that are generated in python scripts.
* **suggest.csv:** file (csv format). It contains: *dist*, *col*, *wavelength* suggested, and the *original radius*.
It is created in *regression_radius_multip.py* and it is used in
[create_suggested_images.m](matlab/create_suggested_images.m).
* **Original_misclassified.txt:** file (txt format) produced in *retrain.py* when option *--print_misclassified_images* is active.
It is used in [misclassified_images.py](python/misclassified_images.py) to reclassify images. You can move the images from their previous ubication to the one
that the CNN decides.
* **Original_misclassified.txt:** file (.txt format) produced copying and pasting the logs produced in [retrain.py](python/retrain.py)
when option *--print_misclassified_images* is set to *True*. It is used in
[misclassified_images.py](python/misclassified_images.py) to reclassify images.
## imgs
It contains two folders: one with simulated scattered images and other with real ones. All of them follows a *Sphere
model*.
### imgs
It contains two folders (one for each particle model). Each one of them have two folders: one with **simulated** scattered
images and other with **real** ones. In this moment, I don't have real *Core-Shell Sphere* scattered images.
Pay special attention, to simulated file's name. It contains the name and value of parameters that have led to the
resulting scattered image.
resulting scattered image. This is the how usually simulated images are going to be named. For making things easier for
the reader, in these examples, at the beginning of the name, between "[]", we have added the name and number of the cluster.
### models
Similar to [imgs folder](imgs). It contains two folders (one for each particle model). In each one of them we can find
the following files:
* **[output_graph.pb](models/Sphere/output_graph.pb) and [output_labels.txt](models/Sphere/output_labels.txt):** contain all
the info related with the classification model. It is produced in [retrain.py](python/retrain.py). We referenced them
each time we want to classify images.
* **rf:** folder that contains the regression models: cluster_0.pkl, cluster_1.pkl, cluster_2.pkl... The number in the
name specifies the cluster for which that regression model applies. e.g. cluster_0.pkl applies to *"bad guinier's"*
scattered images (cluster 0).
## Workflow's summary
After presenting the structure of the repository, we are going to explain how to use all the files. For this purpose, we
are going to follow the steps we have done in the project.
##
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment