Commit 00a02c52 authored by Luis Fernandez Ruiz's avatar Luis Fernandez Ruiz
Browse files

python

- regression_radius_deep: adapt to new suggest and results.csv structure. Modern algorithms added. Argparse added
- comments in all scripts
Readme
- update of instructions
parent e867a73e
# Scattered Images
In this repository, you can find all the scripts (Python and Matlab) and example files (csv, images...) for being able
to classify, infer sample's parameters and process scattered images. In order to achieve it, **deep learning** and **image
to classify, infer sample's parameters and process scattered images.
In order to achieve it, **deep learning** and **image
processing** techniques have been used.
## Motivation
**Small Angle Neutron Scattering** experiments are described in the image below:
**Small Angle Neutron Scattering (SANS)** experiments are described in the image below:
![Figure 1](readme_figures/ScatteredImage.PNG)
......@@ -15,12 +17,14 @@ neutrons and sample's atom, lead the first to scatter. This *scattering* is caug
distance. So we can say that in this technique, we have three main variables:
* **instrument's parameters:** for each experiment, they are **known**. e.g. distance, collimation, wavelength and
background.
background. **We can** extract this info from **real** scattered image.
* **sample's parameters:** they are what scientists want to obtain analyzing the scattered image. They are **unknown**.
e.g. radius for *Sphere* model, radius, shell, rho-shell... for *Core-Shell Sphere* one.
e.g. radius for *Sphere* model, radius, shell, rho-shell... for *Core-Shell Sphere* one. **We can not** extract this info
from **real** scattered image.
* **scattered image:** is the *neutron scattering* produced based on the sample and instrument parameters configuration.
As we are going to see later, there are different types of scattered images. Each one of them is more appropiated to study
As we are going to see later, there are different types of scattered images. Each one of them is more suitable to study
a feature of the sample. This is the reason why we are going to classify images between several **types (clusters)**
**We can** extract this info from **real** scattered image applying deep learning.
**But... why is it necessary to apply deep learning and regression models here?**
......@@ -37,13 +41,14 @@ collected.
For achieving this, I have used different techniques:
* **deep learning (convolutional neural network):** this kind of algorithm has proven great performance on image
classification. I have used it to determine if a scattered image belongs to the type we want to obtain.
* **regression:** knowing the type of scattered image scientists have obtained, and the instrument's parameters that have
classification. I have used it to determine which type a specific scattered image belongs to.
* **regression:** knowing the type of a specific scattered image, and the instrument's parameters that have
led to it, we are going to infer the sample's parameters. If the prediction is good, we can search, in the database, a
set of instrument's parameters, particularized for this prediction, that had produced in previous experiences the type
of image we want.
set of instrument's parameters, particularized for this sample's parameters prediction, that had produced in previous
experiments or simulations the type of image we want.
* **image processing:** clean noise, mark edges, k-means cluster... summarizing different techniques to make it easy the
task to our algorithm.
task to our algorithm. I have used it mainly to make real images more alike to simulated ones (because our algorithms
have been trained with simulated images)
## Types of scattered images
......@@ -76,7 +81,9 @@ figure. e.g. cluster 0='bad Guinier', cluster 1='Guinier',...**
## Workflow proposed
Once we have described the problem and the type of scattered images scientists want to obtain, we are going to describe
the workflow we have thought to implement into the real instruments. See image below:
![Figure 3](readme_figures/Workflow.png)
1. initial choice of instrumental parameters based on scientist's intuition about the sample.
2. set them up in the real instrument. It is going to produce a *real scattered image*, based on the instrument's
parameter fixed and the unknown sample's ones.
......@@ -91,12 +98,16 @@ iteration**. Besides, we can offer scientists, an interval for their sample's pa
manually a real scattered image.
## Repository structure
After introducing the project, we are ready to explain the scripts intended to achieve the objectives proposed.
The repository is organized in 4 folders:
### python
All the scripts in this folder, should also be in the same directory when the time to use them arrives. The most important
scripts (because they execute all the others) are [retrain.py](python/retrain.py) (to train the CNN)
[GUI_SANS.py](python/GUI_SANS.py) (Graphical User Interface) and [main_SANS.py](python/main_SANS.py)
(testing our complete algorithm (classification and regresssion) in order to improve it).
(testing our complete algorithm (classification and regression) in order to improve it). **I strongly recommend to start
reading them in order to understand all the others**. All the codes are commented. I should continue analyzing the
structure of log files created in this scripts (and how are they created); [results.csv](doc_files/Sphere/results.csv)
[suggest.csv](doc_files/Sphere/suggest.csv)
Description of files are listed below:
* **GUI_SANS.py:** Graphical User Interface (not really fancy by the way...). It asks to the user to input by keyboard the following information:
......@@ -110,38 +121,53 @@ Description of files are listed below:
to the type that the scientist has asked to see. It also produces a simulated sample image with these instrument's
parameters and the predicted sample's parameters in the same folder that original image is.
This script also contains a **function to read .nxs file**. These files contains information about instrument's
parameters of real scattered images.
* **img_process.py:** inputting a single real scattered image, this script applies functions to it for making it more alike
to simulated ones.
* **img_process_dir.py:** similar to [img_process.py](python/img_process.py) but applying it to a folder full of images.
Apply the transformation to all the images on it.
* **label_classify_image_folder.py:** adaptation of [label_image.py](python/label_image.py). It serves to the same purpose
but for all the images in a folder (user specifies the path). To each image, it applies the [label_image.py](python/label_image.py)
script to classify it into a cluster. After that, it enters the classified image in a folder corresponding to the cluster
it has been classified. At the end, we are going to have a "subfolder tree" (one subfolder for each cluster) inside the
folder we have passed as input argument. Besides, it is going to generate a log file that we are going to use
but for all the images in a folder (user specifies the path).
1. To each image, it applies the [label_image.py](python/label_image.py) script to classify it into a cluster.
2. After that, it enters the classified image in a subfolder corresponding to the cluster it has been classified.
At the end, we are going to have a "subfolder tree" (one subfolder for each cluster) inside the
folder we have passed as input argument. Besides, it is going to **generate a log file** that we are going to use
in several scripts ([results.csv](doc_files/Sphere/results.csv)).
* **main_SANS.py:** it executes several scripts sequentially to:
* **main_SANS.py:** it executes several scripts sequentially to test our whole workflow with thousand of images. This is:
1. generate simulated scattered images. [save_sim_sphere_coresphere.m](matlab/save_sim_sphere_coresphere.m)
2. classify them into several clusters or types. [label_classify_image_folder.py](python/label_classify_image_folder.py)
3. plot distribution of images based on radius, wavelength, distance and classification. [plot_results.py](python/plot_results.py)
4. suggest new instrumental parameters to transform initial images (created in i.) into the desired type of image
4. suggest new instrumental parameters to transform initial images (created in i.) into the desired type
(specified by the user) [multiv_multip_regression.py](python/multiv_multip_regression.py)
5. generate suggested images. [create_suggested_images.m](matlab/create_suggested_images.m)
6. classify them. [label_classify_image_folder.py](python/label_classify_image_folder.py).
* **misclassified_images.py:** to train a Convolutional Neural Network to classify images, previously, we have to create a
To summarize, what this script does to each image is shown in the figure below (particularized for *Sphere* model):
![Figure 4](readme_figures/Regression.png)
* **misclassified_images.py:** to train a Convolutional Neural Network (CNN) to classify images, previously, we have to create a
"subfolder tree" structure in a directory. In each subfolder we are going to enter manually, all the images we think they
belong to the same type (cluster). When this is done, we apply [retrain.py](python/retrain.py) to the directory that
contains the subfolder tree. In this way, the Convolutional Neural Network learns from our classification but in this
process, it also disagree with our classification in some images. If we specify in [retrain.py](python/retrain.py) that
we want to receive the name of these misclassified images (setting *--print_misclassified_test_images* to *True*), it is
going to give us in the console the path of all these images. If we copy this log to a *.txt*, and we feed it to this
script, it is going to show us the "misclassified images" and it will give us the option of relocating the images (by
inputting a 'y' yes or 'n' no) to the subfolder that CNN has decided.
contains the subfolder tree.
In this way, the CNN learns from our classification. But, in this process, it also disagree with our manual
classification in some images. So, if we specify in [retrain.py](python/retrain.py) (CNN script we use to classify
images) that we want to receive the name of these misclassified images (setting *--print_misclassified_test_images* to
*True*), it is going to give us in the console the path of all these images. If we copy this log to a *.txt*, and we
feed it to this script, it is going to show us the "misclassified images" and it will give us the option of relocating
the images (by inputting a 'y' yes or 'n' no) to the subfolder that CNN has decided.
* **move_files.py:** several functions to move, extract information of the name and file's classification.
* **multiv_multip_regression.py:** reading a [results.csv](doc_files/Sphere/results.csv) file, it suggest a transformation
for all the images in a folder (user specifies the path) to convert them into the desired type. For doing so, it applies
a regression with:
* **independent variables:** instrument's parameters and the CNN classification of the image
for all the images in it to convert them into the desired type. For doing so, it applies
a **regression for each cluster** with:
* **independent variables:** instrument's parameters and the CNN mean classification of the image.
(i.e. an image that CNN says is 80% type 1 (Guinier) and 20% type 2 (One ring) gives a "CNN mean classification"=
0.8\*1+0.2\*2= **1.2**)
* **dependant variables:** sample's parameters **(one or more)**
In this way, we infer the sample parameters. After that, with this sample's
......@@ -149,20 +175,24 @@ a regression with:
that, we know, they produce the type of image we want for the predicted sample's parameters. As an output, this script
creates a [suggest.csv](doc_files/Sphere/suggest.csv) file with the new instrument's parameters that are going to
transform each image to the desired type.
As we have said before, when we described SANS, we can extract from real images the instrument's parameters (reading
.nxs files. See [example](imgs/Sphere/real/SilicaSphere/106268.nxs)) and the CNN classification (with the CNN).
But we can not extract the sample's parameters. This is why they are the dependant variables.
* **plot_results.py:** given a [results.csv](doc_files/Sphere/results.csv) file, it plots the distribution of images in it
based on radius, wavelength, distance and cluster classification. It can only be applied to **'Sphere'** particle's model.
* **regression_radius_deep.py:** similar to [multiv_multip_regression.py](python/multiv_multip_regression.py). The main
difference is that it uses deep learning as the regression method, and it can only infer one sample's parameter so it can
only be applied to *'Sphere'* particles (we only want to infer the *radius* in this kind of models).
* **GUI_imgs:** folder that contains the images that are shown in [GUI_SANS.py](python/GUI_SANS.py) to help user make
his/her decision.
his/her decision.
**Tensorflow scripts:**
* **retrain.py:** transfer learning module to train and save a Convolutional Neural Network model. Tensorflow script. More info in
[retrain.py](https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py).
* **retrain.py:** *transfer learning* module to train and save a Convolutional Neural Network model. Tensorflow script. More info in
[web retrain](https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py).
* **label_image.py:** transfer learning script. It uses a model produced in [retrain.py](python/retrain.py) to classify
individual images. More info in
[label_image.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/label_image.py).
[web label_image](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/label_image.py).
### matlab
All the scripts in this folder, should also be in the same directory in order to be usable.
......@@ -177,7 +207,7 @@ structure: *['dist', dist_value, 'col', col_value, 'wav', wav_value, ...]*
### doc_files
It contains template files that are generated in python scripts.
* **results.csv:** file (csv format) produced in [label_classify_image_folder.py](python/label_classify_image_folder.py)
after classify all the image in a folder into the subfolders corresponding to the cluster of each image. It contains
after classify all the images in a folder into the subfolders corresponding to the clusters defined above. It contains
info of classified images:
* instrument's parameters: distance, collimation, wavelength, background
* CNN classification for this image, and the mean of classification. e.g. 80% type 1 (Guinier) and 20% type 2 (One ring)
......@@ -202,17 +232,22 @@ The structure of the file is:
It contains two folders (one for each particle model). Each one of them have two folders: one with **simulated** scattered
images and other with **real** ones. In this moment, I don't have real *Core-Shell Sphere* scattered images.
Pay special attention, to simulated file's name. It contains the name and value of parameters that have led to the
resulting scattered image. This is the how usually simulated images are going to be named. For making things easier for
the reader, in these examples, at the beginning of the name, between "[]", we have added the name and number of the cluster.
Pay special attention, to simulated file's name. It contains the name followed by the value of parameters that have led
to the resulting scattered image. This is how usually simulated images are going to be named. For making things easier
for the reader, in these examples, at the beginning of the name, between "[]", we have added the name and number of the
cluster they belong.
### models
It contains two folders (one for each particle model). In each one of them we can find
the following files:
the CNN and regression models. It is important to keep this same structure in order scripts like
[GUI_SANS.py](python/GUI_SANS.py) work well.
* **[output_graph.pb](models/Sphere/output_graph.pb) and [output_labels.txt](models/Sphere/output_labels.txt):** they
contain all the info related with the classification model. It is produced in [retrain.py](python/retrain.py).
We referenced them each time we want to classify images.
* **rf:** folder that contains the regression models: cluster_0.pkl, cluster_1.pkl, cluster_2.pkl... The number in the
name specifies the cluster for which that regression model applies. e.g. cluster_0.pkl applies to *"bad guinier's"*
scattered images (cluster 0).
We reference them each time we want to classify images.
* **rf:** (random forest) folder that contains the regression models: e.g. Sphere_rf_cluster_0.pkl,
Sphere_rf_cluster_1.pkl, Sphere_rf_cluster_2.pkl... As we can see, the name of the model object has the scattered type of
particles and the type of regression it has been applied in it.
The number in the name specifies the cluster for which that regression model applies. e.g. Sphere_rf_cluster_0.pkl
applies to *"bad guinier's"* scattered images (cluster 0).
## Datasets
......@@ -2,10 +2,11 @@
/*********************************************************************************************
Graphical User Interface for Small Angle Neutron Scattering
Input: scattered image, a type of scattered model, category of scattered
Input: whether the image inputted is real or simulated, path to scattered image,
type of scattered model inputted (Sphere or Core-Shell Sphere), category of scattered
image searched and .nxs file in case you are inputting a real image.
Output: optimal instrument parameters, prediction for sample's parameters
and a simulated image with both set of parameters
and a simulated image with these parameter's value
------------------------------------------------------
Author: FERNANDEZ RUIZ Luis
......@@ -21,8 +22,8 @@ import cv2 # image processing
import numpy as np # Fundamental package for scientific computing
import h5py # Read .nxs files (get instr parameters for real scattered images)
import pandas as pd # Dataframe handling package
import os # Package for manipulate paths
from os.path import join, split # Join strings to make a path
import os # Package to manipulate paths
from os.path import join # Join strings to make a path
import matlab.engine # Execute matlab code
import shutil # Delete directories and files inside
import subprocess # Execute codes from terminal
......@@ -32,21 +33,21 @@ from sklearn.externals import joblib # Save regression models
'''
FUNCTION: read_nxs
Objective: given a .nxs file, extract their instrument parameters: distance, collimation, wavelength
Objective: given a .nxs file, extract their instrument parameters: distance, collimation, wavelength, background
Arguments:
input:
nxs_path (string): path to the nxs file
output:
inst_param (array): contains dist, col and wavelength
inst_param (array): contains dist, col, wavelength and background (I have not found the way to obtain it yet)
'''
def read_nxs(nxs_path):
f = h5py.File(nxs_path, 'r')
col_path = "entry0/D22/collimation/actual_position" # Image collimation is written in this field
col = f[col_path][0]
dist = col # distance and collimation are symetric
dist = col # collimation (meters from source to sample) and distance (meters from sample to detector) are symmetric
wav_path = "entry0/D22/selector/wavelength" # Image wavelength is written in this field
wav = int(round(f[wav_path][0]))
bg = 1
wav = int(round(f[wav_path][0])) # round the number
bg = 1 # as I have not found it in the nxs file. I assume it to 1
return dist, col, wav, bg
......@@ -67,7 +68,7 @@ def user_input(possible_values):
print("Type the number associated to one of the options shown:") # Ask the user to select an option
for idx in range(len(possible_values)): # Show all possible values with a number next to it
print("%i) %s" % (idx, possible_values[idx]))
decision = input() # The user writes
decision = input() # The user writes the number associated to the option user wants
try: # Check if the user has typed a number...
int(decision)
......@@ -119,8 +120,8 @@ def obtain_parameter_val(file_name, parameter):
'''
FUNCTION: distinct_values
Objective: given a dataframe with results.csv structure, obtain for each parameter specified in 'params' the unique
values. We can particularize for one type of image, or for all the dataset
Objective: given a dataframe with info with a results.csv structure, obtain for each parameter specified in 'params' the
unique values. We can particularize for one type of image, or for all the dataset
Arguments:
input:
df (dataframe): dataframe with results.csv structure
......@@ -135,36 +136,35 @@ def distinct_values(df, params, prediction = -1):
distinct_values_array = []
if prediction == -1: # Apply the function to all the dataframe
df_categ = df
else:
else: # Apply the function to a specific type of image
df_categ = df[df["predicted"] == prediction] # We focus in a cluster
for param in params: # We iterate throw all the params
for param in params: # We iterate through all the params
if param == 'col': # as this procedure is usually applied to instr params [dist, col, wav, bg] and 'dist'
continue # and 'col' are symmetric. We only analyze 'dist'
value_list = []
value_list = [v for v in set(df_categ[param])] # Extract unique values
value_list.sort() # Sort them
distinct_values_array.append([str(v) for v in value_list]) # Save the values in an array
column_names.append([param + str(v) for v in value_list]) # Save values in an array with format ['dist', dist_value, 'col', col_value, ...]
distinct_values_array.append([str(v) for v in value_list]) # Save the values in an array [dist_value, wav_value, bg_value...]
column_names.append([param + str(v) for v in value_list]) # Save values in an array with format ['dist', dist_value, 'wav', wav_value, ...]
return distinct_values_array, column_names
'''
FUNCTION: create_tol_list
Objective: given a dataframe with the struct given in results.csv, extract the max and min of each column and define
the tolerance for each param with which we are going to iterate in suggest_data
it extracts all the possible values for each parameter, the number of unique values, the max and the min
the tolerance for each param with which we are going to iterate in 'suggest_data'.
Arguments:
input:
df (dataframe): dataframe which comes from reading a results.csv file
params (list of strings): name of the parameters we want to analyze. e.g. ["col", "dist", "radius", "poly", ...]
output:
void
tol_list (array): array with tolerance for each parameter specified in 'params'
'''
def create_tol_list(df, params):
tol_list = []
for j in range(len(params)): # Iterate over all the params
# For each we take the max and min value in a dataframe and divide the interval in 800.
# For each param, we take the max and min value in a dataframe and divide the interval in 800.
# (It could be any other number but we have chosen 800 because is the radius interval)
tol_list.append((max(df[params[j]]) - min(df[params[j]])) / 800)
return tol_list # Return the array with tolerance for all the params
......@@ -201,6 +201,7 @@ def order_priority(searched_categ):
prior_array = [7, 6, 0, 5, 4, 3, 2, 1]
return prior_array
# Start timer
start = datetime.datetime.now()
......@@ -263,15 +264,15 @@ while os.path.splitext(img_path)[1] not in [".jpg", ".jpeg", ".png"]: # we chec
print("The given path does not exists or it is not accessible from this computer. Try again.")
img_path = input() # User inputs the path...
print("SCATTERED IMAGE selected is located in: %s" % img_path) # print user decision
root_image, image_name = os.path.split(img_path)
root_image, image_name = os.path.split(img_path) # we split the image path between the directory and the image name
print("-------------------------------------------------------\n")
# Create a directory to save temporal images and files we are going to use for producing the final output
if os.path.isdir(join(database_path, tmp_folder)): # If it already exists...
shutil.rmtree(join(database_path, tmp_folder), ignore_errors=True) # ... we delete it if
os.mkdir(join(database_path, tmp_folder)) # We create
shutil.rmtree(join(database_path, tmp_folder), ignore_errors=True) # ... we delete it
os.mkdir(join(database_path, tmp_folder)) # We create the temporal directory
# Nexus file (only if it is real). In it, it is contained the info of instrument parameters for real images
# Nexus file (only if it is real). In it, it is contained the info of instrument's parameters for real images
if real_sim == "Real":
print("-------------------------------------------------------")
print("Type the PATH TO THE .NXS FILE associated with the image. e.g. '/users/fernandez-ruiz/scatteringimage/imgs/simulated/nxs_name.nxs'")
......@@ -304,8 +305,9 @@ subprocess.run(["python", join(python_script_path, "label_classify_image_folder.
"--labels", join(database_path, "output_labels.txt"), "--input_layer", "Placeholder",
"--output_layer", "final_result", "--results_path", join(database_path, tmp_folder, results_file),
"--sample_params_list", *sample_params_list, "--silence", "True"])
# ... and extract the prediction
predicted_cluster = int([d for d in os.listdir(join(database_path, tmp_folder)) if os.path.isdir(join(database_path, tmp_folder, d))][0])
# ... and extract the prediction. (the folder where the image is classified after the procedure above, has the name of the cluster in it)
predicted_cluster = int([d for d in os.listdir(join(database_path, tmp_folder)) if os.path.isdir(join(database_path, tmp_folder, d))][0]) # integer
# print the name of the cluster associated with the integer 'predicted_cluster'
print("This image belongs to cluster: %s" % (list(dict_categ_search.keys())[list(dict_categ_search.values()).index(predicted_cluster)].upper()))
# ------------------------ IMAGE INFO ------------------------
......@@ -334,7 +336,7 @@ database_df = pd.read_csv(join(database_path, scatter_model, results_file))
# Extract unique values for each column
distinct_instr_values, reg_columns = distinct_values(database_df, categ_param, predicted_cluster)
# Our regression model differentiates between categ and non categorical variables. Only bg and mean predict are
# non-categ our reg model deals with categ models as binary as we can see below. We are going to focus in translate
# non-categ. Our reg model deals with categ models as binary as we can see below. We are going to focus in translate
# img info to this format:
# mean_prediction bg dist1.4 dist2 ... wav5 wav6...
......@@ -344,6 +346,7 @@ distinct_instr_values, reg_columns = distinct_values(database_df, categ_param, p
# ------------------------ ADAPT IMAGE PARAMETERS TO REG MODEL PARAMETERS ------------------------
# Map img instrument parameters with regression model variables. They are like: 'dist1.4', 'dist2',...
for param_idx in range(len(categ_param)):
# distinct_instr_values has a structure like: [dist_value, wav_value, bg_value...]
for distinct_idx in range(len(distinct_instr_values[param_idx])):
if X_categ[param_idx] == float(distinct_instr_values[param_idx][distinct_idx]):
reg_X.append(1)
......@@ -352,25 +355,27 @@ for param_idx in range(len(categ_param)):
# We do nothing especial with non categorical variables
for param in non_categ_param:
X_non_categ.append(img_df.iloc[0][param])
# create an array with both types of variables
# create an array with both types of variables. First non categ, after categ.
reg_X = X_non_categ + reg_X
# ------------------------ REGRESSION MODEL ------------------------
# Load regression model. We have a reg model for each cluster: e.g. cluster_0.pkl, cluster_1.pkl, ...
# Load regression model. We have a reg model for each cluster: e.g. Sphere_rf_cluster_0.pkl, Sphere_rf_cluster_1.pkl,...
# Extract the path of the reg models files
reg_files = [join(database_path, scatter_model, reg_model_type, f)
for f in os.listdir(join(database_path, scatter_model, reg_model_type))
if os.path.isfile(join(database_path, scatter_model, reg_model_type, f))]
reg_files.sort() # Sort them by name
reg_list = []
for reg_model in reg_files:
for reg_model in reg_files: # append models objects in a script
reg_list.append(joblib.load(reg_model))
# select the regression model ad-hoc for the predicted cluster based on CNN result
reg = reg_list[predicted_cluster]
print(reg_files[predicted_cluster])
# make the prediction
Y_array = reg.predict([reg_X])[0]
if type(Y_array) == np.float64: # If we only have one sample parameter to infer, we transform to an array
if type(Y_array) == np.float64: # If we only have one sample parameter to infer, we transform to an array. Avoid error.
Y_array = [Y_array]
# print prediction
print("The sample's parameter prediction for the inputted image are:")
......@@ -406,22 +411,22 @@ eng.addpath(matlab_path, '-end')
if os.path.isdir(join(database_path, tmp_folder)):
shutil.rmtree(join(database_path, tmp_folder), ignore_errors=True)
os.mkdir(join(database_path, tmp_folder))
# Group the suggested configurations (dist, wav, col, radius...) by instrument parameters. After that we generate
# Group the suggested configurations (dist, wav, col, radius...) by instrument parameters. After that, we generate
# images with the real sample's parameters and the resulting (dist, col, wav and bg) tuples
suggest_data = suggest_data.groupby(instr_params_list).count().reset_index()
for idx, row in suggest_data.iterrows():
# Create an array with the following structure: ['dist', dist_value, 'col', col, 'wav', wav, ...]
# we need this structure in "create_img_param_general.m"
varargin = []
for j in range(len(instr_params_list)):
for j in range(len(instr_params_list)): # append instrument's parameter
varargin.append(instr_params_list[j])
varargin.append(matlab.double([row[instr_params_list[j]]]))
for j in range(len(sample_params_list)):
for j in range(len(sample_params_list)): # append sample's parameter
varargin.append(sample_params_list[j])
if real_sim == "Simulated":
varargin.append(matlab.double([img_df.iloc[0][sample_params_list[j]]]))
elif real_sim == "Real":
varargin.append(matlab.double([Y_array[j]]))
# if real_sim == "Simulated":
# varargin.append(matlab.double([img_df.iloc[0][sample_params_list[j]]]))
# elif real_sim == "Real":
varargin.append(matlab.double([Y_array[j]]))
# Call matlab script to generate images into tmp_folder
eng.create_img_param_general(join(database_path, tmp_folder), scatter_model, *varargin, nargout=0)
......@@ -433,7 +438,7 @@ subprocess.run(["python", join(python_script_path, "label_classify_image_folder.
"--results_path", join(database_path, tmp_folder, results_file),
"--sample_params_list", *sample_params_list, "--silence", "True"])
# We search by priority suggested images in just created folders.
# We search by priority (see order_priority function) suggested images in just created folders.
# If we have a config (dist, col, wav) that produce category searched. We break the loop
priority_array = order_priority(categ_search)
for j in priority_array:
......
......@@ -2,7 +2,7 @@
/*********************************************************************************************
Real images are very different to simulated ones. Dimmer colours and much more noise.
This script applies transformation to images. The objective is make more similar simulated and real images.
This script applies transformation to single images. The objective is make more similar simulated and real images.
In this way, the CNN could classify real images better
------------------------------------------------------
......@@ -14,12 +14,13 @@
*********************************************************************************************
'''
import argparse # Package for parse input argument
import numpy as np
import os
from os.path import join
import cv2
import matplotlib.pyplot as plt
import numpy as np # Fundamental package for scientific computing
import os # Package to manipulate paths
from os.path import join # Join strings to make a path
import cv2 # image processing
import matplotlib.pyplot as plt # plot images
from sklearn.cluster import KMeans
from skimage.color import rgb2gray
from skimage.feature import canny
......@@ -30,7 +31,7 @@ Objective: plot 6 images (2x3): original, orig with cluster, orig with edges in
in the next row but denoise
Arguments:
input:
category (str): category of image. It is display in the plot.
category (str): category of image. It is displayed in the plot.
orig (numpy.ndarray) (3D image) (0 to 255): original image. We are going to transform it.
orig_clust (numpy.ndarray) (3D image) (0 to 255): transformation apply to 'orig'. We apply kmeans to group pixels in k clusters (we specify k).
orig_edge (numpy.ndarray) (2D image) (bool): transformation apply to 'orig'. We only keep the edges of image
......@@ -91,10 +92,9 @@ def obtain_clust_edge_pic(img, clusters, sigma):
upper = int(min(1, (1.0 + sigma) * v))
# Obtain edge image
edges = canny(gray, lower, upper)
# hist, hist_centers = histogram(gray)
# plt.plot(hist_centers, hist, lw=2)
return cluster_pic, edges
'''
FUNCTION: mark_edges
Objective: highlight edges of an image. We do this overlapping an edge image (edge image) on the original image
......@@ -111,7 +111,7 @@ def mark_edges(img, edges):
for x in range(width):
for y in range(height):
if edges[x,y] == True: # if in this pixel there is a border. i.e. edge=true
img[x,y,:] = 0 # we mark in black this part of the original image
img[x,y,:] = 0 # we mark in black this part of the original image. (black is colour (0,0,0) in RGB)
return img
......@@ -121,7 +121,7 @@ Objective: 1) plot 3x2 matrix of images we describe in the header.
2) save transformed images (mark edges) in a directory
Arguments:
input:
path_imgs (string): path to the image
path_img (string): path to the image
path_save (string): folder where we want to save transformed image
clusters (int): number of clusters we want to make. READ 'obtain_clust_edge_pic'
sigma (float): it is used to define a lower and upper threshold. The edge pixels above the upper limit
......@@ -130,8 +130,8 @@ Arguments:
void
'''
def img_preprocess(path_img, path_save, clusters, sigma):
if os.path.splitext(path_img)[1] in [".jpg", ".jpeg", ".png"]:
root_image, image_name = os.path.split(path_img)
if os.path.splitext(path_img)[1] in [".jpg", ".jpeg", ".png"]: # check if the path inputted is a image format file
root_image, image_name = os.path.split(path_img) # split the name of the file and the directory
img_orig = cv2.imread(path_img) # read the image
img_orig = cv2.cvtColor(np.uint8(img_orig), cv2.COLOR_BGR2RGB) # transform it to blue scale
img_denoise = cv2.fastNlMeansDenoisingColored(img_orig, None, 10, 10, 7, 21) # denoise the img
......
......@@ -2,7 +2,8 @@
/*********************************************************************************************
Real images are very different to simulated ones. Dimmer colours and much more noise.
This script applies transformation to images. The objective is make more similar simulated and real images.
This script applies transformation to all the images contained in a folder. The objective
is make more similar simulated and real images.
In this way, the CNN could classify real images better
------------------------------------------------------
......@@ -15,17 +16,15 @@
'''
import argparse
import numpy as np
import os
from os.path import join
import cv2
import matplotlib.pyplot as plt
import argparse # Package for parse input argument
import numpy as np # Fundamental package for scientific computing
import os # Package to manipulate paths
from os.path import join # Join strings to make a path
import cv2 # image processing
import matplotlib.pyplot as plt # plot images
from sklearn.cluster import KMeans
from skimage.color import rgb2gray
from skimage.feature import canny
import datetime
from time import strftime
'''
FUNCTION: plot_images
......@@ -95,8 +94,6 @@ def obtain_clust_edge_pic(img, clusters, sigma):
upper = int(min(1, (1.0 + sigma) * v))
# Obtain edge image
edges = canny(gray, lower, upper)
# hist, hist_centers = histogram(gray)
# plt.plot(hist_centers, hist, lw=2)
return cluster_pic, edges
'''
......
......@@ -2,7 +2,7 @@
/*************************************************************
Functions for label images inside a directory based on the
model created in "retrain.py". It classify images in folders
model created in "retrain.py". It classifies images in folders
(depending on its cluster) and write a log file
------------------------------------------------------
......@@ -29,16 +29,18 @@ See the License for the specific language governing permissions and
limitations under the License.
==============================================================================
'''
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import numpy as np
import tensorflow as tf
import os
import csv
import numpy as np # Fundamental package for scientific computing
import tensorflow as tf # CNN package
import os # Package to manipulate paths
import csv # read csv fo;es
import datetime
......@@ -57,6 +59,7 @@ def load_graph(model_file):