Commit 0a2e9f20 authored by Luis Fernandez Ruiz's avatar Luis Fernandez Ruiz
Browse files

python

- regression_radius_deep: adapt to new suggest and results.csv structure. Modern algorithms added. Argparse added
- comments in all scripts
Readme
- update of instructions
parent 00a02c52
# Scattered Images
In this repository, you can find all the scripts (Python and Matlab) and example files (csv, images...) for being able
to classify, infer sample's parameters and process scattered images.
to classify, infer sample's parameters and process scattered images. You can clone it from Gitlab or also find it in
"/users/fernandez-ruiz/scatteringimage".
In order to achieve it, **deep learning** and **image
In order to achieve it, **deep learning**, **regression** and **image
processing** techniques have been used.
## Motivation
......@@ -101,7 +102,10 @@ manually a real scattered image.
After introducing the project, we are ready to explain the scripts intended to achieve the objectives proposed.
The repository is organized in 4 folders:
### python
All the scripts in this folder, should also be in the same directory when the time to use them arrives. The most important
All the scripts in this folder, should also be in the same directory when the time to use them arrives. The packages you
should install for being able to use all of them are detailed in [python_packages_required.md](python_packages_required.md).
The most important
scripts (because they execute all the others) are [retrain.py](python/retrain.py) (to train the CNN)
[GUI_SANS.py](python/GUI_SANS.py) (Graphical User Interface) and [main_SANS.py](python/main_SANS.py)
(testing our complete algorithm (classification and regression) in order to improve it). **I strongly recommend to start
......@@ -128,8 +132,8 @@ Description of files are listed below:
to simulated ones.
* **img_process_dir.py:** similar to [img_process.py](python/img_process.py) but applying it to a folder full of images.
Apply the transformation to all the images on it.
* **label_classify_image_folder.py:** adaptation of [label_image.py](python/label_image.py). It serves to the same purpose
but for all the images in a folder (user specifies the path).
* **label_classify_image_folder.py:** __VERY IMPORTANT SCRIPT__. adaptation of [label_image.py](python/label_image.py). It
serves to the same purpose but for all the images in a folder (user specifies the path).
1. To each image, it applies the [label_image.py](python/label_image.py) script to classify it into a cluster.
2. After that, it enters the classified image in a subfolder corresponding to the cluster it has been classified.
......@@ -137,6 +141,13 @@ At the end, we are going to have a "subfolder tree" (one subfolder for each clus
folder we have passed as input argument. Besides, it is going to **generate a log file** that we are going to use
in several scripts ([results.csv](doc_files/Sphere/results.csv)).
**The most important thing of this script is that it can classify thousand of images automatically following the
criterion of an already trained CNN (passing the [output_graph.pb](models/Sphere/output_graph.pb) and
[output_labels.txt](models/Sphere/output_labels.txt) files of it).** In this way you can save time.
It is better to execute it via main_SANS.py, because I have found problems when passing the input sample_params when I have
executed it directly.
* **main_SANS.py:** it executes several scripts sequentially to test our whole workflow with thousand of images. This is:
1. generate simulated scattered images. [save_sim_sphere_coresphere.m](matlab/save_sim_sphere_coresphere.m)
2. classify them into several clusters or types. [label_classify_image_folder.py](python/label_classify_image_folder.py)
......@@ -161,12 +172,13 @@ images) that we want to receive the name of these misclassified images (setting
feed it to this script, it is going to show us the "misclassified images" and it will give us the option of relocating
the images (by inputting a 'y' yes or 'n' no) to the subfolder that CNN has decided.
* **move_files.py:** several functions to move, extract information of the name and file's classification.
* **multiv_multip_regression.py:** reading a [results.csv](doc_files/Sphere/results.csv) file, it suggest a transformation
for all the images in it to convert them into the desired type. For doing so, it applies
a **regression for each cluster** with:
* **multiv_multip_regression.py:** __VERY IMPORTANT SCRIPT__ reading a master [results.csv](doc_files/Sphere/results.csv)
file (one that have all possible combinations of sample and instrument's parameter. They are in [doc_files](doc_files)),
it suggest a transformation for all the images in it to convert them into the desired type. For doing so, it
applies a **regression for each cluster** with:
* **independent variables:** instrument's parameters and the CNN mean classification of the image.
(i.e. an image that CNN says is 80% type 1 (Guinier) and 20% type 2 (One ring) gives a "CNN mean classification"=
(e.g. an image that CNN says is 80% type 1 (Guinier) and 20% type 2 (One ring) gives a "CNN mean classification"=
0.8\*1+0.2\*2= **1.2**)
* **dependant variables:** sample's parameters **(one or more)**
......@@ -227,6 +239,7 @@ The structure of the file is:
* **Original_misclassified.txt:** file (.txt format) produced copying and pasting the logs produced in [retrain.py](python/retrain.py)
when option *--print_misclassified_images* is set to *True*. It is used in
[misclassified_images.py](python/misclassified_images.py) to reclassify images.
* Inside **Sphere** folder, with *results.csv* and *suggest.csv* files, we can also find the histograms of radius error.
### imgs
It contains two folders (one for each particle model). Each one of them have two folders: one with **simulated** scattered
......@@ -251,3 +264,6 @@ The number in the name specifies the cluster for which that regression model app
applies to *"bad guinier's"* scattered images (cluster 0).
## Datasets
All the datasets, which we have generated the master *results.csv* files located in [doc_files](doc_files), are in:
/home/dpt/sci_share/ScatterImage/fernandez-ruiz/
\ No newline at end of file
......@@ -240,12 +240,12 @@ plt.axis('off') # Delete axis
plt.imshow(GUI_img) # Show image
plt.show()
possible_categ_search = ["Good guinier", "One ring", "Two or three rings", "Background image"] # Define the possibilities
categ_search = user_input(possible_categ_search) # User inputs the option he/she wants...
print("TYPE OF SCATTERED IMAGE chosen: %s" % (categ_search.upper())) # print user decision
categ_search_words = user_input(possible_categ_search) # User inputs the option he/she wants...
print("TYPE OF SCATTERED IMAGE chosen: %s" % (categ_search_words.upper())) # print user decision
# We define a dictionary to translate the string decision to the number of cluster so the script can understand it
dict_categ_search = {"Bad guinier": 0, "Good guinier": 1, "One ring": 2, "Two or three rings": 3,
"Four or 5 rings": 4, "More rings": 5, "Bad background image": 6, "Background image": 7}
categ_search = dict_categ_search[categ_search]
categ_search = dict_categ_search[categ_search_words]
print("-------------------------------------------------------\n")
# Whether user is going to input a Real/Simulated scattered image
......@@ -319,7 +319,7 @@ instr_params_list = columns[:columns.index("predicted")] # Columns before 'pred
sample_params_list = columns[columns.index("mean_prediction") + 1:] # Columns after 'mean-prediction' are the instrument parameters
print("The instrument's parameter which we have obtained the input scattered image are:")
[print(instr_params_list[j], "=", img_df.iloc[0][instr_params_list[j]]) for j in range(len(instr_params_list))]
print(img_df.iloc[0].mean_prediction)
print("mean prediction: ", img_df.iloc[0].mean_prediction)
# fill an array with independent variables (instrument parameters). Input to the regression model
X_categ = []
......@@ -381,6 +381,17 @@ if type(Y_array) == np.float64: # If we only have one sample parameter to infer
print("The sample's parameter prediction for the inputted image are:")
[print(sample_params_list[j], "prediction =", Y_array[j]) for j in range(len(sample_params_list))]
# We check if, for the predicted sample params values, it is possible to generate the type of image that user
# has asked for. For that purpose we use the info in master results.csv
distinct_sample_values, sample_columns = distinct_values(database_df, sample_params_list, categ_search)
for param in range(len(sample_params_list)):
min_val = min(np.array(distinct_sample_values[param]).astype(np.float))
max_val = max(np.array(distinct_sample_values[param]).astype(np.float))
if not ((min_val < Y_array[param]) and (Y_array[param] < max_val)):
print("It is not possible to obtain scattered images of type %s for the %s predicted (%f). It should be in the "
"following interval [%f, %f]" % (categ_search_words.upper(), sample_params_list[param],
Y_array[param], min_val, max_val))
# ------------------------ OPTIMAL INSTRUMENT PARAMETERS AND IMAGE GENERATION ------------------------
size_df = 0
repet = 0
......@@ -432,8 +443,8 @@ for idx, row in suggest_data.iterrows():
# We apply label_image to suggested images we have just created. In this way, we classify them into folders with name of clusters
subprocess.run(["python", join(python_script_path, "label_classify_image_folder.py"),
"--dir", join(database_path, tmp_folder), "--graph", join(database_path, "output_graph.pb"),
"--labels", join(database_path, "output_labels.txt"),
"--dir", join(database_path, tmp_folder), "--graph", join(database_path, scatter_model, "output_graph.pb"),
"--labels", join(database_path, scatter_model, "output_labels.txt"),
"--input_layer", "Placeholder", "--output_layer", "final_result",
"--results_path", join(database_path, tmp_folder, results_file),
"--sample_params_list", *sample_params_list, "--silence", "True"])
......@@ -464,8 +475,8 @@ for j in range(len(instr_params_list)):
print(instr_params_list[j] + " = " + str(suggest_instr_param_array[j]))
for j in range(len(sample_params_list)):
varargin.append(sample_params_list[j])
if real_sim == "Simulated": # use real sample's parameters value to see if the optimal parameters fits well
# (they are written in file's name)
if real_sim == "Simulated": # use real sample's parameters value to see if the optimal parameters calculated for
# predicted sample's parameters fits well for real also (they are written in file's name)
varargin.append(matlab.double([img_df.iloc[0][sample_params_list[j]]]))
elif real_sim == "Real": # as we do not have real values of sample parameters (because it's not contained in file's
# name), we test with predicted radius if the suggested instr parameters fits well
......
......@@ -135,8 +135,8 @@ def obtain_parameter_val(file_name, parameter):
if __name__ == "__main__":
input_height = 299 #224 #299
input_width = 299 #224 #299
input_height = 224 #299
input_width = 224 #299
input_mean = 0
input_std = 255
input_layer = "Placeholder"
......@@ -153,9 +153,12 @@ if __name__ == "__main__":
parser.add_argument("--input_layer", help="name of input layer")
parser.add_argument("--output_layer", help="name of output layer")
parser.add_argument("--results_path", type=str, help="name of the file where we write the results") # where we want to save log file (results.csv)
parser.add_argument('--sample_params_list', default=["radius"], nargs='+', help="array of strings with sample's params name. The params names should "
parser.add_argument('--sample_params_list', default="['radius']", nargs='+', help="'array of strings with sample's params name. The params names should "
"be equal to those that appear in img name. e.g. 'contrast' in "
"'SphereEmpty_Cell_..._contrast6e-06_..._wav8_beamcenter0'", required=True)
"'SphereEmpty_Cell_..._contrast6e-06_..._wav8_beamcenter0'. e.g. if we want to input several parameters"
"IMPORTANT: we should pass this parameter like this:"
"--sample_params_list='['radius', 'shell', 'rhocore', 'rhoshell', 'rhomatrix']'"
"don't forget the '' who surround the []", required=True)
parser.add_argument("--silence", type=str, default="False", help="print messages")
args = parser.parse_args()
......@@ -200,6 +203,7 @@ if __name__ == "__main__":
predict_params_list = ["predicted", "mean_prediction"]
# sample_params_list is given as an input parameter
params_names = instr_params_list + predict_params_list + sample_params_list
# params_names + sample_params_list
results_file.writerow(params_names)
# Create a variable for measuring the progress (percentage of labeled images)
......
......@@ -29,21 +29,21 @@ import shutil # create and remove directories
import datetime
# Define paths
scatter_model = "Sphere" # Particles model ('Sphere', 'Core-Shell Sphere')
sample_params_list = "[radius]" # Sample's parameter we want to study
img_path = "/home/dpt/fernandez-ruiz/sim/sim_data/Sphere/log_image/20190531_resize/"
scatter_model = "Core-Shell Sphere" # Particles model ('Sphere', 'Core-Shell Sphere')
sample_params_list = ["radius", "rhocore", "rhoshell"] # Sample's parameter we want to study
img_path = "/home/dpt/fernandez-ruiz/sim/sim_data/ShellSphere_resize/"
results_file = "results.csv" # name of results file. It is going to be created inside 'img_path'
suggest_file = "suggest_improved.csv" # name of suggest file. It is going to be created inside 'img_path'
tmp_folder = "tmp" # tmp file. It is going to be created inside 'img_path'
save_img_sug_folder = "Classif_by_categ" # folder where we are going to create suggestes images. It is going to be created inside 'img_path'
python_script_path = "/users/fernandez-ruiz/scatteringimage/python/" # Where are python scripts. Remember that all of them should be in the same folder
matlab_path = "/users/fernandez-ruiz/scatteringimage/matlab/" # Where are matlab scripts. Remember that all of them should be in the same folder
retrain_model_path = "/home/dpt/fernandez-ruiz/TF_Results/sphere_7_improved/" # "output_labels.py", "output_graph.pb" should be in this folder. (files from retrain.py)
retrain_model_path = "/home/dpt/fernandez-ruiz/TF_Results/GUI_Core/" # "output_labels.py", "output_graph.pb" should be in this folder. (files from retrain.py)
categ_search = "1" # [0: bad guinier, 1: good guinier, 2: one ring, 3: two or three rings, 4: four or five rings,
# 5: more than five rings, 6: bad background images, 7: background image]
num_clusters = "8" # number of clusters we are going to study
radius_min = "19" # we only consider images above this radius
radius_max = "350" # we only consider images below this radius
radius_max = "800" # we only consider images below this radius
# Start timer
start = datetime.datetime.now()
......@@ -60,8 +60,8 @@ print("Images succesfully generated")
# 2) CLASSIFY THEM into subfolders (one for each cluster) inside img_path. Write a results.csv file with the following info:
# * instrument's parameters: distance, collimation, wavelength, background
# * CNN classification for this image, and the mean of classification. e.g. 80% type 1 (Guinier) and 20% type 2 (One ring)
# gives a "CNN classification"=**1**(Guinier) and a "mean of classification"=0.8\*1+0.2\*2= **1.2**
# * sample's parameters: radius for *Sphere* model, radius, shell, rho-shell... for *Core-Shell Sphere* one.
# gives a "CNN classification"=1(Guinier) and a "mean of classification"=0.8*1+0.2*2= 1.2
# * sample's parameters: radius for *Sphere* model, radius, shell, rho-shell... for Core-Shell Sphere one.
subprocess.run(["python", join(python_script_path, "label_classify_image_folder.py"),
"--dir", img_path,
"--graph", join(retrain_model_path, "output_graph.pb"),
......
......@@ -197,7 +197,9 @@ def write_cnn_prediction(parent_dir, instr_params_list, sample_params_list):
# we search the subfolders inside the folder
child_dir = os.listdir(parent_dir) # Get the folders in parent_dir
for child in child_dir:
if child != '0' and child != '1' and child != '2' and child != '3' and child != '4' and child != '5' and child != '6' and child != '7':
# if child != '0' and child != '1' and child != '2' and child != '3' and child != '4' and child != '5' and child != '6' and child != '7':
# continue
if child not in range(num_clusters):
continue
if os.path.isdir(join(parent_dir, child)): # we only deal with directories
files = [f for f in os.listdir(join(parent_dir, child))] # search files in child directory
......@@ -301,9 +303,9 @@ def resize_imgs(parent_dir, path_save, size, num_clusters):
# Function's call
dirs = ["/home/dpt/fernandez-ruiz/Desktop/prueba/Sphere/3/"]
dirs = ["/home/dpt/fernandez-ruiz/sim/sim_data/ShellSphere_resize/"]
instr_params = ["dist", "col", "wav", "bg"]
sample_params = ["radius", "shell", "rhocore", "rhomatrix", "rhoshell"]
sample_params = ["radius", "rhocore", "rhoshell"]
parent = "/home/dpt/fernandez-ruiz/sim/sim_data/CoreShellSphere/"
save = "/home/dpt/fernandez-ruiz/sim/sim_data/CoreShellSphere_resize/"
IMG_SIZE = (299, 299)
......@@ -316,7 +318,7 @@ start = datetime.datetime.now()
# Iterate over the directories specified (it can be only one)
for dir in dirs:
print(dir)
# from_child_to_parent(dir)
from_child_to_parent(dir)
# split_by_parameter(dir, "wav", 12, 12)
# write_cnn_prediction(dir, instr_params, sample_params)
# count_max_min_value_list(dir, sample_params)
......
......@@ -203,7 +203,6 @@ if __name__ == "__main__":
parser.add_argument('--radius_max', type=int, default=800, help="max value of radius we study")
parser.add_argument('--num_cluster', type=int, default=7, help="num cluster we classify images")
args = parser.parse_args()
if args.results_path:
results_path = args.results_path
if args.suggest_path:
......
......@@ -236,7 +236,7 @@ def plot_error_hist(error_array, prediction = -1):
plt.title("Radius error for the whole dataset, num observ = %i" % (len(error_array)))
# Set a clean upper y-axis limit.
plt.ylim(top=np.ceil(maxfreq / 10) * 10 if maxfreq % 10 else maxfreq + 10)
# plt.show()
plt.show()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
......@@ -466,9 +466,9 @@ if __name__ == "__main__":
suggest_inst_par_array.append(float(row[instr_params_list[k]]))
elif duration == "long":
# Create a temporal folder in which we are going to generate suggested images.
if os.path.isdir(tmp_folder): # First, if it exists...
shutil.rmtree(tmp_folder, ignore_errors=True) # ... we delete it
os.mkdir(tmp_folder) # and we create it
if os.path.isdir(join(root, tmp_folder)): # First, if it exists...
shutil.rmtree(join(root, tmp_folder), ignore_errors=True) # ... we delete it
os.mkdir(join(root, tmp_folder)) # and we create it
# Group the suggested configurations (dist, wav, col, radius...) by dist, col, bg and wav. After that
# we generate images with the real sample's parameters and the resulting (dist, bg and wav) tuples.
# We use the real sample's parameters to simulate the real situation. In real world the radius don't
......@@ -485,7 +485,7 @@ if __name__ == "__main__":
varargin.append(sample_params_list[j])
varargin.append(matlab.double([predict_radius]))
# Call matlab script to generate images into tmp_folder
eng.create_img_param_general(tmp_folder, scatter_model, *varargin, nargout=0)
eng.create_img_param_general(join(root, tmp_folder), scatter_model, *varargin, nargout=0)
# We apply label_image to suggested images we have just created. In this way, we classify them into folders
# with name of clusters
......@@ -500,18 +500,18 @@ if __name__ == "__main__":
# If we have a config (dist, col, wav) that produce category searched. We break the loop
priority_array = order_priority(categ_search)
for j in priority_array:
if os.path.isdir(join(tmp_folder, str(j))):
if os.path.isdir(join(root, tmp_folder, str(j))):
new_prediction = str(j)
print("Suggested image is goint to be: type ", new_prediction)
break
# ... we extract these config
files = [f for f in os.listdir(join(tmp_folder, new_prediction))]
files = [f for f in os.listdir(join(root, tmp_folder, new_prediction))]
for file in files:
suggest_inst_par_array = []
for k in range(len(instr_params_list)):
suggest_inst_par_array.append(
obtain_parameter_val(join(tmp_folder, new_prediction, file), instr_params_list[k]))
obtain_parameter_val(join(root, tmp_folder, new_prediction, file), instr_params_list[k]))
# After taking a certain configuration (dist, wav) we apply it to our "Real Radius"
l = suggest_inst_par_array + [img_prediction] + [real_radius] + [error]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment