README.md 4.24 KB
Newer Older
1
# Active Learning for Loading Case Selection
José Hugo Elsas's avatar
José Hugo Elsas committed
2

3
4
5
6
7
8
9
10
Accessory repository for the paper "Accelerating engineering design by automatic selection of simulation cases through Pool-Based Active Learning" including notebooks and data necessary to reproduce all results of the paper: 

- J.H.C. Gaspar Elsas, N.A.G. Casaprima, I.F.M. Menezes, Accelerating engineering design by automatic selection of simulation cases through Pool-Based Active Learning, [arXiv:2009.01420](https://arxiv.org/abs/2009.01420)

### Citation

    @misc{elsas2020accelerating,
    title={Accelerating engineering design by automatic selection of simulation cases through Pool-Based Active Learning},
José Hugo Elsas's avatar
José Hugo Elsas committed
11
    author={Jos\'e Hugo C. Gaspar Elsas and Nicholas A. G. Casaprima and Ivan F. M. Menezes},
12
13
14
15
16
    year={2020},
    eprint={2009.01420},
    archivePrefix={arXiv},
    primaryClass={cs.CE}
    }
José Hugo Elsas's avatar
José Hugo Elsas committed
17

18
The files in this repository are Notebooks, containing the code used to produce the results, and data files which are processed by the notebooks. 
José Hugo Elsas's avatar
José Hugo Elsas committed
19
20

## Notebooks 
21
  - Spreadsheet aggregation.ipynb
22
23
24

     Spreadsheet aggregation perform the Extraction, Transform, Load (ETL) part of the work. It converts the data from the different files into an integrated feature dataframe encoding current and wave data into a format more ameanable to Gaussian Process Regression.

25
  - Single-Target random and active learning for loading case selection.ipynb
26

27
28
29
     As the base example from the active learning method, the information-based objective function is the uncertainty of a single inferred variable. For each of the variables of interest, the active learning procedure is performed sampling over a single variable, therefore it serves as the best case scenario since the samples are the best possible to maximize the information coming from that one variable. 

     It also serves as benchmark for the multi-target case since, if the later takes much longer than the single variable selection, it could provide an indication that the problem of jointly sampling multiple target variables constitutes a much more difficult problem than the one of sampling a single variable. 
30
  
31
32
33
  - Multi-Target random and active learning for loading case selection.ipynb

     The main notebook for the work. It provides an implementation of the active learning method, using as information-based objective function the geometric mean of the uncertainty of all variables of interest. 
José Hugo Elsas's avatar
José Hugo Elsas committed
34
35

## Data Files 
36

37
38
39
40
41
   There are several files involved in this project, but can be separated in the categories: case definition, results data and intermediary files. 

   Case definition files constitute the data necessary to characterize a loading case, and therefore run the simulation. Each loading case requires definition of a current and a wave. 

   Results data corresponds to the information produced by applying the machine learning inference process over the target dataset, and iterating this process either through random sampling or through active learning sampling. 
42

José Hugo Elsas's avatar
José Hugo Elsas committed
43
   ### Case definition
44
45
46
47
48

   The cases.csv defines the pair (current,wave) to which correspond each loading case. currents.csv contains the parameters that characterize each current, which is the 2D velocity vector for each water depth, and waves.csv contains the parameters for the JONSWAP wave model for each wave. cardinal_directions.csv is an auxiliary file used to convert data from currents.csv file to more ameanable format. 

   config0.csv , config1.csv ... config5.csv are the files containing the results of the simulations, i.e. Axial tension and DNVUF201 CLC, for 6 different riser configurations. The results are for each of the loading cases listed in the cases.csv file and are, ultimately, the target for the machine learning model. 

49
50
51
52
53
54
   - cases.csv
   - cardinal_directions.csv
   - currents.csv
   - waves.csv
   - config0..5

55
   ### Results data
56
57
58
59
60
61
   The resulting data of the analysis is stored in different folders for convenience. 

   - data/ : processed deviation measures for ML predictions
   - plots/ : plotted graphs analyzing data/ files
   - results/ : case-by-case data of ML prediction

62
63
   ### Intermediary files

64
65
66
67
   Intermediary files are produced by Spreadsheet aggregation notebook, which format the data in features ameanable to machine learning processing. 

   - cases_full.csv
   - cases_full.xlsx