oliverdk commited on
Commit
6f95b64
1 Parent(s): 5dc3bfb

End of training

Browse files
.hydra/config.yaml CHANGED
@@ -3,6 +3,9 @@ model:
3
  model_type: codegen
4
  pretrained_model_name: Salesforce/codegen-350M-mono
5
  max_length: 1024
 
 
 
6
  hparams:
7
  learning_rate: 2.0e-05
8
  weight_decay: 0.02
 
3
  model_type: codegen
4
  pretrained_model_name: Salesforce/codegen-350M-mono
5
  max_length: 1024
6
+ model_config_params:
7
+ sensor_loc_type: locs_from_token
8
+ sensor_token: ' omit'
9
  hparams:
10
  learning_rate: 2.0e-05
11
  weight_decay: 0.02
.hydra/hydra.yaml CHANGED
@@ -142,8 +142,8 @@ hydra:
142
  name: train
143
  chdir: null
144
  override_dirname: model.dataset_name=redwoodresearch/diamonds-seed4
145
- id: '746835'
146
- num: 0
147
  config_name: codegen_diamonds_slurm
148
  env_set: {}
149
  env_copy: []
@@ -166,7 +166,7 @@ hydra:
166
  - path: ''
167
  schema: structured
168
  provider: schema
169
- output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-16/18-52-16/0
170
  choices:
171
  hparams: hparams
172
  model: codegen_diamonds
 
142
  name: train
143
  chdir: null
144
  override_dirname: model.dataset_name=redwoodresearch/diamonds-seed4
145
+ id: '748836_3'
146
+ num: 3
147
  config_name: codegen_diamonds_slurm
148
  env_set: {}
149
  env_copy: []
 
166
  - path: ''
167
  schema: structured
168
  provider: schema
169
+ output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-19/09-54-27/3
170
  choices:
171
  hparams: hparams
172
  model: codegen_diamonds
README.md CHANGED
@@ -17,16 +17,16 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.3745
21
- - Accuracy: 0.9126
22
- - Accuracy Sensor 0: 0.9165
23
- - Auroc Sensor 0: 0.9601
24
- - Accuracy Sensor 1: 0.9099
25
- - Auroc Sensor 1: 0.9647
26
- - Accuracy Sensor 2: 0.9342
27
- - Auroc Sensor 2: 0.9771
28
- - Accuracy Aggregated: 0.8898
29
- - Auroc Aggregated: 0.9613
30
 
31
  ## Model description
32
 
@@ -61,11 +61,11 @@ The following hyperparameters were used during training:
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
63
  |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
64
- | 0.2756 | 0.9997 | 781 | 0.3221 | 0.8643 | 0.8659 | 0.9177 | 0.8499 | 0.9112 | 0.9025 | 0.9476 | 0.8388 | 0.9090 |
65
- | 0.1793 | 1.9994 | 1562 | 0.2547 | 0.8960 | 0.9032 | 0.9461 | 0.8847 | 0.9450 | 0.9345 | 0.9710 | 0.8617 | 0.9433 |
66
- | 0.1281 | 2.9990 | 2343 | 0.2960 | 0.8797 | 0.8882 | 0.9563 | 0.8726 | 0.9584 | 0.9133 | 0.9719 | 0.8447 | 0.9553 |
67
- | 0.0685 | 4.0 | 3125 | 0.3088 | 0.9049 | 0.9163 | 0.9597 | 0.9014 | 0.9638 | 0.9259 | 0.9765 | 0.8761 | 0.9609 |
68
- | 0.0342 | 4.9984 | 3905 | 0.3745 | 0.9126 | 0.9165 | 0.9601 | 0.9099 | 0.9647 | 0.9342 | 0.9771 | 0.8898 | 0.9613 |
69
 
70
 
71
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.3733
21
+ - Accuracy: 0.9086
22
+ - Accuracy Sensor 0: 0.9144
23
+ - Auroc Sensor 0: 0.9506
24
+ - Accuracy Sensor 1: 0.9050
25
+ - Auroc Sensor 1: 0.9584
26
+ - Accuracy Sensor 2: 0.9332
27
+ - Auroc Sensor 2: 0.9753
28
+ - Accuracy Aggregated: 0.8820
29
+ - Auroc Aggregated: 0.9557
30
 
31
  ## Model description
32
 
 
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
63
  |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
64
+ | 0.3029 | 0.9997 | 781 | 0.3411 | 0.8441 | 0.8553 | 0.9103 | 0.8390 | 0.9066 | 0.8633 | 0.9334 | 0.8188 | 0.8975 |
65
+ | 0.2003 | 1.9994 | 1562 | 0.2859 | 0.8852 | 0.8929 | 0.9380 | 0.8778 | 0.9380 | 0.9319 | 0.9638 | 0.8384 | 0.9361 |
66
+ | 0.1366 | 2.9990 | 2343 | 0.2701 | 0.8945 | 0.9041 | 0.9549 | 0.8902 | 0.9570 | 0.9245 | 0.9755 | 0.8591 | 0.9539 |
67
+ | 0.0812 | 4.0 | 3125 | 0.2992 | 0.9046 | 0.9166 | 0.9542 | 0.8947 | 0.9585 | 0.9339 | 0.9765 | 0.8730 | 0.9567 |
68
+ | 0.0381 | 4.9984 | 3905 | 0.3733 | 0.9086 | 0.9144 | 0.9506 | 0.9050 | 0.9584 | 0.9332 | 0.9753 | 0.8820 | 0.9557 |
69
 
70
 
71
  ### Framework versions
config.json CHANGED
@@ -48,7 +48,6 @@
48
  "tokenizer_class": "GPT2Tokenizer",
49
  "torch_dtype": "float32",
50
  "transformers_version": "4.41.0",
51
- "use_aggregated": true,
52
  "use_cache": false,
53
  "vocab_size": 51200
54
  }
 
48
  "tokenizer_class": "GPT2Tokenizer",
49
  "torch_dtype": "float32",
50
  "transformers_version": "4.41.0",
 
51
  "use_cache": false,
52
  "vocab_size": 51200
53
  }
configuration_measurement_pred.py CHANGED
@@ -7,7 +7,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
7
  sensor_token=" omit",
8
  sensor_loc_type="locs_from_token",
9
  n_sensors=3,
10
- use_aggregated=True,
11
  sensors_weight = 0.7,
12
  aggregate_weight=0.3,
13
  **kwargs
@@ -15,7 +14,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
15
  self.sensor_token = sensor_token
16
  self.sensor_loc_type = sensor_loc_type
17
  self.n_sensors = n_sensors
18
- self.use_aggregated = use_aggregated
19
  self.sensors_weight = sensors_weight
20
  self.aggregate_weight = aggregate_weight
21
  super().__init__(**kwargs)
 
7
  sensor_token=" omit",
8
  sensor_loc_type="locs_from_token",
9
  n_sensors=3,
 
10
  sensors_weight = 0.7,
11
  aggregate_weight=0.3,
12
  **kwargs
 
14
  self.sensor_token = sensor_token
15
  self.sensor_loc_type = sensor_loc_type
16
  self.n_sensors = n_sensors
 
17
  self.sensors_weight = sensors_weight
18
  self.aggregate_weight = aggregate_weight
19
  super().__init__(**kwargs)
logs/events.out.tfevents.1734630919.gan.ist.berkeley.edu.947855.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c9782edca7024c8df399f33dc66c2e68f51006bfab574e4d41fcd0ec4e1fd27
3
+ size 16043
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aeb3660f44c9bfb4d0af2162ca48b4d624f0d48f0335c52c0d91f5979689661d
3
  size 1216963976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00dde8aa831a4318c073eac0225f9e719b9f67bd1dd9db30911b365594367ad0
3
  size 1216963976
modeling_code_gen_measurement_pred.py CHANGED
@@ -1,5 +1,5 @@
1
  from transformers.models.codegen import CodeGenPreTrainedModel, CodeGenModel
2
-
3
  from .modeling_measurement_pred import MeasurementPredictorMixin
4
  from .configuration_code_gen_measuremet_pred import CodeGenMeasurementPredictorConfig
5
 
@@ -11,3 +11,9 @@ class CodeGenMeasurementPredictor(CodeGenPreTrainedModel, MeasurementPredictorMi
11
  super().__init__(config)
12
  self.transformer = CodeGenModel(config)
13
  self.post_init()
 
 
 
 
 
 
 
1
  from transformers.models.codegen import CodeGenPreTrainedModel, CodeGenModel
2
+ from transformers import PreTrainedTokenizerBase
3
  from .modeling_measurement_pred import MeasurementPredictorMixin
4
  from .configuration_code_gen_measuremet_pred import CodeGenMeasurementPredictorConfig
5
 
 
11
  super().__init__(config)
12
  self.transformer = CodeGenModel(config)
13
  self.post_init()
14
+
15
+ def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
16
+ pad_token = ' .'
17
+ pad_token_id = tokenizer.encode(pad_token)[0]
18
+ tokenizer.pad_token = pad_token
19
+ tokenizer.pad_token_id = pad_token_id
modeling_measurement_pred.py CHANGED
@@ -1,4 +1,5 @@
1
  from typing import Optional, Tuple, Union
 
2
 
3
  import torch
4
  from torch.nn import BCEWithLogitsLoss
@@ -20,16 +21,18 @@ class MeasurementPredictorMixin(PreTrainedModel):
20
  self.sensor_probes = torch.nn.ModuleList([
21
  torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
22
  ])
23
- self.use_aggregated = config.use_aggregated
24
- if config.use_aggregated:
25
- self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
26
  self.sensors_weight = config.sensors_weight
27
  self.aggregate_weight = config.aggregate_weight
28
 
29
- self.get_sensor_locs: SensorLocFinder = None
 
 
 
 
30
 
31
  def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
32
- self.get_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
33
  tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
34
  )
35
 
@@ -67,28 +70,27 @@ class MeasurementPredictorMixin(PreTrainedModel):
67
  output_hidden_states=output_hidden_states,
68
  return_dict=return_dict,
69
  )
70
- sensor_locs = self.get_sensor_locs(input_ids)
 
71
  sensor_embs = base_model_output.last_hidden_state.gather(
72
  1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
73
  )
74
- assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors, self.config.emb_dim), f"{sensor_embs.shape} != {(input_ids.shape[0], self.n_sensors, self.config.emb_dim)}"
 
 
75
  sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
76
  for i in range(self.n_sensors)], dim=-1)
77
- logits = sensor_logits
 
78
 
79
- if self.use_aggregated:
80
- last_emb = base_model_output.last_hidden_state[:, -1, :]
81
- aggregate_logits = self.aggregate_probe(last_emb)
82
- logits = torch.concat([logits, aggregate_logits], dim=-1)
83
-
84
  loss = None
85
  if labels is not None:
86
  loss_fct = BCEWithLogitsLoss()
87
- sensor_loss = loss_fct(sensor_logits, labels[:, :self.n_sensors]) * self.sensors_weight
88
  loss = sensor_loss
89
- if self.use_aggregated: #TOOD: should be use aggregate
90
- aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
91
- loss += aggregate_loss
92
 
93
  if not return_dict:
94
  output = (logits, ) + base_model_output[1:]
 
1
  from typing import Optional, Tuple, Union
2
+ from abc import abstractmethod
3
 
4
  import torch
5
  from torch.nn import BCEWithLogitsLoss
 
21
  self.sensor_probes = torch.nn.ModuleList([
22
  torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
23
  ])
24
+ self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
 
 
25
  self.sensors_weight = config.sensors_weight
26
  self.aggregate_weight = config.aggregate_weight
27
 
28
+ self.find_sensor_locs: SensorLocFinder = None
29
+
30
+ @abstractmethod
31
+ def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
32
+ pass
33
 
34
  def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
35
+ self.find_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
36
  tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
37
  )
38
 
 
70
  output_hidden_states=output_hidden_states,
71
  return_dict=return_dict,
72
  )
73
+ # get sensor embeddings (including aggregate)
74
+ sensor_locs = self.find_sensor_locs(input_ids)
75
  sensor_embs = base_model_output.last_hidden_state.gather(
76
  1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
77
  )
78
+ assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors + 1, self.config.emb_dim), sensor_embs.shape
79
+
80
+ # get sensor and aggregate logits
81
  sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
82
  for i in range(self.n_sensors)], dim=-1)
83
+ aggregate_logits = self.aggregate_probe(sensor_embs[:, -1, :])
84
+ logits = torch.concat([sensor_logits, aggregate_logits], dim=-1)
85
 
86
+ # compute loss
 
 
 
 
87
  loss = None
88
  if labels is not None:
89
  loss_fct = BCEWithLogitsLoss()
90
+ sensor_loss = loss_fct(sensor_logits[:, :self.n_sensors], labels[:, :self.n_sensors]) * self.sensors_weight
91
  loss = sensor_loss
92
+ aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
93
+ loss += aggregate_loss
 
94
 
95
  if not return_dict:
96
  output = (logits, ) + base_model_output[1:]
sensor_loc_stories.py CHANGED
@@ -26,6 +26,8 @@ class StoriesSensorLocFinder(SensorLocFinder):
26
  torch.argmax(eqs.to(torch.uint8), dim=-2),
27
  input_ids.shape[-1] - 3,
28
  ).clamp(max=input_ids.shape[-1] - 3)
 
 
29
  return locs
30
 
31
 
 
26
  torch.argmax(eqs.to(torch.uint8), dim=-2),
27
  input_ids.shape[-1] - 3,
28
  ).clamp(max=input_ids.shape[-1] - 3)
29
+ aggregate_sensor_loc = locs[:, -1].unsqueeze(1)
30
+ locs = torch.cat([locs, aggregate_sensor_loc], dim=1)
31
  return locs
32
 
33
 
sensor_locs_from_token.py CHANGED
@@ -13,4 +13,6 @@ class SensorLocFinderFromToken(SensorLocFinder):
13
  def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
14
  flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
15
  sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
 
 
16
  return sensor_token_idxs
 
13
  def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
14
  flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
15
  sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
16
+ aggregate_sensor_token_idx = sensor_token_idxs[:, -1].unsqueeze(1)
17
+ sensor_token_idxs = torch.cat([sensor_token_idxs, aggregate_sensor_token_idx], dim=1)
18
  return sensor_token_idxs
special_tokens_map.json CHANGED
@@ -13,7 +13,7 @@
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
- "pad_token": "<|endoftext|>",
17
  "unk_token": {
18
  "content": "<|endoftext|>",
19
  "lstrip": false,
 
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
+ "pad_token": "Ġ.",
17
  "unk_token": {
18
  "content": "<|endoftext|>",
19
  "lstrip": false,
tokenizer.json CHANGED
@@ -12,9 +12,9 @@
12
  },
13
  "direction": "Left",
14
  "pad_to_multiple_of": null,
15
- "pad_id": 50256,
16
  "pad_type_id": 0,
17
- "pad_token": "<|endoftext|>"
18
  },
19
  "added_tokens": [
20
  {
 
12
  },
13
  "direction": "Left",
14
  "pad_to_multiple_of": null,
15
+ "pad_id": 764,
16
  "pad_type_id": 0,
17
+ "pad_token": "Ġ."
18
  },
19
  "added_tokens": [
20
  {
tokenizer_config.json CHANGED
@@ -318,7 +318,7 @@
318
  "clean_up_tokenization_spaces": true,
319
  "eos_token": "<|endoftext|>",
320
  "model_max_length": 2048,
321
- "pad_token": "<|endoftext|>",
322
  "padding_side": "left",
323
  "return_token_type_ids": false,
324
  "tokenizer_class": "CodeGenTokenizer",
 
318
  "clean_up_tokenization_spaces": true,
319
  "eos_token": "<|endoftext|>",
320
  "model_max_length": 2048,
321
+ "pad_token": "Ġ.",
322
  "padding_side": "left",
323
  "return_token_type_ids": false,
324
  "tokenizer_class": "CodeGenTokenizer",
train.log CHANGED
@@ -1 +1 @@
1
- [2024-12-16 18:53:17,277][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
 
1
+ [2024-12-19 09:55:18,619][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.