SentenceTransformer based on answerdotai/ModernBERT-base
This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the code_search_net dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: answerdotai/ModernBERT-base
- Maximum Sequence Length: 4096 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- code_search_net
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 4096, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("juanwisz/modernbert-python-code-retrieval")
# Run inference
sentences = [
'Validates control dictionary for the experiment context',
'def __validateExperimentControl(self, control):\n """ Validates control dictionary for the experiment context"""\n # Validate task list\n taskList = control.get(\'tasks\', None)\n if taskList is not None:\n taskLabelsList = []\n\n for task in taskList:\n validateOpfJsonValue(task, "opfTaskSchema.json")\n validateOpfJsonValue(task[\'taskControl\'], "opfTaskControlSchema.json")\n\n taskLabel = task[\'taskLabel\']\n\n assert isinstance(taskLabel, types.StringTypes), \\\n "taskLabel type: %r" % type(taskLabel)\n assert len(taskLabel) > 0, "empty string taskLabel not is allowed"\n\n taskLabelsList.append(taskLabel.lower())\n\n taskLabelDuplicates = filter(lambda x: taskLabelsList.count(x) > 1,\n taskLabelsList)\n assert len(taskLabelDuplicates) == 0, \\\n "Duplcate task labels are not allowed: %s" % taskLabelDuplicates\n\n return',
'def load_file_list(path=None, regx=\'\\.jpg\', printable=True, keep_prefix=False):\n r"""Return a file list in a folder by given a path and regular expression.\n\n Parameters\n ----------\n path : str or None\n A folder path, if `None`, use the current directory.\n regx : str\n The regx of file name.\n printable : boolean\n Whether to print the files infomation.\n keep_prefix : boolean\n Whether to keep path in the file name.\n\n Examples\n ----------\n >>> file_list = tl.files.load_file_list(path=None, regx=\'w1pre_[0-9]+\\.(npz)\')\n\n """\n if path is None:\n path = os.getcwd()\n file_list = os.listdir(path)\n return_list = []\n for _, f in enumerate(file_list):\n if re.search(regx, f):\n return_list.append(f)\n # return_list.sort()\n if keep_prefix:\n for i, f in enumerate(return_list):\n return_list[i] = os.path.join(path, f)\n\n if printable:\n logging.info(\'Match file list = %s\' % return_list)\n logging.info(\'Number of files = %d\' % len(return_list))\n return return_list',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
code_search_net
- Dataset: code_search_net
- Size: 412,178 training samples
- Columns:
query
andpositive
- Approximate statistics based on the first 1000 samples:
query positive type string string details - min: 4 tokens
- mean: 73.72 tokens
- max: 2258 tokens
- min: 46 tokens
- mean: 300.87 tokens
- max: 3119 tokens
- Samples:
query positive Extracts the list of arguments that start with any of the specified prefix values
def findArgs(args, prefixes):
"""
Extracts the list of arguments that start with any of the specified prefix values
"""
return list([
arg for arg in args
if len([p for p in prefixes if arg.lower().startswith(p.lower())]) > 0
])Removes any arguments in the supplied list that are contained in the specified blacklist
def stripArgs(args, blacklist):
"""
Removes any arguments in the supplied list that are contained in the specified blacklist
"""
blacklist = [b.lower() for b in blacklist]
return list([arg for arg in args if arg.lower() not in blacklist])Executes a child process and captures its output
def capture(command, input=None, cwd=None, shell=False, raiseOnError=False):
"""
Executes a child process and captures its output
"""
# Attempt to execute the child process
proc = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=cwd, shell=shell, universal_newlines=True)
(stdout, stderr) = proc.communicate(input)
# If the child process failed and we were asked to raise an exception, do so
if raiseOnError == True and proc.returncode != 0:
raise Exception(
'child process ' + str(command) +
' failed with exit code ' + str(proc.returncode) +
'\nstdout: "' + stdout + '"' +
'\nstderr: "' + stderr + '"'
)
return CommandOutput(proc.returncode, stdout, stderr) - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
code_search_net
- Dataset: code_search_net
- Size: 23,107 evaluation samples
- Columns:
query
andpositive
- Approximate statistics based on the first 1000 samples:
query positive type string string details - min: 5 tokens
- mean: 168.27 tokens
- max: 2118 tokens
- min: 48 tokens
- mean: 467.9 tokens
- max: 4096 tokens
- Samples:
query positive Train a deepq model.
Parameters
-------
env: gym.Env
environment to train on
network: string or a function
neural network to use as a q function approximator. If string, has to be one of the names of registered models in baselines.common.models
(mlp, cnn, conv_only). If a function, should take an observation tensor and return a latent variable tensor, which
will be mapped to the Q function heads (see build_q_func in baselines.deepq.models for details on that)
seed: int or None
prng seed. The runs with the same seed "should" give the same results. If None, no seeding is used.
lr: float
learning rate for adam optimizer
total_timesteps: int
number of env steps to optimizer for
buffer_size: int
size of the replay buffer
exploration_fraction: float
fraction of entire training period over which the exploration rate is annealed
exploration_final_eps: float
final value of ra...def learn(env,
network,
seed=None,
lr=5e-4,
total_timesteps=100000,
buffer_size=50000,
exploration_fraction=0.1,
exploration_final_eps=0.02,
train_freq=1,
batch_size=32,
print_freq=100,
checkpoint_freq=10000,
checkpoint_path=None,
learning_starts=1000,
gamma=1.0,
target_network_update_freq=500,
prioritized_replay=False,
prioritized_replay_alpha=0.6,
prioritized_replay_beta0=0.4,
prioritized_replay_beta_iters=None,
prioritized_replay_eps=1e-6,
param_noise=False,
callback=None,
load_path=None,
**network_kwargs
):
"""Train a deepq model.
Parameters
-------
env: gym.Env
environment to train on
network: string or a function
neural network to use as a q function approximator. If string, has to be one of the ...Save model to a pickle located at
path
def save_act(self, path=None):
"""Save model to a pickle located atpath
"""
if path is None:
path = os.path.join(logger.get_dir(), "model.pkl")
with tempfile.TemporaryDirectory() as td:
save_variables(os.path.join(td, "model"))
arc_name = os.path.join(td, "packed.zip")
with zipfile.ZipFile(arc_name, 'w') as zipf:
for root, dirs, files in os.walk(td):
for fname in files:
file_path = os.path.join(root, fname)
if file_path != arc_name:
zipf.write(file_path, os.path.relpath(file_path, td))
with open(arc_name, "rb") as f:
model_data = f.read()
with open(path, "wb") as f:
cloudpickle.dump((model_data, self._act_params), f)CNN from Nature paper.
def nature_cnn(unscaled_images, **conv_kwargs):
"""
CNN from Nature paper.
"""
scaled_images = tf.cast(unscaled_images, tf.float32) / 255.
activ = tf.nn.relu
h = activ(conv(scaled_images, 'c1', nf=32, rf=8, stride=4, init_scale=np.sqrt(2),
**conv_kwargs))
h2 = activ(conv(h, 'c2', nf=64, rf=4, stride=2, init_scale=np.sqrt(2), **conv_kwargs))
h3 = activ(conv(h2, 'c3', nf=64, rf=3, stride=1, init_scale=np.sqrt(2), **conv_kwargs))
h3 = conv_to_fc(h3)
return activ(fc(h3, 'fc1', nh=512, init_scale=np.sqrt(2))) - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 4gradient_accumulation_steps
: 4learning_rate
: 2e-05num_train_epochs
: 10warmup_steps
: 1000fp16
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 4per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 4eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 10max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 1000log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Click to expand
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.0078 | 200 | 0.634 | - |
0.0155 | 400 | 0.0046 | - |
0.0233 | 600 | 0.0009 | - |
0.0311 | 800 | 0.0004 | - |
0.0388 | 1000 | 0.0001 | - |
0.0466 | 1200 | 0.0002 | - |
0.0543 | 1400 | 0.0001 | - |
0.0621 | 1600 | 0.0001 | - |
0.0699 | 1800 | 0.0001 | - |
0.0776 | 2000 | 0.0 | - |
0.0854 | 2200 | 0.0 | - |
0.0932 | 2400 | 0.0 | - |
0.1009 | 2600 | 0.0 | - |
0.1087 | 2800 | 0.0005 | - |
0.1165 | 3000 | 0.0005 | - |
0.1242 | 3200 | 0.0002 | - |
0.1320 | 3400 | 0.0 | - |
0.1397 | 3600 | 0.0 | - |
0.1475 | 3800 | 0.0 | - |
0.1553 | 4000 | 0.0001 | - |
0.1630 | 4200 | 0.0 | - |
0.1708 | 4400 | 0.0001 | - |
0.1786 | 4600 | 0.0001 | - |
0.1863 | 4800 | 0.0 | - |
0.1941 | 5000 | 0.0 | - |
0.2019 | 5200 | 0.0 | - |
0.2096 | 5400 | 0.0 | - |
0.2174 | 5600 | 0.0 | - |
0.2251 | 5800 | 0.0 | - |
0.2329 | 6000 | 0.0004 | - |
0.2407 | 6200 | 0.0 | - |
0.2484 | 6400 | 0.0001 | - |
0.2562 | 6600 | 0.0 | - |
0.2640 | 6800 | 0.0 | - |
0.2717 | 7000 | 0.0 | - |
0.2795 | 7200 | 0.0 | - |
0.2873 | 7400 | 0.0 | - |
0.2950 | 7600 | 0.0 | - |
0.3028 | 7800 | 0.0 | - |
0.3105 | 8000 | 0.0 | - |
0.3183 | 8200 | 0.0 | - |
0.3261 | 8400 | 0.0004 | - |
0.3338 | 8600 | 0.0 | - |
0.3416 | 8800 | 0.0 | - |
0.3494 | 9000 | 0.0 | - |
0.3571 | 9200 | 0.0 | - |
0.3649 | 9400 | 0.0 | - |
0.3727 | 9600 | 0.0 | - |
0.3804 | 9800 | 0.0 | - |
0.3882 | 10000 | 0.0 | - |
0.3959 | 10200 | 0.0 | - |
0.4037 | 10400 | 0.0 | - |
0.4115 | 10600 | 0.0 | - |
0.4192 | 10800 | 0.0 | - |
0.4270 | 11000 | 0.0 | - |
0.4348 | 11200 | 0.0 | - |
0.4425 | 11400 | 0.0 | - |
0.4503 | 11600 | 0.0 | - |
0.4581 | 11800 | 0.0 | - |
0.4658 | 12000 | 0.0 | - |
0.4736 | 12200 | 0.0 | - |
0.4813 | 12400 | 0.0 | - |
0.4891 | 12600 | 0.0005 | - |
0.4969 | 12800 | 0.0 | - |
0.5046 | 13000 | 0.0 | - |
0.5124 | 13200 | 0.0001 | - |
0.5202 | 13400 | 0.0 | - |
0.5279 | 13600 | 0.0 | - |
0.5357 | 13800 | 0.0 | - |
0.5435 | 14000 | 0.0 | - |
0.5512 | 14200 | 0.0 | - |
0.5590 | 14400 | 0.0004 | - |
0.5667 | 14600 | 0.0 | - |
0.5745 | 14800 | 0.0 | - |
0.5823 | 15000 | 0.0 | - |
0.5900 | 15200 | 0.0 | - |
0.5978 | 15400 | 0.0 | - |
0.6056 | 15600 | 0.0 | - |
0.6133 | 15800 | 0.0 | - |
0.6211 | 16000 | 0.0 | - |
0.6289 | 16200 | 0.0 | - |
0.6366 | 16400 | 0.0006 | - |
0.6444 | 16600 | 0.0 | - |
0.6521 | 16800 | 0.0005 | - |
0.6599 | 17000 | 0.0 | - |
0.6677 | 17200 | 0.0 | - |
0.6754 | 17400 | 0.0 | - |
0.6832 | 17600 | 0.0 | - |
0.6910 | 17800 | 0.0 | - |
0.6987 | 18000 | 0.0005 | - |
0.7065 | 18200 | 0.0001 | - |
0.7143 | 18400 | 0.0 | - |
0.7220 | 18600 | 0.0 | - |
0.7298 | 18800 | 0.0 | - |
0.7375 | 19000 | 0.0 | - |
0.7453 | 19200 | 0.0 | - |
0.7531 | 19400 | 0.0 | - |
0.7608 | 19600 | 0.0 | - |
0.7686 | 19800 | 0.0001 | - |
0.7764 | 20000 | 0.0 | - |
0.7841 | 20200 | 0.0 | - |
0.7919 | 20400 | 0.0 | - |
0.7997 | 20600 | 0.0004 | - |
0.8074 | 20800 | 0.0 | - |
0.8152 | 21000 | 0.0 | - |
0.8229 | 21200 | 0.0 | - |
0.8307 | 21400 | 0.0009 | - |
0.8385 | 21600 | 0.0 | - |
0.8462 | 21800 | 0.0 | - |
0.8540 | 22000 | 0.0 | - |
0.8618 | 22200 | 0.0 | - |
0.8695 | 22400 | 0.0002 | - |
0.8773 | 22600 | 0.0 | - |
0.8851 | 22800 | 0.0 | - |
0.8928 | 23000 | 0.0001 | - |
0.9006 | 23200 | 0.0 | - |
0.9083 | 23400 | 0.0 | - |
0.9161 | 23600 | 0.0 | - |
0.9239 | 23800 | 0.0 | - |
0.9316 | 24000 | 0.0 | - |
0.9394 | 24200 | 0.0 | - |
0.9472 | 24400 | 0.0 | - |
0.9549 | 24600 | 0.0 | - |
0.9627 | 24800 | 0.0 | - |
0.9704 | 25000 | 0.0 | - |
0.9782 | 25200 | 0.0 | - |
0.9860 | 25400 | 0.0 | - |
0.9937 | 25600 | 0.0 | - |
1.0 | 25762 | - | 0.0001 |
1.0015 | 25800 | 0.0005 | - |
1.0092 | 26000 | 0.0 | - |
1.0170 | 26200 | 0.0 | - |
1.0248 | 26400 | 0.0 | - |
1.0325 | 26600 | 0.0 | - |
1.0403 | 26800 | 0.0 | - |
1.0481 | 27000 | 0.0 | - |
1.0558 | 27200 | 0.0 | - |
1.0636 | 27400 | 0.0 | - |
1.0713 | 27600 | 0.0 | - |
1.0791 | 27800 | 0.0 | - |
1.0869 | 28000 | 0.0 | - |
1.0946 | 28200 | 0.0 | - |
1.1024 | 28400 | 0.0 | - |
1.1102 | 28600 | 0.0 | - |
1.1179 | 28800 | 0.0 | - |
1.1257 | 29000 | 0.0 | - |
1.1335 | 29200 | 0.0 | - |
1.1412 | 29400 | 0.0 | - |
1.1490 | 29600 | 0.0 | - |
1.1567 | 29800 | 0.0 | - |
1.1645 | 30000 | 0.0 | - |
1.1723 | 30200 | 0.0 | - |
1.1800 | 30400 | 0.0 | - |
1.1878 | 30600 | 0.0 | - |
1.1956 | 30800 | 0.0 | - |
1.2033 | 31000 | 0.0 | - |
1.2111 | 31200 | 0.0 | - |
1.2189 | 31400 | 0.0 | - |
1.2266 | 31600 | 0.0004 | - |
1.2344 | 31800 | 0.0004 | - |
1.2421 | 32000 | 0.0 | - |
1.2499 | 32200 | 0.0 | - |
1.2577 | 32400 | 0.0 | - |
1.2654 | 32600 | 0.0 | - |
1.2732 | 32800 | 0.0 | - |
1.2810 | 33000 | 0.0 | - |
1.2887 | 33200 | 0.0 | - |
1.2965 | 33400 | 0.0 | - |
1.3043 | 33600 | 0.0 | - |
1.3120 | 33800 | 0.0 | - |
1.3198 | 34000 | 0.0 | - |
1.3275 | 34200 | 0.0 | - |
1.3353 | 34400 | 0.0 | - |
1.3431 | 34600 | 0.0 | - |
1.3508 | 34800 | 0.0004 | - |
1.3586 | 35000 | 0.0005 | - |
1.3664 | 35200 | 0.0004 | - |
1.3741 | 35400 | 0.0011 | - |
1.3819 | 35600 | 0.0 | - |
1.3897 | 35800 | 0.0 | - |
1.3974 | 36000 | 0.0 | - |
1.4052 | 36200 | 0.0 | - |
1.4129 | 36400 | 0.0 | - |
1.4207 | 36600 | 0.0 | - |
1.4285 | 36800 | 0.0 | - |
1.4362 | 37000 | 0.0 | - |
1.4440 | 37200 | 0.0001 | - |
1.4518 | 37400 | 0.0 | - |
1.4595 | 37600 | 0.0 | - |
1.4673 | 37800 | 0.0 | - |
1.4751 | 38000 | 0.0 | - |
1.4828 | 38200 | 0.0004 | - |
1.4906 | 38400 | 0.0003 | - |
1.4983 | 38600 | 0.0 | - |
1.5061 | 38800 | 0.0 | - |
1.5139 | 39000 | 0.0 | - |
1.5216 | 39200 | 0.0 | - |
1.5294 | 39400 | 0.0004 | - |
1.5372 | 39600 | 0.0004 | - |
1.5449 | 39800 | 0.0 | - |
1.5527 | 40000 | 0.0 | - |
1.5605 | 40200 | 0.0 | - |
1.5682 | 40400 | 0.0 | - |
1.5760 | 40600 | 0.0009 | - |
1.5837 | 40800 | 0.0 | - |
1.5915 | 41000 | 0.0009 | - |
1.5993 | 41200 | 0.0 | - |
1.6070 | 41400 | 0.0 | - |
1.6148 | 41600 | 0.0 | - |
1.6226 | 41800 | 0.0 | - |
1.6303 | 42000 | 0.0 | - |
1.6381 | 42200 | 0.0 | - |
1.6459 | 42400 | 0.0 | - |
1.6536 | 42600 | 0.0 | - |
1.6614 | 42800 | 0.0 | - |
1.6691 | 43000 | 0.0 | - |
1.6769 | 43200 | 0.0 | - |
1.6847 | 43400 | 0.0 | - |
1.6924 | 43600 | 0.0 | - |
1.7002 | 43800 | 0.0 | - |
1.7080 | 44000 | 0.0 | - |
1.7157 | 44200 | 0.0 | - |
1.7235 | 44400 | 0.0 | - |
1.7313 | 44600 | 0.0 | - |
1.7390 | 44800 | 0.0 | - |
1.7468 | 45000 | 0.0 | - |
1.7545 | 45200 | 0.0 | - |
1.7623 | 45400 | 0.0 | - |
1.7701 | 45600 | 0.0 | - |
1.7778 | 45800 | 0.0 | - |
1.7856 | 46000 | 0.0 | - |
1.7934 | 46200 | 0.0 | - |
1.8011 | 46400 | 0.0 | - |
1.8089 | 46600 | 0.0 | - |
1.8167 | 46800 | 0.0 | - |
1.8244 | 47000 | 0.0 | - |
1.8322 | 47200 | 0.0 | - |
1.8399 | 47400 | 0.0 | - |
1.8477 | 47600 | 0.0 | - |
1.8555 | 47800 | 0.0004 | - |
1.8632 | 48000 | 0.0 | - |
1.8710 | 48200 | 0.0 | - |
1.8788 | 48400 | 0.0 | - |
1.8865 | 48600 | 0.0 | - |
1.8943 | 48800 | 0.0 | - |
1.9021 | 49000 | 0.0004 | - |
1.9098 | 49200 | 0.0 | - |
1.9176 | 49400 | 0.0 | - |
1.9253 | 49600 | 0.0004 | - |
1.9331 | 49800 | 0.0 | - |
1.9409 | 50000 | 0.0 | - |
1.9486 | 50200 | 0.0 | - |
1.9564 | 50400 | 0.0 | - |
1.9642 | 50600 | 0.0004 | - |
1.9719 | 50800 | 0.0 | - |
1.9797 | 51000 | 0.0 | - |
1.9875 | 51200 | 0.0 | - |
1.9952 | 51400 | 0.0004 | - |
2.0 | 51524 | - | 0.0001 |
2.0030 | 51600 | 0.0 | - |
2.0107 | 51800 | 0.0 | - |
2.0185 | 52000 | 0.0 | - |
2.0262 | 52200 | 0.0 | - |
2.0340 | 52400 | 0.0004 | - |
2.0418 | 52600 | 0.0004 | - |
2.0495 | 52800 | 0.0 | - |
2.0573 | 53000 | 0.0008 | - |
2.0651 | 53200 | 0.0 | - |
2.0728 | 53400 | 0.0 | - |
2.0806 | 53600 | 0.0 | - |
2.0883 | 53800 | 0.0 | - |
2.0961 | 54000 | 0.0 | - |
2.1039 | 54200 | 0.0 | - |
2.1116 | 54400 | 0.0 | - |
2.1194 | 54600 | 0.0 | - |
2.1272 | 54800 | 0.0 | - |
2.1349 | 55000 | 0.0 | - |
2.1427 | 55200 | 0.0 | - |
2.1505 | 55400 | 0.0 | - |
2.1582 | 55600 | 0.0 | - |
2.1660 | 55800 | 0.0 | - |
2.1737 | 56000 | 0.0 | - |
2.1815 | 56200 | 0.0 | - |
2.1893 | 56400 | 0.0 | - |
2.1970 | 56600 | 0.0 | - |
2.2048 | 56800 | 0.0 | - |
2.2126 | 57000 | 0.0 | - |
2.2203 | 57200 | 0.0 | - |
2.2281 | 57400 | 0.0 | - |
2.2359 | 57600 | 0.0 | - |
2.2436 | 57800 | 0.0 | - |
2.2514 | 58000 | 0.0004 | - |
2.2591 | 58200 | 0.0 | - |
2.2669 | 58400 | 0.0004 | - |
2.2747 | 58600 | 0.0 | - |
2.2824 | 58800 | 0.0 | - |
2.2902 | 59000 | 0.0 | - |
2.2980 | 59200 | 0.0 | - |
2.3057 | 59400 | 0.0 | - |
2.3135 | 59600 | 0.0 | - |
2.3213 | 59800 | 0.0004 | - |
2.3290 | 60000 | 0.0 | - |
2.3368 | 60200 | 0.0004 | - |
2.3445 | 60400 | 0.0 | - |
2.3523 | 60600 | 0.0 | - |
2.3601 | 60800 | 0.0 | - |
2.3678 | 61000 | 0.0 | - |
2.3756 | 61200 | 0.0 | - |
2.3834 | 61400 | 0.0 | - |
2.3911 | 61600 | 0.0 | - |
2.3989 | 61800 | 0.0 | - |
2.4067 | 62000 | 0.0005 | - |
2.4144 | 62200 | 0.0 | - |
2.4222 | 62400 | 0.0 | - |
2.4299 | 62600 | 0.0 | - |
2.4377 | 62800 | 0.0 | - |
2.4455 | 63000 | 0.0 | - |
2.4532 | 63200 | 0.0 | - |
2.4610 | 63400 | 0.0 | - |
2.4688 | 63600 | 0.0 | - |
2.4765 | 63800 | 0.0 | - |
2.4843 | 64000 | 0.0 | - |
2.4921 | 64200 | 0.0 | - |
2.4998 | 64400 | 0.0 | - |
2.5076 | 64600 | 0.0 | - |
2.5153 | 64800 | 0.0 | - |
2.5231 | 65000 | 0.0 | - |
2.5309 | 65200 | 0.0 | - |
2.5386 | 65400 | 0.0 | - |
2.5464 | 65600 | 0.0004 | - |
2.5542 | 65800 | 0.0 | - |
2.5619 | 66000 | 0.0 | - |
2.5697 | 66200 | 0.0 | - |
2.5775 | 66400 | 0.0 | - |
2.5852 | 66600 | 0.0 | - |
2.5930 | 66800 | 0.0 | - |
2.6007 | 67000 | 0.0 | - |
2.6085 | 67200 | 0.0 | - |
2.6163 | 67400 | 0.0 | - |
2.6240 | 67600 | 0.0 | - |
2.6318 | 67800 | 0.0 | - |
2.6396 | 68000 | 0.0 | - |
2.6473 | 68200 | 0.0 | - |
2.6551 | 68400 | 0.0 | - |
2.6629 | 68600 | 0.0 | - |
2.6706 | 68800 | 0.0004 | - |
2.6784 | 69000 | 0.0 | - |
2.6861 | 69200 | 0.0 | - |
2.6939 | 69400 | 0.0 | - |
2.7017 | 69600 | 0.0004 | - |
2.7094 | 69800 | 0.0004 | - |
2.7172 | 70000 | 0.0 | - |
2.7250 | 70200 | 0.0 | - |
2.7327 | 70400 | 0.0 | - |
2.7405 | 70600 | 0.0 | - |
2.7483 | 70800 | 0.0 | - |
2.7560 | 71000 | 0.0004 | - |
2.7638 | 71200 | 0.0 | - |
2.7715 | 71400 | 0.0 | - |
2.7793 | 71600 | 0.0 | - |
2.7871 | 71800 | 0.0 | - |
2.7948 | 72000 | 0.0 | - |
2.8026 | 72200 | 0.0 | - |
2.8104 | 72400 | 0.0 | - |
2.8181 | 72600 | 0.0 | - |
2.8259 | 72800 | 0.0 | - |
2.8337 | 73000 | 0.0004 | - |
2.8414 | 73200 | 0.0 | - |
2.8492 | 73400 | 0.0 | - |
2.8569 | 73600 | 0.0 | - |
2.8647 | 73800 | 0.0004 | - |
2.8725 | 74000 | 0.0 | - |
2.8802 | 74200 | 0.0 | - |
2.8880 | 74400 | 0.0 | - |
2.8958 | 74600 | 0.0 | - |
2.9035 | 74800 | 0.0 | - |
2.9113 | 75000 | 0.0 | - |
2.9191 | 75200 | 0.0 | - |
2.9268 | 75400 | 0.0004 | - |
2.9346 | 75600 | 0.0 | - |
2.9423 | 75800 | 0.0 | - |
2.9501 | 76000 | 0.0 | - |
2.9579 | 76200 | 0.0 | - |
2.9656 | 76400 | 0.0 | - |
2.9734 | 76600 | 0.0004 | - |
2.9812 | 76800 | 0.0 | - |
2.9889 | 77000 | 0.0 | - |
2.9967 | 77200 | 0.0 | - |
3.0 | 77286 | - | 0.0000 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.3.1
- Transformers: 4.48.0
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
ModernBERT
@misc{warner2024smarterbetterfasterlonger,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for juanwisz/modernbert-python-code-retrieval
Base model
answerdotai/ModernBERT-base