LaferriereJC
/

jamba_550M_trained

Model card Files Files and versions Community

jamba_550M_trained / README.md

LaferriereJC's picture

Update README.md

816ecd6 verified 2 months ago

|

history blame contribute delete

2.55 kB

	---
	license: mit
	datasets:
	- Abirate/english_quotes
	- JeanKaddour/minipile
	- EleutherAI/wikitext_document_level
	- marksverdhei/wordnet-definitions-en-2021
	language:
	- en
	metrics:
	- cer
	base_model:
	- ai21labs/AI21-Jamba-1.5-Mini
	---
	Trained on 554m tokens, 1 epoch, lr .00987
	brown corpus
	quotes (wikiquote, azquote, gracious quotes, english quotes)
	idioms (scraped)
	defitions (wordnet)
	wiki_text
	mini pile

	Trained on runpod for 5 days using 3090

	code: https://gist.github.com/thistleknot/368ab298edf596ef50d2cfdcbec66fd1

	```
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Specify the path to the directory where the model is stored
	#model_dir = r"C:\Users\User\Documents\wiki\wiki\data science\nlp\research\mamba_brown_trained_556m\mamba_brown_trained\mamba_brown_trained"
	model_dir = "/home/user/mamba_brown_trained"

	# Load the tokenizer from the local directory
	# Load the tokenizer and model (use a causal language model for text generation)
	tokenizer = AutoTokenizer.from_pretrained(model_dir)
	model = AutoModelForCausalLM.from_pretrained(model_dir)
	model.to('cuda')

	# Now, you can use the model and tokenizer for inference
	input_text = "Once upon a time"

	# Tokenize the input
	inputs = tokenizer(input_text, return_tensors="pt").to('cuda')

	# Generate output tokens using the model
	output_ids = model.generate(**inputs, max_length=50)

	# Decode the generated token IDs back into text
	decoded_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)

	# Print the generated output text
	print(decoded_output)
	```

	Once upon a time, the world is changing.


	```
	# Now, you can use the model and tokenizer for inference
	input_text = "The Fulton County Grand Fair was set for Friday at"
	inputs = tokenizer(input_text, return_tensors="pt").to('cuda')

	# Generate output tokens using the model with repetition controls
	output_ids = model.generate(
	**inputs,
	max_length=256, # Max tokens to generate
	repetition_penalty=1.2, # Penalize repeated words
	no_repeat_ngram_size=3, # Prevent 3-gram repetitions
	temperature=0.9, # Adjust randomness (lower means more deterministic)
	top_k=50, # Only sample from top 50 tokens
	top_p=0.9 # Use nucleus sampling to control diversity
	)

	# Decode the generated token IDs back into text
	decoded_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)

	# Print the generated output text
	print(decoded_output)
	```
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62578ad28c6638f8a93e8856/dpDosrj8gUt2puqx5TLt_.png)