Upload 11 files

Browse files

Files changed (9) hide show

Extended-Keyword-dataset 2.csv +61 -0
Extended-Keyword-dataset.csv +61 -0
generation_config.json +6 -0
handler.py +33 -0
model.safetensors.index.json +202 -0
modeling_falcon.py +1242 -0
special_tokens_map.json +16 -0
tokenizer.json +0 -0
tokenizer_config.json +12 -0

Extended-Keyword-dataset 2.csv ADDED Viewed

	@@ -0,0 +1,61 @@

+Inputs;Keywords
+5-15 keyword for the following paragraph to summarise the whole academic scope: The integration of blockchain technology in supply chain management offers transparency and traceability. Companies can track products in real-time, reducing fraud and ensuring authenticity. This innovation also streamlines operations by automating transactions and improving data accuracy.;blockchain, supply chain management, transparency, traceability, real-time tracking, fraud reduction, authenticity, automated transactions, data accuracy
+5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence and machine learning are transforming the healthcare industry. AI-powered diagnostic tools are enhancing accuracy and speed in disease detection, while machine learning models predict patient outcomes and personalize treatment plans. This shift towards digital health is also driving innovations in telemedicine and remote monitoring.;AI, machine learning, healthcare, diagnostic tools, disease detection, patient outcomes, personalized treatment, digital health, telemedicine, remote monitoring
+5-15 keyword for the following paragraph to summarise the whole academic scope: Renewable energy sources such as solar, wind, and geothermal are critical in combating climate change. These technologies reduce reliance on fossil fuels, decrease greenhouse gas emissions, and promote sustainable development. Governments and organizations worldwide are investing heavily in renewable energy infrastructure to ensure a greener future.;renewable energy, solar power, wind power, geothermal energy, climate change, fossil fuels, greenhouse gas emissions, sustainable development, energy infrastructure
+5-15 keyword for the following paragraph to summarise the whole academic scope:  the realm of education, online learning platforms have revolutionized access to knowledge. E-learning tools offer flexibility and personalized learning experiences, catering to diverse needs of students. This digital transformation in education is bridging gaps, providing opportunities for lifelong learning, and fostering global collaboration.;online learning, education, e-learning tools, flexibility, personalized learning, digital transformation, lifelong learning, global collaboration
+5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of Mars has gained momentum with recent advancements in space technology. Robotic missions have successfully landed on the Martian surface, collecting valuable data about its geology and potential for life. These endeavors are laying the groundwork for future human exploration and potential colonization of Mars.;Mars exploration, space technology, robotic missions, Martian geology, potential for life, human exploration, Mars colonization
+5-15 keyword for the following paragraph to summarise the whole academic scope: Cybersecurity remains a top priority as digital threats evolve. Organizations are investing in advanced security measures to protect sensitive data from cyber-attacks. Innovations in cryptography and AI-based threat detection are crucial in safeguarding information and maintaining privacy in an increasingly connected world.;cybersecurity, digital threats, advanced security, sensitive data, cyber-attacks, cryptography, AI-based threat detection, information safeguarding, privacy
+5-15 keyword for the following paragraph to summarise the whole academic scope: The field of biotechnology is witnessing rapid growth with breakthroughs in genetic research. Techniques like CRISPR and gene therapy are opening new avenues for treating genetic disorders. These advancements are not only revolutionizing medicine but also have significant implications for agriculture and environmental conservation.;biotechnology, genetic research, CRISPR, gene therapy, genetic disorders, medical advancements, agriculture, environmental conservation
+5-15 keyword for the following paragraph to summarise the whole academic scope: Urbanization and smart city initiatives are shaping the future of urban living. Technologies such as IoT, AI, and big data are enhancing urban infrastructure, improving resource management, and providing better services to citizens. Smart cities aim to create sustainable and efficient urban environments for the growing population.;urbanization, smart cities, IoT, AI, big data, urban infrastructure, resource management, citizen services, sustainable urban environments
+5-15 keyword for the following paragraph to summarise the whole academic scope: The impact of social media on communication and society is profound. Platforms like Facebook, Twitter, and Instagram have changed the way people interact, share information, and form communities. While social media fosters connectivity, it also raises concerns about privacy, misinformation, and mental health.;social media, communication, society, Facebook, Twitter, Instagram, interaction, information sharing, online communities, privacy, misinformation, mental health
+5-15 keyword for the following paragraph to summarise the whole academic scope: Advances in quantum physics are paving the way for quantum computing. Quantum computers leverage the principles of quantum mechanics to perform complex calculations at unprecedented speeds. This technology has the potential to revolutionize fields such as cryptography, materials science, and artificial intelligence.;quantum physics, quantum computing, quantum mechanics, complex calculations, cryptography, materials science, artificial intelligence
+5-15 keyword for the following paragraph to summarise the whole academic scope: The world around us is evolving with the new age of artificial intelligence, and this has changed the way we think and work. Jobs are getting created and destroyed at a faster rate.;social, internet age, AI, AI impact, job market
+5-15 keyword for the following paragraph to summarise the whole academic scope: Quantum computing promises to revolutionize numerous fields by providing unprecedented computational power, potentially solving problems deemed unsolvable by classical computers.;quantum computing, computational power, revolutionize, classical computers, unsolvable problems
+5-15 keyword for the following paragraph to summarise the whole academic scope: Climate change is an urgent global issue, requiring immediate and sustained efforts to reduce greenhouse gas emissions and mitigate the impact on ecosystems and human societies.;climate change, global issue, greenhouse gases, mitigation, ecosystems, human societies
+5-15 keyword for the following paragraph to summarise the whole academic scope: The discovery of CRISPR-Cas9 has opened new frontiers in genetic engineering, allowing precise editing of DNA and presenting opportunities for advancements in medicine and agriculture.;CRISPR-Cas9, genetic engineering, DNA editing, medical advancements, agricultural advancements
+5-15 keyword for the following paragraph to summarise the whole academic scope: Machine learning algorithms are increasingly being used in finance to predict market trends, optimize investment strategies, and enhance risk management practices.;machine learning, finance, market prediction, investment strategies, risk management
+5-15 keyword for the following paragraph to summarise the whole academic scope: The study of dark matter and dark energy is crucial in understanding the composition and expansion of the universe, as they constitute the majority of its mass-energy content.;dark matter, dark energy, universe composition, expansion, mass-energy content
+5-15 keyword for the following paragraph to summarise the whole academic scope: The field of renewable energy technologies, including solar, wind, and hydropower, is essential for sustainable development and reducing dependence on fossil fuels.;renewable energy, solar power, wind power, hydropower, sustainable development, fossil fuels
+5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in neuroscience are shedding light on brain function, offering insights into neurological disorders and potential therapeutic interventions.;neuroscience, brain function, neurological disorders, therapeutic interventions
+5-15 keyword for the following paragraph to summarise the whole academic scope: The advent of 5G technology promises to enhance connectivity and enable a range of new applications, from smart cities to autonomous vehicles, revolutionizing daily life.;5G technology, connectivity, smart cities, autonomous vehicles, daily life revolution
+5-15 keyword for the following paragraph to summarise the whole academic scope: Big data analytics is transforming healthcare by providing insights from large datasets, improving patient outcomes, and enabling personalized medicine.;big data analytics, healthcare, large datasets, patient outcomes, personalized medicine
+5-15 keyword for the following paragraph to summarise the whole academic scope: The advancement of electric vehicles (EVs) is a significant step towards sustainable transportation. EVs reduce greenhouse gas emissions and reliance on fossil fuels. Innovations in battery technology and charging infrastructure are crucial to the widespread adoption of EVs.;electric vehicles, sustainable transportation, greenhouse gas emissions, fossil fuels, battery technology, charging infrastructure
+5-15 keyword for the following paragraph to summarise the whole academic scope: The study of human microbiomes is revolutionizing our understanding of health and disease. Microbiome research uncovers the complex interactions between microbes and their human hosts, leading to new insights into conditions such as obesity, diabetes, and mental health disorders.;human microbiomes, health, disease, microbiome research, microbes, human hosts, obesity, diabetes, mental health disorders
+5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence in agriculture is enhancing crop management and productivity. AI-driven technologies like precision farming and predictive analytics help farmers optimize resource use, increase yields, and improve sustainability.;AI, agriculture, crop management, productivity, precision farming, predictive analytics, resource optimization, yields, sustainability
+5-15 keyword for the following paragraph to summarise the whole academic scope: The field of nanotechnology is opening new frontiers in medicine. Nanoscale materials and devices are being developed for targeted drug delivery, improved imaging, and early disease detection. These innovations hold promise for more effective and less invasive medical treatments.;nanotechnology, medicine, nanoscale materials, targeted drug delivery, imaging, disease detection, medical treatments
+5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in renewable energy storage systems, such as batteries and supercapacitors, are addressing the intermittency of solar and wind power. Improved energy storage solutions are key to integrating renewable energy into the grid and ensuring a stable power supply.;renewable energy storage, batteries, supercapacitors, solar power, wind power, energy storage solutions, power grid, stable power supply
+5-15 keyword for the following paragraph to summarise the whole academic scope: The application of machine learning in natural language processing (NLP) is transforming how computers understand and generate human language. NLP technologies are used in various applications, including translation, sentiment analysis, and chatbots, improving human-computer interaction.;machine learning, natural language processing, NLP, human language, translation, sentiment analysis, chatbots, human-computer interaction
+5-15 keyword for the following paragraph to summarise the whole academic scope: CRISPR technology continues to make headlines with its potential to edit genes with precision. This groundbreaking technology is being explored for treating genetic disorders, enhancing crop resilience, and even combating climate change by altering the genes of carbon-sequestering plants.;CRISPR, gene editing, genetic disorders, crop resilience, climate change, carbon-sequestering plants
+5-15 keyword for the following paragraph to summarise the whole academic scope: The proliferation of Internet of Things (IoT) devices is reshaping various industries. IoT technology enables the collection and analysis of vast amounts of data, leading to smarter homes, efficient industrial processes, and improved healthcare outcomes.;Internet of Things, IoT, data collection, data analysis, smart homes, industrial processes, healthcare outcomes
+5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of the deep sea is revealing new species and ecosystems. Advanced submersibles and remote-operated vehicles (ROVs) allow scientists to study previously inaccessible ocean depths, contributing to our knowledge of marine biology and environmental conservation.;deep sea exploration, new species, ecosystems, submersibles, remote-operated vehicles, ROVs, marine biology, environmental conservation
+5-15 keyword for the following paragraph to summarise the whole academic scope: Blockchain technology is being adopted in the financial sector to enhance security and transparency. Cryptocurrencies, smart contracts, and decentralized finance (DeFi) platforms are revolutionizing the way financial transactions are conducted and recorded.;blockchain, financial sector, security, transparency, cryptocurrencies, smart contracts, decentralized finance, DeFi, financial transactions
+5-15 keyword for the following paragraph to summarise the whole academic scope: Cultural anthropology studies human societies and cultures, examining customs, social structures, and cultural evolution.;cultural anthropology, human societies, cultures, customs, social structures, cultural evolution
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Shakespeare's plays are renowned for their exploration of themes such as ambition, betrayal, and the human condition, making them enduring classics in literature.;Shakespeare, plays, ambition, betrayal, human condition, literature classics
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Ecological restoration involves rehabilitating ecosystems that have been degraded, damaged, or destroyed, aiming to return them to a healthy and functional state.;ecological restoration, ecosystem rehabilitation, degradation, damage, functional state
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Behavioral psychology focuses on the study of observable behavior and its modification through conditioning, reinforcement, and learning processes.;behavioral psychology, observable behavior, conditioning, reinforcement, learning processes
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Historical archaeology examines the material remains of past societies to understand historical developments and cultural changes.;historical archaeology, material remains, past societies, historical developments, cultural changes
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Renewable agriculture promotes sustainable farming practices that reduce environmental impact and improve soil health through crop rotation, organic farming, and conservation tillage.;renewable agriculture, sustainable farming, environmental impact, soil health, crop rotation, organic farming, conservation tillage
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Public health policy addresses the planning and implementation of measures to improve community health, including vaccination programs and health education.;public health policy, community health, vaccination programs, health education, planning, implementation
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Medieval history explores the social, political, and cultural aspects of the Middle Ages, including feudalism, the Crusades, and the Black Death.;medieval history, Middle Ages, feudalism, Crusades, Black Death, social aspects, political aspects, cultural aspects
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Quantum mechanics is a fundamental theory in physics that describes the behavior of particles at the atomic and subatomic levels.;quantum mechanics, fundamental theory, physics, particle behavior, atomic level, subatomic level
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Sociolinguistics examines how language use varies and changes in social contexts, including factors like region, class, and gender.;sociolinguistics, language use, social contexts, region, class, gender, language variation
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Hydraulic engineering involves the design and management of water resources, including flood control, irrigation systems, and hydropower generation.;hydraulic engineering, water resources, flood control, irrigation systems, hydropower generation
+5-15 keyword for the following paragraph to summarise the whole academic scope:  The Renaissance period was marked by a revival of art, literature, and learning in Europe, significantly influenced by figures like Leonardo da Vinci and Michelangelo.;Renaissance period, art revival, literature, learning, Europe, Leonardo da Vinci, Michelangelo
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Environmental ethics considers the moral relationship between humans and the natural world, focusing on issues like conservation, biodiversity, and sustainability.;environmental ethics, moral relationship, natural world, conservation, biodiversity, sustainability
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Game theory studies strategic interactions where the outcome for each participant depends on the actions of others, with applications in economics, politics, and biology.;game theory, strategic interactions, outcome dependence, economics, politics, biology
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Paleontology investigates the history of life on Earth through the study of fossils, providing insights into ancient ecosystems and evolutionary processes.;paleontology, history of life, fossils, ancient ecosystems, evolutionary processes
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Neuropsychology explores the relationship between brain function and behavior, aiding in the diagnosis and treatment of neurological and psychological disorders.;neuropsychology, brain function, behavior, diagnosis, treatment, neurological disorders, psychological disorders
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Classical music, spanning from the Baroque to the Romantic period, includes works by composers like Bach, Mozart, and Beethoven, known for their complex compositions and enduring influence.;classical music, Baroque period, Romantic period, Bach, Mozart, Beethoven, complex compositions, enduring influence
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Urban sociology studies the social structures, interactions, and experiences within cities, focusing on issues like urbanization, segregation, and community dynamics.;urban sociology, social structures, interactions, cities, urbanization, segregation, community dynamics
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Renewable natural resources management involves sustainable practices in utilizing forests, fisheries, and wildlife to ensure their availability for future generations.;renewable natural resources, sustainable practices, forests, fisheries, wildlife, availability, future generations
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Bioinformatics applies computational methods to analyze and interpret biological data, advancing research in areas like genomics and proteomics.;bioinformatics, computational methods, biological data analysis, genomics, proteomics
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Social media studies examine the impact of platforms like Facebook, Twitter, and Instagram on communication, social behavior, and public opinion.;social media studies, platform impact, communication, social behavior, public opinion, Facebook, Twitter, Instagram
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Modernist literature, emerging in the late 19th and early 20th centuries, is characterized by a break with traditional forms and a focus on stream-of-consciousness narrative and fragmented structure.;modernist literature, traditional forms, stream-of-consciousness, narrative, fragmented structure, 19th century, 20th century
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Human rights law focuses on protecting individual freedoms and ensuring justice, addressing issues like discrimination, freedom of speech, and fair trial.;human rights law, individual freedoms, justice, discrimination, freedom of speech, fair trial
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Biomedical engineering integrates principles from biology and engineering to develop technologies for healthcare, including medical devices and diagnostic tools.;biomedical engineering, biology, engineering, healthcare technologies, medical devices, diagnostic tools
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Political philosophy explores concepts like justice, power, and the role of government, contributing to the development of political systems and ideologies.;political philosophy, justice, power, government role, political systems, ideologies
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Renewable energy policy involves the formulation of regulations and incentives to promote the adoption of renewable energy sources and reduce reliance on fossil fuels.;renewable energy policy, regulations, incentives, renewable energy adoption, fossil fuels reduction
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Environmental chemistry studies the chemical processes that occur in natural environments, including the effects of pollutants and the behavior of chemical compounds in ecosystems.;environmental chemistry, chemical processes, natural environments, pollutants, chemical compounds, ecosystems
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence ethics addresses the moral implications of AI development and deployment, including issues like bias, accountability, and societal impact.;AI ethics, moral implications, AI development, deployment, bias, accountability, societal impact
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Architectural engineering focuses on the design and construction of buildings, integrating principles of structural engineering, environmental systems, and aesthetic considerations.;architectural engineering, building design, construction, structural engineering, environmental systems, aesthetic considerations
+5-15 keyword for the following paragraph to summarise the whole academic scope:  Marine conservation aims to protect and restore ocean ecosystems, addressing threats like overfishing, pollution, and habitat destruction.;marine conservation, ocean ecosystems, protection, restoration, overfishing, pollution, habitat destruction

Extended-Keyword-dataset.csv ADDED Viewed

	@@ -0,0 +1,61 @@

+Inputs,Keywords
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The integration of blockchain technology in supply chain management offers transparency and traceability. Companies can track products in real-time, reducing fraud and ensuring authenticity. This innovation also streamlines operations by automating transactions and improving data accuracy.","blockchain, supply chain management, transparency, traceability, real-time tracking, fraud reduction, authenticity, automated transactions, data accuracy"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence and machine learning are transforming the healthcare industry. AI-powered diagnostic tools are enhancing accuracy and speed in disease detection, while machine learning models predict patient outcomes and personalize treatment plans. This shift towards digital health is also driving innovations in telemedicine and remote monitoring.","AI, machine learning, healthcare, diagnostic tools, disease detection, patient outcomes, personalized treatment, digital health, telemedicine, remote monitoring"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Renewable energy sources such as solar, wind, and geothermal are critical in combating climate change. These technologies reduce reliance on fossil fuels, decrease greenhouse gas emissions, and promote sustainable development. Governments and organizations worldwide are investing heavily in renewable energy infrastructure to ensure a greener future.","renewable energy, solar power, wind power, geothermal energy, climate change, fossil fuels, greenhouse gas emissions, sustainable development, energy infrastructure"
+"5-15 keyword for the following paragraph to summarise the whole academic scope:  the realm of education, online learning platforms have revolutionized access to knowledge. E-learning tools offer flexibility and personalized learning experiences, catering to diverse needs of students. This digital transformation in education is bridging gaps, providing opportunities for lifelong learning, and fostering global collaboration.","online learning, education, e-learning tools, flexibility, personalized learning, digital transformation, lifelong learning, global collaboration"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of Mars has gained momentum with recent advancements in space technology. Robotic missions have successfully landed on the Martian surface, collecting valuable data about its geology and potential for life. These endeavors are laying the groundwork for future human exploration and potential colonization of Mars.","Mars exploration, space technology, robotic missions, Martian geology, potential for life, human exploration, Mars colonization"
+5-15 keyword for the following paragraph to summarise the whole academic scope: Cybersecurity remains a top priority as digital threats evolve. Organizations are investing in advanced security measures to protect sensitive data from cyber-attacks. Innovations in cryptography and AI-based threat detection are crucial in safeguarding information and maintaining privacy in an increasingly connected world.,"cybersecurity, digital threats, advanced security, sensitive data, cyber-attacks, cryptography, AI-based threat detection, information safeguarding, privacy"
+5-15 keyword for the following paragraph to summarise the whole academic scope: The field of biotechnology is witnessing rapid growth with breakthroughs in genetic research. Techniques like CRISPR and gene therapy are opening new avenues for treating genetic disorders. These advancements are not only revolutionizing medicine but also have significant implications for agriculture and environmental conservation.,"biotechnology, genetic research, CRISPR, gene therapy, genetic disorders, medical advancements, agriculture, environmental conservation"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Urbanization and smart city initiatives are shaping the future of urban living. Technologies such as IoT, AI, and big data are enhancing urban infrastructure, improving resource management, and providing better services to citizens. Smart cities aim to create sustainable and efficient urban environments for the growing population.","urbanization, smart cities, IoT, AI, big data, urban infrastructure, resource management, citizen services, sustainable urban environments"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The impact of social media on communication and society is profound. Platforms like Facebook, Twitter, and Instagram have changed the way people interact, share information, and form communities. While social media fosters connectivity, it also raises concerns about privacy, misinformation, and mental health.","social media, communication, society, Facebook, Twitter, Instagram, interaction, information sharing, online communities, privacy, misinformation, mental health"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Advances in quantum physics are paving the way for quantum computing. Quantum computers leverage the principles of quantum mechanics to perform complex calculations at unprecedented speeds. This technology has the potential to revolutionize fields such as cryptography, materials science, and artificial intelligence.","quantum physics, quantum computing, quantum mechanics, complex calculations, cryptography, materials science, artificial intelligence"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The world around us is evolving with the new age of artificial intelligence, and this has changed the way we think and work. Jobs are getting created and destroyed at a faster rate.","social, internet age, AI, AI impact, job market"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Quantum computing promises to revolutionize numerous fields by providing unprecedented computational power, potentially solving problems deemed unsolvable by classical computers.","quantum computing, computational power, revolutionize, classical computers, unsolvable problems"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Climate change is an urgent global issue, requiring immediate and sustained efforts to reduce greenhouse gas emissions and mitigate the impact on ecosystems and human societies.","climate change, global issue, greenhouse gases, mitigation, ecosystems, human societies"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The discovery of CRISPR-Cas9 has opened new frontiers in genetic engineering, allowing precise editing of DNA and presenting opportunities for advancements in medicine and agriculture.","CRISPR-Cas9, genetic engineering, DNA editing, medical advancements, agricultural advancements"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Machine learning algorithms are increasingly being used in finance to predict market trends, optimize investment strategies, and enhance risk management practices.","machine learning, finance, market prediction, investment strategies, risk management"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The study of dark matter and dark energy is crucial in understanding the composition and expansion of the universe, as they constitute the majority of its mass-energy content.","dark matter, dark energy, universe composition, expansion, mass-energy content"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The field of renewable energy technologies, including solar, wind, and hydropower, is essential for sustainable development and reducing dependence on fossil fuels.","renewable energy, solar power, wind power, hydropower, sustainable development, fossil fuels"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in neuroscience are shedding light on brain function, offering insights into neurological disorders and potential therapeutic interventions.","neuroscience, brain function, neurological disorders, therapeutic interventions"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The advent of 5G technology promises to enhance connectivity and enable a range of new applications, from smart cities to autonomous vehicles, revolutionizing daily life.","5G technology, connectivity, smart cities, autonomous vehicles, daily life revolution"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Big data analytics is transforming healthcare by providing insights from large datasets, improving patient outcomes, and enabling personalized medicine.","big data analytics, healthcare, large datasets, patient outcomes, personalized medicine"
+5-15 keyword for the following paragraph to summarise the whole academic scope: The advancement of electric vehicles (EVs) is a significant step towards sustainable transportation. EVs reduce greenhouse gas emissions and reliance on fossil fuels. Innovations in battery technology and charging infrastructure are crucial to the widespread adoption of EVs.,"electric vehicles, sustainable transportation, greenhouse gas emissions, fossil fuels, battery technology, charging infrastructure"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The study of human microbiomes is revolutionizing our understanding of health and disease. Microbiome research uncovers the complex interactions between microbes and their human hosts, leading to new insights into conditions such as obesity, diabetes, and mental health disorders.","human microbiomes, health, disease, microbiome research, microbes, human hosts, obesity, diabetes, mental health disorders"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence in agriculture is enhancing crop management and productivity. AI-driven technologies like precision farming and predictive analytics help farmers optimize resource use, increase yields, and improve sustainability.","AI, agriculture, crop management, productivity, precision farming, predictive analytics, resource optimization, yields, sustainability"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The field of nanotechnology is opening new frontiers in medicine. Nanoscale materials and devices are being developed for targeted drug delivery, improved imaging, and early disease detection. These innovations hold promise for more effective and less invasive medical treatments.","nanotechnology, medicine, nanoscale materials, targeted drug delivery, imaging, disease detection, medical treatments"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in renewable energy storage systems, such as batteries and supercapacitors, are addressing the intermittency of solar and wind power. Improved energy storage solutions are key to integrating renewable energy into the grid and ensuring a stable power supply.","renewable energy storage, batteries, supercapacitors, solar power, wind power, energy storage solutions, power grid, stable power supply"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The application of machine learning in natural language processing (NLP) is transforming how computers understand and generate human language. NLP technologies are used in various applications, including translation, sentiment analysis, and chatbots, improving human-computer interaction.","machine learning, natural language processing, NLP, human language, translation, sentiment analysis, chatbots, human-computer interaction"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: CRISPR technology continues to make headlines with its potential to edit genes with precision. This groundbreaking technology is being explored for treating genetic disorders, enhancing crop resilience, and even combating climate change by altering the genes of carbon-sequestering plants.","CRISPR, gene editing, genetic disorders, crop resilience, climate change, carbon-sequestering plants"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The proliferation of Internet of Things (IoT) devices is reshaping various industries. IoT technology enables the collection and analysis of vast amounts of data, leading to smarter homes, efficient industrial processes, and improved healthcare outcomes.","Internet of Things, IoT, data collection, data analysis, smart homes, industrial processes, healthcare outcomes"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of the deep sea is revealing new species and ecosystems. Advanced submersibles and remote-operated vehicles (ROVs) allow scientists to study previously inaccessible ocean depths, contributing to our knowledge of marine biology and environmental conservation.","deep sea exploration, new species, ecosystems, submersibles, remote-operated vehicles, ROVs, marine biology, environmental conservation"
+"5-15 keyword for the following paragraph to summarise the whole academic scope: Blockchain technology is being adopted in the financial sector to enhance security and transparency. Cryptocurrencies, smart contracts, and decentralized finance (DeFi) platforms are revolutionizing the way financial transactions are conducted and recorded.","blockchain, financial sector, security, transparency, cryptocurrencies, smart contracts, decentralized finance, DeFi, financial transactions"
+"Cultural anthropology studies human societies and cultures, examining customs, social structures, and cultural evolution.","cultural anthropology, human societies, cultures, customs, social structures, cultural evolution"
+"Shakespeare's plays are renowned for their exploration of themes such as ambition, betrayal, and the human condition, making them enduring classics in literature.","Shakespeare, plays, ambition, betrayal, human condition, literature classics"
+"Ecological restoration involves rehabilitating ecosystems that have been degraded, damaged, or destroyed, aiming to return them to a healthy and functional state.","ecological restoration, ecosystem rehabilitation, degradation, damage, functional state"
+"Behavioral psychology focuses on the study of observable behavior and its modification through conditioning, reinforcement, and learning processes.","behavioral psychology, observable behavior, conditioning, reinforcement, learning processes"
+Historical archaeology examines the material remains of past societies to understand historical developments and cultural changes.,"historical archaeology, material remains, past societies, historical developments, cultural changes"
+"Renewable agriculture promotes sustainable farming practices that reduce environmental impact and improve soil health through crop rotation, organic farming, and conservation tillage.","renewable agriculture, sustainable farming, environmental impact, soil health, crop rotation, organic farming, conservation tillage"
+"Public health policy addresses the planning and implementation of measures to improve community health, including vaccination programs and health education.","public health policy, community health, vaccination programs, health education, planning, implementation"
+"Medieval history explores the social, political, and cultural aspects of the Middle Ages, including feudalism, the Crusades, and the Black Death.","medieval history, Middle Ages, feudalism, Crusades, Black Death, social aspects, political aspects, cultural aspects"
+Quantum mechanics is a fundamental theory in physics that describes the behavior of particles at the atomic and subatomic levels.,"quantum mechanics, fundamental theory, physics, particle behavior, atomic level, subatomic level"
+"Sociolinguistics examines how language use varies and changes in social contexts, including factors like region, class, and gender.","sociolinguistics, language use, social contexts, region, class, gender, language variation"
+"Hydraulic engineering involves the design and management of water resources, including flood control, irrigation systems, and hydropower generation.","hydraulic engineering, water resources, flood control, irrigation systems, hydropower generation"
+"The Renaissance period was marked by a revival of art, literature, and learning in Europe, significantly influenced by figures like Leonardo da Vinci and Michelangelo.","Renaissance period, art revival, literature, learning, Europe, Leonardo da Vinci, Michelangelo"
+"Environmental ethics considers the moral relationship between humans and the natural world, focusing on issues like conservation, biodiversity, and sustainability.","environmental ethics, moral relationship, natural world, conservation, biodiversity, sustainability"
+"Game theory studies strategic interactions where the outcome for each participant depends on the actions of others, with applications in economics, politics, and biology.","game theory, strategic interactions, outcome dependence, economics, politics, biology"
+"Paleontology investigates the history of life on Earth through the study of fossils, providing insights into ancient ecosystems and evolutionary processes.","paleontology, history of life, fossils, ancient ecosystems, evolutionary processes"
+"Neuropsychology explores the relationship between brain function and behavior, aiding in the diagnosis and treatment of neurological and psychological disorders.","neuropsychology, brain function, behavior, diagnosis, treatment, neurological disorders, psychological disorders"
+"Classical music, spanning from the Baroque to the Romantic period, includes works by composers like Bach, Mozart, and Beethoven, known for their complex compositions and enduring influence.","classical music, Baroque period, Romantic period, Bach, Mozart, Beethoven, complex compositions, enduring influence"
+"Urban sociology studies the social structures, interactions, and experiences within cities, focusing on issues like urbanization, segregation, and community dynamics.","urban sociology, social structures, interactions, cities, urbanization, segregation, community dynamics"
+"Renewable natural resources management involves sustainable practices in utilizing forests, fisheries, and wildlife to ensure their availability for future generations.","renewable natural resources, sustainable practices, forests, fisheries, wildlife, availability, future generations"
+"Bioinformatics applies computational methods to analyze and interpret biological data, advancing research in areas like genomics and proteomics.","bioinformatics, computational methods, biological data analysis, genomics, proteomics"
+"Social media studies examine the impact of platforms like Facebook, Twitter, and Instagram on communication, social behavior, and public opinion.","social media studies, platform impact, communication, social behavior, public opinion, Facebook, Twitter, Instagram"
+"Modernist literature, emerging in the late 19th and early 20th centuries, is characterized by a break with traditional forms and a focus on stream-of-consciousness narrative and fragmented structure.","modernist literature, traditional forms, stream-of-consciousness, narrative, fragmented structure, 19th century, 20th century"
+"Human rights law focuses on protecting individual freedoms and ensuring justice, addressing issues like discrimination, freedom of speech, and fair trial.","human rights law, individual freedoms, justice, discrimination, freedom of speech, fair trial"
+"Biomedical engineering integrates principles from biology and engineering to develop technologies for healthcare, including medical devices and diagnostic tools.","biomedical engineering, biology, engineering, healthcare technologies, medical devices, diagnostic tools"
+"Political philosophy explores concepts like justice, power, and the role of government, contributing to the development of political systems and ideologies.","political philosophy, justice, power, government role, political systems, ideologies"
+Renewable energy policy involves the formulation of regulations and incentives to promote the adoption of renewable energy sources and reduce reliance on fossil fuels.,"renewable energy policy, regulations, incentives, renewable energy adoption, fossil fuels reduction"
+"Environmental chemistry studies the chemical processes that occur in natural environments, including the effects of pollutants and the behavior of chemical compounds in ecosystems.","environmental chemistry, chemical processes, natural environments, pollutants, chemical compounds, ecosystems"
+"Artificial intelligence ethics addresses the moral implications of AI development and deployment, including issues like bias, accountability, and societal impact.","AI ethics, moral implications, AI development, deployment, bias, accountability, societal impact"
+"Architectural engineering focuses on the design and construction of buildings, integrating principles of structural engineering, environmental systems, and aesthetic considerations.","architectural engineering, building design, construction, structural engineering, environmental systems, aesthetic considerations"
+"Marine conservation aims to protect and restore ocean ecosystems, addressing threats like overfishing, pollution, and habitat destruction.","marine conservation, ocean ecosystems, protection, restoration, overfishing, pollution, habitat destruction"

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 11,
+  "eos_token_id": 11,
+  "transformers_version": "4.31.0.dev0"
+}

handler.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import torch
+from typing import Any, Dict
+from transformers import AutoModelForCausalLM, AutoTokenizer
+class EndpointHandler:
+    def __init__(self, path=""):
+        # load model and tokenizer from path
+        self.tokenizer = AutoTokenizer.from_pretrained(path)
+        self.model = AutoModelForCausalLM.from_pretrained(
+            path, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True
+        )
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+    def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
+        # process input
+        inputs = data.pop("inputs", data)
+        parameters = data.pop("parameters", None)
+        # preprocess
+        inputs = self.tokenizer(inputs, return_tensors="pt").to(self.device)
+        # pass inputs with all kwargs in data
+        if parameters is not None:
+            outputs = self.model.generate(**inputs, **parameters)
+        else:
+            outputs = self.model.generate(**inputs)
+        # postprocess the prediction
+        prediction = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
+        return [{"generated_text": prediction}]

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,202 @@

+{
+  "metadata": {
+    "total_size": 27686882816
+  },
+  "weight_map": {
+    "transformer.h.0.input_layernorm.bias": "model-00001-of-00015.safetensors",
+    "transformer.h.0.input_layernorm.weight": "model-00001-of-00015.safetensors",
+    "transformer.h.0.mlp.dense_4h_to_h.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.0.mlp.dense_h_to_4h.weight": "model-00001-of-00015.safetensors",
+    "transformer.h.0.self_attention.dense.weight": "model-00001-of-00015.safetensors",
+    "transformer.h.0.self_attention.query_key_value.weight": "model-00001-of-00015.safetensors",
+    "transformer.h.1.input_layernorm.bias": "model-00002-of-00015.safetensors",
+    "transformer.h.1.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.1.mlp.dense_4h_to_h.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.1.mlp.dense_h_to_4h.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.1.self_attention.dense.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.1.self_attention.query_key_value.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.10.input_layernorm.bias": "model-00005-of-00015.safetensors",
+    "transformer.h.10.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.10.mlp.dense_4h_to_h.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.10.mlp.dense_h_to_4h.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.10.self_attention.dense.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.10.self_attention.query_key_value.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.11.input_layernorm.bias": "model-00006-of-00015.safetensors",
+    "transformer.h.11.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.11.mlp.dense_4h_to_h.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.11.mlp.dense_h_to_4h.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.11.self_attention.dense.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.11.self_attention.query_key_value.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.12.input_layernorm.bias": "model-00006-of-00015.safetensors",
+    "transformer.h.12.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.12.mlp.dense_4h_to_h.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.12.mlp.dense_h_to_4h.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.12.self_attention.dense.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.12.self_attention.query_key_value.weight": "model-00006-of-00015.safetensors",
+    "transformer.h.13.input_layernorm.bias": "model-00007-of-00015.safetensors",
+    "transformer.h.13.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.13.mlp.dense_4h_to_h.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.13.mlp.dense_h_to_4h.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.13.self_attention.dense.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.13.self_attention.query_key_value.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.14.input_layernorm.bias": "model-00007-of-00015.safetensors",
+    "transformer.h.14.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.14.mlp.dense_4h_to_h.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.14.mlp.dense_h_to_4h.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.14.self_attention.dense.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.14.self_attention.query_key_value.weight": "model-00007-of-00015.safetensors",
+    "transformer.h.15.input_layernorm.bias": "model-00008-of-00015.safetensors",
+    "transformer.h.15.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.15.mlp.dense_4h_to_h.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.15.mlp.dense_h_to_4h.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.15.self_attention.dense.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.15.self_attention.query_key_value.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.16.input_layernorm.bias": "model-00008-of-00015.safetensors",
+    "transformer.h.16.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.16.mlp.dense_4h_to_h.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.16.mlp.dense_h_to_4h.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.16.self_attention.dense.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.16.self_attention.query_key_value.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.17.input_layernorm.bias": "model-00008-of-00015.safetensors",
+    "transformer.h.17.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "transformer.h.17.mlp.dense_4h_to_h.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.17.mlp.dense_h_to_4h.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.17.self_attention.dense.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.17.self_attention.query_key_value.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.18.input_layernorm.bias": "model-00009-of-00015.safetensors",
+    "transformer.h.18.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.18.mlp.dense_4h_to_h.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.18.mlp.dense_h_to_4h.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.18.self_attention.dense.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.18.self_attention.query_key_value.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.19.input_layernorm.bias": "model-00009-of-00015.safetensors",
+    "transformer.h.19.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.19.mlp.dense_4h_to_h.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.19.mlp.dense_h_to_4h.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.19.self_attention.dense.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.19.self_attention.query_key_value.weight": "model-00009-of-00015.safetensors",
+    "transformer.h.2.input_layernorm.bias": "model-00002-of-00015.safetensors",
+    "transformer.h.2.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.2.mlp.dense_4h_to_h.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.2.mlp.dense_h_to_4h.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.2.self_attention.dense.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.2.self_attention.query_key_value.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.20.input_layernorm.bias": "model-00010-of-00015.safetensors",
+    "transformer.h.20.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.20.mlp.dense_4h_to_h.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.20.mlp.dense_h_to_4h.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.20.self_attention.dense.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.20.self_attention.query_key_value.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.21.input_layernorm.bias": "model-00010-of-00015.safetensors",
+    "transformer.h.21.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.21.mlp.dense_4h_to_h.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.21.mlp.dense_h_to_4h.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.21.self_attention.dense.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.21.self_attention.query_key_value.weight": "model-00010-of-00015.safetensors",
+    "transformer.h.22.input_layernorm.bias": "model-00011-of-00015.safetensors",
+    "transformer.h.22.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.22.mlp.dense_4h_to_h.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.22.mlp.dense_h_to_4h.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.22.self_attention.dense.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.22.self_attention.query_key_value.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.23.input_layernorm.bias": "model-00011-of-00015.safetensors",
+    "transformer.h.23.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.23.mlp.dense_4h_to_h.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.23.mlp.dense_h_to_4h.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.23.self_attention.dense.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.23.self_attention.query_key_value.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.24.input_layernorm.bias": "model-00011-of-00015.safetensors",
+    "transformer.h.24.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "transformer.h.24.mlp.dense_4h_to_h.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.24.mlp.dense_h_to_4h.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.24.self_attention.dense.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.24.self_attention.query_key_value.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.25.input_layernorm.bias": "model-00012-of-00015.safetensors",
+    "transformer.h.25.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.25.mlp.dense_4h_to_h.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.25.mlp.dense_h_to_4h.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.25.self_attention.dense.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.25.self_attention.query_key_value.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.26.input_layernorm.bias": "model-00012-of-00015.safetensors",
+    "transformer.h.26.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.26.mlp.dense_4h_to_h.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.26.mlp.dense_h_to_4h.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.26.self_attention.dense.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.26.self_attention.query_key_value.weight": "model-00012-of-00015.safetensors",
+    "transformer.h.27.input_layernorm.bias": "model-00013-of-00015.safetensors",
+    "transformer.h.27.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.27.mlp.dense_4h_to_h.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.27.mlp.dense_h_to_4h.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.27.self_attention.dense.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.27.self_attention.query_key_value.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.28.input_layernorm.bias": "model-00013-of-00015.safetensors",
+    "transformer.h.28.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.28.mlp.dense_4h_to_h.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.28.mlp.dense_h_to_4h.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.28.self_attention.dense.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.28.self_attention.query_key_value.weight": "model-00013-of-00015.safetensors",
+    "transformer.h.29.input_layernorm.bias": "model-00014-of-00015.safetensors",
+    "transformer.h.29.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.29.mlp.dense_4h_to_h.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.29.mlp.dense_h_to_4h.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.29.self_attention.dense.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.29.self_attention.query_key_value.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.3.input_layernorm.bias": "model-00002-of-00015.safetensors",
+    "transformer.h.3.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "transformer.h.3.mlp.dense_4h_to_h.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.3.mlp.dense_h_to_4h.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.3.self_attention.dense.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.3.self_attention.query_key_value.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.30.input_layernorm.bias": "model-00014-of-00015.safetensors",
+    "transformer.h.30.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.30.mlp.dense_4h_to_h.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.30.mlp.dense_h_to_4h.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.30.self_attention.dense.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.30.self_attention.query_key_value.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.31.input_layernorm.bias": "model-00014-of-00015.safetensors",
+    "transformer.h.31.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "transformer.h.31.mlp.dense_4h_to_h.weight": "model-00015-of-00015.safetensors",
+    "transformer.h.31.mlp.dense_h_to_4h.weight": "model-00015-of-00015.safetensors",
+    "transformer.h.31.self_attention.dense.weight": "model-00015-of-00015.safetensors",
+    "transformer.h.31.self_attention.query_key_value.weight": "model-00015-of-00015.safetensors",
+    "transformer.h.4.input_layernorm.bias": "model-00003-of-00015.safetensors",
+    "transformer.h.4.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.4.mlp.dense_4h_to_h.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.4.mlp.dense_h_to_4h.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.4.self_attention.dense.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.4.self_attention.query_key_value.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.5.input_layernorm.bias": "model-00003-of-00015.safetensors",
+    "transformer.h.5.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.5.mlp.dense_4h_to_h.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.5.mlp.dense_h_to_4h.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.5.self_attention.dense.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.5.self_attention.query_key_value.weight": "model-00003-of-00015.safetensors",
+    "transformer.h.6.input_layernorm.bias": "model-00004-of-00015.safetensors",
+    "transformer.h.6.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.6.mlp.dense_4h_to_h.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.6.mlp.dense_h_to_4h.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.6.self_attention.dense.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.6.self_attention.query_key_value.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.7.input_layernorm.bias": "model-00004-of-00015.safetensors",
+    "transformer.h.7.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.7.mlp.dense_4h_to_h.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.7.mlp.dense_h_to_4h.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.7.self_attention.dense.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.7.self_attention.query_key_value.weight": "model-00004-of-00015.safetensors",
+    "transformer.h.8.input_layernorm.bias": "model-00005-of-00015.safetensors",
+    "transformer.h.8.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.8.mlp.dense_4h_to_h.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.8.mlp.dense_h_to_4h.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.8.self_attention.dense.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.8.self_attention.query_key_value.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.9.input_layernorm.bias": "model-00005-of-00015.safetensors",
+    "transformer.h.9.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.9.mlp.dense_4h_to_h.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.9.mlp.dense_h_to_4h.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.9.self_attention.dense.weight": "model-00005-of-00015.safetensors",
+    "transformer.h.9.self_attention.query_key_value.weight": "model-00005-of-00015.safetensors",
+    "transformer.ln_f.bias": "model-00015-of-00015.safetensors",
+    "transformer.ln_f.weight": "model-00015-of-00015.safetensors",
+    "transformer.word_embeddings.weight": "model-00001-of-00015.safetensors"
+  }
+}

modeling_falcon.py ADDED Viewed

	@@ -0,0 +1,1242 @@

+# coding=utf-8
+# Copyright 2023 the Falcon authors and HuggingFace Inc. team.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PyTorch Falcon model."""
+import math
+from typing import Optional, Tuple, Union
+import torch
+import torch.utils.checkpoint
+from torch import nn
+from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, LayerNorm, MSELoss
+from torch.nn import functional as F
+from transformers.modeling_outputs import (
+    BaseModelOutputWithPastAndCrossAttentions,
+    CausalLMOutputWithCrossAttentions,
+    QuestionAnsweringModelOutput,
+    SequenceClassifierOutputWithPast,
+    TokenClassifierOutput,
+)
+from transformers.modeling_utils import PreTrainedModel
+from transformers.utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
+from .configuration_falcon import FalconConfig
+logger = logging.get_logger(__name__)
+FALCON_PRETRAINED_MODEL_ARCHIVE_LIST = [
+    "tiiuae/falcon-40b",
+    "tiiuae/falcon-40b-instruct",
+    "tiiuae/falcon-7b",
+    "tiiuae/falcon-7b-instruct",
+    "tiiuae/falcon-rw-7b",
+    "tiiuae/falcon-rw-1b",
+]
+_CHECKPOINT_FOR_DOC = "Rocketknight1/falcon-rw-1b"
+_CONFIG_FOR_DOC = "FalconConfig"
+# NOTE(Hesslow): Unfortunately we did not fuse matmul and bias during training, this means that there's one additional quantization to bfloat16 between the operations.
+# In order not to degrade the quality of our HF-port, we keep these characteristics in the final model.
+class FalconLinear(nn.Linear):
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        hidden_states = input @ self.weight.T
+        if self.bias is None:
+            return hidden_states
+        return hidden_states + self.bias
+# rotary pos emb helpers (torch.jit.script does not seem to support staticmethod...)
+def rotate_half(x):
+    x1, x2 = x[..., : x.shape[-1] // 2], x[..., x.shape[-1] // 2 :]
+    return torch.cat((-x2, x1), dim=-1)
+class FalconRotaryEmbedding(nn.Module):
+    """Implementation of RotaryEmbedding from GPT-NeoX.
+    This implementation is designed to operate on queries and keys that are compatible with `[batch_size,
+    n_heads_per_partition, seq_len, head_dim]` (e.g. MinGPTAttention format).
+    """
+    def __init__(self, head_dim: int, base=10000):
+        super().__init__()
+        inv_freq = 1.0 / (base ** (torch.arange(0, head_dim, 2).float() / head_dim))
+        self.register_buffer("inv_freq", inv_freq, persistent=False)
+        self.head_dim = head_dim
+        self.seq_len_cached = -1
+        self.cos_cached: torch.Tensor | None = None
+        self.sin_cached: torch.Tensor | None = None
+    def cos_sin(self, seq_len: int, past_key_values_length: int, device="cpu", dtype=torch.bfloat16) -> torch.Tensor:
+        total_length = seq_len + past_key_values_length
+        if total_length > self.seq_len_cached:
+            self.seq_len_cached = total_length
+            t = torch.arange(total_length, device=device, dtype=self.inv_freq.dtype)
+            freqs = torch.einsum("i,j->ij", t, self.inv_freq)
+            emb = torch.cat((freqs, freqs), dim=-1).to(device)
+            if dtype in [torch.float16, torch.bfloat16]:
+                emb = emb.float()
+            self.cos_cached = emb.cos()[None, :, :]
+            self.sin_cached = emb.sin()[None, :, :]
+            self.cos_cached = self.cos_cached.type(dtype)
+            self.sin_cached = self.sin_cached.type(dtype)
+        return (
+            self.cos_cached[:, past_key_values_length : seq_len + past_key_values_length],
+            self.sin_cached[:, past_key_values_length : seq_len + past_key_values_length],
+        )
+    def forward(self, query, key, past_key_values_length=0):
+        batch, seq_len, head_dim = query.shape
+        cos, sin = self.cos_sin(seq_len, past_key_values_length, query.device, query.dtype)
+        return (query * cos) + (rotate_half(query) * sin), (key * cos) + (rotate_half(key) * sin)
+def _make_causal_mask(
+    input_ids_shape: torch.Size, device: torch.device, past_key_values_length: int
+) -> torch.BoolTensor:
+    """
+    Make causal mask used for self-attention. This mask does not take the existing attention mask into account - it
+    just blocks tokens from attending forwards in the sequence. The output shape will be `[batch_size, 1,
+    target_length, target_length+past_key_values_length]`.
+    """
+    batch_size, target_length = input_ids_shape
+    mask = torch.triu(torch.ones((target_length, target_length), dtype=torch.bool, device=device), diagonal=1)
+    # If past_key_values_length is 0 this is an empty tensor and the concatenation is a no-op.
+    # This code style is an unfortunate consequence of getting your TF engineer to port models; doing it this
+    # way avoids a data-dependent conditional, which will help me when I have to port this to XLA later.
+    past_mask = torch.zeros((target_length, past_key_values_length), dtype=torch.bool, device=device)
+    mask = torch.cat([past_mask, mask], dim=-1)
+    expanded_mask = mask[None, None, :, :].expand(batch_size, 1, target_length, target_length + past_key_values_length)
+    return expanded_mask
+def _expand_mask(mask: torch.Tensor, past_key_values_length: int) -> torch.BoolTensor:
+    """
+    Expands attention_mask from `[batch_size, seq_length]` to `[batch_size, 1, seq_length, seq_length + past_length]`.
+    """
+    batch_size, total_length = mask.shape
+    seq_length = total_length - past_key_values_length if past_key_values_length is not None else total_length
+    expanded_mask = ~(mask[:, None, None, :].to(torch.bool))
+    return expanded_mask.expand(batch_size, 1, seq_length, total_length)
+def build_alibi_tensor(attention_mask: torch.Tensor, num_heads: int, dtype: torch.dtype) -> torch.Tensor:
+    batch_size, seq_length = attention_mask.shape
+    closest_power_of_2 = 2 ** math.floor(math.log2(num_heads))
+    base = torch.tensor(
+        2 ** (-(2 ** -(math.log2(closest_power_of_2) - 3))), device=attention_mask.device, dtype=torch.float32
+    )
+    powers = torch.arange(1, 1 + closest_power_of_2, device=attention_mask.device, dtype=torch.int32)
+    slopes = torch.pow(base, powers)
+    if closest_power_of_2 != num_heads:
+        extra_base = torch.tensor(
+            2 ** (-(2 ** -(math.log2(2 * closest_power_of_2) - 3))), device=attention_mask.device, dtype=torch.float32
+        )
+        num_remaining_heads = min(closest_power_of_2, num_heads - closest_power_of_2)
+        extra_powers = torch.arange(1, 1 + 2 * num_remaining_heads, 2, device=attention_mask.device, dtype=torch.int32)
+        slopes = torch.cat([slopes, torch.pow(extra_base, extra_powers)], dim=0)
+    # Note: alibi will added to the attention bias that will be applied to the query, key product of attention
+    # => therefore alibi will have to be of shape (batch_size, num_heads, query_length, key_length)
+    # => here we set (batch_size=1, num_heads=num_heads, query_length=1, key_length=max_length)
+    # => the query_length dimension will then be broadcasted correctly
+    # This is more or less identical to T5's relative position bias:
+    # https://github.com/huggingface/transformers/blob/f681437203baa7671de3174b0fa583c349d9d5e1/src/transformers/models/t5/modeling_t5.py#L527
+    arange_tensor = ((attention_mask.cumsum(dim=-1) - 1) * attention_mask)[:, None, :]
+    alibi = slopes[..., None].bfloat16() * arange_tensor
+    return alibi.reshape(batch_size * num_heads, 1, seq_length).to(dtype)
+# Copied from transformers.models.bloom.modeling_bloom.dropout_add
+def dropout_add(x: torch.Tensor, residual: torch.Tensor, prob: float, training: bool) -> torch.Tensor:
+    """
+    Dropout add function
+    Args:
+        x (`torch.tensor`, *required*):
+            input tensor
+        residual (`torch.tensor`, *required*):
+            residual tensor
+        prob (`float`, *required*):
+            dropout probability
+        training (`bool`, *required*):
+            training mode
+    """
+    out = F.dropout(x, p=prob, training=training)
+    out = residual + out
+    return out
+class FalconAttention(nn.Module):
+    def __init__(self, config: FalconConfig):
+        super().__init__()
+        self.hidden_size = config.hidden_size
+        self.num_heads = config.num_attention_heads
+        self.head_dim = self.hidden_size // self.num_heads
+        self.split_size = self.hidden_size
+        self.hidden_dropout = config.hidden_dropout
+        if self.head_dim * self.num_heads != self.hidden_size:
+            raise ValueError(
+                f"`hidden_size` must be divisible by num_heads (got `hidden_size`: {self.hidden_size} and `num_heads`:"
+                f" {self.num_heads})."
+            )
+        self.maybe_rotary = FalconRotaryEmbedding(config.head_dim) if config.rotary else lambda q, k, t: (q, k)
+        # Layer-wise attention scaling
+        self.inv_norm_factor = 1.0 / math.sqrt(self.head_dim)
+        self.beta = self.inv_norm_factor
+        if config.new_decoder_architecture:
+            qkv_out_dim = (config.num_kv_heads * 2 + config.num_attention_heads) * self.head_dim
+        elif config.multi_query:
+            qkv_out_dim = self.hidden_size + 2 * self.head_dim
+        else:
+            qkv_out_dim = 3 * self.hidden_size
+        self.query_key_value = FalconLinear(self.hidden_size, qkv_out_dim, bias=config.bias)
+        self.new_decoder_architecture = config.new_decoder_architecture
+        self.multi_query = config.multi_query
+        self.dense = FalconLinear(self.hidden_size, self.hidden_size, bias=config.bias)
+        self.attention_dropout = nn.Dropout(config.attention_dropout)
+        self.num_kv_heads = config.num_kv_heads if (self.new_decoder_architecture or not self.multi_query) else 1
+    def _split_heads(self, fused_qkv: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
+        """
+        Split the last dimension into (num_heads, head_dim), results share same memory storage as `fused_qkv`
+        Args:
+            fused_qkv (`torch.tensor`, *required*): [batch_size, seq_length, num_heads * 3 * head_dim]
+        Returns:
+            query: [batch_size, seq_length, num_heads, head_dim] key: [batch_size, seq_length, num_heads, head_dim]
+            value: [batch_size, seq_length, num_heads, head_dim]
+        """
+        if self.new_decoder_architecture:
+            batch, seq_len, _ = fused_qkv.shape
+            qkv = fused_qkv.view(batch, seq_len, -1, self.num_heads // self.num_kv_heads + 2, self.head_dim)
+            query = qkv[:, :, :, :-2]
+            key = qkv[:, :, :, [-2]]
+            value = qkv[:, :, :, [-1]]
+            key = torch.broadcast_to(key, query.shape)
+            value = torch.broadcast_to(value, query.shape)
+            query, key, value = [x.flatten(2, 3) for x in (query, key, value)]
+            return query, key, value
+        elif not self.multi_query:
+            batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
+            fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads, 3, self.head_dim)
+            return fused_qkv[..., 0, :], fused_qkv[..., 1, :], fused_qkv[..., 2, :]
+        else:
+            batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
+            fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim)
+            return fused_qkv[..., :-2, :], fused_qkv[..., [-2], :], fused_qkv[..., [-1], :]
+    # Copied from transformers.models.bloom.modeling_bloom.BloomAttention._merge_heads
+    def _merge_heads(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Merge heads together over the last dimenstion
+        Args:
+            x (`torch.tensor`, *required*): [batch_size * num_heads, seq_length, head_dim]
+        Returns:
+            torch.tensor: [batch_size, seq_length, num_heads * head_dim]
+        """
+        # What we want to achieve is:
+        # batch_size * num_heads, seq_length, head_dim -> batch_size, seq_length, num_heads * head_dim
+        batch_size_and_num_heads, seq_length, _ = x.shape
+        batch_size = batch_size_and_num_heads // self.num_heads
+        # First view to decompose the batch size
+        # batch_size * num_heads, seq_length, head_dim -> batch_size, num_heads, seq_length, head_dim
+        x = x.view(batch_size, self.num_heads, seq_length, self.head_dim)
+        # batch_size, num_heads, seq_length, head_dim -> batch_size, seq_length, num_heads, head_dim
+        x = x.permute(0, 2, 1, 3)
+        # batch_size, seq_length, num_heads, head_dim -> batch_size, seq_length, num_heads * head_dim
+        return x.reshape(batch_size, seq_length, self.num_heads * self.head_dim)
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        alibi: Optional[torch.Tensor],
+        attention_mask: torch.Tensor,
+        layer_past: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        use_cache: bool = False,
+        output_attentions: bool = False,
+    ):
+        fused_qkv = self.query_key_value(hidden_states)  # [batch_size, seq_length, 3 x hidden_size]
+        num_kv_heads = self.num_heads if self.new_decoder_architecture else self.num_kv_heads
+        # 3 x [batch_size, seq_length, num_heads, head_dim]
+        (query_layer, key_layer, value_layer) = self._split_heads(fused_qkv)
+        batch_size, query_length, _, _ = query_layer.shape
+        query_layer = query_layer.transpose(1, 2).reshape(batch_size * self.num_heads, query_length, self.head_dim)
+        key_layer = key_layer.transpose(1, 2).reshape(
+            batch_size * num_kv_heads,
+            query_length,
+            self.head_dim,
+        )
+        value_layer = value_layer.transpose(1, 2).reshape(batch_size * num_kv_heads, query_length, self.head_dim)
+        past_kv_length = 0 if layer_past is None else layer_past[0].shape[1]
+        query_layer, key_layer = self.maybe_rotary(query_layer, key_layer, past_kv_length)
+        if layer_past is not None:
+            past_key, past_value = layer_past
+            # concatenate along seq_length dimension:
+            #  - key: [batch_size * self.num_heads, kv_length, head_dim]
+            #  - value: [batch_size * self.num_heads, kv_length, head_dim]
+            key_layer = torch.cat((past_key, key_layer), dim=1)
+            value_layer = torch.cat((past_value, value_layer), dim=1)
+        _, kv_length, _ = key_layer.shape
+        if use_cache:
+            present = (key_layer, value_layer)
+        else:
+            present = None
+        attention_mask_float = (attention_mask * 1.0).masked_fill(attention_mask, float("-1e9")).to(query_layer.dtype)
+        query_layer_ = query_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)
+        key_layer_ = key_layer.reshape(batch_size, num_kv_heads, -1, self.head_dim)
+        value_layer_ = value_layer.reshape(batch_size, num_kv_heads, -1, self.head_dim)
+        if alibi is None:
+            if output_attentions:
+                # F.scaled_dot_product_attention doesn't return the attention weights, so we have
+                # to do it by hand if we want them
+                attention_scores = query_layer_ @ key_layer_.transpose(-1, -2)
+                attention_scores /= math.sqrt(self.head_dim)
+                attention_scores = F.softmax(
+                    attention_scores + attention_mask_float, dim=-1, dtype=hidden_states.dtype
+                )
+                attn_output = attention_scores @ value_layer_
+            else:
+                attn_output = F.scaled_dot_product_attention(
+                    query_layer_, key_layer_, value_layer_, attention_mask_float, 0.0, is_causal=False
+                )
+                attention_scores = None
+            attn_output = attn_output.view(batch_size, self.num_heads, query_length, self.head_dim)
+            attn_output = attn_output.permute(0, 2, 1, 3)
+            attn_output = attn_output.reshape(batch_size, query_length, self.num_heads * self.head_dim)
+            output_tensor = self.dense(attn_output)
+            if output_attentions:
+                return output_tensor, present, attention_scores
+            else:
+                return output_tensor, present
+        else:
+            matmul_result = query_layer_ @ key_layer_.transpose(-1, -2)
+            # change view to [batch_size, num_heads, q_length, kv_length]
+            attention_scores = matmul_result.view(batch_size, self.num_heads, query_length, kv_length)
+            # cast attention scores to fp32, compute scaled softmax and cast back to initial dtype - [batch_size, num_heads, q_length, kv_length]
+            input_dtype = attention_scores.dtype
+            # `float16` has a minimum value of -65504.0, whereas `bfloat16` and `float32` have a minimum value of `-3.4e+38`
+            if input_dtype == torch.float16 or input_dtype == torch.bfloat16:
+                attention_scores = attention_scores.to(torch.float32)
+            # Matt (HF) note: We could possibly use F.scaled_dot_product_attention here too, by
+            # adding (alibi * self.inv_norm_factor) to attention_mask_float. I think this would be mathematically
+            # equivalent and more performant, but there might be a numerical difference. If you're reading this
+            # and you'd like to experiment and maybe file a PR, feel free!
+            attention_logits = attention_scores + alibi.view(batch_size, self.num_heads, 1, -1)
+            attention_logits *= self.inv_norm_factor
+            attention_probs = F.softmax(attention_logits + attention_mask_float, dim=-1, dtype=hidden_states.dtype)
+            # [batch_size, num_heads, q_length, kv_length]
+            attention_probs = self.attention_dropout(attention_probs)
+            if head_mask is not None:
+                attention_probs = attention_probs * head_mask
+            # change view [batch_size, num_heads, q_length, kv_length]
+            attention_probs_reshaped = attention_probs.view(batch_size, self.num_heads, query_length, kv_length)
+            # matmul: [batch_size * num_heads, q_length, head_dim]
+            context_layer = (attention_probs_reshaped @ value_layer_).flatten(0, 1)
+            # change view [batch_size, num_heads, q_length, head_dim]
+            context_layer = self._merge_heads(context_layer)
+            output_tensor = self.dense(context_layer)
+            if output_attentions:
+                return output_tensor, present, attention_probs
+            else:
+                return output_tensor, present
+class FalconMLP(nn.Module):
+    def __init__(self, config: FalconConfig):
+        super().__init__()
+        hidden_size = config.hidden_size
+        self.dense_h_to_4h = FalconLinear(hidden_size, 4 * hidden_size, bias=config.bias)
+        self.act = nn.GELU()
+        self.dense_4h_to_h = FalconLinear(4 * hidden_size, hidden_size, bias=config.bias)
+        self.hidden_dropout = config.hidden_dropout
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.act(self.dense_h_to_4h(x))
+        x = self.dense_4h_to_h(x)
+        return x
+class FalconDecoderLayer(nn.Module):
+    def __init__(self, config: FalconConfig):
+        super().__init__()
+        hidden_size = config.hidden_size
+        self.num_heads = config.num_attention_heads
+        self.self_attention = FalconAttention(config)
+        self.mlp = FalconMLP(config)
+        self.hidden_dropout = config.hidden_dropout
+        self.config = config
+        if config.new_decoder_architecture:
+            # The layer norm before self-attention
+            self.ln_attn = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
+            # The layer norm before the MLP
+            self.ln_mlp = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
+        else:
+            self.input_layernorm = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
+            if not config.parallel_attn:
+                self.post_attention_layernorm = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        alibi: Optional[torch.Tensor],
+        attention_mask: torch.Tensor,
+        layer_past: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        use_cache: bool = False,
+        output_attentions: bool = False,
+    ):
+        residual = hidden_states
+        if self.config.new_decoder_architecture:
+            attention_layernorm_out = self.ln_attn(hidden_states)
+            mlp_layernorm_out = self.ln_mlp(hidden_states)
+        else:
+            attention_layernorm_out = self.input_layernorm(hidden_states)
+        # Self attention.
+        attn_outputs = self.self_attention(
+            attention_layernorm_out,
+            layer_past=layer_past,
+            attention_mask=attention_mask,
+            alibi=alibi,
+            head_mask=head_mask,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+        )
+        attention_output = attn_outputs[0]
+        if not self.config.new_decoder_architecture:
+            if self.config.parallel_attn:
+                mlp_layernorm_out = attention_layernorm_out
+            else:
+                residual = dropout_add(
+                    attention_output, residual, self.config.attention_dropout, training=self.training
+                )
+                mlp_layernorm_out = self.post_attention_layernorm(residual)
+        outputs = attn_outputs[1:]
+        # MLP.
+        mlp_output = self.mlp(mlp_layernorm_out)
+        if self.config.new_decoder_architecture or self.config.parallel_attn:
+            mlp_output += attention_output
+        output = dropout_add(mlp_output, residual, self.config.hidden_dropout, training=self.training)
+        if use_cache:
+            outputs = (output,) + outputs
+        else:
+            outputs = (output,) + outputs[1:]
+        return outputs  # hidden_states, present, attentions
+FALCON_START_DOCSTRING = r"""
+    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
+    library implements for all its model (such as downloading or saving, resizing the input embeddings etc.)
+    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
+    Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
+    and behavior.
+    Parameters:
+        config ([`FalconConfig`]): Model configuration class with all the parameters of the model.
+            Initializing with a config file does not load the weights associated with the model, only the
+            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
+"""
+FALCON_INPUTS_DOCSTRING = r"""
+    Args:
+        input_ids (`torch.LongTensor` of shape `(batch_size, input_ids_length)`):
+            `input_ids_length` = `sequence_length` if `past_key_values` is `None` else `past_key_values[0][0].shape[2]`
+            (`sequence_length` of input past key value states). Indices of input sequence tokens in the vocabulary.
+            If `past_key_values` is used, only `input_ids` that do not have their past calculated should be passed as
+            `input_ids`.
+            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
+            [`PreTrainedTokenizer.__call__`] for details.
+            [What are input IDs?](../glossary#input-ids)
+        past_key_values (`Tuple[Tuple[torch.Tensor]]` of length `config.num_hidden_layers`):
+            Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (see
+            `past_key_values` output below). Can be used to speed up sequential decoding. The `input_ids` which have
+            their past given to this model should not be passed as `input_ids` as they have already been computed.
+            Each element of `past_key_values` is a tuple (past_key, past_value):
+            - past_key: [batch_size * num_heads, head_dim, kv_length]
+            - past_value: [batch_size * num_heads, kv_length, head_dim]
+        attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
+            - 1 for tokens that are **not masked**,
+            - 0 for tokens that are **masked**.
+            [What are attention masks?](../glossary#attention-mask)
+        head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
+            Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
+            - 1 indicates the head is **not masked**,
+            - 0 indicates the head is **masked**.
+        inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
+            Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
+            is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
+            model's internal embedding lookup matrix.
+            If `past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see
+            `past_key_values`).
+        use_cache (`bool`, *optional*):
+            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
+            `past_key_values`).
+        output_attentions (`bool`, *optional*):
+            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
+            tensors for more detail.
+        output_hidden_states (`bool`, *optional*):
+            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
+            more detail.
+        return_dict (`bool`, *optional*):
+            Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
+"""
+class FalconPreTrainedModel(PreTrainedModel):
+    """
+    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
+    models.
+    """
+    config_class = FalconConfig
+    base_model_prefix = "transformer"
+    supports_gradient_checkpointing = True
+    _no_split_modules = ["FalconDecoderLayer"]
+    def __init__(self, *inputs, **kwargs):
+        super().__init__(*inputs, **kwargs)
+    def _init_weights(self, module: nn.Module):
+        """Initialize the weights."""
+        if isinstance(module, nn.Linear) or isinstance(module, FalconLinear):
+            # Slightly different from the TF version which uses truncated_normal for initialization
+            # cf https://github.com/pytorch/pytorch/pull/5617
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.Embedding):
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.padding_idx is not None:
+                module.weight.data[module.padding_idx].zero_()
+        elif isinstance(module, LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+    # Copied from transformers.models.bloom.modeling_bloom.BloomPreTrainedModel._set_gradient_checkpointing with BloomModel->FalconModel
+    def _set_gradient_checkpointing(self, module: nn.Module, value: bool = False):
+        if isinstance(module, FalconModel):
+            module.gradient_checkpointing = value
+    @staticmethod
+    def _convert_cache_to_standard_format(
+        past_key_value: Tuple[Tuple[torch.Tensor, torch.Tensor]], batch_size: int
+    ) -> Tuple[Tuple[torch.Tensor, torch.Tensor]]:
+        """
+        Standardizes the format of the cache so as to match most implementations, i.e. to tuple(tuple([batch_size,
+        num_heads, ...]))
+        """
+        batch_size_times_num_heads, kv_length, head_dim = past_key_value[0][0].shape
+        # [batch_size * self.num_heads, kv_length, head_dim] -> [batch_size, num_heads, kv_length, head_dim]
+        # Note that don't want to use self.num_attention_heads because the number of heads may vary depending
+        # on whether we use multi_query attention.
+        num_heads = batch_size_times_num_heads // batch_size
+        return tuple(
+            (
+                layer_past[0].view(batch_size, num_heads, kv_length, head_dim),
+                layer_past[1].view(batch_size, num_heads, kv_length, head_dim),
+            )
+            for layer_past in past_key_value
+        )
+    @staticmethod
+    def _convert_to_rw_cache(
+        past_key_value: Tuple[Tuple[torch.Tensor, torch.Tensor]]
+    ) -> Tuple[Tuple[torch.Tensor, torch.Tensor]]:
+        batch_size, num_heads, kv_length, head_dim = past_key_value[0][0].shape
+        batch_size_times_num_heads = batch_size * num_heads
+        # [batch_size, num_heads, kv_length, head_dim] -> [batch_size * num_heads, kv_length, head_dim]
+        return tuple(
+            (
+                layer_past[0].view(batch_size_times_num_heads, kv_length, head_dim),
+                layer_past[1].view(batch_size_times_num_heads, kv_length, head_dim),
+            )
+            for layer_past in past_key_value
+        )
+@add_start_docstrings(
+    "The bare Falcon Model transformer outputting raw hidden-states without any specific head on top.",
+    FALCON_START_DOCSTRING,
+)
+class FalconModel(FalconPreTrainedModel):
+    def __init__(self, config: FalconConfig):
+        super().__init__(config)
+        self.embed_dim = config.hidden_size
+        self.num_heads = config.num_attention_heads
+        self.use_alibi = config.alibi
+        # Embedding + LN Embedding
+        self.word_embeddings = nn.Embedding(config.vocab_size, self.embed_dim)
+        # Transformer blocks
+        self.h = nn.ModuleList([FalconDecoderLayer(config) for _ in range(config.num_hidden_layers)])
+        # Final Layer Norm
+        self.ln_f = LayerNorm(self.embed_dim, eps=config.layer_norm_epsilon)
+        self.gradient_checkpointing = False
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_input_embeddings(self):
+        return self.word_embeddings
+    @staticmethod
+    def _prepare_attn_mask(
+        attention_mask: torch.Tensor, input_shape: Tuple[int, int], past_key_values_length: int
+    ) -> torch.BoolTensor:
+        # Create a causal mask
+        # The attention mask we receive as input should cover the whole extended sequence, including any past
+        # cache, so its shape should be [batch_size, seq_length + past_key_values_length]
+        # The output shape will be [batch_size, 1, seq_length, seq_length + past_key_values_length]
+        if input_shape[1] + past_key_values_length != attention_mask.shape[1]:
+            raise ValueError(
+                "Attention mask shape should be (batch_size, seq_length + past_key_values_length)"
+                f" but is {attention_mask.shape} with input_ids shape {input_shape} and past length"
+                f" {past_key_values_length}."
+            )
+        combined_attention_mask = None
+        device = attention_mask.device
+        _, seq_length = input_shape
+        if seq_length > 1:
+            combined_attention_mask = _make_causal_mask(
+                input_shape, device=device, past_key_values_length=past_key_values_length
+            )
+        # [batch_size, seq_length + past_key_values_length] -> [batch_size, 1, seq_length, seq_length + past_key_values_length]
+        expanded_attn_mask = _expand_mask(attention_mask, past_key_values_length=past_key_values_length)
+        combined_attention_mask = (
+            expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask | combined_attention_mask
+        )
+        return combined_attention_mask
+    def set_input_embeddings(self, new_embeddings: torch.Tensor):
+        self.word_embeddings = new_embeddings
+    @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
+    @add_code_sample_docstrings(
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=BaseModelOutputWithPastAndCrossAttentions,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.LongTensor] = None,
+        inputs_embeds: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor, ...], BaseModelOutputWithPastAndCrossAttentions]:
+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        use_cache = use_cache if use_cache is not None else self.config.use_cache
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        if input_ids is not None and inputs_embeds is not None:
+            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
+        elif input_ids is not None:
+            batch_size, seq_length = input_ids.shape
+        elif inputs_embeds is not None:
+            batch_size, seq_length, _ = inputs_embeds.shape
+        else:
+            raise ValueError("You have to specify either input_ids or inputs_embeds")
+        if past_key_values is None:
+            past_key_values = tuple([None] * len(self.h))
+        else:
+            past_key_values = self._convert_to_rw_cache(past_key_values)
+        # Prepare head mask if needed
+        # 1.0 in head_mask indicate we keep the head
+        # attention_probs has shape batch_size x num_heads x N x N
+        # head_mask has shape n_layer x batch x num_heads x N x N
+        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
+        if inputs_embeds is None:
+            inputs_embeds = self.word_embeddings(input_ids)
+        hidden_states = inputs_embeds
+        presents = () if use_cache else None
+        all_self_attentions = () if output_attentions else None
+        all_hidden_states = () if output_hidden_states else None
+        # Compute alibi tensor: check build_alibi_tensor documentation
+        past_key_values_length = 0
+        if past_key_values[0] is not None:
+            past_key_values_length = past_key_values[0][0].shape[1]  # 1 because RW-cache, not standard format
+        if attention_mask is None:
+            attention_mask = torch.ones((batch_size, seq_length + past_key_values_length), device=hidden_states.device)
+        else:
+            attention_mask = attention_mask.to(hidden_states.device)
+        if self.use_alibi:
+            alibi = build_alibi_tensor(attention_mask, self.num_heads, dtype=hidden_states.dtype)
+        else:
+            alibi = None
+        causal_mask = self._prepare_attn_mask(
+            attention_mask,
+            input_shape=(batch_size, seq_length),
+            past_key_values_length=past_key_values_length,
+        )
+        for i, (block, layer_past) in enumerate(zip(self.h, past_key_values)):
+            if output_hidden_states:
+                all_hidden_states = all_hidden_states + (hidden_states,)
+            if self.gradient_checkpointing and self.training:
+                if use_cache:
+                    logger.warning(
+                        "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
+                    )
+                    use_cache = False
+                def create_custom_forward(module):
+                    def custom_forward(*inputs):
+                        # None for past_key_value
+                        return module(*inputs, use_cache=use_cache, output_attentions=output_attentions)
+                    return custom_forward
+                outputs = torch.utils.checkpoint.checkpoint(
+                    create_custom_forward(block),
+                    hidden_states,
+                    alibi,
+                    causal_mask,
+                    head_mask[i],
+                )
+            else:
+                outputs = block(
+                    hidden_states,
+                    layer_past=layer_past,
+                    attention_mask=causal_mask,
+                    head_mask=head_mask[i],
+                    use_cache=use_cache,
+                    output_attentions=output_attentions,
+                    alibi=alibi,
+                )
+            hidden_states = outputs[0]
+            if use_cache is True:
+                presents = presents + (outputs[1],)
+            if output_attentions:
+                all_self_attentions = all_self_attentions + (outputs[2 if use_cache else 1],)
+        # Add last hidden state
+        hidden_states = self.ln_f(hidden_states)
+        if output_hidden_states:
+            all_hidden_states = all_hidden_states + (hidden_states,)
+        if presents is not None:
+            presents = self._convert_cache_to_standard_format(presents, batch_size)
+        if not return_dict:
+            return tuple(v for v in [hidden_states, presents, all_hidden_states, all_self_attentions] if v is not None)
+        return BaseModelOutputWithPastAndCrossAttentions(
+            last_hidden_state=hidden_states,
+            past_key_values=presents,
+            hidden_states=all_hidden_states,
+            attentions=all_self_attentions,
+        )
+@add_start_docstrings(
+    "The Falcon Model transformer with a language modeling head on top (linear layer with weights tied to the input embeddings).",
+    FALCON_START_DOCSTRING,
+)
+class FalconForCausalLM(FalconPreTrainedModel):
+    _tied_weights_keys = ["lm_head.weight"]
+    def __init__(self, config: FalconConfig):
+        super().__init__(config)
+        self.transformer = FalconModel(config)
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_output_embeddings(self):
+        return self.lm_head
+    def set_output_embeddings(self, new_embeddings: torch.Tensor):
+        self.lm_head = new_embeddings
+    def prepare_inputs_for_generation(
+        self,
+        input_ids: torch.LongTensor,
+        past_key_values: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        **kwargs,
+    ) -> dict:
+        if past_key_values is not None:
+            input_ids = input_ids[:, -1:]
+        return {
+            "input_ids": input_ids,
+            "past_key_values": past_key_values,
+            "use_cache": kwargs.get("use_cache"),
+            "attention_mask": attention_mask,
+        }
+    @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
+    @add_code_sample_docstrings(
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=CausalLMOutputWithCrossAttentions,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], CausalLMOutputWithCrossAttentions]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
+            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
+            are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        transformer_outputs = self.transformer(
+            input_ids,
+            past_key_values=past_key_values,
+            attention_mask=attention_mask,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        hidden_states = transformer_outputs[0]
+        lm_logits = self.lm_head(hidden_states)
+        loss = None
+        if labels is not None:
+            # Shift so that tokens < n predict n
+            shift_logits = lm_logits[..., :-1, :].contiguous()
+            shift_labels = labels[..., 1:].contiguous()
+            batch_size, seq_length, vocab_size = shift_logits.shape
+            # Flatten the tokens
+            loss_fct = CrossEntropyLoss()
+            loss = loss_fct(
+                shift_logits.view(batch_size * seq_length, vocab_size), shift_labels.view(batch_size * seq_length)
+            )
+        if not return_dict:
+            output = (lm_logits,) + transformer_outputs[1:]
+            return ((loss,) + output) if loss is not None else output
+        return CausalLMOutputWithCrossAttentions(
+            loss=loss,
+            logits=lm_logits,
+            past_key_values=transformer_outputs.past_key_values,
+            hidden_states=transformer_outputs.hidden_states,
+            attentions=transformer_outputs.attentions,
+        )
+    def _reorder_cache(
+        self, past: Tuple[Tuple[torch.Tensor, torch.Tensor], ...], beam_idx: torch.LongTensor
+    ) -> Tuple[Tuple[torch.Tensor, torch.Tensor], ...]:
+        """
+        This function is used to re-order the `past_key_values` cache if [`~PreTrainedModel.beam_search`] or
+        [`~PreTrainedModel.beam_sample`] is called. This is required to match `past_key_values` with the correct
+        beam_idx at every generation step.
+        Output shares the same memory storage as `past`.
+        """
+        # Get a copy of `beam_idx` on all the devices where we need those indices.
+        device_to_beam_idx = {
+            past_state.device: beam_idx.to(past_state.device) for layer_past in past for past_state in layer_past
+        }
+        reordered_past = tuple(
+            (
+                layer_past[0].index_select(0, device_to_beam_idx[layer_past[0].device]),
+                layer_past[1].index_select(0, device_to_beam_idx[layer_past[0].device]),
+            )
+            for layer_past in past
+        )
+        return reordered_past
+@add_start_docstrings(
+    """
+    The Falcon Model transformer with a sequence classification head on top (linear layer).
+    [`FalconForSequenceClassification`] uses the last token in order to do the classification, as other causal models
+    (e.g. GPT-1) do.
+    Since it does classification on the last token, it requires to know the position of the last token. If a
+    `pad_token_id` is defined in the configuration, it finds the last token that is not a padding token in each row. If
+    no `pad_token_id` is defined, it simply takes the last value in each row of the batch. Since it cannot guess the
+    padding tokens when `inputs_embeds` are passed instead of `input_ids`, it does the same (take the last value in
+    each row of the batch).
+    """,
+    FALCON_START_DOCSTRING,
+)
+class FalconForSequenceClassification(FalconPreTrainedModel):
+    def __init__(self, config: FalconConfig):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.transformer = FalconModel(config)
+        self.score = nn.Linear(config.hidden_size, config.num_labels, bias=False)
+        # Initialize weights and apply final processing
+        self.post_init()
+    @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
+    @add_code_sample_docstrings(
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=SequenceClassifierOutputWithPast,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutputWithPast]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
+            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
+            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        transformer_outputs = self.transformer(
+            input_ids,
+            past_key_values=past_key_values,
+            attention_mask=attention_mask,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        hidden_states = transformer_outputs[0]
+        logits = self.score(hidden_states)
+        if input_ids is not None:
+            batch_size = input_ids.shape[0]
+        else:
+            batch_size = inputs_embeds.shape[0]
+        if self.config.pad_token_id is None and batch_size != 1:
+            raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
+        if self.config.pad_token_id is None:
+            sequence_lengths = -1
+        else:
+            if input_ids is not None:
+                sequence_lengths = torch.ne(input_ids, self.config.pad_token_id).sum(dim=-1) - 1
+            else:
+                sequence_lengths = -1
+                logger.warning(
+                    f"{self.__class__.__name__} will not detect padding tokens in `inputs_embeds`. Results may be "
+                    "unexpected if using padding tokens in conjunction with `inputs_embeds.`"
+                )
+        pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
+        loss = None
+        if labels is not None:
+            if self.config.problem_type is None:
+                if self.num_labels == 1:
+                    self.config.problem_type = "regression"
+                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
+                    self.config.problem_type = "single_label_classification"
+                else:
+                    self.config.problem_type = "multi_label_classification"
+            if self.config.problem_type == "regression":
+                loss_fct = MSELoss()
+                if self.num_labels == 1:
+                    loss = loss_fct(pooled_logits.squeeze(), labels.squeeze())
+                else:
+                    loss = loss_fct(pooled_logits, labels)
+            elif self.config.problem_type == "single_label_classification":
+                loss_fct = CrossEntropyLoss()
+                loss = loss_fct(pooled_logits, labels)
+            elif self.config.problem_type == "multi_label_classification":
+                loss_fct = BCEWithLogitsLoss()
+                loss = loss_fct(pooled_logits, labels)
+        if not return_dict:
+            output = (pooled_logits,) + transformer_outputs[1:]
+            return ((loss,) + output) if loss is not None else output
+        return SequenceClassifierOutputWithPast(
+            loss=loss,
+            logits=pooled_logits,
+            past_key_values=transformer_outputs.past_key_values,
+            hidden_states=transformer_outputs.hidden_states,
+            attentions=transformer_outputs.attentions,
+        )
+@add_start_docstrings(
+    """
+    Falcon Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for
+    Named-Entity-Recognition (NER) tasks.
+    """,
+    FALCON_START_DOCSTRING,
+)
+class FalconForTokenClassification(FalconPreTrainedModel):
+    def __init__(self, config: FalconConfig):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.transformer = FalconModel(config)
+        if getattr(config, "classifier_dropout", None) is not None:
+            classifier_dropout = config.classifier_dropout
+        elif getattr(config, "hidden_dropout", None) is not None:
+            classifier_dropout = config.hidden_dropout
+        else:
+            classifier_dropout = 0.1
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
+        # Initialize weights and apply final processing
+        self.post_init()
+    @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
+    @add_code_sample_docstrings(
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=TokenClassifierOutput,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], TokenClassifierOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
+            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
+            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        transformer_outputs = self.transformer(
+            input_ids,
+            past_key_values=past_key_values,
+            attention_mask=attention_mask,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        hidden_states = transformer_outputs[0]
+        hidden_states = self.dropout(hidden_states)
+        logits = self.classifier(hidden_states)
+        loss = None
+        if labels is not None:
+            batch_size, seq_length = labels.shape
+            loss_fct = CrossEntropyLoss()
+            loss = loss_fct(
+                logits.view(batch_size * seq_length, self.num_labels), labels.view(batch_size * seq_length)
+            )
+        if not return_dict:
+            output = (logits,) + transformer_outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return TokenClassifierOutput(
+            loss=loss,
+            logits=logits,
+            hidden_states=transformer_outputs.hidden_states,
+            attentions=transformer_outputs.attentions,
+        )
+@add_start_docstrings(
+    """
+    The Falcon Model transformer with a span classification head on top for extractive question-answering tasks like
+    SQuAD (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
+    """,
+    FALCON_START_DOCSTRING,
+)
+class FalconForQuestionAnswering(FalconPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.transformer = FalconModel(config)
+        self.qa_outputs = nn.Linear(config.hidden_size, 2)
+        # Initialize weights and apply final processing
+        self.post_init()
+    @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.FloatTensor] = None,
+        head_mask: Optional[torch.FloatTensor] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        start_positions: Optional[torch.LongTensor] = None,
+        end_positions: Optional[torch.LongTensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple, QuestionAnsweringModelOutput]:
+        r"""
+        start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for position (index) of the start of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
+            are not taken into account for computing the loss.
+        end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for position (index) of the end of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
+            are not taken into account for computing the loss.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        outputs = self.transformer(
+            input_ids,
+            attention_mask=attention_mask,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = outputs[0]
+        logits = self.qa_outputs(sequence_output)
+        start_logits, end_logits = logits.split(1, dim=-1)
+        start_logits = start_logits.squeeze(-1).contiguous()
+        end_logits = end_logits.squeeze(-1).contiguous()
+        total_loss = None
+        if start_positions is not None and end_positions is not None:
+            # If we are on multi-GPU, split add a dimension
+            if len(start_positions.size()) > 1:
+                start_positions = start_positions.squeeze(-1)
+            if len(end_positions.size()) > 1:
+                end_positions = end_positions.squeeze(-1)
+            # sometimes the start/end positions are outside our model inputs, we ignore these terms
+            ignored_index = start_logits.size(1)
+            start_positions = start_positions.clamp(0, ignored_index)
+            end_positions = end_positions.clamp(0, ignored_index)
+            loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
+            start_loss = loss_fct(start_logits, start_positions)
+            end_loss = loss_fct(end_logits, end_positions)
+            total_loss = (start_loss + end_loss) / 2
+        if not return_dict:
+            output = (start_logits, end_logits) + outputs[2:]
+            return ((total_loss,) + output) if total_loss is not None else output
+        return QuestionAnsweringModelOutput(
+            loss=total_loss,
+            start_logits=start_logits,
+            end_logits=end_logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "additional_special_tokens": [
+    ">>TITLE<<",
+    ">>ABSTRACT<<",
+    ">>INTRODUCTION<<",
+    ">>SUMMARY<<",
+    ">>COMMENT<<",
+    ">>ANSWER<<",
+    ">>QUESTION<<",
+    ">>DOMAIN<<",
+    ">>PREFIX<<",
+    ">>SUFFIX<<",
+    ">>MIDDLE<<"
+  ],
+  "eos_token": "<|endoftext|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "add_prefix_space": false,
+  "eos_token": "<|endoftext|>",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 2048,
+  "name_or_path": "tiiuae/falcon_tokenizer",
+  "special_tokens_map_file": null,
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}